BalLOT: Balanced $k$-means clustering with optimal transport

Generative AI & LLMs
Published: arXiv: 2512.05926v1
Authors

Wenyan Luo Dustin G. Mixon

Abstract

We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.

Paper Summary

Problem
The main problem addressed in this paper is the "balanced k-means clustering" problem, where we want to partition data points into k clusters of equal size. This is a challenging problem because traditional k-means clustering algorithms often produce clusters of different sizes, which may not be suitable for certain applications.
Key Innovation
The key innovation of this paper is the introduction of a new algorithm called BalLOT, which uses optimal transport to solve the balanced k-means clustering problem. BalLOT is a fast and effective algorithm that produces high-quality clusters, and it has several theoretical guarantees that ensure its performance.
Practical Impact
The practical impact of this research is significant, as it provides a new and efficient way to solve the balanced k-means clustering problem. This problem is important in many fields, such as wireless sensor networks, frequency-sensitive competitive learning, and market basket analysis. The ability to cluster data points into equal-sized groups can help to identify patterns and relationships that may not be apparent in traditional k-means clustering.
Analogy / Intuitive Explanation
Think of clustering data points like grouping students into classrooms. Traditional k-means clustering is like assigning students to classrooms based on their interests, but the classrooms may end up having different numbers of students. Balanced k-means clustering is like assigning students to classrooms in a way that ensures each classroom has the same number of students. The BalLOT algorithm is like a smart teacher who uses optimal transport to ensure that students are assigned to classrooms in a way that minimizes the distance between students in the same classroom.
Paper Information
Categories:
stat.ML cs.DS cs.IT cs.LG math.OC
Published Date:

arXiv ID:

2512.05926v1

Quick Actions