Fast KMeans++¶
fastkmeanspp is a Python package that implements a KMeans clone from scikit-learn but with a much faster centroid initialization and optimized for speed with FAISS. It is designed to be a drop-in replacement for scikit-learn’s KMeans implementation.
Installation¶
You can install the package via pip:
pip install fastkmeanspp
API Reference¶
- class KMeans(n_clusters=8, n_iter=20, n_local_trials=None, random_state=None)[source]¶
Bases:
BaseEstimator,ClusterMixinK-means clustering using FAISS.
- Variables:
n_clusters (int) – The number of clusters to form.
n_iter (int) – The number of iterations to run the k-means algorithm.
n_local_trials (int | None) – The number of seeding trials for centroids initialization.
X (np.ndarray | None) – The input data matrix.
random_state (int | None) – centroid initialization.
cluster_centers (np.ndarray | None) – Coordinates of cluster centers.
labels (np.ndarray | None) – Labels of each point (index) in X.
- Parameters:
n_clusters (int)
n_iter (int)
n_local_trials (int | None)
random_state (int | None)
- n_clusters: int¶
- n_iter: int¶
- n_local_trials: int | None¶
- random_state: int | None¶
- X_: ndarray | None¶
- cluster_centers_: ndarray | None¶
- labels_: ndarray | None¶
- fit(X, y=None)[source]¶
Run k-means clustering on the input data X.
- Parameters:
X (npt.ArrayLike) – Input data matrix to cluster.
y (None, optional) – Placeholder for y.
- Raises:
ValueError – If X contains inf or NaN values.
- Returns:
The fitted model.
- Return type:
Self
- predict(X)[source]¶
Predict the nearest cluster index for each input data point.
- Parameters:
X (npt.ArrayLike) – The input data.
- Raises:
ValueError – If X contains inf or NaN values.
ValueError – If self.cluster_centers_ is not set.
- Returns:
np.ndarray The predicted cluster indices.
- Return type:
ndarray