Fast KMeans++¶

fastkmeanspp is a Python package that implements a KMeans clone from scikit-learn but with a much faster centroid initialization and optimized for speed with FAISS. It is designed to be a drop-in replacement for scikit-learn’s KMeans implementation.

Installation¶

You can install the package via pip:

pip install fastkmeanspp

API Reference¶

class KMeans(n_clusters=8, n_iter=20, n_local_trials=None, random_state=None)[source]¶

Bases: BaseEstimator, ClusterMixin

K-means clustering using FAISS.

Variables:

n_clusters (int) – The number of clusters to form.
n_iter (int) – The number of iterations to run the k-means algorithm.
n_local_trials (int | None) – The number of seeding trials for centroids initialization.
X (np.ndarray | None) – The input data matrix.
random_state (int | None) – centroid initialization.
cluster_centers (np.ndarray | None) – Coordinates of cluster centers.
labels (np.ndarray | None) – Labels of each point (index) in X.

Parameters:

n_clusters (int)
n_iter (int)
n_local_trials (int | None)
random_state (int | None)

n_clusters: int¶

n_iter: int¶

n_local_trials: int | None¶

random_state: int | None¶

X_: ndarray | None¶

cluster_centers_: ndarray | None¶

labels_: ndarray | None¶

fit(X, y=None)[source]¶

Run k-means clustering on the input data X.

Parameters:

X (npt.ArrayLike) – Input data matrix to cluster.
y (None, optional) – Placeholder for y.

Raises:

ValueError – If X contains inf or NaN values.

Returns:

The fitted model.

Return type:

Self

predict(X)[source]¶

Predict the nearest cluster index for each input data point.

Parameters:

X (npt.ArrayLike) – The input data.

Raises:

ValueError – If X contains inf or NaN values.
ValueError – If self.cluster_centers_ is not set.

Returns:

np.ndarray The predicted cluster indices.

Return type:

ndarray

property inertia_: float[source]¶

Get the inertia of the fitted model.

Parameters:: X (npt.ArrayLike) – The input data.
Raises:: ValueError – If self.X_, self.labels_ and self.cluster_centers_ are not all set.
Returns:: The inertia of the fitted model.
Return type:: float