Fast KMeans++

fastkmeanspp is a Python package that implements a KMeans clone from scikit-learn but with a much faster centroid initialization and optimized for speed with FAISS. It is designed to be a drop-in replacement for scikit-learn’s KMeans implementation.

Installation

You can install the package via pip:

pip install fastkmeanspp

API Reference

class KMeans(n_clusters=8, n_iter=20, n_local_trials=None, random_state=None)[source]

Bases: BaseEstimator, ClusterMixin

K-means clustering using FAISS.

Variables:
  • n_clusters (int) – The number of clusters to form.

  • n_iter (int) – The number of iterations to run the k-means algorithm.

  • n_local_trials (int | None) – The number of seeding trials for centroids initialization.

  • X (np.ndarray | None) – The input data matrix.

  • random_state (int | None) – centroid initialization.

  • cluster_centers (np.ndarray | None) – Coordinates of cluster centers.

  • labels (np.ndarray | None) – Labels of each point (index) in X.

Parameters:
  • n_clusters (int)

  • n_iter (int)

  • n_local_trials (int | None)

  • random_state (int | None)

n_clusters: int
n_iter: int
n_local_trials: int | None
random_state: int | None
X_: ndarray | None
cluster_centers_: ndarray | None
labels_: ndarray | None
fit(X, y=None)[source]

Run k-means clustering on the input data X.

Parameters:
  • X (npt.ArrayLike) – Input data matrix to cluster.

  • y (None, optional) – Placeholder for y.

Raises:

ValueError – If X contains inf or NaN values.

Returns:

The fitted model.

Return type:

Self

predict(X)[source]

Predict the nearest cluster index for each input data point.

Parameters:

X (npt.ArrayLike) – The input data.

Raises:
  • ValueError – If X contains inf or NaN values.

  • ValueError – If self.cluster_centers_ is not set.

Returns:

np.ndarray The predicted cluster indices.

Return type:

ndarray

property inertia_: float[source]

Get the inertia of the fitted model.

Parameters:

X (npt.ArrayLike) – The input data.

Raises:

ValueError – If self.X_, self.labels_ and self.cluster_centers_ are not all set.

Returns:

The inertia of the fitted model.

Return type:

float