Spectral Bridges ================ **sbcluster** is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called **Spectral Bridges**. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets. Features -------- - **Spectral Bridges Clustering Algorithm**: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results. - **Scalability**: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations. - **Customizable**: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations. - **Model selection**: Automatic model selection for number of nodes (m) according to a normalized eigengap metric. - **scikit-learn**: Native integration with the standard API, with easy options for model selection and evaluation. Speed ----- Spectral Bridges utilizes fastkmeanspp's efficient implementation for KMeans, which makes it remarkably fast even with large scale datasets. Installation ------------ You can install the package via pip: .. code-block:: bash pip install sbcluster Usage ----- Example: .. code-block:: python from time import time import matplotlib.pyplot as plt import numpy as np from sbcluster import SpectralBridges, ngap_scorer from sklearn.cluster import SpectralClustering from sklearn.metrics import adjusted_rand_score from sklearn.model_selection import GridSearchCV # Load some synthetic data data = np.genfromtxt("datasets/impossible.csv", delimiter=",") X, y = data[:, :-1], data[:, -1] # Define the parameter grid param_grid = {"n_clusters": [2, 3, 4, 5, 6, 7, 8, 9, 10]} cv = [(np.arange(X.shape[0]), np.arange(X.shape[0]))] * 5 # Perform grid search for optimal parameters grid_search = GridSearchCV( estimator=SpectralBridges(n_clusters=2, n_nodes=250), param_grid=param_grid, scoring=ngap_scorer, cv=cv, verbose=1, ) # Fit the grid search grid_search.fit(X) # Print the results print(grid_search.cv_results_["mean_test_score"]) print(grid_search.best_params_) # Make predictions with the best model guess = grid_search.best_estimator_.predict(X) ari = adjusted_rand_score(y, guess) # Print the ARI print(f"Adjusted Rand Index: {ari}") # Visualize the clustering results plt.scatter(X[:, 0], X[:, 1], c=guess, alpha=0.1) plt.scatter( grid_search.best_estimator_.cluster_centers_[:, 0], grid_search.best_estimator_.cluster_centers_[:, 1], c=grid_search.best_estimator_.cluster_labels_, marker="X", ) plt.title("Clustered data and centroids with best SpectralBridges fit") plt.show() # Compare with sklearn's SpectralClustering sc_low = SpectralClustering(n_clusters=7).fit(X) plt.scatter(X[:, 0], X[:, 1], c=sc_low.labels_) plt.title("Spectral Clustering of the original dataset, gamma=1.0") plt.show() sc_high = SpectralClustering(n_clusters=7, gamma=5).fit(X) plt.scatter(X[:, 0], X[:, 1], c=sc_high.labels_) plt.title("Spectral Clustering of the original dataset, gmma=5.0") plt.show() # Comapre times start = time() grid_search.best_estimator_.fit(X) end = time() print("SpectralBridges fit time:", end - start) start = time() sc_low.fit(X) end = time() print("SpectralClustering fit time:", end - start) API Reference ------------- .. autoclass:: sbcluster.SpectralBridges :members: :undoc-members: :show-inheritance: