MimiSBM

mimisbm is a Python package implementing the Mixture of Multilayer Integrator Stochastic Block Model proposed by the original authors. It jointly groups nodes into clusters and layers into components, providing a unified framework for identifying shared connectivity patterns across multiple network layers.

Features

  • Multilayer Clustering: Jointly identifies node communities and layer components in a single probabilistic framework.

  • Variational EM: Efficient inference using a Variational Expectation-Maximization (VEM) algorithm for large-scale networks.

  • Bayesian Framework: Supports flexible Dirichlet and Beta priors, allowing for robust structure discovery under different sparsity regimes.

  • Component-wise SBMs: Groups layers sharing similar block-model structures into distinct mixture components.

  • scikit-learn API: Native BaseEstimator and ClusterMixin integration with a familiar fit / predict interface.

Installation

You can install the package via pip:

pip install mimisbm

Usage

Example:

import numpy as np
from mimisbm import MimiSBM

# Generate a synthetic multilayer adjacency tensor (20 nodes, 5 layers)
np.random.seed(42)
N, V = 20, 5
A = np.random.randint(0, 2, size=(N, N, V))

# Ensure the adjacency matrices are symmetric (undirected)
for v in range(V):
    A[..., v] = np.tril(A[..., v], -1) + np.tril(A[..., v], -1).T

# Initialize the model with 3 node clusters and 2 layer components
model = MimiSBM(n_clusters=3, n_components=2, random_state=42)

# Fit the model to the multilayer network
model.fit(A)

# Predict node cluster and layer component assignments
node_labels, layer_labels = model.predict()

print(f"Node clusters: {node_labels}")
print(f"Layer components: {layer_labels}")
print(f"Final ELBO: {model.elbo_:.2f}")

Citation

If you use MimiSBM in your research, please cite the original authors’ paper:

@article{de2024mixture,
  title={Mixture of multilayer stochastic block models for multiview clustering},
  author={De Santiago, Kylliann and Szafranski, Marie and Ambroise, Christophe},
  journal={arXiv preprint arXiv:2401.04682},
  year={2024}
}

For more details, see the corresponding Preprint: https://arxiv.org/abs/2401.04682

API Reference

class MimiSBM(n_clusters=2, n_components=2, *, clusters_prior='jeffreys', components_prior='jeffreys', adjacency_prior='jeffreys', max_iter=100, tol=0.0001, warm_start=False, random_state=None)[source]

Bases: ClusterMixin, BaseEstimator

Mixture of Multilayer Integrator Stochastic Block Model (MimiSBM).

The MimiSBM is a generative model for multilayer networks that identifies mesoscale structures by grouping nodes into clusters and layers into components.

Each component represents a distinct Stochastic Block Model (SBM) shared by a subset of layers. This model uses a Variational Expectation-Maximization (VEM) algorithm to perform inference and estimation of the posterior distributions.

Model settings:
  • n_clusters: Number of clusters for the nodes.

  • n_components: Number of mixture components for the layers.

Prior settings:
  • clusters_prior: Dirichlet prior for the node cluster mixing proportions.

  • components_prior: Dirichlet prior for the layer component mixing proportions.

  • adjacency_prior: Beta prior for the edge probabilities within and between clusters for each component.

EM settings:
  • max_iter: Maximum number of iterations for the VEM algorithm.

  • tol: Convergence tolerance based on the Evidence Lower Bound (ELBO).

  • warm_start: If True, reuse the responsibilities from the previous fit as initialization.

Variables:
  • n_clusters (int) – Number of node clusters.

  • n_components (int) – Number of layer components.

  • clusters_prior (np.ndarray) – Prior parameters for node clusters.

  • components_prior (np.ndarray) – Prior parameters for layer components.

  • adjacency_prior (np.ndarray) – Prior parameters for edge connections.

  • max_iter (int) – Maximum number of iterations for the EM algorithm.

  • tol (float) – Tolerance to declare convergence based on the ELBO.

  • warm_start (bool) – Whether to reuse the solution of the previous call to fit as initialization.

  • random_state (int | None) – Random state for initialization.

  • cluster_responsibilities (np.ndarray) – Posterior probabilities of node cluster assignments (N, K).

  • component_responsibilities (np.ndarray) – Posterior probabilities of layer component assignments (V, Q).

  • cluster_posterior (np.ndarray) – Dirichlet posterior parameters for clusters.

  • component_posterior (np.ndarray) – Dirichlet posterior parameters for components.

  • adjacency_posterior (np.ndarray) – Beta posterior parameters for edge connections (2, K, K, Q).

  • elbo (float) – Evidence Lower Bound of the fitted model.

  • converged (bool) – True if the algorithm converged, False otherwise.

Parameters:
  • n_clusters (int)

  • n_components (int)

  • clusters_prior (ndarray)

  • components_prior (ndarray)

  • adjacency_prior (ndarray)

  • max_iter (int)

  • tol (float)

  • warm_start (bool)

  • random_state (int | None)

Examples

>>> from mimisbm import MimiSBM
>>> import numpy as np
>>> A = np.random.randint(0, 2, size=(10, 10, 5))
>>> model = MimiSBM(n_clusters=2, n_components=2)
>>> model.fit(A)
>>> node_labels, layer_labels = model.predict()
cluster_responsibilities_: ndarray
component_responsibilities_: ndarray
cluster_posterior_: ndarray
component_posterior_: ndarray
adjacency_posterior_: ndarray
elbo_: float
converged_: bool
n_clusters: int
n_components: int
clusters_prior: ndarray
components_prior: ndarray
adjacency_prior: ndarray
max_iter: int
tol: float
warm_start: bool
random_state: int | None
set_fit_request(*, A='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • A (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for A parameter in fit.

  • self (MimiSBM)

Returns:

self – The updated object.

Return type:

object

fit(A)[source]

Fits the MimiSBM model to the multilayer adjacency tensor.

Initializes the model responsibilities and iteratively updates them using the VEM algorithm. The process continues until the ELBO converges or the maximum number of iterations is reached.

Parameters:

A (np.typing.ArrayLike) – A 3D numpy array-like representing the multilayer adjacency tensor of shape (N, N, V).

Returns:

The fitted model instance.

Return type:

Self

predict()[source]

Predicts the node clusters and layer components labels.

Assigns each node and each layer to the cluster/component with the highest probability.

Returns:

A tuple containing:
  • node_labels (np.ndarray): Predicted cluster for each node (N,).

  • layer_labels (np.ndarray): Predicted component for each layer (V,).

Return type:

tuple[np.ndarray, np.ndarray]

fit_predict(A)[source]

Fits the MimiSBM model to the multilayer adjacency tensor and predicts.

Initializes the model responsibilities and iteratively updates them using the VEM algorithm. The process continues until the ELBO converges or the maximum number of iterations is reached.

Parameters:

A (np.typing.ArrayLike) – A 3D numpy array-like representing the multilayer adjacency tensor of shape (N, N, V).

Returns:

A tuple containing:
  • node_labels (np.ndarray): Predicted cluster for each node (N,).

  • layer_labels (np.ndarray): Predicted component for each layer (V,).

Return type:

tuple[np.ndarray, np.ndarray]