MultispatiPCA

class multispaeti.MultispatiPCA(n_components=None, *, connectivity=None, center_sparse=False, use_gpu=False, random_state=None)

MULTISPATI-PCA

In contrast to Principal component analysis (PCA), MULTISPATI-PCA does not optimize the variance explained of each component but rather the product of the variance and Moran’s I. This can lead to negative eigenvalues i.e. in the case of negative auto-correlation.

The problem is solved by diagonalizing the symmetric matrix \(H=1/(2n)*X^t(W+W^t)X\) where X is matrix of n observations \(\times\) d features, and W is a matrix of the connectivity between observations.

Parameters:
  • n_components (int or tuple[int, int], optional) – Number of components to keep. If None, will keep all components. If an int, it will keep the top n_components. If a tuple, it will keep the top and bottom n_components, respectively.

  • connectivity (sparray or spmatrix) – Matrix of row-wise neighbor definitions i.e. cij is the connectivity of i \(\to\) j. The matrix does not have to be symmetric. It can be a binary adjacency matrix or a matrix of connectivities in which case cij should be larger if i and j are close. A distance matrix should be transformed to connectivities by e.g. calculating \(1-d/d_{max}\) beforehand.

  • center_sparse (bool) – Whether to center X if it is a sparse array. By default sparse X will not be centered as this requires transforming it to a dense array, potentially raising out-of-memory errors.

  • use_gpu (bool) – Whether to use GPU implementation based on cupy and cupyx.scipy (not installed by default). Eigendecomposition using the GPU is not as mature yet. For sparse arrays instead of min(n, d)-1 only min(n, d)-3 eigenvalues/-vectors can be calculated (which in most cases won’t be a problem). For dense arrays all eigenvalues have to be calculated and subsequently subsetted.

  • random_state (int | RandomState | None) – Used when the X is sparse and center_sparse is False and for Moran’s I bound estimation. Pass an int for reproducible results across multiple function calls.

components_

The estimated components: Array of shape (n_components, n_features).

Type:

ndarray | ndarray

eigenvalues_

The eigenvalues corresponding to each of the selected components. Array of shape (n_components,).

Type:

ndarray

variance_

The estimated variance part of the eigenvalues. Array of shape (n_components,).

Type:

ndarray

moransI_

The estimated Moran’s I part of the eigenvalues. Array of shape (n_components,).

Type:

ndarray

mean_

Per-feature empirical mean, estimated from the training set if X is not sparse. Array of shape (n_features,).

Type:

ndarray | ndarray | None

n_components_

The estimated number of components.

Type:

int

n_samples_

Number of samples in the training data.

Type:

int

n_features_in_

Number of features seen during fit.

Type:

int

References

Dray, Stéphane, Sonia Saïd, and Françis Débias. “Spatial ordination of vegetation data using a generalization of Wartenberg’s multivariate spatial correlation.” Journal of vegetation science 19.1 (2008): 45-56.

Methods

fit

Fit MULTISPATI-PCA projection.

fit_transform

Fit and transform the data using MULTISPATI-PCA projection.

get_feature_names_out

Get output feature names for transformation.

get_metadata_routing

Get metadata routing of this object.

get_params

Get parameters for this estimator.

moransI_bounds

Calculate the minimum and maximum bound for Moran's I given the connectivity and the expected value given the #observations.

set_output

Set output container.

set_params

Set the parameters of this estimator.

transform

Transform the data using fitted MULTISPATI-PCA projection.

transform_spatial_lag

Transform the data using fitted MULTISPATI-PCA projection and calculate the spatial lag.