You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a matrix with only one row with nonzero variance and all the rest have zero variance, the model doesn't fit-
Error is:
NotFittedError: This UMAP instance is not fitted yet.
on line:
connected_vertices_mask = ~disconnected_vertices(reducer)
Proposed solution:
def umap_embedding(
X: np.ndarray,
n_neighbors: int = 5,
min_dist: float = 0.12,
spread: float = 9.0,
random_state: int = 42,
n_components: int = 2,
metric: str = "correlation",
n_epochs: int = 1500,
**kwargs,
) -> Tuple[np.ndarray, np.ndarray, UMAP]:
from umap.utils import disconnected_vertices
"""
Perform UMAP embedding on input data.
Args:
X: Input data with shape (n_samples, n_features).
n_neighbors: Number of neighbors to consider for each point.
min_dist: Minimum distance between points in the embedding space.
spread: Determines how spread out all embedded points are overall.
random_state: Random seed for reproducibility.
n_components: Number of dimensions in the embedding space.
metric: Distance metric to use.
n_epochs: Number of training epochs for embedding optimization.
**kwargs: Additional keyword arguments for UMAP.
Returns:
A tuple containing:
- embedding: The UMAP embedding (n_samples, n_components). May be NaN if insufficient data.
- mask: Boolean mask (length n_samples) showing which rows had nonzero variance and were connected.
- reducer: The fitted UMAP object or None if insufficient data.
Raises:
ValueError: If n_components is too large relative to sample size.
Note:
This function handles reshaping of input data and removes constant rows.
"""
if n_components > X.shape[0] - 2:
raise ValueError(
"number of components must be 2 smaller than sample size. "
"See: https://github.com/lmcinnes/umap/issues/201"
)
if len(X.shape) > 2:
# Flatten (n_samples, n_features_1, ...) → (n_samples, n_features)
X = X.reshape(X.shape[0], -1)
# Prepare an output array of NaNs.
n_samples = X.shape[0]
embedding = np.full((n_samples, n_components), np.nan)
# Mask out rows that have zero (or near-zero) variance.
mask = ~np.isclose(X.std(axis=1), 0)
X_nonconst = X[mask]
# If fewer than 2 rows remain, skip UMAP and return embedding of NaNs.
if X_nonconst.shape[0] < 2:
return embedding, mask, None
# Fit UMAP
reducer = UMAP(
n_neighbors=n_neighbors,
min_dist=min_dist,
random_state=random_state,
n_components=n_components,
metric=metric,
spread=spread,
n_epochs=n_epochs,
**kwargs,
)
_embedding = reducer.fit_transform(X_nonconst)
# Remove any “disconnected” vertices UMAP couldn’t place
# (e.g. if the graph is disjoint).
connected_vertices_mask = ~disconnected_vertices(reducer)
# Incorporate the connected-vertices mask into our existing mask.
mask[mask] = mask[mask] & connected_vertices_mask
# Place the valid embeddings back into the final array.
embedding[mask] = _embedding[connected_vertices_mask]
return embedding, mask, reducer
The text was updated successfully, but these errors were encountered:
When a matrix with only one row with nonzero variance and all the rest have zero variance, the model doesn't fit-
Error is:
NotFittedError: This UMAP instance is not fitted yet.
on line:
connected_vertices_mask = ~disconnected_vertices(reducer)
Proposed solution:
The text was updated successfully, but these errors were encountered: