Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RAM Estimation Feature for Ripser Parameters Calculation (Fixes #145) #185

Conversation

minimalProviderAgentMarket

Pull Request Description

Title: Add Memory Estimation Functionality to Ripser

Related Issue: #145 - Knowing required RAM to run ripser

Background:
This pull request addresses the concerns raised in issue #145, where users experience out-of-memory errors when running Ripser due to varying parameters such as the number of points, the number of features, and the maximum dimension value. To mitigate this issue, we introduce a new functionality to estimate the RAM requirements prior to executing Ripser.

Summary of Changes Implemented:

  1. New Memory Estimation Function:

    • Implemented the function estimate_ripser_memory which allows users to estimate the RAM requirements based on key parameters:
      • Number of Points: The amount of data points in the dataset.
      • Maxdim: The maximum homology dimension the user intends to compute.
      • Distance Matrix Memory: Calculated as (O(n^2)).
      • Simplicial Complex Memory: Calculated as (O(n^{(d+1)})), where (d) is the dimension of the data.
      • Persistence Computation Memory: Calculated as (O(n^3)).
  2. Testing:

    • Comprehensive test cases have been added in the file test/test_memory_estimation.py to ensure:
      • Memory usage correctly increases with the dimensionality.
      • Memory usage scales proportionately concerning the number of points.
      • The function operates correctly with both point clouds and distance matrices.
  3. Documentation Updates:

    • Updated the __all__ list in the module to include the new memory estimation function.
    • Added an example usage section in README.md, showcasing how to utilize the estimate_ripser_memory function to assess memory requirements.

Usage Example:
Users can estimate the RAM needed for their runs of Ripser using the following code snippet:

import numpy as np
from ripser import estimate_ripser_memory

data = np.random.random((1000, 2))
estimated_gb = estimate_ripser_memory(data, maxdim=2)
print(f"Estimated RAM required: {estimated_gb:.2f} GB")

These enhancements will help users to preemptively gauge the memory needs for their computations, reducing the likelihood of experiencing out-of-memory errors during execution.

Conclusion:
This pull request aims to improve the user experience by providing a reliable method for estimating RAM usage, thus addressing the issue raised in #145 effectively.

Fixes #145.

@ubauer
Copy link
Collaborator

ubauer commented Jan 15, 2025

I am sorry, but the estimates implemented here do not apply to Ripser. The Rips complex is not constructed explicitly as a whole. Moreover, the estimate O(n^3) for persistence computation is incorrect. It is true that matrix reduction uses time at most O(m^3), where m is the number of simplices. But this is a time bound, not a memory bound, and the number of vertices is very different from the number of simplices.

@ubauer ubauer closed this Jan 15, 2025
@ctralie
Copy link
Member

ctralie commented Jan 15, 2025

Thanks Uli. Yet another example of bullshit coming out of a large language model :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Knowing required RAM to run ripser
3 participants