Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Crystal Space and Oxidation Utilities to the Documentation #337

Merged
merged 17 commits into from
Dec 1, 2024

Conversation

AntObi
Copy link
Collaborator

@AntObi AntObi commented Dec 1, 2024

Documentation updates

Description

This PR addresses the following:

  • Crystal Space functions are now rendered in the documentation
  • The Oxidation Utilities module (Creating oxidation state lists from the ICSD data) is now rendered in the documentation
  • Hyperlinks have been updated in the documentation and tutorial notebooks
  • Updated type hinting in the codebase

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

  • Documentation has been rendered locally to check if changes are reflected in the docs
  • Tests passed locally before making PR

Test Configuration:

  • Python version: 3.10
  • Operating System: macOS

Reviewers

N/A

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Summary by CodeRabbit

Release Notes

  • New Features

    • Added new documentation sections for the Crystal Space and Oxidation modules, enhancing user guidance and module visibility.
    • Updated Jupyter notebooks with improved clarity and formatting for better user experience.
  • Bug Fixes

    • Corrected import paths in tutorials to reflect the new directory structure.
  • Documentation

    • Expanded and clarified documentation across various modules and notebooks, including detailed descriptions of functions and parameters.
    • Updated links in the README for easier access to examples and enhanced project visibility.
  • Chores

    • Significant updates to the requirements.txt file, adding new dependencies and updating existing ones to ensure compatibility and performance.
    • Added a requirement in the .readthedocs.yaml file to include dependencies from requirements.txt.

@AntObi AntObi added the docs label Dec 1, 2024
Copy link
Contributor

coderabbitai bot commented Dec 1, 2024

Walkthrough

The pull request introduces several changes across multiple documentation files and Python modules. Key modifications include updates to the Sphinx configuration in docs/conf.py, the addition of new documentation sections for various modules, and restructuring of import paths in Python files. Additionally, the requirements.txt file has been significantly updated with new and modified package dependencies. Changes also include the removal of certain examples from the documentation and enhancements to Jupyter notebooks, particularly in formatting and clarity.

Changes

File Path Change Summary
docs/conf.py Renamed variable jupyter_execute_notebooks to nb_execution_mode, commented out html_static_path, added new MOCK_MODULES.
docs/examples.rst Removed entry examples/neutral_combos from toctree.
docs/examples/doper.ipynb Updated header formatting and clarified output description, added new section for plot_dopants.
docs/smact.utils.crystal_space.download_compounds_with_mp_api.rst Added documentation section for download_compounds_with_mp_api.
docs/smact.utils.crystal_space.generate_composition_with_smact.rst Added documentation section for generate_composition_with_smact.
docs/smact.utils.crystal_space.plot_embedding.rst Added documentation section for plot_embedding.
docs/smact.utils.crystal_space.rst Created new documentation file for crystal_space module.
docs/smact.utils.oxidation.rst Created new documentation file for oxidation module.
docs/smact.utils.rst Updated submodules: removed five, added crystal_space and oxidation.
docs/tutorials/crystal_space.ipynb Updated import paths for functions and refined tutorial content.
docs/tutorials/crystal_space_visualisation.ipynb Updated prerequisite reference from crystal_space_with_smact.ipynb to crystal_space.ipynb.
docs/tutorials/oxidation_states.ipynb Changed header from second-level to first-level for prominence.
requirements.txt Added and updated numerous package dependencies, with several packages upgraded to newer versions.
smact/utils/crystal_space/download_compounds_with_mp_api.py Updated import path and corrected save_dir parameter in docstring.
smact/utils/crystal_space/generate_composition_with_smact.py Updated import path, added return type annotations, and expanded docstrings for clarity.
smact/utils/oxidation.py Simplified filename parameter type in write method of ICSD24OxStatesFilter class.

Possibly related PRs

Suggested labels

enhancement, feature, python, refactor

🐇 In the garden of code, we hop with delight,
New docs and modules, all shining so bright.
With paths rearranged and examples refined,
Our project now sparkles, with treasures to find!
From Sphinx to Jupyter, we dance through the night,
Celebrating changes, oh what a sight! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ac6604f and 9c823f4.

📒 Files selected for processing (2)
  • .readthedocs.yaml (1 hunks)
  • README.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • README.md
🔇 Additional comments (2)
.readthedocs.yaml (2)

16-16: Verify the necessity of installing both requirements files

The addition of requirements.txt alongside docs/requirements.txt could lead to duplicate dependencies or version conflicts. Consider consolidating these files if possible.

✅ Verification successful

Both requirements files serve distinct purposes and should be kept separate

The analysis of both files shows they serve different purposes with minimal overlap:

  • requirements.txt contains the main project dependencies (SMACT, pymatgen, etc.)
  • docs/requirements.txt contains documentation-specific packages (Sphinx, MyST-NB, etc.)

The only shared dependencies are basic utilities (attrs, typing-extensions, etc.) which are common across Python projects and don't pose version conflicts. The addition of requirements.txt is necessary as the documentation likely needs access to the main project's code and dependencies to:

  • Generate API documentation
  • Execute code examples
  • Build interactive documentation features
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for duplicate dependencies and potential conflicts

# Compare requirements files for duplicates
echo "Checking for duplicate packages:"
comm -12 <(sort requirements.txt) <(sort docs/requirements.txt)

# Display both files for manual review
echo -e "\nContents of requirements.txt:"
cat requirements.txt
echo -e "\nContents of docs/requirements.txt:"
cat docs/requirements.txt

Length of output: 11487


16-16: Monitor documentation build performance

The inclusion of additional requirements might impact build time and memory usage. Consider monitoring the ReadTheDocs build logs after this change.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (10)
requirements.txt (1)

Line range hint 1-375: Consider documenting direct dependencies separately

The requirements.txt file contains both direct and transitive dependencies. Consider maintaining a separate requirements-direct.txt file listing only the direct dependencies of the project for better maintainability.

Create a new file requirements-direct.txt with only the direct dependencies (those marked with "via SMACT (setup.py)"), then generate requirements.txt from it:

+ # Create requirements-direct.txt with:
+ ase==3.23.0
+ dash==2.18.2
+ elementembeddings==0.6.1
+ # ... other direct dependencies
+ 
+ # Then generate requirements.txt using:
+ pip-compile --extra=strict requirements-direct.txt -o requirements.txt
docs/examples/doper.ipynb (3)

14-15: Consider using proper markdown heading instead of bold text

For better document structure and navigation, consider using a proper markdown heading (e.g., ## Output Format) instead of bold text. This would maintain consistent heading hierarchy and improve the notebook's table of contents generation.

-**Output Format**
+## Output Format

Line range hint 1-24: Consider adding error handling examples

The documentation would benefit from examples demonstrating:

  • How to handle invalid ionic species
  • Error cases when using incompatible oxidation states
  • Edge cases with unusual dopant combinations

This would help users understand the module's behaviour in non-ideal scenarios.


Line range hint 8-9: Enhance the alternative metrics documentation

The skipspecies metric explanation could be improved by:

  • Adding a brief comparison of accuracy between traditional and skipspecies approaches
  • Including performance considerations
  • Providing guidance on when to use each metric

This would help users make informed decisions about which metric to use for their specific use case.

smact/utils/crystal_space/download_compounds_with_mp_api.py (2)

Line range hint 16-24: Add type hints to improve code clarity.

Consider adding type hints to the function signature for better code maintainability and IDE support.

def download_mp_data(
-   mp_api_key: str | None = None,
-   num_elements: int = 2,
-   max_stoich: int = 8,
-   save_dir: str = "data/binary/mp_api",
-   request_interval: float = 0.1,
+   mp_api_key: str | None = None,
+   num_elements: int = 2,
+   max_stoich: int = 8,
+   save_dir: str = "data/binary/mp_api",
+   request_interval: float = 0.1,
+) -> None:

Line range hint 89-89: Consider implementing exponential backoff for API requests.

The current implementation uses a fixed interval between requests. Consider implementing exponential backoff to better handle potential API rate limits.

-        time.sleep(request_interval)
+        # Implement exponential backoff
+        retry_count = 0
+        while retry_count < 3:
+            try:
+                time.sleep(request_interval * (2 ** retry_count))
+                break
+            except Exception as e:
+                retry_count += 1
+                if retry_count == 3:
+                    raise e
smact/utils/crystal_space/generate_composition_with_smact.py (2)

Line range hint 77-86: Consider using context manager for multiprocessing pools.

Using a context manager with multiprocessing pools ensures proper cleanup of resources.

-    pool = multiprocessing.Pool(processes=multiprocessing.cpu_count() if num_processes is None else num_processes)
-    compounds = list(
-        tqdm(
-            pool.imap_unordered(
-                partial(
-                    convert_formula,
-                    num_elements=num_elements,
-                    max_stoich=max_stoich,
-                ),
-                combinations,
-            ),
-            total=len(combinations),
-        )
-    )
-    pool.close()
-    pool.join()
+    with multiprocessing.Pool(processes=multiprocessing.cpu_count() if num_processes is None else num_processes) as pool:
+        compounds = list(
+            tqdm(
+                pool.imap_unordered(
+                    partial(
+                        convert_formula,
+                        num_elements=num_elements,
+                        max_stoich=max_stoich,
+                    ),
+                    combinations,
+                ),
+                total=len(combinations),
+            )
+        )

Line range hint 115-115: Remove redundant strict=False parameter in zip().

The strict parameter is redundant in Python 3.10+ as it's the default behaviour.

-            symbols_stoich = zip(res[0], res[2], strict=False)
+            symbols_stoich = zip(res[0], res[2])
docs/tutorials/crystal_space_visualisation.ipynb (2)

Line range hint 32-38: Consider optimising the embedding computation for large datasets.

The current implementation processes embeddings one at a time, which could be inefficient for large datasets. Consider vectorising the computation or processing in batches.

-    embeddings = []
-    for f in tqdm(formula):
-        try:
-            compositional_embedding = CompositionalEmbedding(f, embedding=embedding)
-            embeddings.append(compositional_embedding.feature_vector(stats=stats))
-        except Exception as e:
-            embeddings.append(np.full(embedding_dim, np.nan))
+    def process_batch(formulas, batch_size=100):
+        for i in range(0, len(formulas), batch_size):
+            batch = formulas[i:i + batch_size]
+            yield [
+                CompositionalEmbedding(f, embedding=embedding).feature_vector(stats=stats)
+                if f else np.full(embedding_dim, np.nan)
+                for f in batch
+            ]
+
+    embeddings = np.vstack(list(process_batch(formula)))

Also applies to: 39-40, 41-42


Line range hint 148-149: Consider adding error handling for specific exceptions.

The dimension reduction implementation could benefit from more specific error handling to provide better feedback to users.

     try:
         reducer = PCA(n_components=n_components, **kwargs)
     elif reducer == "tsne":
         reducer = TSNE(n_components=n_components, **kwargs)
     elif reducer == "umap":
         reducer = UMAP(n_components=n_components, **kwargs)
     else:
         raise ValueError("reducer must be one of ['pca', 'tsne', 'umap']")
+    try:
+        reduced_embeddings = reducer.fit_transform(embeddings.values)
+    except ValueError as e:
+        raise ValueError(f"Failed to reduce dimensions: {str(e)}")
+    except MemoryError:
+        raise MemoryError("Insufficient memory for dimension reduction. Consider reducing the dataset size.")

Also applies to: 150-151, 152-153, 154-155, 156-157, 158-159

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 691f9d4 and ac6604f.

📒 Files selected for processing (16)
  • docs/conf.py (3 hunks)
  • docs/examples.rst (0 hunks)
  • docs/examples/doper.ipynb (1 hunks)
  • docs/smact.utils.crystal_space.download_compounds_with_mp_api.rst (1 hunks)
  • docs/smact.utils.crystal_space.generate_composition_with_smact.rst (1 hunks)
  • docs/smact.utils.crystal_space.plot_embedding.rst (1 hunks)
  • docs/smact.utils.crystal_space.rst (1 hunks)
  • docs/smact.utils.oxidation.rst (1 hunks)
  • docs/smact.utils.rst (1 hunks)
  • docs/tutorials/crystal_space.ipynb (2 hunks)
  • docs/tutorials/crystal_space_visualisation.ipynb (1 hunks)
  • docs/tutorials/oxidation_states.ipynb (1 hunks)
  • requirements.txt (1 hunks)
  • smact/utils/crystal_space/download_compounds_with_mp_api.py (1 hunks)
  • smact/utils/crystal_space/generate_composition_with_smact.py (3 hunks)
  • smact/utils/oxidation.py (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/examples.rst
✅ Files skipped from review due to trivial changes (7)
  • docs/smact.utils.oxidation.rst
  • docs/smact.utils.crystal_space.rst
  • docs/smact.utils.crystal_space.download_compounds_with_mp_api.rst
  • docs/smact.utils.crystal_space.plot_embedding.rst
  • docs/smact.utils.crystal_space.generate_composition_with_smact.rst
  • docs/smact.utils.rst
  • docs/tutorials/oxidation_states.ipynb
🔇 Additional comments (10)
requirements.txt (1)

224-231: Verify security advisories for pymatgen

The pymatgen package is a critical dependency. Given its recent version (2024.11.13), we should verify there are no known security issues.

✅ Verification successful

pymatgen 2024.11.13 is secure and up-to-date

The version 2024.11.13 of pymatgen being used is secure as it's well above the patched versions for all known vulnerabilities:

  • Critical vulnerability (arbitrary code execution) was patched in 2024.2.20
  • Moderate vulnerability (ReDoS) was patched in versions after 2022.9.21
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security advisories for pymatgen

# Check PyPI for latest versions and potential security issues
curl -s https://pypi.org/pypi/pymatgen/json | jq '.info.version'

# Check for security advisories
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "pymatgen") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 949

docs/examples/doper.ipynb (1)

Line range hint 1-1: Verify integration with Crystal Space functionality

Given that this PR aims to document Crystal Space functions, consider adding examples or references showing how the Doper module integrates with Crystal Space functionality, if applicable.

smact/utils/crystal_space/download_compounds_with_mp_api.py (1)

13-13: LGTM! Import statement has been simplified.

The import statement has been updated to use a more concise path whilst maintaining the same functionality.

smact/utils/crystal_space/generate_composition_with_smact.py (2)

12-12: LGTM! Import statement has been simplified.

The import statement has been updated to use a more concise path whilst maintaining the same functionality.


21-21: LGTM! Return type hints added.

The addition of return type hints improves code clarity and IDE support.

Also applies to: 47-47

docs/tutorials/crystal_space.ipynb (1)

47-47: LGTM! Documentation hyperlinks updated.

The hyperlinks to source files have been updated to reflect the new file structure, improving navigation and documentation clarity.

Also applies to: 130-130

docs/tutorials/crystal_space_visualisation.ipynb (1)

14-14: LGTM: Prerequisite reference updated correctly.

The prerequisite reference has been updated from crystal_space_with_smact.ipynb to crystal_space.ipynb, which aligns with the PR objectives.

docs/conf.py (3)

48-48: LGTM: Configuration updated to use modern Sphinx notebook execution setting.

The change from jupyter_execute_notebooks to nb_execution_mode aligns with current Sphinx-notebook practices.


421-423: LGTM: Mock modules added for documentation build.

The addition of mp_api, mp_api.client, and emmet to MOCK_MODULES is appropriate for handling external dependencies during documentation builds.


157-157: Verify the impact of commenting out static path.

While commenting out html_static_path is valid, ensure that no static assets (CSS, images, etc.) are affected.

✅ Verification successful

Commenting out html_static_path has no impact on the codebase

The verification shows that:

  • No _static directory exists in the repository
  • The only references to static assets in documentation are images in dev_docs/design_docs/StructurePredictor.md, which use a different path (../images/)
  • The only occurrence of static_path is the commented line itself in conf.py

Therefore, commenting out the unused html_static_path configuration is safe and won't affect any static assets.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any static assets that might be affected
# Test: Look for references to static assets in the documentation
rg -g "*.rst" -g "*.md" -g "*.ipynb" "_static"

Length of output: 46


Script:

#!/bin/bash
# Check if _static directory exists and list its contents
fd -t d "_static"
fd -t f -p "_static"

# Check for any static file references in HTML templates or Python files
rg -g "*.html" -g "*.py" "_static"

# Check for any other common static asset patterns in docs
rg -g "*.rst" -g "*.md" -g "*.ipynb" -e "\.\./\S*\.(png|jpg|jpeg|gif|css|js)" -e "\.\./static" -e "\./_static"

# Check sphinx configuration for other static path configurations
rg -g "*.py" "static_path"

Length of output: 474

@@ -112,15 +111,15 @@ def get_species_occurrences_df(

def write(
self,
filename: str | os.PathLike,
filename: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider maintaining support for PathLike objects

The change to restrict the filename parameter to only strings reduces API flexibility. Python's path handling typically accepts both string and PathLike objects for better interoperability.

Apply this diff to restore PathLike support:

-        filename: str,
+        filename: str | os.PathLike,

And update the docstring:

-            filename (str): The filename to write the filtered oxidation states list to.
+            filename (str | os.PathLike): The filename to write the filtered oxidation states list to.

Also applies to: 122-122

Copy link

codecov bot commented Dec 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.44%. Comparing base (691f9d4) to head (9c823f4).
Report is 18 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #337      +/-   ##
==========================================
- Coverage   77.45%   77.44%   -0.01%     
==========================================
  Files          31       31              
  Lines        2590     2589       -1     
==========================================
- Hits         2006     2005       -1     
  Misses        584      584              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AntObi AntObi merged commit 70122e8 into master Dec 1, 2024
13 checks passed
@AntObi AntObi deleted the documentation_fixes branch December 1, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant