Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor media storage for scalability #45

Merged

Conversation

sebastienbarbier
Copy link
Member

@sebastienbarbier sebastienbarbier commented Oct 9, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a structured directory path for storing performance-related files.
    • Enhanced file management capabilities with automatic cleanup of directories upon deletion of performance instances.
    • Integrated AWS S3 for static and media file storage, allowing for larger file uploads.
    • Added a signal to automatically delete user directories when a profile is removed.
  • Bug Fixes

    • Fixed the admin % lighthouse inQueue value.
    • Corrected incident duration calculation.
  • Documentation

    • Updated the CHANGELOG.md to reflect notable changes and bug fixes for version 0.10.0.

@sebastienbarbier sebastienbarbier added this to the v0.10.0 milestone Oct 9, 2024
Copy link

coderabbitai bot commented Oct 9, 2024

Walkthrough

The changes in this pull request include updates to the CHANGELOG.md to document notable fixes in version 0.10.0, modifications to the Django settings in django/core/settings.py for AWS S3 integration, the introduction of a new MediaStorage class in django/core/storage_backends.py, and enhancements to the Performance model in django/performances/models.py by adding a method for directory path generation and improving the deletion process. Additionally, the save_report function in django/performances/api.py has been updated for improved file path handling.

Changes

File Path Change Summary
CHANGELOG.md Updated for version 0.10.0 with bug fixes; modified version 0.9.2 for incident duration fix.
django/core/settings.py Integrated AWS S3 for static/media storage; updated STATIC_URL, MEDIA_URL, and file upload limits.
django/core/storage_backends.py Added MediaStorage class for handling media files in S3.
django/performances/api.py Updated save_report function for file path generation; streamlined filename error handling.
django/performances/models.py Added directory_path method; enhanced delete method for directory cleanup; updated user_directory_path.
django/projects/models.py Added directory_path method to Project class for directory path generation.
django/settings/apps.py Introduced ready method in SettingsConfig class to load signals at application startup.
django/settings/models.py Added directory_path method to Profile class for directory path generation.
django/settings/signals.py Added delete_user_directory function to handle directory cleanup on Profile deletion.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant PerformanceModel

    User->>API: Save Performance Report
    API->>PerformanceModel: Call save_report()
    PerformanceModel->>PerformanceModel: Generate directory path
    PerformanceModel->>PerformanceModel: Store report in directory
    PerformanceModel->>API: Return success
    API->>User: Confirm report saved
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
CHANGELOG.md (2)

26-27: LGTM! Consider adding more details to the bug fix descriptions.

The additions to version 0.10.0 are well-formatted and follow the changelog guidelines. Good job on including the issue numbers for reference.

To improve clarity, consider adding brief explanations of the impact or importance of these fixes. For example:

- Fix admin % lighthouse inQueue value (#42) - Ensures accurate reporting of queued items
- Performance folder is not deleted with model (#44) - Prevents orphaned data and improves storage management

Line range hint 30-34: Enhance consistency and maintain chronological order in version 0.9.2.

While the new entry is relevant, there are a few suggestions to improve consistency:

  1. Consider using the emoji-based category system as seen in other versions. Replace "Improvements" with "🛠 Improvements".
  2. To maintain chronological order, place the new entry at the bottom of the list unless it's intentionally highlighted as the most recent change.
  3. For consistency with other entries, consider adding a brief description of the impact.

Here's a suggested revision:

## [0.9.2] - 2024-05-20
### 🛠 Improvements
- Minor modification to README.md (#34)
- Update dependencies (#36)
- Minor updates on website (#37)
- Fix incident duration calculation (#29) - Ensures accurate reporting of incident timeframes
django/performances/api.py (1)

Line range hint 63-67: Improve error handling in filename and screenshot generation

The current implementation uses bare except clauses, which can mask unexpected errors and make debugging more difficult. Consider improving the error handling in the following ways:

  1. For filename generation:
 try:
     filename = f'{data["audits"]["final-screenshot"]["details"]["timestamp"]}'
-except:
+except KeyError as e:
     filename = f'{timezone.now().timestamp()}'
+    print(f"Warning: Could not find timestamp in audit data. Using current time. Error: {e}")
  1. For screenshot generation:
 try:
     screenshot_as_a_string = data['audits']['final-screenshot']['details']['data']
     screenshot_as_a_string = screenshot_as_a_string.replace('data:image/jpeg;base64,', '')
     screenshot = default_storage.save(f'{path}/{filename}.jpg', ContentFile(base64.b64decode(screenshot_as_a_string)))
-except:
+except (KeyError, base64.binascii.Error) as e:
     screenshot = None
+    print(f"Warning: Failed to generate screenshot. Error: {e}")

These changes will provide more specific error handling and logging, which can be helpful for debugging and monitoring the application's behavior.

Also applies to: 80-85

🧰 Tools
🪛 Ruff

65-65: Do not use bare except

(E722)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6e50aeb and 4fe749d.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • django/performances/api.py (1 hunks)
  • django/performances/apps.py (0 hunks)
  • django/performances/models.py (2 hunks)
💤 Files with no reviewable changes (1)
  • django/performances/apps.py
🧰 Additional context used
🔇 Additional comments (3)
CHANGELOG.md (1)

Line range hint 1-43: Overall, the CHANGELOG.md updates are well-structured and informative.

The changes made to the changelog effectively document the recent updates to the project. The use of issue numbers for reference is particularly helpful.

To further enhance the changelog:

  1. Maintain consistency in category labeling across versions (e.g., using emojis).
  2. Consider adding brief explanations of the impact of each change when possible.
  3. Ensure chronological order within each version's list of changes.

These minor adjustments will make the changelog even more valuable for users and contributors.

django/performances/api.py (2)

Line range hint 1-115: Overall assessment of changes in django/performances/api.py

The changes made to the save_report function, specifically the use of performance.directory_path(), are appropriate and likely improve code maintainability. The core functionality of the function remains intact, and the change is consistent with good coding practices.

To further enhance the code quality, consider implementing the suggested improvements in error handling for both filename and screenshot generation. These changes will make the code more robust and easier to debug.

🧰 Tools
🪛 Ruff

65-65: Do not use bare except

(E722)


62-62: Approve the use of performance.directory_path()

The change to use performance.directory_path() for generating the file path is a good improvement. It encapsulates the path generation logic within the Performance model, which can lead to better maintainability and consistency across the codebase.

To ensure the correctness of this change, please verify the implementation of the directory_path() method in the Performance model. Run the following script to check its implementation:

Additionally, check if this method is used consistently across the codebase:

✅ Verification successful

Verified the implementation and usage of directory_path()

The directory_path() method is properly implemented in the Performance model within django/performances/models.py and is consistently used across the codebase, including in django/performances/api.py. No issues were found related to this change.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of the directory_path() method in the Performance model

# Test: Search for the directory_path method in the Performance model
ast-grep --lang python --pattern 'class Performance($$$):
  $$$
  def directory_path(self):
    $$$'

Length of output: 3879


Script:

#!/bin/bash
# Description: Check for consistent usage of directory_path() method

# Test: Search for other occurrences of directory_path() method
rg --type python 'directory_path\(\)'

Length of output: 70


Script:

#!/bin/bash
# Description: Retry searching for occurrences of directory_path() method without specifying file type

rg 'directory_path\(\)'

Length of output: 379


Script:

#!/bin/bash
# Description: Alternative approach to find usage of directory_path() method by searching Python files

# Find all Python files and search for directory_path()
fd --extension py | xargs rg 'directory_path\(\)'

Length of output: 405

django/performances/models.py Show resolved Hide resolved
django/performances/models.py Outdated Show resolved Hide resolved
django/performances/models.py Outdated Show resolved Hide resolved
@sebastienbarbier sebastienbarbier changed the title Fix folder delete on Performance delete Refactor media storage for scalability #44 Oct 9, 2024
@sebastienbarbier sebastienbarbier changed the title Refactor media storage for scalability #44 Refactor media storage for scalability Oct 9, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
django/performances/models.py (1)

5-6: Remove unused import

The slugify function is imported but not used in the visible code. Consider removing this unused import to keep the codebase clean.

Apply this change:

from django.core.files.storage import default_storage
-from django.template.defaultfilters import slugify
🧰 Tools
🪛 Ruff

6-6: django.template.defaultfilters.slugify imported but unused

Remove unused import: django.template.defaultfilters.slugify

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4fe749d and 395307c.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • django/core/settings.py (1 hunks)
  • django/core/storage_backends.py (1 hunks)
  • django/performances/models.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md
🧰 Additional context used
🪛 Ruff
django/performances/models.py

6-6: django.template.defaultfilters.slugify imported but unused

Remove unused import: django.template.defaultfilters.slugify

(F401)

🔇 Additional comments (4)
django/performances/models.py (3)

50-54: Approved: Improved directory path generation

The directory_path method provides a consistent and secure way to generate directory paths for performances. By using primary keys instead of potentially sensitive information, it addresses the previous concern about PII exposure in file paths.


69-69: Approved: Consistent use of directory path

The update to user_directory_path function ensures consistency in directory path generation by utilizing the directory_path method of the Performance model. This change aligns with the improved approach to handle file paths securely.


55-64: ⚠️ Potential issue

Improve exception handling and adjust deletion order

While the current implementation includes exception handling, there are a few areas for improvement:

  1. The order of operations: Delete the directory before deleting the model instance to prevent orphaned files if directory deletion fails.
  2. Use proper logging instead of print statements for error handling.

Consider applying these changes:

+import logging

 def delete(self):
+    logger = logging.getLogger(__name__)
+    dir_path = self.directory_path()
+    if default_storage.exists(dir_path):
+        try:
+            # Deletes the folder and all its contents
+            default_storage.delete(dir_path)
+        except Exception as e:
+            logger.error(f"Error deleting folder: {e}")
+            # Optionally, re-raise the exception if you want to prevent deletion
+            # raise e
     # Delete the model instance after files are deleted
     super().delete()
-    if default_storage.exists(self.directory_path()):
-        try:
-            # Deletes the folder and all its contents
-            default_storage.delete(self.directory_path())
-        except Exception as e:
-            print(f"Error deleting folder: {e}")

This ensures that if file deletion fails, the model instance remains in the database, allowing for retrying the deletion or handling the error appropriately.

django/core/settings.py (1)

98-100: Verify the necessity of STATIC_ROOT when using S3 for static files

When using Amazon S3 as the storage backend for static files (STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'), the collectstatic command uploads files directly to S3. Setting STATIC_ROOT may not be necessary because static files are not collected into a local directory. Please confirm if STATIC_ROOT = os.path.join(BASE_DIR, 'collectstatic') is required in this context.

django/core/storage_backends.py Show resolved Hide resolved
django/core/settings.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 395307c and ec56b7e.

📒 Files selected for processing (2)
  • django/core/settings.py (1 hunks)
  • django/performances/models.py (2 hunks)
🧰 Additional context used
🪛 Ruff
django/performances/models.py

6-6: django.template.defaultfilters.slugify imported but unused

Remove unused import: django.template.defaultfilters.slugify

(F401)

🔇 Additional comments (4)
django/performances/models.py (2)

69-69: LGTM: Improved user_directory_path function

The updated user_directory_path function looks good. It correctly utilizes the new directory_path method of the performance instance, resulting in a more consistent and maintainable directory structure for uploaded files.


55-63: ⚠️ Potential issue

Improve robustness and error handling in delete method

The current implementation of the delete method has some potential issues:

  1. The order of operations could lead to orphaned files if directory deletion fails.
  2. Using print for error handling is not suitable for production code.
  3. The method might not work correctly with all storage backends.

Here's a suggested implementation that addresses these issues:

import logging

def delete(self):
    logger = logging.getLogger(__name__)
    dir_path = self.directory_path()
    if default_storage.exists(dir_path):
        try:
            # List all files and directories within the directory
            dirs, files = default_storage.listdir(dir_path)
            # Delete all files
            for file in files:
                file_path = os.path.join(dir_path, file)
                default_storage.delete(file_path)
            # Delete all subdirectories
            for subdir in dirs:
                subdir_path = os.path.join(dir_path, subdir)
                default_storage.delete(subdir_path)
            # Delete the now-empty directory
            default_storage.delete(dir_path)
        except Exception as e:
            logger.error(f"Error deleting directory {dir_path}: {e}")
            raise  # Re-raise the exception to prevent deletion of the model instance
    # Delete the model instance after files are deleted
    super().delete()

This implementation:

  • Attempts to delete the directory and its contents before deleting the model instance.
  • Uses proper logging for error handling.
  • Implements a more thorough directory deletion process that should work with various storage backends.
  • Raises the exception if directory deletion fails, preventing the model instance from being deleted and potentially leaving orphaned files.

To ensure this implementation works correctly with your storage backend, you may want to test it thoroughly, especially if you're using a cloud storage service like S3.

django/core/settings.py (2)

98-98: Verify that STATIC_ROOT change to 'collectstatic' is appropriate

Changing STATIC_ROOT to os.path.join(BASE_DIR, 'collectstatic') may affect your deployment setup, especially if your deployment scripts or services expect static files in the previous directory (staticfiles). Please ensure that this change aligns with your deployment processes, and update any related configurations if necessary.


105-106: Validation check for AWS_S3_CUSTOM_DOMAIN added successfully

Adding the validation check for AWS_S3_CUSTOM_DOMAIN ensures that the application will alert you if the custom domain is not set when using S3 storage, preventing potential runtime errors due to misconfiguration.

django/performances/models.py Outdated Show resolved Hide resolved
django/performances/models.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
django/settings/models.py (1)

14-15: LGTM! Consider adding a docstring for clarity.

The directory_path method is a good addition to the Profile model. It provides a clean way to generate a unique directory path for each user, which can be useful for organizing user-specific files or data.

Consider adding a brief docstring to explain the purpose and return value of the method. For example:

def directory_path(self):
    """
    Generate a unique directory path for the user.
    
    Returns:
        str: A string in the format 'user_{user_id}'.
    """
    return f'user_{self.user.pk}'

This will improve code readability and make it easier for other developers to understand the method's purpose at a glance.

django/settings/signals.py (1)

1-7: Remove unused import

The django.conf.settings import on line 3 is not used in this file. Consider removing it to keep the imports clean and relevant.

Apply this diff to remove the unused import:

-from django.conf import settings
🧰 Tools
🪛 Ruff

3-3: django.conf.settings imported but unused

Remove unused import: django.conf.settings

(F401)

django/projects/models.py (1)

122-123: Approve the new directory_path method with suggestions for improvement.

The new directory_path method is a good addition to the Project model, providing a consistent way to generate unique directory paths for projects. However, consider the following suggestions to improve robustness:

  1. Add input validation to ensure self.user and self.pk are not None.
  2. Consider using os.path.join for better cross-platform compatibility.

Here's an improved version:

import os

def directory_path(self):
    if self.user_id is None or self.pk is None:
        raise ValueError("User or Project primary key is missing")
    return os.path.join(f'user_{self.user_id}', f'prjct_{self.pk}')

This version handles potential errors and uses os.path.join for better portability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ec56b7e and 4d604ec.

📒 Files selected for processing (5)
  • django/performances/models.py (2 hunks)
  • django/projects/models.py (1 hunks)
  • django/settings/apps.py (1 hunks)
  • django/settings/models.py (1 hunks)
  • django/settings/signals.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • django/performances/models.py
🧰 Additional context used
🪛 Ruff
django/settings/apps.py

8-8: settings.signals imported but unused

Remove unused import: settings.signals

(F401)

django/settings/signals.py

3-3: django.conf.settings imported but unused

Remove unused import: django.conf.settings

(F401)

🔇 Additional comments (3)
django/settings/apps.py (1)

7-8: LGTM! The ready method is correctly implemented.

The addition of the ready method in the SettingsConfig class is a good practice in Django for loading and registering signals when the application starts. This ensures that the signal handlers in settings.signals are properly set up.

Regarding the static analysis warning about the unused import:

  • This is a false positive. In Django, importing modules in the ready method is often done for their side effects (like registering signal handlers) rather than for direct usage.
  • The comment "# Load signals" provides clear context for why this import is necessary.

No changes are needed. The implementation is correct and follows Django best practices.

🧰 Tools
🪛 Ruff

8-8: settings.signals imported but unused

Remove unused import: settings.signals

(F401)

django/settings/signals.py (1)

9-10: LGTM: Signal receiver implementation

The @receiver decorator is correctly used to connect the delete_user_directory function to the post_delete signal of the Profile model. This ensures that user directories are cleaned up when a profile is deleted.

django/projects/models.py (1)

122-123: Verify the usage of the new directory_path method across the codebase.

The new directory_path method seems to be intended for use in other parts of the application, likely for file storage or organization. To ensure it's being used correctly and consistently, please verify its usage across the codebase.

Run the following script to check for usage of the directory_path method:

This will help identify where the method is being used and if there are any places where similar path construction is happening manually, which could be replaced with this new method.

✅ Verification successful

Usage of the directory_path method is verified across the codebase.

  • Found usages in:

    • django/performances/models.py
    • django/performances/api.py
    • django/settings/signals.py
  • No instances of manual path constructions were detected.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for usage of the directory_path method in the Project model

# Test: Search for method calls to directory_path on Project instances
rg --type python -A 5 $'\.directory_path\('

# Test: Search for string patterns that might be manually constructing similar paths
rg --type python -A 5 $'user_\d+/prjct_\d+'

Length of output: 151


Script:

#!/bin/bash
# Description: Recheck for usage of the directory_path method in the Project model using file extension

# Test: Search for method calls to directory_path on Project instances within .py files
rg '\.directory_path\(' --glob '*.py' -A 5

# Test: Search for string patterns that might be manually constructing similar paths within .py files
rg 'user_\d+/prjct_\d+' --glob '*.py' -A 5

Length of output: 2331

django/settings/models.py Show resolved Hide resolved
django/settings/signals.py Outdated Show resolved Hide resolved
@sebastienbarbier sebastienbarbier merged commit 03be925 into develop Oct 9, 2024
2 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant