-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COG-989 feat: make tasks a configurable argument in the cognify function #442
Conversation
WalkthroughThe changes modify the Changes
Possibly related issues
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (18)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
cognee/api/v1/cognify/cognify_v2.py (4)
39-39
: UseList
fromtyping
module for better type safety.Consider using
List[Task]
instead oflist[Task]
for better compatibility with older Python versions and static type checkers.- tasks: list[Task] = None, + tasks: List[Task] = None,
71-71
: Track the UI lock TODO comment.There's an unaddressed TODO comment about adding a UI lock to prevent multiple backend requests. This should be tracked for future implementation.
Would you like me to create a GitHub issue to track this TODO item?
101-106
: Enhance task validation with more descriptive error messages.The validation logic could be improved with more descriptive error messages and moved to a separate function for reusability.
Consider refactoring to:
+def validate_tasks(tasks: List[Task]) -> None: + """Validate that tasks is a list of Task instances.""" + if not isinstance(tasks, list): + raise ValueError("Expected tasks to be a list, got {type(tasks)}") + + for idx, task in enumerate(tasks): + if not isinstance(task, Task): + raise ValueError( + f"Task at index {idx} is not a Task instance: {type(task)}" + ) async def run_cognify_pipeline(dataset: Dataset, user: User, tasks: list[Task]): # ... - if not isinstance(tasks, list): - raise ValueError("Tasks must be a list") - - for task in tasks: - if not isinstance(task, Task): - raise ValueError(f"Task {task} is not an instance of Task") + validate_tasks(tasks)
143-167
: Add docstring and consider task configuration validation.The function would benefit from documentation explaining the default task sequence and its purpose. Also, consider validating task configurations.
Add a docstring and enhance error handling:
async def get_default_tasks( user: User = None, graph_model: BaseModel = KnowledgeGraph ) -> list[Task]: + """Generate the default sequence of tasks for the cognify pipeline. + + The default sequence includes: + 1. Document classification + 2. Permission checking + 3. Chunk extraction + 4. Graph extraction + 5. Text summarization + + Args: + user: The user context for permission checking. Defaults to default user. + graph_model: The graph model to use. Defaults to KnowledgeGraph. + + Returns: + List[Task]: The default sequence of tasks. + + Raises: + ConfigError: If cognify configuration is invalid + ValueError: If task configuration is invalid + """ if user is None: user = await get_default_user() try: cognee_config = get_cognify_config() + # Validate configuration + if not cognee_config.summarization_model: + raise ValueError("Summarization model not configured") + default_tasks = [ Task(classify_documents), Task(check_permissions_on_documents, user=user, permissions=["write"]), Task(extract_chunks_from_documents), Task( extract_graph_from_data, graph_model=graph_model, task_config={"batch_size": 10} ), Task( summarize_text, summarization_model=cognee_config.summarization_model, task_config={"batch_size": 10}, ), ] - except Exception as error: + except (ConfigError, ValueError) as error: send_telemetry("cognee.cognify DEFAULT TASKS CREATION ERRORED", user.id) raise error return default_tasks
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/api/v1/cognify/cognify_v2.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: docker-compose-test
🔇 Additional comments (2)
cognee/api/v1/cognify/cognify_v2.py (2)
59-61
: LGTM! Good handling of default tasks.The code properly handles the case when tasks are not provided by fetching default tasks.
39-39
: Verify all callers are updated for the signature changes.The signatures of both
cognify
andrun_cognify_pipeline
have changed. Let's verify all callers are updated.Also applies to: 71-71
✅ Verification successful
All callers are compatible with the signature changes ✅
The
tasks
parameter is optional with a default value ofNone
, maintaining backward compatibility with all existing callers. No updates to the calling code are required.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for direct calls to these functions echo "Searching for cognify function calls..." rg -A 2 "cognify\(" --type py echo "Searching for run_cognify_pipeline function calls..." rg -A 2 "run_cognify_pipeline\(" --type pyLength of output: 4347
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9573981 | Triggered | Generic Password | de85dfa | notebooks/cognee_graphiti_demo.ipynb | View secret |
8719688 | Triggered | Generic Password | de85dfa | notebooks/cognee_graphiti_demo.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Summary by CodeRabbit
New Features
cognify
function with optional custom task list support.Improvements