Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs update #107

Merged
merged 4 commits into from
Jun 9, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 37 additions & 3 deletions cognee/api/v1/topology/add_topology.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,43 @@
import pandas as pd
from pydantic import BaseModel

from typing import List, Dict, Any, Union, Optional
from cognee.infrastructure.databases.graph.get_graph_client import get_graph_client
from cognee.modules.topology.topology import TopologyEngine, GitHubRepositoryModel
from cognee.infrastructure.databases.graph.config import get_graph_config

import os
import pandas as pd
import json
from pydantic import BaseModel, Field
from typing import Dict, List, Optional, Union, Type, Any
from cognee.infrastructure.databases.graph.get_graph_client import get_graph_client




class Relationship(BaseModel):
type: str = Field(..., description="The type of relationship, e.g., 'belongs_to'.")
source: Optional[str] = Field(None, description="The identifier of the source id of in the relationship being a directory or subdirectory")
target: Optional[str] = Field(None, description="The identifier of the target id in the relationship being the directory, subdirectory or file")
properties: Optional[Dict[str, Any]] = Field(None, description="A dictionary of additional properties and values related to the relationship.")

class JSONEntity(BaseModel):
name: str
set_type_as: Optional[str] = None
property_columns: List[str]
description: Optional[str] = None
Comment on lines +25 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding validation for optional fields in JSONEntity to ensure data integrity.


class JSONPattern(BaseModel):
head: str
relation: str
tail: str
description: Optional[str] = None
Comment on lines +31 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to JSONEntity, consider adding validation for optional fields in JSONPattern.


class JSONModel(BaseModel):
node_id: str
entities: List[JSONEntity]
patterns: List[JSONPattern]
USER_ID = "default_user"

async def add_topology(directory: str = "example", model: BaseModel = GitHubRepositoryModel) -> Any:
Expand Down Expand Up @@ -44,11 +77,12 @@ def flatten_repository(repo_model: BaseModel) -> List[Dict[str, Any]]:
""" Flatten the entire repository model, starting with the top-level model """
return recursive_flatten(repo_model)

flt_topology = flatten_repository(topology)
async def add_graph_topology():

flt_topology = flatten_repository(topology)

df = pd.DataFrame(flt_topology)
df = pd.DataFrame(flt_topology)
Comment on lines +80 to +84
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_graph_topology function is complex. Consider refactoring to improve readability and maintainability.

Tools
Ruff

84-84: Local variable df is assigned to but never used (F841)


print(df.head(10))

for _, row in df.iterrows():
node_data = row.to_dict()
Expand Down
12 changes: 12 additions & 0 deletions cognee/infrastructure/data/chunking/LangchainChunkingEngine.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ def chunk_data(

if chunk_strategy == ChunkStrategy.CODE:
chunked_data = LangchainChunkEngine.chunk_data_by_code(source_data,chunk_size, chunk_overlap)

elif chunk_strategy == ChunkStrategy.LANGCHAIN_CHARACTER:
chunked_data = LangchainChunkEngine.chunk_data_by_character(source_data,chunk_size, chunk_overlap)
else:
chunked_data = DefaultChunkEngine.chunk_data_by_paragraph(source_data,chunk_size, chunk_overlap)
return chunked_data
Expand All @@ -50,3 +53,12 @@ def chunk_data_by_code(data_chunks, chunk_size, chunk_overlap, language=None):

return only_content

def chunk_data_by_character(self, data_chunks, chunk_size, chunk_overlap):
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size, chunk_overlap)
data = splitter.split(data_chunks)

only_content = [chunk.page_content for chunk in data]

return only_content
Comment on lines +56 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well implemented method for character-based chunking. Consider adding unit tests to ensure its functionality.

Would you like me to help with writing the unit tests for this method?


1 change: 1 addition & 0 deletions cognee/shared/data_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ class ChunkStrategy(Enum):
PARAGRAPH = "paragraph"
SENTENCE = "sentence"
CODE = "code"
LANGCHAIN_CHARACTER = "langchain_character"

class MemorySummary(BaseModel):
""" Memory summary. """
Expand Down
Empty file.
1 change: 1 addition & 0 deletions docs/research.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ The page is dedicated to collecting all research that was collected in the past
This is not an exhaustive list, and any PRs would be welcome

### Research Papers
- [2024/06/04] [Transformers and episodic memory](https://arxiv.org/abs/2405.14992)
- [2024/03/24] [Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs](https://arxiv.org/abs/2404.07103)
- [2024/03/24] [Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention](https://arxiv.org/abs/2404.07143)
- [2024/03/24] [Compound AI systems](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/)
Expand Down
Loading