Skip to content

Commit

Permalink
Merge branch 'main' into docsum_gt_merge
Browse files Browse the repository at this point in the history
  • Loading branch information
MSCetin37 authored Nov 15, 2024
2 parents 1b8e1b1 + 405a632 commit 0155165
Show file tree
Hide file tree
Showing 10 changed files with 242 additions and 107 deletions.
4 changes: 2 additions & 2 deletions comps/cores/mega/gateway.py
Original file line number Diff line number Diff line change
Expand Up @@ -1100,13 +1100,13 @@ def parser_input(data, TypeClass, key):
if isinstance(response, StreamingResponse):
return response
last_node = runtime_graph.all_leaves()[-1]
response = result_dict[last_node]["text"]
response_content = result_dict[last_node]["choices"][0]["message"]["content"]
choices = []
usage = UsageInfo()
choices.append(
ChatCompletionResponseChoice(
index=0,
message=ChatMessage(role="assistant", content=response),
message=ChatMessage(role="assistant", content=response_content),
finish_reason="stop",
)
)
Expand Down
5 changes: 3 additions & 2 deletions comps/dataprep/multimodal/redis/langchain/multimodal_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ def extract_frames_and_generate_captions(
# Set up location to store frames and annotations
os.makedirs(output_dir, exist_ok=True)
os.makedirs(os.path.join(output_dir, "frames"), exist_ok=True)
is_video = os.path.splitext(video_path)[-1] == ".mp4"

# Load video and get fps
vidcap = cv2.VideoCapture(video_path)
Expand Down Expand Up @@ -294,8 +295,8 @@ def extract_frames_and_generate_captions(
"video_name": os.path.basename(video_path),
"b64_img_str": b64_img_str,
"caption": text,
"time": mid_time_ms,
"frame_no": frame_no,
"time": mid_time_ms if is_video else 0.0,
"frame_no": frame_no if is_video else 0,
"sub_video_id": idx,
}
)
Expand Down
14 changes: 14 additions & 0 deletions comps/dataprep/neo4j/llama_index/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Dataprep Microservice with Neo4J

This Dataprep microservice performs:

- Graph extraction (entities, relationships and descripttions) using LLM
- Performs hierarchical_leiden clustering to identify communities in the knowledge graph
- Generates a community symmary for each community
- Stores all of the above in Neo4j Graph DB

This microservice follows the graphRAG approached defined by Microsoft paper ["From Local to Global: A Graph RAG Approach to Query-Focused Summarization"](https://www.microsoft.com/en-us/research/publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization/) with some differences such as: 1) only level zero cluster summaries are leveraged, 2) The input context to the final answer generation is trimmed to fit maximum context length.

This dataprep microservice ingests the input files and uses LLM (TGI or OpenAI model when OPENAI_API_KEY is set) to extract entities, relationships and descriptions of those to build a graph-based text index.

## Setup Environment Variables
Expand Down Expand Up @@ -78,6 +87,11 @@ curl -X POST \
http://${host_ip}:6004/v1/dataprep
```

Please note that clustering of extracted entities and summarization happens in this data preparation step. The result of this is:

- Large processing time for large dataset. An LLM call is done to summarize each cluster which may result in large volume of LLM calls
- Need to clean graph GB entity_info and Cluster if dataprep is run multiple times since the resulting cluster numbering will differ between consecutive calls and will corrupt the results.

We support table extraction from pdf documents. You can specify process_table and table_strategy by the following commands. "table_strategy" refers to the strategies to understand tables for table retrieval. As the setting progresses from "fast" to "hq" to "llm," the focus shifts towards deeper table understanding at the expense of processing speed. The default strategy is "fast".

Note: If you specify "table_strategy=llm" TGI service will be used.
Expand Down
Loading

0 comments on commit 0155165

Please sign in to comment.