Merge pull request #421 from magicyuan876/main

优化embedding相似度缓存机制，增加 LLM 相似性检查功能并优化缓存机制
HKUDS · Dec 9, 2024 · 2644a29 · 2644a29
2 parents 67c4acb + 779ed60
commit 2644a29
Show file tree

Hide file tree

Showing 6 changed files with 205 additions and 276 deletions.
diff --git a/README.md b/README.md
@@ -596,11 +596,7 @@ if __name__ == "__main__":
 | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
 | **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese"}`: sets example limit and output language | `example_number: all examples, language: English` |
 | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
-| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains two parameters:
-- `enabled`: Boolean value to enable/disable caching functionality. When enabled, questions and answers will be cached.
-- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.
-
-Default: `{"enabled": False, "similarity_threshold": 0.95}` | `{"enabled": False, "similarity_threshold": 0.95}` |
+| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
 
 ## API Server Implementation
 

diff --git a/lightrag/lightrag.py b/lightrag/lightrag.py
@@ -87,7 +87,11 @@ class LightRAG:
     )
     # Default not to use embedding cache
     embedding_cache_config: dict = field(
-        default_factory=lambda: {"enabled": False, "similarity_threshold": 0.95}
+        default_factory=lambda: {
+            "enabled": False,
+            "similarity_threshold": 0.95,
+            "use_llm_check": False,
+        }
     )
     kv_storage: str = field(default="JsonKVStorage")
     vector_storage: str = field(default="NanoVectorDBStorage")
@@ -174,7 +178,6 @@ def __post_init__(self):
             if self.enable_llm_cache
             else None
         )
-
         self.embedding_func = limit_async_func_call(self.embedding_func_max_async)(
             self.embedding_func
         )
@@ -481,6 +484,7 @@ async def aquery(self, query: str, param: QueryParam = QueryParam()):
                 self.text_chunks,
                 param,
                 asdict(self),
+                hashing_kv=self.llm_response_cache,
             )
         elif param.mode == "naive":
             response = await naive_query(
@@ -489,6 +493,7 @@ async def aquery(self, query: str, param: QueryParam = QueryParam()):
                 self.text_chunks,
                 param,
                 asdict(self),
+                hashing_kv=self.llm_response_cache,
             )
         else:
             raise ValueError(f"Unknown mode {param.mode}")