-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add chromadb store #86
Conversation
yewentao256
commented
Jul 17, 2024
- chromadb support
- only roots have metadata
similarity_func, mode, descend = self.registered_similarity[similarity_name] | ||
|
||
if mode == "embedding": | ||
assert self.embed, "Chosen similarity needs embed model." | ||
assert len(query) > 0, "Query should not be empty." | ||
query_embedding = ast.literal_eval(self.embed(query)) | ||
query_embedding = self.embed(query) | ||
for node in nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有一个异步或并行化的处理么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
异步或者并行我理解是性能方面的提升了,需要的话可以加一个,不过想着不影响功能就先没加和测试
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我这边记下了,P0做完后还有空间的话就加上
|
||
@metadata.setter | ||
def metadata(self, metadata: Dict) -> None: | ||
self._metadata = metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果非root节点,设置了metadata,要怎么读出来呀;下_excluded_embed_metadata_keys同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前设计下,非root节点我们不会设置metadata
for group in node_groups | ||
} | ||
self.placeholder_length = len(embed("a")) | ||
self.try_load_store() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现有的逻辑是用到谁读取谁,新的逻辑是一次性读取到内层中,我思考一下哪个更合适
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还有一个问题,既然这个函数定义在了基类作为抽象方法,那理应基类去调用,而不是子类去用;但子类不初始化又无法完成load,所以看看这里怎么设计合适
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用谁读取谁的话,那一层的父子节点都没处理好,作为metadata str存储在node里,数据结构会出现不一致性,导致很多地方都会有问题。一次性读取O(N)处理好,性能上我理解也不会差很多(或者说性能优化主要点不在这里)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“还有一个问题,既然这个函数定义在了基类作为抽象方法,那理应基类去调用,而不是子类去用;但子类不初始化又无法完成load,所以看看这里怎么设计合适” 已讨论,暂不变更
lazyllm/tools/rag/store.py
Outdated
if node.is_saved: | ||
continue | ||
if not node.has_embedding(): | ||
node.embedding = [-1] * self.placeholder_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-1] * self.placeholder_length直接在init的时候存好,不重复计算?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以的,已优化