Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add chromadb store #86

Merged
merged 16 commits into from
Jul 23, 2024
Merged

add chromadb store #86

merged 16 commits into from
Jul 23, 2024

Conversation

yewentao256
Copy link
Collaborator

  • chromadb support
  • only roots have metadata

lazyllm/tools/rag/doc_impl.py Show resolved Hide resolved
similarity_func, mode, descend = self.registered_similarity[similarity_name]

if mode == "embedding":
assert self.embed, "Chosen similarity needs embed model."
assert len(query) > 0, "Query should not be empty."
query_embedding = ast.literal_eval(self.embed(query))
query_embedding = self.embed(query)
for node in nodes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没有一个异步或并行化的处理么

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

异步或者并行我理解是性能方面的提升了,需要的话可以加一个,不过想着不影响功能就先没加和测试

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我这边记下了,P0做完后还有空间的话就加上


@metadata.setter
def metadata(self, metadata: Dict) -> None:
self._metadata = metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果非root节点,设置了metadata,要怎么读出来呀;下_excluded_embed_metadata_keys同

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前设计下,非root节点我们不会设置metadata

for group in node_groups
}
self.placeholder_length = len(embed("a"))
self.try_load_store()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现有的逻辑是用到谁读取谁,新的逻辑是一次性读取到内层中,我思考一下哪个更合适

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有一个问题,既然这个函数定义在了基类作为抽象方法,那理应基类去调用,而不是子类去用;但子类不初始化又无法完成load,所以看看这里怎么设计合适

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用谁读取谁的话,那一层的父子节点都没处理好,作为metadata str存储在node里,数据结构会出现不一致性,导致很多地方都会有问题。一次性读取O(N)处理好,性能上我理解也不会差很多(或者说性能优化主要点不在这里)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“还有一个问题,既然这个函数定义在了基类作为抽象方法,那理应基类去调用,而不是子类去用;但子类不初始化又无法完成load,所以看看这里怎么设计合适” 已讨论,暂不变更

if node.is_saved:
continue
if not node.has_embedding():
node.embedding = [-1] * self.placeholder_length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-1] * self.placeholder_length直接在init的时候存好,不重复计算?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以的,已优化

@wzh1994 wzh1994 changed the title Ywt/feature chromadb add chromadb store Jul 23, 2024
@wzh1994 wzh1994 merged commit 1efc667 into LazyAGI:main Jul 23, 2024
7 checks passed
@wzh1994 wzh1994 mentioned this pull request Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants