Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] 整合类似chatpdf的功能 #960

Closed
wwy94621 opened this issue Apr 21, 2023 · 33 comments
Closed

[Feature] 整合类似chatpdf的功能 #960

wwy94621 opened this issue Apr 21, 2023 · 33 comments

Comments

@wwy94621
Copy link

能不能整合类似chatpdf的功能?比如新建聊天时可以选中文件,然后基于文件开始聊天。
OpenAI的GitHub上有基本的实现,但是UI太差了。

@wwy94621 wwy94621 changed the title [Feature] [Feature] 整合类似chatpdf的功能 Apr 21, 2023
@Yidadaa
Copy link
Collaborator

Yidadaa commented Apr 21, 2023

这种功能需要建立向量索引,在纯前端比较难搞,你看到的开源项目都是把文件传到服务器做处理。

不过我在研究怎么在浏览器里跑 embedding 模型,之后会加此功能。

@wwy94621
Copy link
Author

那个项目是用纯next.js做的,没有连矢量数据库。自己部署肯定是没问题的,难道Vercel不行吗?我还以为可以比较容易的整合进来呢!

@yaoleifly
Copy link

这个功能实现就太感谢开发者了

@Yidadaa
Copy link
Collaborator

Yidadaa commented Apr 21, 2023

@wwy94621
Copy link
Author

嗯嗯!确实,不过反正是用户提供自己的Key嘛!这样可以比较快的来实现,多谢大神!

@jun0315
Copy link

jun0315 commented Apr 21, 2023

https://github.com/talkingwallace/ChatGPT-Paper-Reader
https://github.com/binary-husky/chatgpt_academic
https://github.com/mukulpatnaik/researchgpt
一些类似可以借鉴的项目

@reonokiy
Copy link

reonokiy commented Apr 22, 2023

LangChain 有 Python 和 JS 的版本,可以和很多信息源集成
https://github.com/hwchase17/langchain
https://github.com/hwchase17/langchainjs
可以在Browser运行 https://js.langchain.com/docs/getting-started/install#browser

@Yidadaa
Copy link
Collaborator

Yidadaa commented Apr 22, 2023

@sperjar 不会使用 langchain,这个库过于重了。

我重申一遍,这个功能实现起来并不难,只需要解析 pdf 内容,然后调 openai 接口进行向量化,然后再去做检索就行了。

这个功能现在没做的原因是优先级比较低,我正筹备 2.0 版本的开发,v2.0 的重磅功能是预设角色,chatpdf 的功能会归到外挂知识库的需求里去做,可能是 v2.5,也可能是 v3.0,可以确认的是近期不会实现该功能。

@RudRho
Copy link

RudRho commented Apr 27, 2023

@sperjar 不会使用 langchain,这个库过于重了。

我重申一遍,这个功能实现起来并不难,只需要解析 pdf 内容,然后调 openai 接口进行向量化,然后再去做检索就行了。

这个功能现在没做的原因是优先级比较低,我正筹备 2.0 版本的开发,v2.0 的重磅功能是预设角色,chatpdf 的功能会归到外挂知识库的需求里去做,可能是 v2.5,也可能是 v3.0,可以确认的是近期不会实现该功能。

感谢大神,@Yidadaa,一些想法:

  • llama-index 13k stars 可以看一下,比langchain 轻很多,支持本地化 embedding.
  • 「重磅功能是预设角色」 现在各家都在做角色预设,但是我看不清角色预设之后的好处,能不能简单讲一下想法。感谢!

@Yidadaa
Copy link
Collaborator

Yidadaa commented Apr 27, 2023

预设角色的用处: #138

https://www.allabtai.com/prompt-engineering-tips-zero-one-and-few-shot-prompting/

别人的预设角色只不过是预设一个 prompt,你可以列几个竞品,应该功能都没我的好。

@JiangYain
Copy link

@Yidadaa 老师您好,以下为羊驼索引Llamindex的参考链接请您参阅:https://gpt-index.readthedocs.io/en/latest/index.html;我目前已经尝试使用Llamaindex0.6.9构建了一个侧边栏插件(不过只能在谷歌114Beta上运行side panel,且基于本地)和您的项目(最重要的是mask功能)一起配合使用,由于羊驼索引有太多的index方式,比如关键词、树索引、向量索引等等,且目前index还可以进行嵌套等等,除了index也有很多需要深度开发的部分,所以在我认为这个项目目前如果只是使用会很简单,但是想要使用的好会很有难度,我支持你的想法:即“chatpdf 的功能会归到外挂知识库的需求里去做,可能是 v2.5,也可能是 v3.0,可以确认的是近期不会实现该功能”,这个项目现在很活跃基本几天就是一个更新,在给Llamaindex一点时间,让子弹飞一会

希望以上链接能给你一些帮助,至于有人偷盗公众号文章私自转载这件事,希望老师您不要放在心上,如果需要额外的经济支持我愿意尽一些微薄之力!祝你开心

@pptt121212
Copy link

pptt121212 commented Jun 14, 2023

这种功能需要建立向量索引,在纯前端比较困难的情况下,你看到的源项目都是把文件传到服务器做处理。

不过我在研究怎么在浏览器里跑嵌入模型,之后会增加这个功能。

PDF文本总结应该是将PDF分段总结后在内存里临时存放,最后输出最终总结结果。和向量检索PDF里的段落应该是两个方向的方案。

@Yidadaa
Copy link
Collaborator

Yidadaa commented Jun 17, 2023

技术选型:

@Yidadaa
Copy link
Collaborator

Yidadaa commented Jun 17, 2023

此功能将于 v2.9 版本加入。

@daiaji
Copy link

daiaji commented Jun 18, 2023

这种功能需要建立向量索引,在纯前端比较难搞,你看到的开源项目都是把文件传到服务器做处理。

不过我在研究怎么在浏览器里跑 embedding 模型,之后会加此功能。

用js跑embedding模型?纯前端听起来很酷。

@daiaji
Copy link

daiaji commented Jun 21, 2023

此功能将于 v2.9 版本加入。

结合现有的历史摘要功能,是否可以实现把每一个生成的历史摘要向量化到向量数据库里,然后实现GPT对于整个事件的长期记忆,而不是只局限于上下文和近期的历史摘要?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


This function will be added in v2.9 version.

Combined with the existing historical summary function, is it possible to vectorize each generated historical summary into a vector database, and then realize GPT's long-term memory for the entire event, instead of being limited to context and recent historical summaries?

@daiaji
Copy link

daiaji commented Jun 21, 2023

此功能将于 v2.9 版本加入。

结合现有的历史摘要功能,是否可以实现把每一个生成的历史摘要向量化到向量数据库里,然后实现GPT对于整个事件的长期记忆,而不是只局限于上下文和近期的历史摘要?

从这个实践的最后示例来看,似乎是可行的。
https://github.com/TomLBZ/koishi-plugin-openai

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


This function will be added in v2.9 version.

Combined with the existing historical summary function, is it possible to vectorize each generated historical summary into a vector database, and then realize GPT's long-term memory for the entire event, instead of being limited to context and recent historical summaries?

From this last example of practice, the market works.
https://github.com/TomLBZ/koishi-plugin-openai

@alanwu4321
Copy link

bump

@vual

This comment was marked as abuse.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Hello teacher @Yidadaa, please refer to the following reference link for the alpaca index Llamindex: [https://gpt-index.readthedocs.io/en/latest/index.html; I have tried to use Llamindex0.6.9 to build Added a sidebar plugin (but only works on Google 114Beta side](https://gpt-index.readthedocs.io/en/latest/index.html%EF%BC%9B%E6%88%91% E7%9B%AE%E5%89%8D%E5%B7%B2%E7%BB%8F%E5%B0%9D%E8%AF%95%E4%BD%BF%E7%94%A8Llamaindex0.6.9% E6%9E%84%E5%BB%BA%E4%BA%86%E4%B8%80%E4%B8%AA%E4%BE%A7%E8%BE%B9%E6%A0%8F%E6% 8F%92%E4%BB%B6%EF%BC%88%E4%B8%8D%E8%BF%87%E5%8F%AA%E8%83%BD%E5%9C%A8%E8%B0% B7%E6%AD%8C114Beta%E4%B8%8A%E8%BF%90%E8%A1%8Cside) panel, and based on local) and your project (the most important is the mask function), because the sheep Camel index has too many index methods, such as keywords, tree index, vector index, etc., and currently index can also be nested, etc. In addition to index, there are many parts that need in-depth development, so in my opinion, if this project is currently It’s very simple to use, but it’s very difficult to use it well. I support your idea: “The function of chatpdf will be included in the requirements of the plug-in knowledge base, which may be v2.5 or v3 .0, it can be confirmed that this function will not be implemented in the near future", this project is very active now, basically an update for a few days, give Llamaindex a little time, let the bullets fly for a while

I hope the above link can give you some help. As for the fact that someone stole the article from the official account and reposted it privately, I hope you don’t take it to heart. If you need additional financial support, I am willing to do some modest efforts! wish you happy

You made the plug-in, do you have a demo address? Can I see the effect?

@johnfelipe
Copy link
Contributor

i want to know if in roadmap upload or link pdf editable files?

@maristeslk
Copy link

后续会支持接入 azure embedding模型吗?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Will it support access to the azure embedding model in the future?

@iccyuan
Copy link

iccyuan commented Aug 8, 2023

看到有个网站依赖 https://qdrant.tech 实现

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I saw that there is a website that relies on to achieve

@whl1207
Copy link

whl1207 commented Aug 14, 2023

纯前端做,我用了nlp.js匹配问题和知识库的相关性,然后给到提示词里面,但参数有些难调

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Purely front-end, I used nlp.js to match the correlation between the problem and the knowledge base, and then gave it to the prompt word, but the parameters are a bit difficult to adjust

@johnfelipe
Copy link
Contributor

Witch % is complete this feature?

@tatakof
Copy link

tatakof commented Jun 27, 2024

any updates on this?

pd: superb project!

@lloydzhou
Copy link
Contributor

  1. 放弃在项目中跑embedding的想法
  2. v2.15.0支持使用插件的方式调用外部知识库
  3. 这里有一个将fastgpt包装成插件使用的示例: https://github.com/ChatGPTNextWeb/NextChat-Awesome-Plugins/tree/main/plugins/fastgpt

@lloydzhou
Copy link
Contributor

new plugin:

https://github.com/ChatGPTNextWeb/NextChat-Awesome-Plugins/tree/main/plugins/chatpdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests