Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use open source embedding methods rather than OpenAIEmbeddings? #7619

Closed
3 of 14 tasks
ykemiche opened this issue Jul 12, 2023 · 2 comments
Closed
3 of 14 tasks
Labels
🤖:question A specific question about the codebase, product, project, or how to use a feature

Comments

@ykemiche
Copy link

ykemiche commented Jul 12, 2023

System Info

LangChain version : 0.0.216
Python version : 3.11.4
System: Windows

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

I want to create a chatbot to retrieve information from my own pdf in response to a query using google PaLM model, I followed these steps :
-load the pdf
-split it using RecursiveCharacterTextSplitter
-store its embeddings in a Chroma vectorestore
and then create a chain ...

from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

import langchain

loader=PyPDFLoader("path/to/pdf.pdf")
chroma_dir="./chroma
pages=loader.load()
splitter=RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=150,
    separators=['\n\n','\n'," ",""]
)

splits=splitter.split_documents(pages)

#I want to change this with another embedding method which doesn't require API authentification
embeddings=OpenAIEmbeddings()

vector_db=Chroma.from_documents(
documents=splits,
embedding=embeddings,
persist_directory=chroma_dir
)

But the only embedding method that is available in the LangChain documentation is OpenAIEmbeddings,how can we do without it?

Expected behavior

all the splits embeddings stored in Chroma vectorestore without using OpenAIEmbeddings()

@dosubot dosubot bot added the 🤖:question A specific question about the codebase, product, project, or how to use a feature label Jul 12, 2023
@mswoff
Copy link

mswoff commented Jul 13, 2023

If you want Google PALM

from langchain.embeddings.google_palm import GooglePalmEmbeddings
...
embeddings = GooglePalmEmbeddings()

You can alternatively use VertexAIEmbeddings as an alternative to OpenAIEmbeddings. This method does not require API authentication. Here is how you can modify your code:

from langchain.embeddings.vertexai import VertexAIEmbeddings

# Replace OpenAIEmbeddings with VertexAIEmbeddings
embeddings = VertexAIEmbeddings()

vector_db = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=chroma_dir
)

@Fofna
Copy link

Fofna commented Apr 26, 2024

It requieres GOOGLE_API_KEY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:question A specific question about the codebase, product, project, or how to use a feature
Projects
None yet
Development

No branches or pull requests

3 participants