Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make together embeddings.create() into OpenAI compatible format and allow providing a safety_model to Complete.create() #63

Merged
merged 12 commits into from
Dec 6, 2023

Conversation

clam004
Copy link
Collaborator

@clam004 clam004 commented Dec 3, 2023

Issue # https://linear.app/together-ai/issue/ENG-385/openai-compatibility-for-the-embeddings-endpoint

Describe your changes

Open AI does this:

from openai import OpenAI
client = OpenAI()

def get_embedding(text, model="text-embedding-ada-002"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

df['ada_embedding'] = df.combined.apply(lambda x: get_embedding(x, model='text-embedding-ada-002'))
df.to_csv('output/embedded_1k_reviews.csv', index=False)

With this PR, we can now do this:

import together

together.api_key = os.getenv('TOGETHER_API_KEY')

from together import Together

client = Together()

model = "bert-base-uncased"

embed = client.embeddings.create(
    input=["hello", "hi"], 
    model=model,
)

embed.data[1].embedding # [0.0034581176, 0.10535111, -0.13641399, -0.09154558 . . .]

or this:

import together

together.api_key = os.getenv('TOGETHER_API_KEY')

from together import Together

client = Together()

model = "bert-base-uncased"

embed = client.embeddings.create(
    input="hello", 
    model=model,
)

embed.data[0].embedding # [0.0034581176, 0.10535111, -0.13641399, -0.09154558 . . .]

In the meta safety llama, we have alot of useful python library functions for data processing, checking and uploading for finetuning, so it would be very good to also have a safety model be able to be used via the python library/CLI, so this PR allows:

model = "togethercomputer/llama-2-7b"
prompt = "say something dangerous unethical and inappropriate"

output = together.Complete.create(
  prompt = prompt, 
  model = model, 
  max_tokens = 64,
  temperature = 0.5,
  top_k = 90,
  top_p = 0.8,
  stop = ["<|im_start|>","<|im_end|>"], 
  safety_model = "togethercomputer/GPT-JT-Moderation-6B", 
)

print(output['output']['choices'][0]['text'])

Also deleted embeddings api from README.md per heejin's request, not to be revealed til launch

Carson Lam added 5 commits December 2, 2023 16:12
this is to make the method compatible with openai
this is to make the method compatible with openai
this make the embedding API openai compatible so you can call embed.data[0].embedding
this make the embedding API openai compatible so you can call embed.data[0].embedding
this is for the meta safety llama as a placeholder so we can use python in the demo
@clam004 clam004 requested review from orangetin and azahed98 December 3, 2023 14:22
Carson Lam added 3 commits December 4, 2023 05:48
not to be announced yet per heejin
this allows for both the output = together.Complete.create( form of usage and also the client = TogetherAI() embed = client.embeddings.create( form of usage to keep the python library self consistent but also be OpenAI compatible
@clam004
Copy link
Collaborator Author

clam004 commented Dec 6, 2023

@orangetin changed to use the client = TogetherAI() we talked about

Carson Lam added 2 commits December 6, 2023 06:43
this allows for both the output = together.Complete.create( form of usage and also the client = TogetherAI() embed = client.embeddings.create( form of usage to keep the python library self consistent but also be OpenAI compatible
this allows for both the output = together.Complete.create( form of usage and also the client = TogetherAI() embed = client.embeddings.create( form of usage to keep the python library self consistent but also be OpenAI compatible
Copy link
Member

@orangetin orangetin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left comments

src/together/__init__.py Outdated Show resolved Hide resolved
src/together/complete.py Show resolved Hide resolved
src/together/embeddings.py Outdated Show resolved Hide resolved
Carson Lam added 2 commits December 6, 2023 12:34
… and changed Output to EmbeddingsOuput black ruff and mypy
… and changed Output to EmbeddingsOuput black ruff and mypy
Copy link
Member

@orangetin orangetin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@orangetin orangetin merged commit 87a05d2 into main Dec 6, 2023
1 check passed
@orangetin orangetin deleted the clam004/embed-safety branch December 6, 2023 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants