Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Concurrent OpenAI Embeddings #8

Merged

Conversation

Butch78
Copy link
Contributor

@Butch78 Butch78 commented Nov 1, 2023

Hi @santiagomed,
Thank you for creating the PDF example it helped so much with what I want to do!

I think my laptop is a little under-powered because it is taking forever to make the Bert Embeddings 😅
So I had a go at creating a concurrent request for the OpenAI Embeddings, it worked much faster when I tested the rev on my local PDFs.

@Butch78 Butch78 changed the title feat: Concurrent Embeddings feat: Concurrent OpenAI Embeddings Nov 1, 2023
@santiagomed
Copy link
Owner

Hi @Butch78!

Thank you for opening the PR and for diving into concurrency to improve the speed of generating embeddings! I'm glad the PDF example was helpful for your use case.

I've previously looked into concurrent processing for the Bert embeddings using the Rayon crate. One challenge I faced was ensuring the order of the output embeddings (Vec<Vec<f32>>) matched the order of the input prompts (Vec<Box<dyn Prompt>>). When tasks are executed concurrently, they can finish at different times, making the order unpredictable.

Did your implementation successfully maintain the correct order? I'd be interested to see how you tackled this challenge.

Additionally, you might find tokio::spawn useful—it's another approach that could offer a neat concurrency solution for this context.

Looking forward to your thoughts!

@Butch78
Copy link
Contributor Author

Butch78 commented Nov 2, 2023

My Embeddings implementation lacks order preservation, primarily due to my initial oversight of its importance.

I did some further research and we could a Queue system which would solve this issue #9, One possible solution it creating something similar to the recently released Hugging Face repository for text embeddings inference. This will hopefully ensure that order is consistently maintained while allowing some good performance!

I'll change my pull request to use tokio::spawn soon :)

@Butch78
Copy link
Contributor Author

Butch78 commented Nov 2, 2023

I updated my code to use tokio::spawn, I also used the crossbeam_channel crate to ensure order of the OpenAI emdeddings creation :)

Copy link
Owner

@santiagomed santiagomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I really like your solution! It's easy to understand yet effective. I just left some suggestions to make it cleaner (I think) using Tokio. I like to avoid adding extra dependencies unless absolutely necessary (I am looking at ways to reduce the number of dependencies we have right now). Also, if you think any of my suggestions are wrong or there is a better way to do it, let me know.

orca/src/llm/openai.rs Outdated Show resolved Hide resolved
tokio::spawn(async move {
let res = client.execute(req).await.map_err(|e| e.to_string())?;
let response = res.json::<OpenAIEmbeddingResponse>().await.map_err(|e| e.to_string())?;
sender.send((i, response)).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could look something like sender.send((i, response)).await; using tokio channels

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Actually, the current method of error handling within the spawned tasks can be problematic. It might be better to send back a Result type over the channel, so the receiver can decide how to handle errors.

tokio::spawn(async move {
    let result = async {
        let res = client.execute(req).await.map_err(|e| EmbeddingError::RequestError(e.to_string()))?;
        let response = res.json::<OpenAIEmbeddingResponse>().await.map_err(|e| EmbeddingError::ResponseError(e.to_string()))?;
        Ok(response)
    }.await;

    // Send back the result (success or error) via the channel.
    sender.send((i, result)).await.expect("Failed to send over channel");
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added simple format! errors but I was wondering did you want to create some thiserror for the EmbeddingError like in your example?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format! should be fine for now! I just made up some generic type there for example purposes. Tbh Rust error handling still confuses me, so I just try to use anyhow for everything. In the future my plan is to clean up Orca's error handling.

orca/src/llm/openai.rs Outdated Show resolved Hide resolved
orca/src/llm/openai.rs Outdated Show resolved Hide resolved
@Butch78
Copy link
Contributor Author

Butch78 commented Nov 3, 2023

I'm just getting this error whenever I try and run cargo make test I was wondering if you are having the same error?

image

@santiagomed
Copy link
Owner

santiagomed commented Nov 3, 2023

@Butch78 huh strange. They must have removed the Clone implementation for ModelWeights in the Candle crate maybe. The solution here would be to use an Arc on the ModelWeights I think. I'll look into later today after work

@santiagomed
Copy link
Owner

It's also very strange to me how the CI workflow keeps failing on PRs but it passes on pushes to main branch...

@santiagomed
Copy link
Owner

Hi @Butch78, I made a small change to drop the sender because the receiver was infinite looping; this was caused by the fact that the original sender was not being dropped. According to Tokio docs:

The channel is closed when all senders have been dropped

Since there was still one sender "alive", the receiver loop did not exit. I just added a simple drop(sender) to fix this. I also modified the embeddings vector to start with default values as with_capacity was not working as intended. Regarding to the problem you mentioned last week, I was not encountering that issue. Can you do cargo update and try again? Let me know if the issue is still going on for you.

@Butch78
Copy link
Contributor Author

Butch78 commented Nov 7, 2023

Amazing work! Once I ran cargo update I didn't run into the testing issue anymore! I'll be sure to run that more often when working on different Repos! :)

Copy link
Owner

@santiagomed santiagomed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving!

@santiagomed santiagomed merged commit d3b42c7 into santiagomed:main Nov 7, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants