You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding how to generate a large amount of data when only one API is available. I've noticed that the data generation rate is very slow during actual use. Increasing the batch_size and code_batch_size to large values does not seem to help much. Additionally, I found that after starting one process, I cannot start another process, even though I have changed the session_output path. Starting multiple threads also did not significantly speed up the process. I hope to receive your response as soon as possible. Thank you.
The text was updated successfully, but these errors were encountered:
Hi, this depends on the rate limit of your APIs. Even though this code base can support multiprocess, we found the main bottleneck is still the rate limit when calling those proprietary models. From my experience, we can generate up to 10k - 15k samples per day using Anthropic API.
Hello,
I have a question regarding how to generate a large amount of data when only one API is available. I've noticed that the data generation rate is very slow during actual use. Increasing the batch_size and code_batch_size to large values does not seem to help much. Additionally, I found that after starting one process, I cannot start another process, even though I have changed the session_output path. Starting multiple threads also did not significantly speed up the process. I hope to receive your response as soon as possible. Thank you.
The text was updated successfully, but these errors were encountered: