-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatibility between Atomic Agents and Instructor when using Gemini models #56
Comments
@marcortola I was looking into this, but then decided not to make any changes since google is providing an OpenAI compatible API nowadays, which means you can just use the openAI provider, see: https://github.com/BrainBlend-AI/atomic-agents/blob/main/atomic-examples/quickstart/quickstart/4_basic_chatbot_different_providers.py From what I tested it seemed to do what you need it to do, and I didn't want to muddy up configuration or make examples more confusing at that time, especially since with gemini you need to pass the model when executing the That being said, if you make sure it works for images, your solution makes sense as a temporary solution, because other people will inevitably try to use For a longer term solution, I'm starting to think it might be worth abstracting the client creation to provide a better developer experience, that way internally, we could still just use the cleanest/easiest way, which would be using Thoughts? Ideas? Pros? Cons? |
Thanks for considering this issue and providing your perspective, @KennyVaneetvelde. Regarding using the OpenAI adapter, while it does seem to work, I was hesitant to rely on it in production since Google has marked it as experimental. They also explicitly recommend calling the Gemini API directly if you aren't already using the OpenAI libraries. I haven't done any latency tests yet, but that was my main reasoning for not going with the from_openai approach initially. I agree that passing the model to the run method feels weird and not quite right. It's not an ideal developer experience. I also think that adding vendor-specific code to the Atomic Agents core codebase with my proposed solution would break the OCP and could introduce maintenance difficulties down the line as we support more vendors (or if Instructor adds breaking changes). While it could work as a temporary workaround, it's not ideal for the long term. Abstracting client creation is an interesting idea. While it solves a slightly different problem, I can see how it would enable us to implement new clients without changing the Atomic Agents user-facing code. The Atomic Agents client creation API could act as a stable contract. As an example, I'm currently using this rudimentary client abstraction in my projects: import os
from functools import cache
import anthropic
import instructor
import openai
from instructor import Instructor
class ClientFactory:
@classmethod
@cache
def create(cls, model_name: str, streaming: bool = False) -> Instructor:
if model_name.startswith("gemini-"):
return cls._create_gemini_client(streaming)
elif model_name.startswith("gpt-"):
return cls._create_openai_client(streaming)
elif model_name.startswith("claude-"):
return cls._create_anthropic_client(streaming)
raise ValueError(f"Unsupported model: {model_name}")
@classmethod
def _create_gemini_client(cls, streaming: bool) -> Instructor:
api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
raise ValueError("GEMINI_API_KEY environment variable is not set")
if streaming:
client = openai.AsyncOpenAI(
api_key=api_key,
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
else:
client = openai.OpenAI(
api_key=api_key,
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
return instructor.from_openai(client, mode=instructor.Mode.JSON)
@classmethod
def _create_openai_client(cls, streaming: bool) -> Instructor:
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise ValueError("OPENAI_API_KEY environment variable is not set")
if streaming:
client = openai.AsyncOpenAI(api_key=api_key)
else:
client = openai.OpenAI(api_key=api_key)
return instructor.from_openai(client)
@classmethod
def _create_anthropic_client(cls, streaming: bool) -> Instructor:
api_key = os.getenv("ANTHROPIC_API_KEY")
if not api_key:
raise ValueError("ANTHROPIC_API_KEY environment variable is not set")
if streaming:
client = anthropic.AsyncAnthropic(api_key=api_key)
else:
client = anthropic.Anthropic(api_key=api_key)
return instructor.from_anthropic(client)
# Usage example:
client = ClientFactory.create("gpt-4-turbo-preview") # or "gemini-1.5-pro", "claude-3-opus" Perhaps we could:
Would love to hear your thoughts on this approach. |
@marcortola My first thoughts are that this seems like a nice approach, let me think on it for a bit and get back to you! |
@marcortola Thought about it, I am liking this approach, but then I thought "what about using Ollama? And what about people who use something like LMStudio which is sliiiightly different" And then I also started thinking about custom GPT-4o finetunes and the like.. Perhaps what we should do is:
That way we can have the control of your original proposal, with more configurability in case of local models/other servers/custom finetunes/... Does this make sense? |
When using Instructor with Gemini models, there's an incompatibility issue between Atomic Agents and Instructor's implementation. Atomic Agents always includes the
model
parameter in thecreate
method call, while Instructor explicitly raises an exception when this parameter is provided for Gemini models.This incompatibility prevents users from using Atomic Agents with Instructor when working with Gemini models, limiting the functionality and integration possibilities.
Note that while we can still use Gemini like this:
It would be safer in the long term to use the recommended Instructor approach as stated in its docs: https://python.useinstructor.com/integrations/google/
Current Behavior
https://github.com/instructor-ai/instructor/blob/887316c8267df621b985c736a2ebba504edf2399/instructor/process_response.py#L446
Proposed Solution
Modify the base agent implementation to conditionally include the model parameter based on the client mode, without introducing any BC:
Note: While this is the current behavior in Instructor, it might be worth considering a longer-term solution where Instructor handles the model parameter more gracefully (e.g., with a warning instead of an exception) for Gemini models.
I can prepare a PR if this approach is agreed upon.
The text was updated successfully, but these errors were encountered: