-
Because LLMs want to control Home Assistant too. ContextWe have 3 integrations for large language models: OpenAI, Google and Ollama. All 3 have the same thing in common: they cannot control Home Assistant and the only information they have is the information provided in the prompt (based on a template) sent to them on first interaction. But a user wants to talk to LLMs via voice and control their house. This is not currently possible. For our default conversation agent we have a set of intents to control Home Assistant. We match incoming text and extract the intention from it. Those intents get called. There is a custom component OpenAI Extended Conversation that allows OpenAI to call APIs in Home Assistant to control devices, create automations and more. Google also supports function calling. Fun fact: OpenAI Extended Conversation relies on the built-in knowledge of OpenAI which services exist in Home Assistant. We want to expose a Home Assistant API interface to LLMs. LLMs don’t work like other code, and we cannot just give them our websocket or Rest API. We also want to be very careful in the beginning about what we want to expose as it’s still experimental. DecisionAdd a new option to each AI Agent to allow it to access the Home Assistant API. We don’t want to have each LLM integration define their own API to Home Assistant so we want to introduce a helper that defines an LLM API that can be shared. To get the structure in place, we want to initially start by just exposing all intents as APIs to LLMs. This will put the LLM at the same level as our built-in conversation agent. In the future, we want to expand the LLM API with things like being able to query entities, devices and areas or do administration. When we do this, it would be done by creating LLM specific intents as we wouldn’t want to expose those intents to normal voice operations. We don’t want integrations like OpenAI to directly integrate intents because it is a too tight coupling of the intent API with LLMs. The LLM API helper can be a small wrapper or interface to expose the intents for the first iteration. It is the responsibility of the LLM integration to expose the LLM API to the LLM and to translate responses from the LLM into calls into the LLM API Helper. For OpenAI this will be done by leveraging the “tools” keyword argument. For models where the API does not support API calling, integrations can try to make it work by adding text to the prompt to expect JSON responses. This is up to the integration to figure out and not the responsibility of the LLM API helper. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 10 replies
-
Having a common interface for LLM integration is brilliant. I hope to see more from these conversations. |
Beta Was this translation helpful? Give feedback.
-
Currently I am using it that way: classificator (extracts context from user sentences): Then based on this informations I am triggering specified agent, that has detailed information about room or area like garden, or specialized like meteo station specialist. Then it triggers executor that is creating serice calls, agents based on domain, trigger lights/broadcast message etc. Idea here is to have lot of specialized agents and just trigger specific, so it is like 2-3 layers operation: classify->run interperter agent->run executor. So there is no need to send huge context and all data to one agent . In addition you can pass user preferences, for example how light based on time of day should behave. Example of the same concept for light controller: level 0 - classificator:
level 1 - description of weather
level 2 - floor lights controller:
And I never had to use aliases neither intents in HA. |
Beta Was this translation helpful? Give feedback.
-
Sounds great. Just need the AI to translate intent into meaningful commands home assistant will recognize efficiently. Maybe a next step is a model as you have described but as training so the AI manages the pieces you'd intend to code or as much of it as possible so only the ai api hooks on the endpoints need coding. Sent from my iPhoneOn May 7, 2024, at 10:27 AM, Wojciech Zieliński ***@***.***> wrote:
Currently I am using it that way:
classificator (extracts context from user sentences):
system message: prompt, floors, areas, short house description
user input: user sentence
result: indoor|outdoor, home assistant domain, floor, area, type: order|question
Then based on this informations I am triggering specified agent, that has detailed information about room or area like garden, or specialized like meteo station specialist.
system message: prompt, description of area, list of entities
user input: original user sentence
result: action, domain, entity id, data, type: order|question
Then it triggers executor that is creating serice calls, agents based on domain, trigger lights/broadcast message etc.
Idea here is to have lot of specialized agents and just trigger specific, so it is like 2-3 layers operation: classify->run interperter agent->run executor.
So there is no need to send huge context and all data to one agent .
In addition you can pass user preferences, for example how light based on time of day should behave.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I like it. Does this have a feedback mechanism so that acknowledgments/error handling may happen internally and potentially rerouted or retried after conditions are met? Logging to a log repository of the chain changes? Are you going to try to maintain a state until the intention has concluded in an acceptable action for the user? Great stuff. Sent from my iPhoneOn May 7, 2024, at 12:06 PM, lorerave85 ***@***.***> wrote:
I'm using a decoupling layer between the LLM and home assistant: it's called langchain.
I created some tools that respond differently based on the question, and it is the LLM model's job to understand which tool to go for.
I don't know if it can be useful, but as a layer it helps a lot and leaves you free to choose the LLM model you prefer.
the peculiarity of this layer is that it allows you to create chains that process and execute operations locally, without having to expose APIs externally.
The only flaw is that, in order to be processed, the result must pass through an LLM that interprets it and responds in natural language.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This was implemented and released. |
Beta Was this translation helpful? Give feedback.
-
Awesome, thanks for the info!
Just saw the complete.
…On Sat, Sep 28, 2024 at 7:32 AM Paulus Schoutsen ***@***.***> wrote:
It was released in June. More info at
https://www.home-assistant.io/blog/2024/06/07/ai-agents-for-the-smart-home/
—
Reply to this email directly, view it on GitHub
<#1068 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABHDRFRZHMM5QT5F4IFFTLZY2OUTAVCNFSM6AAAAABFN327MKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANZYGM2DOMY>
.
You are receiving this because you commented.Message ID:
<home-assistant/architecture/repo-discussions/1068/comments/10783473@
github.com>
|
Beta Was this translation helpful? Give feedback.
This was implemented and released.