Allow LLMs to control Home Assistant #1068

balloob · 2024-03-29T03:55:02Z

balloob
Mar 29, 2024
Maintainer

Because LLMs want to control Home Assistant too.

Context

We have 3 integrations for large language models: OpenAI, Google and Ollama. All 3 have the same thing in common: they cannot control Home Assistant and the only information they have is the information provided in the prompt (based on a template) sent to them on first interaction.

But a user wants to talk to LLMs via voice and control their house. This is not currently possible.

For our default conversation agent we have a set of intents to control Home Assistant. We match incoming text and extract the intention from it. Those intents get called.

There is a custom component OpenAI Extended Conversation that allows OpenAI to call APIs in Home Assistant to control devices, create automations and more. Google also supports function calling. Fun fact: OpenAI Extended Conversation relies on the built-in knowledge of OpenAI which services exist in Home Assistant.

We want to expose a Home Assistant API interface to LLMs. LLMs don’t work like other code, and we cannot just give them our websocket or Rest API. We also want to be very careful in the beginning about what we want to expose as it’s still experimental.

Decision

Add a new option to each AI Agent to allow it to access the Home Assistant API.

We don’t want to have each LLM integration define their own API to Home Assistant so we want to introduce a helper that defines an LLM API that can be shared. To get the structure in place, we want to initially start by just exposing all intents as APIs to LLMs. This will put the LLM at the same level as our built-in conversation agent.

In the future, we want to expand the LLM API with things like being able to query entities, devices and areas or do administration. When we do this, it would be done by creating LLM specific intents as we wouldn’t want to expose those intents to normal voice operations.

We don’t want integrations like OpenAI to directly integrate intents because it is a too tight coupling of the intent API with LLMs. The LLM API helper can be a small wrapper or interface to expose the intents for the first iteration.

It is the responsibility of the LLM integration to expose the LLM API to the LLM and to translate responses from the LLM into calls into the LLM API Helper. For OpenAI this will be done by leveraging the “tools” keyword argument.

For models where the API does not support API calling, integrations can try to make it work by adding text to the prompt to expect JSON responses. This is up to the integration to figure out and not the responsibility of the LLM API helper.

Answered by balloob

Sep 28, 2024

This was implemented and released.

View full answer

aalencia · 2024-04-07T21:58:03Z

aalencia
Apr 7, 2024

Having a common interface for LLM integration is brilliant. I hope to see more from these conversations.

2 replies

aalencia Apr 7, 2024

The main thing the voice assist feature in home assistant needs is having AI translate "turn off my damn lights you crazy robot" into the exact text it expects to perform the task. Obviously the phrase would break voice assist but if an LLM made it make sense to voice assist we have a useful feature integration.

aalencia Apr 8, 2024

A new Conversation Manager would need to be created to receive the text from whisper and have a configuration for selecting the LLM responsible for command translation. This is where I would guess a drop down with a list of LLM's that are capable. Being that the LLM needs to be a hybrid model in that it needs to remember the list of automation objects in home assistant, i'm wondering how that would affect things? Otherwise, we need to pre-train the LLM on each request. i.e. list of entities? Now, I have said enough to get murdered by people who actually know how these things work. It's my hope they come here and murder me and build this thread!

Baael · 2024-05-07T14:27:22Z

Baael
May 7, 2024

Currently I am using it that way:

classificator (extracts context from user sentences):
system message: prompt, floors, areas, short house description
user input: user sentence
result: indoor|outdoor, home assistant domain, floor, area, type: order|question

Then based on this informations I am triggering specified agent, that has detailed information about room or area like garden, or specialized like meteo station specialist.
system message: prompt, description of area, list of entities
user input: original user sentence
result: action, domain, entity id, data, type: order|question

Then it triggers executor that is creating serice calls, agents based on domain, trigger lights/broadcast message etc.

Idea here is to have lot of specialized agents and just trigger specific, so it is like 2-3 layers operation: classify->run interperter agent->run executor.

So there is no need to send huge context and all data to one agent .

In addition you can pass user preferences, for example how light based on time of day should behave.

Example of the same concept for light controller:

level 0 - classificator:

You are a conversation classifier, you assess what is being discussed in the context of the conversation.
Take into account the user's command if it exists, for example, note, save, remind. 
"Note" means NOTES, "remind" means REMINDER. 
Depending on the context, return only and exclusively one of these tags and no additional text: 
SEGREGATION_AND_ORGANIZATION|CARETAKER|MEDICAL_HELP|REMINDER|CONVERSATION_COMPANION|SMART_HOME|WEATHER|SHOPPING_LIST|NOTES.

level 1 - description of weather

Sensors:
- sunlight lux: {{current_outdoor_lux}}
- UV: {{uv_index}}
- Rain: {{rain}} mm hourly
- feels_like_temperature: {{odczuwalna}}
- average wind speed: {{wind_speed}}
- average wind direction: {{wind_dir}}

Time/Environment:
- Time: {{hour}}, {{day}} {{month}}
- is someone in workroom: {{person_in_home}}
- is someone on terrace: {{person_in_home}}

Task:
Briefly and concisely describe sensor state for an AI prompt using low amount of tokens using followin directions:
describe lux, azimuth, elevation, elevation trend, uv index
part of a day, month, using words like early, late, middle etc
describe sun position and trend
describe how sunlight ambience and radiation to rooms through windows including all possible directions - north, south, east, west
write about people presence in rooms

level 2 - floor lights controller:

{{{weather}}}

rooms:
  workroom:
    {{#person_in_home}}
    - Needs more white light; shaded windows face southeast.
    - Prefer colder white light.
    - Used for computer work.
    - Increase light significantly when outside lux < 5,000; set brightness 80-100.
    - during middle of sunny day: should be turned off
    - from a middle of a day to an evening: should be turned off
    - in a mornings it needs daylight compensation, it means it has to be very bright and cold white
    {{/person_in_home}}
    {{^person_in_home}}
    - light should be turned off
    {{/person_in_home}}

  terrace
    {{^person_in_home}}
    - light should be turned off
    {{/person_in_home}}
    {{#person_in_home}}
    - very sunny during the day, open sides: south and west.
    - evenings: is used as main light
    - nights: is used as ambient light
    - if it is sunny outside: should be turned off.
    {{/person_in_home}}

  side_terrace
    - very sunny during the day, open sides: south and west.
    {{^person_in_home}}
    - nights: is used as ambient light
    {{/person_in_home}}
    {{#person_in_home}}
    - evenings: is used as helping light
    - during nights: light is used as ambient light
    - if it is sunny outside: light should be turned off.
    {{/person_in_home}}

Task:
Based on sensors and room conditions and connections between rooms, determine lamp settings (on/off, brightness, color temperature in mireds) answer exactly in such JSON format without any additional text::
[{ "topic": "ai/light/[room id]", payload: { "service": "turn_on|turn_off", "data": { "brightness_pct": 0-100, "color_temp": 250-454 } }, "explanation":, "occupied": }]

And I never had to use aliases neither intents in HA.
Maybe thats the solution for LLMs? Just (maybe even automatically) describe areas and use it.

4 replies

lorerave85 May 7, 2024

I'm using a decoupling layer between the LLM and home assistant: it's called langchain.

I created some tools that respond differently based on the question, and it is the LLM model's job to understand which tool to go for.

I don't know if it can be useful, but as a layer it helps a lot and leaves you free to choose the LLM model you prefer.

the peculiarity of this layer is that it allows you to create chains that process and execute operations locally, without having to expose APIs externally.
The only flaw is that, in order to be processed, the result must pass through an LLM that interprets it and responds in natural language.

Baael May 7, 2024

I removed langchain from my toolbox long time ago, to reduce complexity and increase flexibility.
Currently all I need are simple templates and such simple approach works perfectly and is very easy to maintain. I can even ask wife or parents to write about their preferences and simply concat it with prompt.
But maybe I will try it again.
However in both cases in context of the OP idea and HA integration langchain may be very usefull (I think)

lorerave85 May 7, 2024

I use it mainly for 2 reasons:

rag models on which I structure useful information (vectorDB, etc)
history (saved on redis) and context which helps improve the response

what other frameworks do you use? or do you work directly with the APIs that are available from the LLM?

Baael May 7, 2024

currently directly, I have few ideas especially for vector databases, but lack of time :)

aalencia · 2024-05-07T15:17:33Z

aalencia
May 7, 2024

Sounds great. Just need the AI to translate intent into meaningful commands home assistant will recognize efficiently. Maybe a next step is a model as you have described but as training so the AI manages the pieces you'd intend to code or as much of it as possible so only the ai api hooks on the endpoints need coding. Sent from my iPhoneOn May 7, 2024, at 10:27 AM, Wojciech Zieliński ***@***.***> wrote: Currently I am using it that way: classificator (extracts context from user sentences): system message: prompt, floors, areas, short house description user input: user sentence result: indoor|outdoor, home assistant domain, floor, area, type: order|question Then based on this informations I am triggering specified agent, that has detailed information about room or area like garden, or specialized like meteo station specialist. system message: prompt, description of area, list of entities user input: original user sentence result: action, domain, entity id, data, type: order|question Then it triggers executor that is creating serice calls, agents based on domain, trigger lights/broadcast message etc. Idea here is to have lot of specialized agents and just trigger specific, so it is like 2-3 layers operation: classify->run interperter agent->run executor. So there is no need to send huge context and all data to one agent . In addition you can pass user preferences, for example how light based on time of day should behave. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

2 replies

Baael May 7, 2024

Actually output from LLM is going straight to service call (I am just parsing json from string). It is smart enough (llama3) that you can write "respond with home assistant service call JSON payload" and it is in 99% working, it even understands if it should be single entity or area_id in data object.

Example of LLM generated JSON:

{
    "domain": "light",
    "service": "turn_on",
    "target": {
        "area_id": ["biuro"],
    },
    "data": {
        "brightness_pct": 80
    }
}

aalencia May 7, 2024

Smart.

aalencia · 2024-05-07T17:18:19Z

aalencia
May 7, 2024

I like it. Does this have a feedback mechanism so that acknowledgments/error handling may happen internally and potentially rerouted or retried after conditions are met? Logging to a log repository of the chain changes? Are you going to try to maintain a state until the intention has concluded in an acceptable action for the user? Great stuff. Sent from my iPhoneOn May 7, 2024, at 12:06 PM, lorerave85 ***@***.***> wrote: I'm using a decoupling layer between the LLM and home assistant: it's called langchain. I created some tools that respond differently based on the question, and it is the LLM model's job to understand which tool to go for. I don't know if it can be useful, but as a layer it helps a lot and leaves you free to choose the LLM model you prefer. the peculiarity of this layer is that it allows you to create chains that process and execute operations locally, without having to expose APIs externally. The only flaw is that, in order to be processed, the result must pass through an LLM that interprets it and responds in natural language. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

0 replies

balloob · 2024-09-28T02:49:32Z

balloob
Sep 28, 2024
Maintainer Author

This was implemented and released.

2 replies

aalencia Sep 28, 2024

I'm looking forward to trying this! Awesome news.

balloob Sep 28, 2024
Maintainer Author

It was released in June. More info at https://www.home-assistant.io/blog/2024/06/07/ai-agents-for-the-smart-home/

aalencia · 2024-09-28T13:40:27Z

aalencia
Sep 28, 2024

Awesome, thanks for the info! Just saw the complete.

…

On Sat, Sep 28, 2024 at 7:32 AM Paulus Schoutsen ***@***.***> wrote: It was released in June. More info at https://www.home-assistant.io/blog/2024/06/07/ai-agents-for-the-smart-home/ — Reply to this email directly, view it on GitHub <#1068 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABHDRFRZHMM5QT5F4IFFTLZY2OUTAVCNFSM6AAAAABFN327MKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANZYGM2DOMY> . You are receiving this because you commented.Message ID: <home-assistant/architecture/repo-discussions/1068/comments/10783473@ github.com>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow LLMs to control Home Assistant #1068

{{title}}

Replies: 6 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Allow LLMs to control Home Assistant #1068

balloob Mar 29, 2024 Maintainer

Context

Decision

Replies: 6 comments · 10 replies

balloob Sep 28, 2024 Maintainer Author

balloob Sep 28, 2024 Maintainer Author

balloob
Mar 29, 2024
Maintainer

Replies: 6 comments 10 replies

balloob
Sep 28, 2024
Maintainer Author

balloob Sep 28, 2024
Maintainer Author