- Add ability to output according to a JSON spec.
- Add Gemini 2.0 Flash and Llama 3.3 and QwQ models.
- Fix Open AI context length sizes, which are mostly smaller than advertised.
- Add JSON mode, for most providers with the exception of Claude.
- Add ability for keys to be functions, thanks to Daniel Mendler.
- Fix extra argument in
llm-batch-embeddings-async
.
- Add media handling, for images, videos, and audio.
- Add batch embeddings capability (currently for just Open AI and Ollama).
- Add Microsoft Azure’s Open AI
- Remove testing and other development files from ELPA packaging.
- Remove vendored
plz-event-source
andplz-media-type
, and add requirements. - Update list of Ollama models for function calling.
- Centralize model list so things like Vertex and Open AI compatible libraries can have more accurate context lengths and capabilities.
- Update default Gemini chat model to Gemini 1.5 Pro.
- Update default Claude chat model to latest Sonnet version.
- Fix issue in some Open AI compatible providers with empty function call arguments
- Fix problem with Open AI’s
llm-chat-token-limit
. - Fix Open AI and Gemini’s parallel function calling.
- Add variable
llm-prompt-default-max-tokens
to put a cap on number of tokens regardless of model size.
- More fixes with Claude and Ollama function calling conversation, thanks to Paul Nelson.
- Make
llm-chat-streaming-to-point
more efficient, just inserting new text, thanks to Paul Nelson. - Don’t output streaming information when
llm-debug
is true, since it tended to be overwhelming.
- Fix compiled functions not being evaluated in
llm-prompt
. - Use Ollama’s new
embed
API instead of the obsolete one. - Fix Claude function calling conversations
- Fix issue in Open AI streaming function calling.
- Update Open AI and Claude default chat models to the later models.
- Support Ollama function calling, for models which support it.
- Make sure every model, even unknown models, return some value for
llm-chat-token-limit
. - Add token count for llama3.1 model.
- Make
llm-capabilities
work model-by-model for embeddings and functions
- Introduced
llm-prompt
for prompt management and creation from generators. - Removed Gemini and Vertex token counting, because
llm-prompt
uses token counting often and it’s best to have a quick estimate than a more expensive more accurate count.
- Fix Open AI’s gpt4-o context length, which is lower for most paying users than the max.
- Add support for HTTP / HTTPS proxies.
- Add “non-standard params” to set per-provider options.
- Add default parameters for chat providers.
- Move to
plz
backend, which usescurl
. This helps move this package to a stronger foundation backed by parsing to spec. Thanks to Roman Scherer for contributing theplz
extensions that enable this, which are currently bundled in this package but will eventually become their own separate package. - Add model context information for Open AI’s GPT 4-o.
- Add model context information for Gemini’s 1.5 models.
- Fix mangled copyright line (needed to get ELPA version unstuck).
- Fix Vertex response handling bug.
- Fix various issues with the 0.14 release
- Introduce new way of creating prompts: llm-make-chat-prompt, deprecating the older ways.
- Improve Vertex error handling
- Add Claude’s new support for function calling.
- Refactor of providers to centralize embedding and chat logic.
- Remove connection buffers after use.
- Fixes to provider more specific error messages for most providers.
- Refactor of warn-non-nonfree methods.
- Add non-free warnings for Gemini and Claude.
- Send connection issues to error callbacks, and fix an error handling issue in Ollama.
- Fix issue where, in some cases, streaming does not work the first time attempted.
- Fix issue in
llm-ollama
with not using provider host for sync embeddings. - Fix issue in
llm-openai
where were incompatible with some Open AI-compatible backends due to assumptions about inconsequential JSON details.
- Add provider
llm-claude
, for Anthropic’s Claude.
- Introduce function calling, now available only in Open AI and Gemini.
- Introduce
llm-capabilities
, which returns a list of extra capabilities for each backend. - Fix issue with logging when we weren’t supposed to.
- Introduce llm logging (for help with developing against
llm
), setllm-log
to non-nil to enable logging of all interactions with thellm
package. - Change the default interaction with ollama to one more suited for converesations (thanks to Thomas Allen).
- Default to the new “text-embedding-3-small” model for Open AI. Important: Anyone who has stored embeddings should either regenerate embeddings (recommended) or hard-code the old embedding model (“text-embedding-ada-002”).
- Fix response breaking when prompts run afoul of Gemini / Vertex’s safety checks.
- Change Gemini streaming to be the correct URL. This doesn’t seem to have an effect on behavior.
- Add
llm-chat-token-limit
to find the token limit based on the model. - Add request timeout customization.
- Allow users to change the Open AI URL, to allow for proxies and other services that re-use the API.
- Add
llm-name
andllm-cancel-request
to the API. - Standardize handling of how context, examples and history are folded into
llm-chat-prompt-interactions
.
- Upgrade Google Cloud Vertex to Gemini - previous models are no longer available.
- Added
gemini
provider, which is an alternate endpoint with alternate (and easier) authentication and setup compared to Cloud Vertex. - Provide default for
llm-chat-async
to fall back to streaming if not defined for a provider.
- Add provider
llm-llamacpp
. - Fix issue with Google Cloud Vertex not responding to messages with a system interaction.
- Fix use of
(pos-eol)
which is not compatible with Emacs 28.1.
- Fix incompatibility with older Emacs introduced in Version 0.5.1.
- Add support for Google Cloud Vertex model
text-bison
and variants. llm-ollama
can now be configured with a scheme (http vs https).
- Implement token counting for Google Cloud Vertex via their API.
- Fix issue with Google Cloud Vertex erroring on multibyte strings.
- Fix issue with small bits of missing text in Open AI and Ollama streaming chat.
- Fixes for conversation context storage, requiring clients to handle ongoing conversations slightly differently.
- Fixes for proper sync request http error code handling.
llm-ollama
can now be configured with a different hostname.- Callbacks now always attempts to be in the client’s original buffer.
- Add provider
llm-gpt4all
.
- Add helper function
llm-chat-streaming-to-point
. - Add provider
llm-ollama
.
- Streaming support in the API, and for the Open AI and Vertex models.
- Properly encode and decode in utf-8 so double-width or other character sizes don’t cause problems.
- Changes in how we make and listen to requests, in preparation for streaming functionality.
- Fix overzealous change hook creation when using async llm requests.
- Remove the dependency on non-GNU request library.