Model Metadata Central

A centralized, language-agnostic, open-source approach to storing and sharing model definitions like context windows, cost per token, etc.

Problem

Nearly every project that wants to incorporate multiple models needs to handle knowledge about the underlying model like context window length.

This leads to the proliferation of essentially the same code repeated across multiple codebases that all need to be updated when new models are released, token costs change, a model is deprecated, etc.

Examples

LangChain base/base_language/count_tokens.ts
LiteLLM model_prices_and_context_window.json
- Note: This was my inspiration for this initiative
AutoGen token_count_utils.py
tokentrim model_map.py
- Note: Open Interpreter relies on this
AI Research Assistant sevices/openai/models
- Note: This is one of my projects
Mentat llm_api.py
AutoGPT autogpt/core/resource/model_providers/openai.py
AgentGPT next/src/types/modelSettings.ts
MetaGPT utils/token_counter.py

Have more examples? Create a Pull Request.

Proposal

Centralized ownership (by an open source foundation) of a tech stack agnostic utility that defines model information and allows developers to easily import and consume these definitions in their own codebases.

JSON Schema

A JSON Schema definition can be found in model-metadata.schema.json, and example Model Metadata definitions can be found in the /models directory.

This schema defines properties that are relevant to the model and developers who wish to leverage it in their own codebases.

Required Properties

model_id: The identifier of the model that the provider uses
- Example: gpt-3.5-turbo
model_name: The human-friendly name of the model
- Example: GPT-3.5 Turbo
model_type: The type of model (chat, completion, or embedding)
- Example: chat
context_window: The maximum number of tokens in the model's context window
- Example: 4096

Optional Properties

model_provider: The provider of the model in lowercase
- Example: openai
model_description: A human-friendly description of the model
model_version: The version of the model
- Example 0613
cost_per_token: The cost per token in USD
- Example: json { "input": 0.0000015, "output": 0.000002 }
- Note: supports either a basic number or an object with input and output numbers to define different costs between input tokens and output tokens
knowledge_cutoff: The training data cutoff date for the model
- Note: This is helpful when dealing with applications where you may need to know if you should supplement the model's training data with more recent information
token_encoding: What encoding the model uses for tokens
- Example: cl100k_base
- Note: This is helpful when using tiktoken, gpt-tokenizer, etc. or needing to know if a model requires an alternate approach to counting tokens
tuning: The types of tuning that the model has been given in Array format; currently supports function, instruction, code, multilingual, and multimodal
- Example: ["function", "instruction"]
- Note: This is helpful when deciding which models are suitable for given tasks

Example

Here is an example Model Metadata definition for OpenAI's GPT-3.5 Turbo model:

model_id: gpt-3.5-turbo
model_name: GPT-3.5 Turbo
model_provider: openai
model_description: Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003.
model_info: https://platform.openai.com/docs/models/gpt-3-5
model_version: latest
model_type: chat
context_window: 4097
max_tokens: 4095
cost_per_token:
  input: 0.0000015
  output: 0.000002
knowledge_cutoff: 2021-09-01
token_encoding: cl100k_base
tuning:
  - function

Roadmap

Note: This project is open to feedback at every stage of rhis roadmap.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
models		models
packages		packages
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
model-metadata.schema.json		model-metadata.schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Metadata Central

Problem

Examples

Proposal

JSON Schema

Required Properties

Optional Properties

Example

Roadmap

About

Releases

Packages

Languages

InterwebAlchemy/model-metadata-central

Folders and files

Latest commit

History

Repository files navigation

Model Metadata Central

Problem

Examples

Proposal

JSON Schema

Required Properties

Optional Properties

Example

Roadmap

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages