Prompterator is a Streamlit-based prompt-iterating IDE. It runs locally but connects to external APIs exposed by various LLMs.
A screenshot of the prompterator
interface, with highligted features / areas of interest: 1. Data Upload, 2. Compose Prompt, 3. Run Prompt, 4. Evaluate and 5. Prompt History.
Create a virtual environment that uses Python 3.10. Then, install project requirements:
-
pip install poetry==1.4.2
-
poetry install --no-root
If you use PyCharm, consider storing these in your run configuration.
OPENAI_API_KEY
: Optional. Only if you want to use OpenAI models (ChatGPT, GPT-4, etc.) via OpenAI APIs.AZURE_OPENAI_API_KEY
andAZURE_OPENAI_API_BASE
: Optional. Only if you want to use OpenAI models (ChatGPT, GPT-4, etc.) via Azure OpenAI APIs.AZURE_OPENAI_USE_DEFAULT_CREDENTIALS
: Optional. If set toTrue
, the Azure OpenAI API will use the default credentials from the environment, as per https://learn.microsoft.com/en-us/python/api/overview/azure/identity-readme?view=azure-python
PROMPTERATOR_DATA_DIR
: Optional. Where to store the files with your prompts and generated texts. Defaults to~/prompterator-data
. If you plan to work on prompts for different tasks or datasets, it's a good idea to use a separate directory for each one.
If you do not happen to have access to an OpenAI API key, feel free to use the mock-gpt-3.5-turbo
model, which is a mocked version of the OpenAI's GPT-3.5 model. This is also very helpful when
developing Prompterator itself.
From the root of the repository, run:
make run
If you want to run the app directly from PyCharm, create a run configuration:
- Right-click
prompterator/main.py
-> More Run/Debug -> Modify Run Configuration - Under "Interpreter options", enter
-m poetry run streamlit run
- Optionally, configure environment variables as described above
- Save and use 🚀
To use the models Prompterator supports out of the box, you generally need to at least specify an API key and/or the endpoint Prompterator ought to use when contacting them.
The sections below specify how to do that for each supported model family.
- To use OpenAI APIs, set the
OPENAI_API_KEY
environment variable as per the docs. - To use Azure OpenAI APIs, set:
AZURE_OPENAI_API_KEY
AZURE_OPENAI_API_BASE
-- the base endpoint URL, excluding the/openai/deployments/...
endingAZURE_OPENAI_API_VERSION
if your version differs from the default2023-05-15
; see the docs
- Set the
GOOGLE_VERTEX_AUTH_TOKEN
environment variable to the output ofgcloud auth print-access-token
. - Set the
TEXT_BISON_URL
environment variable to the URL that belongs to yourPROJECT_ID
, as per the docs
To use the AWS Bedrock-provided models, a version of boto3
that supports AWS Bedrock needs to be installed.
- Set the
COHERE_API_KEY
environment variable to your Cohere api key as per the docs.
Note that to use the Cohere models, the Cohere package needs to be installed as well.
Prompterator supports vision models (for now gpt-4-vision-preview
) that can take text and an image as input and output text. To use them, you need to upload a csv file with the following columns:
text
: is basic requirement just like in other modelsimage
: full base64 encoding of an image (example:data:image/jpeg;base64,/9j/4AA...
) The image will be rendered inside the displayed dataframe and next to the "generated text" area
(Note: you also need an OPENAI_API_KEY
environment variable to use gpt-4-vision-preview
)
Prompterator accepts CSV files as input. Additionally, the CSV data should follow these rules:
- be parseable using a
pd.read_csv
call with the default argument values. This means e.g. having column names in the first row, using comma as the separator, and enclosing values (where needed) in double quotes ("
) - have a column named
text
The user/system prompt textboxes support Jinja templates. Don't worry if you're new to Jinja -- Prompterator can show you a real-time "compiled" preview of your prompts to help you write the templates.
Given a column named text
in your uploaded CSV data, you can include values from this column by
writing the simple {{text}}
template in your prompt.
If the values in your column represent more complex objects, you can still work with them but make
sure they are either valid JSON strings or valid Python expressions accepted by
ast.literal_eval
.
To parse string representations of objects, use:
fromjson
: for valid JSON strings, e.g.'["A", "B"]'
fromAstString
: for Python expressions such as dicts/lists/tuples/... (see the accepted types ofast.literal_eval
), e.g."{'key': 'value'}"
For example, given a CSV column texts
with a value "[""A"", ""B"", ""C""]"
, you can utilise this template to enumerate the individual list items
in your prompt:
{% for item in fromjson(texts) -%}
- {{ item }}
{% endfor %}
which would lead to this in your prompt:
- A
- B
- C
Structured Outputs is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don't need to worry about the model omitting a required key, or hallucinating an invalid enum value. - OpenAI website
- Check with the model developer/provider whether the model supports some kind of structured output.
- Toggle structured output switch
- Select one of the supported structured output methods (a model might support all of them but also none of them):
None
- no structured output is used (equals to toggle being in off state)Function calling
- hacky way of implementing structured outputs beforeResponse format
was implemented into APIResponse format
- new way of implementing structured outputs
- Provide JSON schema in
json schema
text input (can be generated frompydantic
model orzod
if you usenodejs
) wheretitle
must satisfy'^[a-zA-Z0-9_-]+$'
:{ "title": "get_delivery_date", "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'", "type": "object", "properties": { "order_id": { "type": "string" } }, "required": ["order_id"], "additionalProperties": false }
When working with LLMs, you would often postprocess the raw generated text. Prompterator supports this use case so that you can iterate your prompts based on inspecting/annotating postprocessed model outputs.
By default, no postprocessing is carried out. You can change this by
rewriting the postprocess
function in prompterator/postprocess_output.py
. The function will
receive one raw model-generated text at a time and should output its postprocessed version. Both
the raw and the postprocessed text are kept and saved.
While iterating your prompt on a dataset, you may find yourself annotating a model output that you already annotated in an earlier round. You can choose to automatically reuse such previously assigned labels by toggling "reuse past labels". To speed up your annotation process even more, you can toggle "skip past label rows" so that you only go through the rows for which no previously assigned label was found.
How this feature works:
- Existing labels are searched for in the current list of files in the sidebar, where a match
requires both the
response
and all the input columns' values to match. - If multiple different labels are found for a given input+output combination (a sign of inconsistent past annotation work), the most recent label is re-used.
You can find more information on Prompterator in the associated paper: https://aclanthology.org/2023.emnlp-demo.43/
If you found Prompterator helpful in your research, please consider citing it:
@inproceedings{sucik-etal-2023-prompterator,
title = "Prompterator: Iterate Efficiently towards More Effective Prompts",
author = "Su{\v{c}}ik, Samuel and
Skala, Daniel and
{\v{S}}vec, Andrej and
Hra{\v{s}}ka, Peter and
{\v{S}}uppa, Marek",
editor = "Feng, Yansong and
Lefever, Els",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-demo.43",
doi = "10.18653/v1/2023.emnlp-demo.43",
pages = "471--478",
abstract = "With the advent of Large Language Models (LLMs) the process known as prompting, which entices the LLM to solve an arbitrary language processing task without the need for finetuning, has risen to prominence. Finding well-performing prompts, however, is a non-trivial task which requires experimentation in order to arrive at a prompt that solves a specific task. When a given task does not readily reduce to one that can be easily measured with well established metrics, human evaluation of the results obtained by prompting is often necessary. In this work we present prompterator, a tool that helps the user interactively iterate over various potential prompts and choose the best performing one based on human feedback. It is distributed as an open source package with out-of-the-box support for various LLM providers and was designed to be easily extensible.",
}