[v0.2.1] change default server settings (add small and large), enable…

… env variables to config appl, update cursorrules * Can configurate default small and large models, unify the interface (server name) for the developpers and users. * Enable using environmental variable to update appl configs (requires `jsonargparse`). * Update cursorrules to better use cursor to compose codes. * Update docs for setup observability platforms. * Support caching request with response_format as a pydantic model. * Add `.env.example` to help setup for new projects. * Add hint when Langfuse or Lunary is not configured.
appl-team · Dec 23, 2024 · 248c409 · 248c409
1 parent 4fbdab8
commit 248c409
Show file tree

Hide file tree

Showing 22 changed files with 348 additions and 104 deletions.
diff --git a/.cursorrules b/.cursorrules
@@ -7,7 +7,7 @@ Best Practices for writing code:
 </general_rules>
 
 <working_with_llms>
-When calling Large Language Models (LLMs), you should use the package `appl` to build the application.
+When calling Large Language Models (LLMs), you should use the package `appl` to build the application (The name on `pypi` is `applang`, so it should be installed with `pip install -U applang`).
 
 When working with LLMs, you should have clear format instructions to guide the LLMs to generate the desired output, and have appropriate parser to parse the output into the desired format. The format should be designed clear and easy to parse, consider using format like markdown's code block.
 
@@ -16,13 +16,14 @@ You should add `stream=True` for tasks that require LLMs to generate a large amo
 
 <explain_appl>
 APPL is a package that integrates prompts of LLMs into the code.
+- `@ppl` is a decorator that marks a function as a prompt function, the function cannot be a coroutine (async function).
 - Grow your prompt by calling `grow()`, a implicit newline is added between each component. When being asked to be implicit, you can remove the `grow()` function and leave the content inside `grow` as it is, APPL will automatically add the `grow()` function for you during runtime.
 - The docstring of the `@ppl` function will not be counted as a part of the prompt by default. If that part is meant to be the system prompt, you can specified that using `@ppl(docstring_as="system")`.
 - The `gen` function is a wrapper around `litellm.completion`, it returns a future object, it will automatically takes the prompt captured so far as the prompt for the LLM. See the example below for more details. Note that you do not need to wrap `gen` in AIRole() scope to call it for generation.
 - You can use `with role:` to specify the role of the message, for example `with AIRole():` to specify the prompt growed in the scope as the assistant message. The default scope is `user`.
 - To get the result of `gen` immediately, use `str()` to convert it to a string. Otherwise, it is a `Generation` object where you can take the `result` attribute to get the result.
-- Try to delay the time you get the result of `gen` as much as possible, so that the code can be more parallelized.
-- You should avoid using multi-line string in `@ppl` function as much as possible. But when needed, write them with indentation aligning with the code, it will be dedented similar to docstring before being used in the code.
+- Try to delay the time you get the result of `gen` as much as possible, so that the code can be more parallelized. See the example below for more details.
+- When writing multi-line prompt, it is recommended to `grow` the prompt multiple times to utilize the implicit compositor that adds a newline between each component. This way provides a better control over the prompt where you can easily comment out parts of the prompt. But you can also use multi-line string with indentation aligning with the code (it will be dedented similar to docstring before being used in the code).
 
 <example>
 
@@ -94,6 +95,7 @@ Prompt:
 ```
 Output: Mona Lisa.
 
+The two questions are answered in parallel, since the generation are future objects until being evaluated by `str` or printing, so the main process is not blocked by the LLM.
 </example>
 
 <example>
@@ -158,6 +160,8 @@ The prompt for both `gen` calls in `hello1` and `hello2` will looks like:
 
 <example>
 You can use compositors to build the prompt, which specify the indexing, indentation, and separation between different parts of the prompt (growed by `grow()`) inside its scope. Some useful compositors include: Tagged, NumberedList.
+The Tagged wraps the content inside with opening and closing tag, and NumberedList indexes each single prompt part.
+You are encouraged to use Tagged to wrap contents to make the prompt more readable.
 
 ```python
 from appl import ppl, gen, grow
@@ -169,6 +173,7 @@ def guess_output(hints: list[str], inputs: str):
     grow("Guess the output of the input.")
     with Tagged("hints"):
         with NumberedList():
+            grow("First hint")
             grow(hints)  # list will be captured one by one
 
     with Tagged("inputs"):
@@ -178,15 +183,17 @@ def guess_output(hints: list[str], inputs: str):
 
     return gen()
 
-print(guess_output(["The output is the sum of the numbers"], "1, 2, 3"))
+print(guess_output(["The output involves addition", "The output is a single number"], "1, 2, 3"))
 ```
 
 The prompt will looks like:
 ```yaml
 - User:
     Guess the output of the input.
     <hints>
-    1. The output is the sum of the numbers
+    1. First hint
+    2. The output involves addition
+    3. The output is a single number
     </hints>
     <inputs>
     1, 2, 3
@@ -197,7 +204,7 @@ The prompt will looks like:
 </example>
 
 <best_practices>
-Though you can call LLMs with simple tasks sharing the same context multiple times, you are encouraged to combine them into a single call with proper formatting and parsing (or using pydantic model) to reduce cost. For example, when being asked to generate a person's name and age:
+Though you can call LLMs to generate one thing at a time, you are encouraged to combine them into a single call with proper formatting and parsing (or using pydantic model as `response_format`) to reduce cost. For example, when being asked to generate a person's name and age:
 ```python
 class Person(BaseModel):
     name: str
@@ -226,7 +233,7 @@ def parse_to_get_name_and_age() -> Person:
     person: Person = parse_response(response)
     return person
 
-# or this (generally more recommended)
+# or this (generally more recommended, you do not need to include format instructions in the prompt)
 @ppl
 def pydantic_to_get_name_and_age() -> Person:
     grow("Generate a person's name and age.")

diff --git a/.env.example b/.env.example
@@ -0,0 +1,14 @@
+# API keys
+OPENAI_API_KEY=<your-openai-api-key>
+ANTHROPIC_API_KEY=<your-anthropic-api-key>
+
+# Observability platform
+## Langfuse
+## You can find the keys at: <your-langfuse-host>/project/<project-id>/setup (Project Dashboard -> Configure Tracing)
+LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
+LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
+LANGFUSE_HOST=<your-langfuse-host>
+
+## Lunary
+LUNARY_PUBLIC_KEY=<your-lunary-public-key>
+LUNARY_API_URL=<your-lunary-api-url>
diff --git a/.gitignore b/.gitignore
@@ -17,6 +17,9 @@ dumps
 docs/reference
 docs/docs
 
+# changelog
+changelogs
+
 # appl
 appl.yml
 

diff --git a/README.md b/README.md
@@ -212,14 +212,16 @@ For a more comprehensive tutorial, please refer to the [tutorial](https://appl-t
 - [Prompt Coding Helpers](https://appl-team.github.io/appl/tutorials/6_prompt_coding)
 - [Using Tracing](https://appl-team.github.io/appl/tutorials/7_tracing)
 
-### Cookbook
+### Cookbook and Applications
 For more detailed usage and examples, please refer to the [cookbook](https://appl-team.github.io/appl/cookbook).
 
-APPL can be used to reproduce some popular LM-based applications easily, such as:
-* [Tree of Thoughts](https://github.com/princeton-nlp/tree-of-thought-llm)[[APPL implementation](examples/advanced/tree_of_thoughts/)]: deliberate problem solving with Large Language Models.
+We use APPL to reimplement popular LLM and prompting algorithms in [Reppl](https://github.com/appl-team/reppl), such as:
+* [Tree of Thoughts](https://github.com/princeton-nlp/tree-of-thought-llm) [[Re-implementation](https://github.com/appl-team/reppl/tree/main/tree-of-thoughts/)] [[APPL Example](examples/advanced/tree_of_thoughts/)]: deliberate problem solving with Large Language Models.
+
+We use APPL to build popular LM-based applications, such as:
 * [Wordware's TwitterPersonality](https://twitter.wordware.ai/)[[APPL implementation](https://github.com/appl-team/TwitterPersonality)]: analyzes your tweets to determine your Twitter personality.
 
-We also use APPL to build small LLM-powered libraries, such as:
+We use APPL to build small LLM-powered libraries, such as:
 * [AutoNaming](https://github.com/appl-team/AutoNaming): automatically generate names for experiments based on argparse arguments.
 * [ExplErr](https://github.com/appl-team/ExplErr): a library for error explanation with LLMs.
 

diff --git a/docs/setup.md b/docs/setup.md
@@ -16,6 +16,11 @@ For example, you can create a `.env` file with the following content to specify
 OPENAI_API_KEY=<your openai api key>
 ```
 
+We provide an example of `.env.example` file in the root directory, you can copy it to your project directory and modify it.
+```bash title=".env.example"
+--8<-- ".env.example"
+```
+
 ### Export or Shell Configuration
 Alterantively, you can export the environment variables directly in your terminal, or add them to your shell configuration file (e.g., `.bashrc`, `.zshrc`). For example:
 ```bash
@@ -30,8 +35,15 @@ export OPENAI_API_KEY=<your openai api key>
 --8<-- "src/appl/default_configs.yaml"
 ```
 
-??? note "You should setup your own default server."
-    The default server (currently `gpt-4o-mini`) set in APPL could be outdated and changed in the future. We recommend you to specify your own default model in the `appl.yaml` file.
+??? note "Setup your default models"
+    You should specify your own default model in the `appl.yaml` file. You may also specify the default "small" and "large" models, which will fallback to the default model if not specified.
+    The name can be a server name in your configuration (`servers` section), or a model name that is supported by litellm.
+    ```yaml title="appl.yaml (example)"
+    settings:
+      model:
+        default: gpt-4o-mini # small model fallback to this
+        large: gpt-4o
+    ```
 
 ### Override Configs
 You can override these configurations by creating a `appl.yaml` file in the root directory of your project (or other directories, see [Priority of Configurations](#priority-of-configurations) for more details). A typical usage is to override the `servers` configuration to specify the LLM servers you want to use, as shown in the following example `appl.yaml` file.
@@ -96,6 +108,43 @@ settings:
 
 To resume from a previous trace, you can specify the `APPL_RESUME_TRACE` environment variable with the path to the trace file. See more details in the [tutorial](./tutorials/7_tracing.md).
 
+## Visualize Traces
+
+### Langfuse (Recommended)
+
+Langfuse is an open-source web-based tool for visualizing traces and LLM calls.
+
+You can host Langfuse [locally](https://langfuse.com/self-hosting) or use [public version](https://langfuse.com/).
+
+```bash
+git clone https://github.com/langfuse/langfuse.git
+cd langfuse
+docker compose up
+```
+
+Then you can set the environment variables for the Langfuse server by:
+
+```bash title=".env"
+LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
+LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
+LANGFUSE_HOST=<your-langfuse-host>
+# Set to http://localhost:3000 if you are hosting Langfuse locally
+```
+You can find your Langfuse public and private API keys in the project settings page (Project Dashboard -> Configure Tracing).
+
+Please see [the tutorial](./tutorials/7_tracing.md#langfuse-recommended) for more details.
+
+You can see conversation like:
+
+![Langfuse Conversation](./_assets/tracing/langfuse_convo.png)
+
+and the timeline like:
+
+![Langfuse Timeline](./_assets/tracing/langfuse_timeline.png)
+
+### Lunary
+Please see [the tutorial](./tutorials/7_tracing.md#lunary) for more details.
+
 ### LangSmith
 To enable [LangSmith](https://docs.smith.langchain.com/) tracing, you need to to [obtain your API key](https://smith.langchain.com/settings) from LangSmith and add the following environment variables to your `.env` file:
 

diff --git a/docs/tutorials/7_tracing.md b/docs/tutorials/7_tracing.md
@@ -78,12 +78,13 @@ docker compose up
 
 Then you can set the environment variables for the Langfuse server by:
 
-```bash
-export LANGFUSE_PUBLIC_KEY=<your public key>
-export LANGFUSE_SECRET_KEY=<your secret key>
-export LANGFUSE_HOST=http://localhost:3000
+```bash title=".env"
+LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
+LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
+LANGFUSE_HOST=<your-langfuse-host>
+# Set to http://localhost:3000 if you are hosting Langfuse locally
 ```
-You can find your Langfuse public and private API keys in the project settings page.
+You can find your Langfuse public and private API keys in the project settings page (Project Dashboard -> Configure Tracing).
 
 Then you can visualize the traces by:
 
@@ -99,6 +100,9 @@ and the timeline like:
 
 ![Langfuse Timeline](../_assets/tracing/langfuse_timeline.png)
 
+??? question "Troubleshooting: Incomplete traces on Langfuse"
+    You may see incomplete traces (function calls tree) in Langfuse when you click from the `Traces` page. This might because langfuse apply a filter based on the timestamp. Try to remove the `?timestamp=<timestamp>` in the url and refresh the page.
+
 ### Lunary 
 
 Lunary is another open-source web-based tool for visualizing traces and LLM calls.

diff --git a/examples/advanced/tree_of_thoughts/appl.yaml b/examples/advanced/tree_of_thoughts/appl.yaml
@@ -1,5 +1,7 @@
-servers:
+default_servers:
   default: gpt4o-t07
+
+servers:
   gpt4o-t07:
     model: gpt-4o
     temperature: 0.7
diff --git a/examples/appl.yaml b/examples/appl.yaml
@@ -11,9 +11,11 @@ settings:
   tracing:
     enabled: true
 
+# default_servers:
+#   default: azure-gpt35 # override the default server according to your needs
+
 # example for setting up servers
 servers:
-  # default: azure-gpt35 # override the default server according to your needs
   azure-gpt35: # the name of the server
     model: azure/gpt35t # the model name
     # temperature: 1.0 # set the default temperature for the calls to this server

diff --git a/examples/usage/cmd_args.py b/examples/usage/cmd_args.py
@@ -3,15 +3,20 @@
 import appl
 from appl import gen, ppl
 
-# option 1: update part of the configs in a dict
-# appl.init(servers={"default": "gpt-4o"})
+# * option 1: update part of the configs in a dict
+# appl.init(default_servers={"default": "gpt-4o", "small": "gpt-4o-mini"})
+# !! update the default server to `gpt-4o`, and the small server to `gpt-4o-mini`
+# !! used for gen(...) and gen(server="small", ...)
 
-# option 2: get options from command line
+# * option 2: get options from command line or environment variables
 parser = appl.get_parser()
 parser.add_argument("--name", type=str, default="APPL")
 args = parser.parse_args()
 appl.update_appl_configs(args.appl)
-# python cmd_args.py --appl.servers.default gpt-4o
+
+# * Both change the default server to `gpt-4o`
+# python cmd_args.py --appl.default_servers.default gpt-4o
+# _APPL__DEFAULT_SERVERS__DEFAULT=gpt-4o python cmd_args.py
 
 
 @ppl  # the @ppl decorator marks the function as an `APPL function`

diff --git a/pdm.lock b/pdm.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "pdm.backend"
 
 [project]
 name = "applang"
-version = "0.2.0"
+version = "0.2.1"
 description = "A Prompt Programming Language"
 authors = [
     { name = "Honghua Dong", email = "[email protected]" },
@@ -30,6 +30,7 @@ dependencies = [
     "rich>=13.8.1",
     "pillow>=11.0.0",
     "deprecated>=1.2.15",
+    "jsonargparse>=4.35.0",
 ]
 requires-python = ">=3.9"
 readme = "README.md"

diff --git a/src/appl/__init__.py b/src/appl/__init__.py
@@ -90,9 +90,11 @@
 from .version import __version__
 
 
-def get_parser():
+def get_parser(
+    env_prefix: str = "", default_env: bool = True, **kwargs: Any
+) -> ArgumentParser:
     """Get an argument parser with configurable APPL configs."""
-    parser = ArgumentParser()
+    parser = ArgumentParser(env_prefix=env_prefix, default_env=default_env, **kwargs)
     parser.add_argument("--appl", type=APPLConfigs, default=global_vars.configs)
     return parser
 

diff --git a/src/appl/caching/db.py b/src/appl/caching/db.py
@@ -7,6 +7,7 @@
 
 from litellm import ModelResponse
 from loguru import logger
+from pydantic import BaseModel
 
 from ..core.globals import global_vars
 from ..core.types.caching import DBCacheBase
@@ -186,6 +187,20 @@ def insert(
             )
 
 
+def _serialize_args(args: Dict[str, Any]) -> str:
+    args = args.copy()
+    for k, v in args.items():
+        # dump as schema if it is a pydantic model
+        if isinstance(v, type):
+            if issubclass(v, BaseModel):
+                args[k] = v.model_json_schema()
+            else:
+                # TODO: convert to a schema
+                logger.warning(f"Unknown type during serialization: {type(v)}")
+                args[k] = str(v)
+    return json.dumps(args)
+
+
 def find_in_cache(
     args: Dict[str, Any], cache: Optional[DBCacheBase] = None
 ) -> Optional[ModelResponse]:
@@ -207,7 +222,7 @@ def find_in_cache(
     ):
         return None
     # only cache the completions with temperature == 0
-    value = cache.find(json.dumps(args))
+    value = cache.find(_serialize_args(args))
     if value is None:
         return None
     return dict_to_pydantic(value, ModelResponse)
@@ -226,7 +241,7 @@ def add_to_cache(
         and not global_vars.configs.settings.caching.allow_temp_greater_than_0
     ):
         return
-    args_str = json.dumps(args)
+    args_str = _serialize_args(args)
     value_dict = pydantic_to_dict(value)
     logger.info(f"Adding to cache, args: {args_str}, value: {value_dict}")
     cache.insert(args_str, value_dict)