BUG: Many chat models never uses SQLiteCache because of the cache instance's repr method changes! #23257

thiswillbeyourgithub · 2024-06-21T10:57:12Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import os
import json
from pathlib import Path
from langchain_community.cache import SQLiteCache
from typing import Callable, List

model_list = [
    'ChatAnthropic',  # <- has several instance of this bug, not only SQLiteCache
    'ChatBaichuan',
    'ChatCohere',
    'ChatCoze',
    'ChatDeepInfra',
    'ChatEverlyAI',
    'ChatFireworks',
    'ChatFriendli',
    'ChatGooglePalm',
    'ChatHunyuan',
    'ChatLiteLLM',
    'ChatOctoAI',
    'ChatOllama',
    'ChatOpenAI',
    'ChatPerplexity',
    'ChatYuan2',
    'ChatZhipuAI'

    # Below are the models I didn't test, as well as the reason why I haven't
    # 'ChatAnyscale',  # needs a model name
    # 'ChatDatabricks',  # needs some params
    # 'ChatHuggingFace',  # needs a modelname
    # 'ChatJavelinAIGateway',  # needs some params
    # 'ChatKinetica',  # not installed
    # 'ChatKonko',  # not installed
    # 'ChatLiteLLMRouter',  # needs router arg
    # 'ChatLlamaCpp',  #needs some params
    # 'ChatMLflowAIGateway',  # not installed
    # 'ChatMaritalk',  # needs some params
    # 'ChatMlflow',  # not installed
    # 'ChatMLX',  # needs some params
    # 'ChatPremAI',  # not installed
    # 'ChatSparkLLM',  # issue with api key
    # 'ChatTongyi',  # not installed
    # 'ChatVertexAI',  # not insalled
    # 'ChatYandexGPT',  # needs some params
]

# import the models
for m in model_list:
    exec(f"from langchain_community.chat_models import {m}")

# set fake api keys
for m in model_list:
    backend = m[4:].upper()
    os.environ[f"{backend}_API_KEY"] = "aaaaaa"
    os.environ[f"{backend}_API_TOKEN"] = "aaaaaa"
    os.environ[f"{backend}_TOKEN"] = "aaaaaa"
os.environ["GOOGLE_API_KEY"] = "aaaaaa"
os.environ["HUNYUAN_APP_ID"] = "aaaaaa"
os.environ["HUNYUAN_SECRET_ID"] = "aaaaaa"
os.environ["HUNYUAN_SECRET_KEY"] = "aaaaaa"
os.environ["PPLX_API_KEY"] = "aaaaaa"
os.environ["IFLYTEK_SPARK_APP_ID"] = "aaaaaa"
os.environ["SPARK_API_KEY"] = "aaaaaa"
os.environ["DASHSCOPE_API_KEY"] = "aaaaaa"
os.environ["YC_API_KEY"] = "aaaaaa"

# create two brand new cache
Path("test_cache.db").unlink(missing_ok=True)
c1 = SQLiteCache(database_path="test_cache.db")
c2 = SQLiteCache(database_path="test_cache.db")

def recur_dict_check(val: dict) -> List[str]:
    "find which object is causing the issue"
    found = []
    for k, v in val.items():
        if " object at " in str(v):
            if isinstance(v, dict):
                found.append(recur_dict_check(v))
            else:
                found.append(v)
    # flatten the list
    out = []
    for f in found:
        if isinstance(f, list):
            out.extend(f)
        else:
            out.append(f)
    assert out
    out = [str(o) for o in out]
    return out


def check(chat_model: Callable, verbose: bool = False) -> bool:
    "check a given chatmodel"
    llm1 = chat_model(
        cache=c1,
        )
    llm2 = chat_model(
        cache=c2,
    )
    backend = llm1.get_lc_namespace()[-1]
 
    str1 = llm1._get_llm_string().split("---")[0]
    str2 = llm2._get_llm_string().split("---")[0]

    if verbose:
        print(f"LLM1:\n{str1}")
        print(f"LLM2:\n{str2}")


    if str1 == str2:
        print(f"{backend.title()} does not have the bug")
        return True
    else:
        print(f"{backend.title()} HAS the bug")
        j1, j2 = json.loads(str1), json.loads(str2)
        assert j1.keys() == j2.keys()
        diff1 = recur_dict_check(j1)
        diff2 = recur_dict_check(j2)
        assert len(diff1) == len(diff2)
        diffs = [str(v).split("object at ")[0] for v in diff1 + diff2]
        assert all(diffs.count(elem) == 2 for elem in diffs)

        print(f"List of buggy objects for model {backend.title()}:")
        for d in diff1:
            print(f"    - {d}")

        # for k, v in j1

        return False

failed = []
for model in model_list:
    if not check(locals()[model]):
        failed.append(model)

print(f"The culprit is at least SQLiteCache repr string:\n{c1}\n{c2}")
c1.__class__.__repr__ = lambda x=None : "<langchain_community.cache.SQLiteCache>"
c2.__class__.__repr__ = lambda x=None : "<langchain_community.cache.SQLiteCache>"
print(f"Now fixed:\n{c1}\n{c2}\n")

# Anthropic still has issues
assert not check(locals()["ChatAnthropic"])

for model in failed:
    if model == "ChatAnthropic":  # anthropic actually has more issues!
        continue
    assert check(locals()[model]), model
print("Fixed it for most models!")

print(f"Models with the issue: {len(failed)} / {len(model_list)}")
for f in failed:
    print(f"    - {f}")

Error Message and Stack Trace (if applicable)

No response

Description

Being affected by this bug in my DocToolsLLM project I ended up, instead of ChatLiteLLM for all models, using directly ChatOpenAI if the model asked is by openai anyway.

The other day I noticed that my SQLiteCcache was getting systematically ignored only by ChatOpenAI and ended up figuring out the culprit :

To know if a value is present in the cache, the prompt AND a string characterizing the LLM is used.
The method used to characterize the LLM is _get_llm_string()
This method's implementation is inconsistent across chat models, causing outputs to contain the unfiltered repr object of for example cache, callbacks etc.
The issue is that for a lot of instance, the repr returns something like <langchain_community.cache.SQLiteCache object at SOME_ADRESS>
I found that manually setting the repr of the superclass of those object is a viable workaround

To help you fix this ASAP I coded an loop that checks over all chat models and tells you what instance is causing the issue.

System Info

python -m langchain_core.sys_info

System Information

OS: Linux
OS Version: #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2
Python Version: 3.11.7 (main, Jun 12 2024, 12:57:34) [GCC 11.4.0]

Package Information

langchain_core: 0.2.7
langchain: 0.2.5
langchain_community: 0.2.5
langsmith: 0.1.77
langchain_mistralai: 0.1.8
langchain_openai: 0.1.8
langchain_text_splitters: 0.2.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

The text was updated successfully, but these errors were encountered:

eyurtsev · 2024-06-21T14:50:53Z

Re-opened after reading a bit more carefully. @thiswillbeyourgithub if you're sharing a minimal example in the future, it's much better to share the example itself, and then any utility code to identify more cases. the utility code uses a bunch of functionality (e.g., exec) -- that made this look like spam on a first read

thiswillbeyourgithub · 2024-06-21T14:53:30Z

I'm writing a minimal example right now

eyurtsev · 2024-06-21T14:57:32Z

Hi @thiswillbeyourgithub, thanks!

I confirmed locally already, so we're all set :)

from langchain_anthropic import ChatAnthropic
from langchain_core.caches import InMemoryCache

cache = InMemoryCache()
model = ChatAnthropic(cache=InMemoryCache(), model_name='hello')
model._get_llm_string()

thiswillbeyourgithub · 2024-06-21T14:59:41Z

Alright, sorry for the extensive reproduction but at first I showed this on ChatOpenAI then noticed that the issue was so extensive (affecting at least 7 chat models) and is not only related to cache but sometimes to other attributes as well for example for Anthropic as you saw.

thiswillbeyourgithub · 2024-06-21T15:00:58Z

So my original code is a bit awkward but allows to quickly see a lower bound of what attributes of which model is posing problem

eyurtsev · 2024-06-21T15:01:56Z

Issue is here:

langchain/libs/core/langchain_core/language_models/chat_models.py

Line 393 in 61daa16

llm_string = dumps(self)

Likely affected by any other helper objects (e.g., client)

…23416) Fix LLM string representation for serializable objects. Fix for issue: #23257 The llm string of serializable chat models is the serialized representation of the object. LangChain serialization dumps some basic information about non serializable objects including their repr() which includes an object id. This means that if a chat model has any non serializable fields (e.g., a cache), then any new instantiation of the those fields will change the llm representation of the chat model and cause chat misses. i.e., re-instantiating a postgres cache would result in cache misses!

thiswillbeyourgithub · 2024-09-27T16:39:48Z

This is a critical bug, don't you think @eyurtsev ?

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jun 21, 2024

eyurtsev closed this as not planned Won't fix, can't repro, duplicate, stale Jun 21, 2024

eyurtsev reopened this Jun 21, 2024

eyurtsev added the 01 bug Confirmed bug label Jun 21, 2024

eyurtsev self-assigned this Jun 21, 2024

eyurtsev mentioned this issue Jun 25, 2024

core[patch]: Fix llm string representation for serializable models #23416

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 20, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Many chat models never uses SQLiteCache because of the cache instance's repr method changes! #23257

BUG: Many chat models never uses SQLiteCache because of the cache instance's repr method changes! #23257

thiswillbeyourgithub commented Jun 21, 2024

eyurtsev commented Jun 21, 2024 •

edited

Loading

thiswillbeyourgithub commented Jun 21, 2024

eyurtsev commented Jun 21, 2024 •

edited

Loading

thiswillbeyourgithub commented Jun 21, 2024

thiswillbeyourgithub commented Jun 21, 2024

eyurtsev commented Jun 21, 2024

thiswillbeyourgithub commented Sep 27, 2024

BUG: Many chat models never uses SQLiteCache because of the cache instance's __repr__ method changes! #23257

BUG: Many chat models never uses SQLiteCache because of the cache instance's __repr__ method changes! #23257

Comments

thiswillbeyourgithub commented Jun 21, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

eyurtsev commented Jun 21, 2024 • edited Loading

thiswillbeyourgithub commented Jun 21, 2024

eyurtsev commented Jun 21, 2024 • edited Loading

thiswillbeyourgithub commented Jun 21, 2024

thiswillbeyourgithub commented Jun 21, 2024

eyurtsev commented Jun 21, 2024

thiswillbeyourgithub commented Sep 27, 2024

BUG: Many chat models never uses SQLiteCache because of the cache instance's repr method changes! #23257

BUG: Many chat models never uses SQLiteCache because of the cache instance's repr method changes! #23257

eyurtsev commented Jun 21, 2024 •

edited

Loading

eyurtsev commented Jun 21, 2024 •

edited

Loading