Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core[minor]: support pydantic v2 models in PydanticOutputParser #18811

Merged
merged 14 commits into from
Mar 27, 2024

Conversation

jnis23
Copy link
Contributor

@jnis23 jnis23 commented Mar 8, 2024

As mentioned in #18322, the current PydanticOutputParser won't work for anyone trying to parse to pydantic v2 models. This PR adds a separate PydanticV2OutputParser, as well as a langchain_core.pydantic_v2 namespace that will fail on import to any projects using pydantic<2. Happy to update the docs for output parsers if this is something we're interesting in adding.

On a separate note, I also updated check_pydantic.sh to detect pydantic imports with leading whitespace and excluded the internal namespaces. That change can be separated into its own PR if needed.

Copy link

vercel bot commented Mar 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 27, 2024 2:13pm

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: parsing Related to output parser module 🤖:improvement Medium size change to existing code to handle new use-cases labels Mar 8, 2024
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will require investigation

@@ -0,0 +1,55 @@
import json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit worried about getting this into the code base. Definitely into core as it will create duplicated code in core when it upgrades to pydantic 2

Not sure, but this could potentially create issues when using in a chain or a runnable map together since it'll cause mixing of pydantic v1 and v2 code.

@eyurtsev eyurtsev added investigate Flagged for investigation. Ɑ: pydantic Related to pydantic or pydantic migration labels Mar 8, 2024
@eyurtsev
Copy link
Collaborator

eyurtsev commented Mar 8, 2024

@jnis23 Apologies for blocking. I think we need to do a bit of investigation first to figure out the pathway for a pydantic v2 migration and how this would interact with it.

@eyurtsev eyurtsev self-assigned this Mar 8, 2024
@jnis23
Copy link
Contributor Author

jnis23 commented Mar 8, 2024

@eyurtsev No problem at all. Agreed it feels bad to dupe that much code, but couldn't think of a better way without adding some additional abstraction. Lmk if there is anything I can do to help re: the investigation!

@eyurtsev
Copy link
Collaborator

eyurtsev commented Mar 8, 2024

@jnis23 we basically have two options:

  1. spec out full support for pydantic v2 for LangChain (at least in terms of roadmap)
  2. get some partial fixes in and check that they don't fail in unexpected ways.

If you want to investigate (2), I'd check that it still works with something like prompt | model | parser chain and write a unit test for that. I'm mostly worried about type inference (via OutputType/InputType) ending up and mixing pydnatic v1 with pydantic v2 types (which would crash things), I don't know if it can happen, but would love some code to double check it.


In this case, could we not do something like:

TBaseModel = TypeVar("TBaseModel", bound=PydanticBaseModel)

with PydanticBaseModel being Union[BaseModelv2, BaseModelv1] if pydantic version == 2, and
PydanticBaseModel = BaseModelv1 if pydantic version == 1?

  • other conditional behavior based on version, so it's still a single interface

@eyurtsev
Copy link
Collaborator

eyurtsev commented Mar 8, 2024

for

  1. spec out full support for pydantic v2 for LangChain (at least in terms of roadmap)

I'll try to check a potential pathway forward for this next week, and will share if it looks promising

@jnis23
Copy link
Contributor Author

jnis23 commented Mar 8, 2024

@jnis23 we basically have two options:

1. spec out full support for pydantic v2 for LangChain (at least in terms of roadmap)

2. get some partial fixes in and check that they don't fail in unexpected ways.

If you want to investigate (2), I'd check that it still works with something like prompt | model | parser chain and write a unit test for that. I'm mostly worried about type inference (via OutputType/InputType) ending up and mixing pydnatic v1 with pydantic v2 types (which would crash things), I don't know if it can happen, but would love some code to double check it.

I'm currently using the v2 parser in a chain without issue so I can add a test for that. Are there supported use cases for outputs getting piped back into other runnables? I can definitely see that being a problem if the runnable depends on v1. If you know of any example off the top of your head, I can start there, otherwise I can dig.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Mar 8, 2024
@jnis23
Copy link
Contributor Author

jnis23 commented Mar 11, 2024

@eyurtsev Was able to get both versions working under the existing PydanticOutputParser. Pyright can figure out the typing, but Mypy gets hung up on the type alias being re-declared. I also don't love that we need the conditional checks in _parse_obj for both the BaseModel and the ValidationError, but it should get the job done until a full transition to V2.

If this looks good, I can nuke the V2 class

# * This change is easier to roll out and roll back.


if get_pydantic_major_version() < 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no side effects on import since the import is triggered by default rather than user action

@@ -0,0 +1,24 @@
import warnings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that the namespace is needed yet -- could you review the code snippets -- i think some form of either those versions or else local scope imports will work

if PYDANTIC_MAJOR_VERSION == 2 and issubclass(
self.pydantic_object, V2BaseModel
):
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation error can be imported here locally -- the import is cached so there's no issue in terms of speed

if VERSION = 2:
   if issubclass(self.pydantic_object, V1):
      .....
       except pydantic.v1.ValidationError:
   else:
       except pydantic.ValidationError:


from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.outputs import Generation
from langchain_core.pydantic_v1 import BaseModel, ValidationError
from langchain_core.pydantic_v2 import BaseModel as V2BaseModel
from langchain_core.pydantic_v2 import ValidationError as V2ValidationError
from langchain_core.utils.pydantic import PYDANTIC_MAJOR_VERSION

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic_v2 module could probably be eliminated to do something like this instead:

import pydantic

if VERSION = 2:
  PydanticBaseModel = Union[pydantic.BaseModel, pydantic.v1.BaseModel]
else:
 PydanticBaseModel = pydantic.BaseModel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyurtsev Definitely like this approach better, it's much cleaner. The namespace was to get around the check_pydantic script that gets run as part of linting and in CI. If we're fine adding an exclusion for this file and the test, I can push up these changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah OK skipping this file or adding some # pydantic: ignore type filter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, added support for # pydantic: ignore. Lmk how the rest looks

@eyurtsev
Copy link
Collaborator

@jnis23 looks good. you can remove the older version of the code. I added a few suggestions that I think can further reduce the code foot print of the change, let me know if those work

@jnis23 jnis23 changed the title core: add PydanticV2OutputParser core: support pydantic v2 models in PydanticOutputParser Mar 26, 2024
@jnis23
Copy link
Contributor Author

jnis23 commented Mar 27, 2024

@eyurtsev ready for you to re-review when you can

@eyurtsev eyurtsev self-requested a review March 27, 2024 19:34
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 27, 2024
@eyurtsev eyurtsev changed the title core: support pydantic v2 models in PydanticOutputParser core[minor]: support pydantic v2 models in PydanticOutputParser Mar 27, 2024
@eyurtsev eyurtsev merged commit 2e0ddd6 into langchain-ai:master Mar 27, 2024
95 checks passed
@eyurtsev
Copy link
Collaborator

Thank you @jnis23!

@jnis23 jnis23 deleted the pydantic-v2-parser branch March 27, 2024 20:19
gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
…chain-ai#18811)

As mentioned in langchain-ai#18322, the current PydanticOutputParser won't work for
anyone trying to parse to pydantic v2 models. This PR adds a separate
`PydanticV2OutputParser`, as well as a `langchain_core.pydantic_v2`
namespace that will fail on import to any projects using pydantic<2.
Happy to update the docs for output parsers if this is something we're
interesting in adding.

On a separate note, I also updated `check_pydantic.sh` to detect
pydantic imports with leading whitespace and excluded the internal
namespaces. That change can be separated into its own PR if needed.

---------

Co-authored-by: Jan Nissen <[email protected]>
hinthornw pushed a commit that referenced this pull request Apr 26, 2024
As mentioned in #18322, the current PydanticOutputParser won't work for
anyone trying to parse to pydantic v2 models. This PR adds a separate
`PydanticV2OutputParser`, as well as a `langchain_core.pydantic_v2`
namespace that will fail on import to any projects using pydantic<2.
Happy to update the docs for output parsers if this is something we're
interesting in adding.

On a separate note, I also updated `check_pydantic.sh` to detect
pydantic imports with leading whitespace and excluded the internal
namespaces. That change can be separated into its own PR if needed.

---------

Co-authored-by: Jan Nissen <[email protected]>
@eyurtsev
Copy link
Collaborator

eyurtsev commented Jun 7, 2024

Hi @jnis23 not sure if you have any cycles to look into this: #22669

Potentially related to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases investigate Flagged for investigation. lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: parsing Related to output parser module Ɑ: pydantic Related to pydantic or pydantic migration size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants