Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tokenizers optional for Python 3.13 compatibility #719

Closed
wants to merge 4 commits into from

Conversation

simonw
Copy link

@simonw simonw commented Oct 29, 2024

Refs:

This package cannot be install on Python 3.13 at the moment because the tokenizers package it depends on does not have a release for that Python version yet.

Since tokenizers is only used for an undocumented count_tokens() feature that works for Claude 2 but not Claude 3/3.5 I think making it an optional dependency is a good idea.

@simonw simonw requested a review from a team as a code owner October 29, 2024 14:04
@simonw
Copy link
Author

simonw commented Oct 29, 2024

It looks like CI tests don't run against different Python versions at the moment - I ran these tests against Python 3.13 on my laptop like this:

cd anthropic-sdk-python
uv venv --python 3.13
./scripts/mock --daemon # Start mock server
uv run pytest

@simonw
Copy link
Author

simonw commented Oct 29, 2024

Since this PR makes the tokenizers package optional (which is the thing blocking a Python 3.13 release) the legacy, undocumented count_tokens() method now does this:

>>> import anthropic
>>> client = anthropic.Anthropic()
>>> client.count_tokens("hi")
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    client.count_tokens("hi")
    ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/simon/Dropbox/Development/anthropic-sdk-python/src/anthropic/_client.py", line 281, in count_tokens
    tokenizer = self.get_tokenizer()
  File "/Users/simon/Dropbox/Development/anthropic-sdk-python/src/anthropic/_client.py", line 286, in get_tokenizer
    return sync_get_tokenizer()
  File "/Users/simon/Dropbox/Development/anthropic-sdk-python/src/anthropic/_tokenizers.py", line 41, in sync_get_tokenizer
    return _load_tokenizer(text)
  File "/Users/simon/Dropbox/Development/anthropic-sdk-python/src/anthropic/_tokenizers.py", line 29, in _load_tokenizer
    from tokenizers import Tokenizer
ModuleNotFoundError: No module named 'tokenizers'

I think that's OK. Running pip install tokenizers (not on Python 3.13) fixes that.

Also, that method is presumably undocumented for a reason: it uses an old Claude 2 tokenizer. I wish there WAS a way to count tokens for Claude 3 and 3.5 but there currently is not!

@simonw simonw changed the title Make tokenizers compatible for Python 3.13 compatibility Make tokenizers optional for Python 3.13 compatibility Oct 29, 2024
"jiter>=0.4.0, <1",
]
requires-python = ">= 3.7"
requires-python = ">= 3.8"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was necessary because otherwise the jiter dependency cannot be successfully resolved:

% uv sync
  × No solution found when resolving dependencies:
  ╰─▶ Because the requested Python version (>=3.7) does not satisfy Python>=3.9 and the requested Python version (>=3.7) does not satisfy
      Python>=3.8,<3.9, we can conclude that Python>=3.8 is incompatible.
      And because jiter>=0.4.0 depends on Python>=3.8 and only the following versions of jiter are available:
          jiter<=0.4.0
          jiter==0.4.1
          jiter==0.4.2
          jiter==0.5.0
          jiter==0.6.0
          jiter==0.6.1
      we can conclude that jiter>=0.4.0 cannot be used.
      And because your project depends on jiter>=0.4.0 and your project requires anthropic[vertex], we can conclude that your projects's
      requirements are unsatisfiable.

      hint: The `requires-python` value (>=3.7) includes Python versions that are not supported by your dependencies (e.g., jiter>=0.4.0 only
      supports >=3.8). Consider using a more restrictive `requires-python` value (like >=3.8).

@@ -14,11 +14,9 @@ dependencies = [
"anyio>=3.5.0, <5",
"distro>=1.7.0, <2",
"sniffio",
"cached-property; python_version < '3.8'",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this since the requires-python line means Python 3.7 is no longer supported, so this polyfill is not necessary.

The Python 3.7 EOL was June 27, 2023 so it's been unsupported for over a year at this point (technically 3.8 is unsupported now as well).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes sense to drop 3.7 support now, we just haven't had a compelling reason to do so yet. Will double check internally first before confirming that dropping support is okay.

Copy link
Author

@simonw simonw Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 3.7 support may have been accidentally dropped already.

Running 3.7 these days is a bit tricky, but I managed to get a working interpreter using https://github.com/indygreg/python-build-standalone/releases/tag/20191025 like this:

cd /tm
mkdir py37
cd py37
wget 'https://github.com/indygreg/python-build-standalone/releases/download/20191025/cpython-3.7.5-macos-20191026T0535.tar.zst'
unzstd cpython-3.7.5-macos-20191026T0535.tar.zst
tar -xvf cpython-3.7.5-macos-20191026T0535.tar
cd python/install/bin/
./python3 -m pip install cowsay

That demonstrated that pip install cowsay worked. But then:

./python3 -m pip install jiter

Said this:

ERROR: Could not find a version that satisfies the requirement jiter (from versions: none)
ERROR: No matching distribution found for jiter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! thanks for catching that, we really need multi-version tests in CI...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're due for an upgrade from rye to uv, at which point adding multi-version tests (both in CI and on local laptops) should be pretty simple.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I'm hoping to make that change soon

@RobertCraigie
Copy link
Collaborator

Thanks for the PR! I agree it just does make more sense for tokenizers to be an optional dependency but it is a breaking change so I'm checking if we have any sense of how many people are using .count_tokens() & if this is an acceptable breakage

@RobertCraigie
Copy link
Collaborator

looks like the tests are failing because tokenizers needs to be added as a dev dep now

pyproject.toml Outdated Show resolved Hide resolved
Co-authored-by: Peter Schilling <[email protected]>
@simonw
Copy link
Author

simonw commented Nov 1, 2024

Now that Claude has an official API version of count tokens the count_tokens() method that uses tokenizers is even less necessary:

@RobertCraigie
Copy link
Collaborator

Thanks for the PR Simon, we've decided to just completely remove client.count_tokens() in favour of the new client.beta.messages.count_tokens() method in the next release.

@RobertCraigie
Copy link
Collaborator

We've also dropped support for Python 3.7 and Python 3.13 should now be supported! Really appreciate the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants