Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Remove tiktoken alternative #4991

Merged
merged 2 commits into from
May 23, 2023
Merged

build: Remove tiktoken alternative #4991

merged 2 commits into from
May 23, 2023

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented May 23, 2023

Related Issues

Proposed Changes:

  • Remove conditional statements from tiktoken in open ai utils as we don't need to handle any alternative anymore
  • Remove limitations on tiktoken from pyproject.toml

How did you test it?

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added tests that demonstrate the correct behavior of the change
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
julian-risch Julian Risch
@julian-risch julian-risch requested a review from a team as a code owner May 23, 2023 09:53
@julian-risch julian-risch requested review from silvanocerza and removed request for a team May 23, 2023 09:53
@julian-risch julian-risch requested review from ZanSara and removed request for silvanocerza May 23, 2023 09:56
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small improvement and it's ready to go!

return tokenizer

logger.debug("Using tiktoken %s tokenizer", tokenizer_name)
return tiktoken.get_encoding(tokenizer_name)


def count_openai_tokens(text: str, tokenizer) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this function altogether

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that's done now. 🙂

@coveralls
Copy link
Collaborator

coveralls commented May 23, 2023

Pull Request Test Coverage Report for Build 5056081722

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 81 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.004%) to 39.534%

Files with Coverage Reduction New Missed Lines %
init.py 2 83.33%
nodes/answer_generator/openai.py 11 72.22%
utils/openai_utils.py 13 75.64%
nodes/retriever/_openai_encoder.py 26 30.68%
nodes/prompt/invocation_layer/open_ai.py 29 30.26%
Totals Coverage Status
Change from base Build 5055409480: 0.004%
Covered Lines: 8858
Relevant Lines: 22406

💛 - Coveralls

Verified

This commit was signed with the committer’s verified signature.
julian-risch Julian Risch
@github-actions github-actions bot added the type:documentation Improvements on the docs label May 23, 2023
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ZanSara ZanSara merged commit 9e4feb6 into main May 23, 2023
@ZanSara ZanSara deleted the no-tiktoken-alternative branch May 23, 2023 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants