Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: upgrade urllib3 #3482

Closed
david-waterworth opened this issue Jul 2, 2024 · 5 comments · Fixed by #3493
Closed

[Feature]: upgrade urllib3 #3482

david-waterworth opened this issue Jul 2, 2024 · 5 comments · Fixed by #3493
Assignees
Labels
feature A new feature

Comments

@david-waterworth
Copy link
Contributor

david-waterworth commented Jul 2, 2024

Problem statement

flair requires urllib3<2.0.0 which is quite old (latest is 2.2, the 2.0 alpha was released in late 2022)

https://github.com/flairNLP/flair/blob/59bd7053a1a73da03293b0bdb68113cf383d6b9e/requirements.txt#L26C1-L26C75

I have other dependencies that require urllib3 >= 2.0 (i.e. tritonclient[http] (>=2.47.0,<3.0.0) requires urllib3 (>=2.0.7))

Solution

I'm not sure that flair directly uses urllib3, I suspect the require was added as per the comment due to 3rd party libraries not requiring urllib3 < 2 so resulting in a broken installation when urllib3 v2 was released. I would hope this is no longer a constraint and it can be removed?

Additional Context

In general I think flair imports to much by default, i.e. I have to downgrade scipy because even though I'm not using any gensim based features the fact that gensim uses deprecated scipy features that are now removed (triu) I get import errors (gensim has fixed this but not released the fix yet). Also note a recent "fix" added by gensim pins numpy < 2 which flair will inherit which could cause issues down the line.

I think most 3rd party (gensim, spacy, huggingface) functionality should be "optional" - i.e. pip install flair[gensim] to enable gensim features. I'm not a big fan of getting errors due to flair importing gensim when I'm training a tagger with transformer embeddings.

@david-waterworth david-waterworth added the feature A new feature label Jul 2, 2024
@alanakbik
Copy link
Collaborator

Hi @david-waterworth, generally agree that we have too many dependencies.

@helpmefindaname did an analysis of all dependencies and identified a core set of dependencies plus optional ones, also wanting to get rid of gensim. He proposed a solution essentially exactly like you, e.g. using something like pip install flair[gensim] to install bigger libraries.

My main problem is that gensim is used in some of our main taggers (for instance 'ner') and then this starter example would not work anymore: https://flairnlp.github.io/docs/intro#requirements-and-installation (installation plus example 1).

@david-waterworth
Copy link
Contributor Author

The example appears to work for me without using gensim features? My assumption is gensim is used by the WordEmbeddings class in flair/embeddings/token.py (specifically gensim.models.KeyedVectors to load word2vec serialized files). My testing was a bit ad-hoc though so I could very well have missed something.

But tagger = Classifier.load('ner') appears to load a serialised pytorch state dict so (based on stepping through the code and putting breakpoints on lines that appear to use gensim) it seemed two work.

  • token.py is probably the most obvious example of a module that could be refactored - even if gensim wasn't optional, if each class in token.py was moved into a separate file then I could probably import the TransformerEmbedding class without flair importing gensim, spacy etc.

PS as an aside a transformer encoder fined tuned on my corpus using FLERT works extremely well!

@helpmefindaname
Copy link
Collaborator

Hi @david-waterworth

the urllib3<2.0.0 constraint was a hot fix to a dependency resolution issue (see this or this) that came up this year. We can remove that constraint again.

I agree that there are too many standard dependencies, @alanakbik will be discussing this again.

@sglbl
Copy link

sglbl commented Jul 8, 2024

I had the same error with gradio, I downgraded the versionn of gradio but still it was using urllib >= 2.0.

The conflict is caused by:
gradio 4.37.2 depends on urllib3~=2.0
flair 0.13.1 depends on urllib3<2.0.0 and >=1.0.0

At the end my solution was this:

pip install --use-deprecated=legacy-resolver -r requirements.txt

But it's deprecated and soon pip can remove this feature without notifying it and I'm not sure how can I install both of them together.

@david-waterworth
Copy link
Contributor Author

@helpmefindaname I don't think those issues are really problems that should be fixed this way - my understanding of the issue is for packages that don't include a manifest as a separate file, poetry has to download and extract the wheel, and it does it once only and for many packages (unless you clear the cache), not just urllib. It's annoying the first time but users could always pin urllib3 in their project. From memory the main issue is boto3 doesn't support the manifest format and there's tons of different wheels which initially triggered poetry to download every boto3 wheel to check dependencies when urllib3 > 2.0.0 was released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants