Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer only uses TextClassifier.load_from_file #474

Closed
fbenites opened this issue Feb 8, 2019 · 15 comments
Closed

Trainer only uses TextClassifier.load_from_file #474

fbenites opened this issue Feb 8, 2019 · 15 comments
Labels
question Further information is requested

Comments

@fbenites
Copy link

fbenites commented Feb 8, 2019

I am using 0.4 but master seems to have this behavior:
I inherited from TextClassifier and overloaded basic methods (I changed the decoder to be a full-blown NN). I get very good results in the training phase. The problem is in the test phase when the trainer checks for type TextClassifier and uses the TextClassifier load_from_file and not mine overloaded method:
https://github.com/zalandoresearch/flair/blob/master/flair/trainers/trainer.py#L271
Wouldn't it be better to get the class of the instance? My first guess would be:
self.model = self.model.class.load_from_file(base_path / 'best-model.pt')
But that seems also wrong. Maybe load_from_file should not be a class method?

@fbenites fbenites added the question Further information is requested label Feb 8, 2019
@alanakbik
Copy link
Collaborator

Hello @fbenites thank you for pointing this out - this in fact raises a few issues with the current implementation of ModelTrainer that we need to address.

Specifically, I think some more work needs to be done such that all explicit references to SequenceTagger and TextClassifier are removed from the ModelTrainer. Instead, the flair.nn.Model interface needs to be extended so that it contains everything required by the model trainer, including load_from_file(), save_checkpoint(), etc. This would allow users to simply implement the flair.nn.Model interface and directly get access to all ModelTrainer functionality. I am not sure if we can do this in time for 0.4.1 release which has been pushed back too far already, but we could get started on this right after the release.

Another question: Your TextClassifier modification sounds very interesting. Would you consider contributing it to Flair?

@fbenites
Copy link
Author

fbenites commented Feb 9, 2019

Hi @alanakbik, thank you for the great framework!
I am figuring out if my model is any good, atm I am trying it on simple questions relation prediction with the model, if I get some decent results (78% acc. 88% is sota, but I am masking the entities completly) I will post something more specific. In principle, I am using some kaggle kernel capsule implementation (https://www.kaggle.com/sfzero/fork-of-gamma-bianli-feature-1-1-i-16) instead of the default decoder: self.decoder=NeuralNet(). Another problem I have is that the documentlstm gives only a single vector (sentence embedding?) which is for the GRU+LSTM of the capsule probably useless. I will try first to figure that out. I will also try some hierarchical classification with bert and flair, and let you know how that works.

@dancsalo
Copy link

Hi @fbenites and @alanakbik, I did some work to isolate the parts of ModelTrainer that are currently dependent on other classes, and I've pushed my work here: https://github.com/dancsalo/classyflair.

It allows users to add their own architectures to train by subclassing TextClassifier and passing that subclass in as the model to ModelTrainer. Only one function in ModelTrainer needs to be modified for a simple training / evaluation run (final_test()).

I would be interested in contributing this idea if it is in line with where you are taking the framework! And of course, open to feedback.

@alanakbik
Copy link
Collaborator

@dancsalo sounds very interesting and help is always appreciated :)

After the 0.4.1 release I'll take a look and get back to you!

@alanakbik alanakbik mentioned this issue Feb 24, 2019
5 tasks
@FlorentPajot
Copy link

FlorentPajot commented Feb 27, 2019

Hello, following your discussion, I also had to inherit from TextClassifier to overload basic methods for the same reasons, and I encountered another issue with load_from_file which uses torch.load() in TextClassifier._load_state() method. In fact in the case I want to use a model trained on GPU on a CPU only device for inference, I cannot access to the map_location parameter of torch.load which urges me to overload all the related basic methods for loading. Thus, could you add a parameter injection from the load_from_file method which could help use this torch feature?

@alanakbik
Copy link
Collaborator

Hello @FlorentPajot sure we can add this. Could you help me better understand why you are currently overloading these methods? We train models on GPU a lot and use them in CPU only environments and this seems to work so far.

@FlorentPajot
Copy link

FlorentPajot commented Feb 28, 2019

Thanks for your answer, you're right I currently use model.cpu().save() to save my model and use it on CPU instance, but I encountered the issue for model saved differently which required me to access torch.load(). However, if you save it the right way it's not an issue.

alanakbik pushed a commit that referenced this issue Mar 14, 2019
alanakbik pushed a commit that referenced this issue Apr 20, 2019
alanakbik pushed a commit that referenced this issue Apr 23, 2019
alanakbik pushed a commit that referenced this issue Apr 26, 2019
stefan-it pushed a commit that referenced this issue Apr 26, 2019
@aayushsanghavi
Copy link

@alanakbik in reference to your earlier comment -

We train models on GPU a lot and use them in CPU only environments

I trained a SequenceTagger model with ELMo embeddings on Google Colab's GPU but I wish to evaluate it on my local machine. I tried doing the following -

flair.device = torch.device('cpu')
model = SequenceTagger.load_from_file(join(args.model_path, 'best-model.pt'))

but it doesn't work. I get aRuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at the statement - model.predict(sentence)

Could you please tell me how I could train a model on a GPU and use it on a CPU environment?

@alanakbik
Copy link
Collaborator

Hm that is strange and should normally work. Could you try calling model.to(flair.device) before doing the predict? Also, which version of Flair are you using?

@aayushsanghavi
Copy link

I tried that but it didn't work. I got the following error after that - https://pastebin.com/sEB6z3kB
I guess it isn't able to load the ELMo embeddings on cpu. And I'm using Flair 0.4.1

@alanakbik
Copy link
Collaborator

You're right, it looks like this has something to do with ELMo embeddings. The ELMoEmbedder class we use from AllenNLP does not inherit from torch.nn.Module and so is not handled by the PyTorch .to(device) logic. So unfortunately this currently does not work. Perhaps someone who knows AllenNLP well could take a look?

@aayushsanghavi
Copy link

No worries. But say if I were to use FlairEmbeddings instead of ELMo and train it on a GPU, would I be able to load that model on a CPU environment with the above lines added?

@alanakbik
Copy link
Collaborator

Yes, if you use FlairEmbeddings you don't have to do anything. It will automatically detect what type of machine it is on and handle everything. For instance, all our pre-trained models were trained on GPU, but you can just load them on CPU and use them.

@aayushsanghavi
Copy link

Perfect. Thank you so much!

@stale
Copy link

stale bot commented Apr 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 30, 2020
@alanakbik alanakbik removed the wontfix This will not be worked on label Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants