setfit usage #83
Replies: 2 comments 1 reply
-
This is something we also tried previously, and we got to the same conclusion. SetFit works much better with very little data since it essentially uses the strength of the fine-tuned (Transformer) model. With Model2Vec, you don't have that luxury anymore since the model is fully static. However, the m2v encoder should be much faster, which is the main draw. I tested it with your code example by benchmarking the time for encoding the entire training set (timing In this case, you essentially trade off ~14% accuracy (68% to 54%) for a 45x speedup. It differs a bit per usecase and task what the tradeoff is, but we believe that the speedup is substantial enough to enable usecases that are not possible with existing models, while still maintaining decent performance. |
Beta Was this translation helpful? Give feedback.
-
Yes I agree it could be worth it. Just to note I initially fine tune the model, then convert it to a static model and on top of the static model I then fine tune the classification head, which i would have expected to work slightly slightly better for a 2 label use case (did not look into class in balance).On 14 Oct 2024, at 20:33, Thomas van Dongen ***@***.***> wrote:
Hi @davidberenstein1957!
This is something we also tried previously, and we got to the same conclusion. SetFit works much better with very little data since it essentially uses the strength of the fine-tuned (Transformer) model. With Model2Vec, you don't have that luxury anymore since the model is fully static.
However, the m2v encoder should be much faster, which is the main draw. I tested it with your code example by benchmarking the time for encoding the entire training set (timing embedding = model.model_body.encode(dataset["train"]["sentence"]) before and after replacing the encode with model2vec, which gives 23.9 seconds for the SetFit model vs. 0.53 seconds for the Model2Vec model, so roughly a 45x increase in speed.
In this case, you essentially trade off ~14% accuracy (68% to 54%) for a 45x speedup. It differs a bit per usecase and task what the tradeoff is, but we believe that the speedup is substantial enough to enable usecases that are not possible with existing models, while still maintaining decent performance.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I played around a bit but sadly the performance is too poor.
Beta Was this translation helpful? Give feedback.
All reactions