You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use this model in Google Colab with BERTopic for topic modeling and am unable to run the model. I'm using a subset of the Arxiv dataset with concatenated title and abstract for the data.
When the fit_transform() method is called the following error occurs:
RuntimeError Traceback (most recent call last) in
5
6 topic_model = BERTopic(embedding_model=ASPIRE, language="english", nr_topics="auto", verbose=True )
----> 7 topics, probs = topic_model.fit_transform(less_docs)
I have not used the model in this manner before so I'm not sure I could say definitively what was wrong. However from the looks of it, it may be a problem with the tokenizer not truncating the input documents to 512 tokens. If BERTopic has an option to truncate the input documents you can try doing that. Or else, you can manually truncate the individual documents of arxiv_docs to have ~450 (white space tokenized) tokens or so.
I'm trying to use this model in Google Colab with BERTopic for topic modeling and am unable to run the model. I'm using a subset of the Arxiv dataset with concatenated title and abstract for the data.
When the fit_transform() method is called the following error occurs:
RuntimeError Traceback (most recent call last)
in
5
6 topic_model = BERTopic(embedding_model=ASPIRE, language="english", nr_topics="auto", verbose=True )
----> 7 topics, probs = topic_model.fit_transform(less_docs)
12 frames
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
235 if self.position_embedding_type == "absolute":
236 position_embeddings = self.position_embeddings(position_ids)
--> 237 embeddings += position_embeddings
238 embeddings = self.LayerNorm(embeddings)
239 embeddings = self.dropout(embeddings)
RuntimeError: The size of tensor a (541) must match the size of tensor b (512) at non-singleton dimension 1
The text was updated successfully, but these errors were encountered: