Tensor sizes not matching #3

ericchagnon15 · 2023-01-20T18:11:58Z

I'm trying to use this model in Google Colab with BERTopic for topic modeling and am unable to run the model. I'm using a subset of the Arxiv dataset with concatenated title and abstract for the data.

from transformers import *
ASPIRE = pipeline("feature-extraction", model="allenai/aspire-sentence-embedder")

less_docs = arxiv_docs[:200]
topic_model = BERTopic(embedding_model=ASPIRE, language="english", nr_topics="auto", verbose=True )
topics, probs = topic_model.fit_transform(less_docs)

When the fit_transform() method is called the following error occurs:
RuntimeError Traceback (most recent call last)
in
5
6 topic_model = BERTopic(embedding_model=ASPIRE, language="english", nr_topics="auto", verbose=True )
----> 7 topics, probs = topic_model.fit_transform(less_docs)

12 frames
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
235 if self.position_embedding_type == "absolute":
236 position_embeddings = self.position_embeddings(position_ids)
--> 237 embeddings += position_embeddings
238 embeddings = self.LayerNorm(embeddings)
239 embeddings = self.dropout(embeddings)

RuntimeError: The size of tensor a (541) must match the size of tensor b (512) at non-singleton dimension 1

MSheshera · 2023-02-08T17:37:15Z

I have not used the model in this manner before so I'm not sure I could say definitively what was wrong. However from the looks of it, it may be a problem with the tokenizer not truncating the input documents to 512 tokens. If BERTopic has an option to truncate the input documents you can try doing that. Or else, you can manually truncate the individual documents of arxiv_docs to have ~450 (white space tokenized) tokens or so.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor sizes not matching #3

Tensor sizes not matching #3

ericchagnon15 commented Jan 20, 2023 •

edited

Loading

MSheshera commented Feb 8, 2023

Tensor sizes not matching #3

Tensor sizes not matching #3

Comments

ericchagnon15 commented Jan 20, 2023 • edited Loading

MSheshera commented Feb 8, 2023

ericchagnon15 commented Jan 20, 2023 •

edited

Loading