-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result too large #1168
Comments
Hey. How about sharing some code first? You have to make a bit of an effort if you want us to help you, we're not magicians 🧙♂️ |
@MaxHalford I apologize, really appreciate both the product and any help you're willing to give. Added an explanation and some code below. I'm using DBStream for online topic modeling with BERTopic as described here. I run from river import cluster
from river import stream
river_cluster_model = River(cluster.DBSTREAM(
clustering_threshold=1,
fading_factor=0.01,
cleanup_interval=2,
intersection_factor=0.3,
minimum_weight=10.0
))
class River:
def __init__(self, model):
self.model = model
def partial_fit(self, umap_embeddings):
for umap_embedding, _ in stream.iter_array(umap_embeddings):
self.model = self.model.learn_one(umap_embedding)
labels = []
for umap_embedding, _ in stream.iter_array(umap_embeddings):
label = self.model.predict_one(umap_embedding)
labels.append(label)
self.labels_ = labels
return self
def predict(self, umap_embeddings):
labels = []
for umap_embedding, _ in stream.iter_array(umap_embeddings):
label = self.model.predict_one(umap_embedding)
labels.append(label)
return labels |
Thanks a lot @vantubbe, I'll look into it. FYI @hoanganhngo610. |
Small update, was able to bypass the issue. In self.micro_clusters[i].weight = (
self.micro_clusters[i].weight
* 2 ** (-self.fading_factor * (self.time_stamp - self.micro_clusters[i].last_update))
+ 1
) or self.s[i][j] = (
self.s[i][j]
* 2 ** (-self.fading_factor * (self.time_stamp - self.s_t[i][j]))
+ 1
)
self.s_t[i][j] = self.time_stamp If
I doubt this is a valid issue. It's likely that my setup, config, or saving & loading the model is causing these weights to grow too large/small. But not sure what what situation would cause this. |
Sorry for the late response @vantubbe @MaxHalford. I will have a look within this week, and hopefully come back with a response ASAP. |
Hi @hoanganhngo610 , I'm facing the same issue when I was using DBStream for online topic modeling with BERTopic. May I check if there's any resolution? Thank you! |
Hi @vantubbe and @lshihui. I am really sorry for getting this issue slipped through for such a long time. If the problem still persists for you, would you mind giving me the actual use case that caused this error? Since, if I understand correctly, even if
or
is getting large, with the negative sign and the fading factor being positive, this should not cause the problem to happen at all. |
I’m running a model using DBStream, and after training on about 100k datapoints, I get 34 Result too large. I’m sure this is something I’m doing wrong and not an issue with the library. Would appreciate any suggestions on how to handle this.
The text was updated successfully, but these errors were encountered: