You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should display things we look at often in W&B. Final merged corpus size after deduplication is something I look at periodically to understand how aggressive the cleaning is overall. We can also display corpus size after each cleaning stage as we discussed with @gregtatum which should probably be a part of the analysis job.
The text was updated successfully, but these errors were encountered:
I think the idea of this ticket was also to display the size of the corpus after different cleaning steps but we can start with uploading only the size of the final corpus.
We should display things we look at often in W&B. Final merged corpus size after deduplication is something I look at periodically to understand how aggressive the cleaning is overall. We can also display corpus size after each cleaning stage as we discussed with @gregtatum which should probably be a part of the analysis job.
The text was updated successfully, but these errors were encountered: