Calculating total events to generate #78

endorama · 2023-03-14T16:26:30Z

endorama
Mar 14, 2023

Currently we calculate total events by dividing the requested size by the size of a single event we generate as sample.

Es: requested size is 100kB. We run the templating function once, say it produces an event of size 1kB. We define the number of events to be generated to be 100 (100k/1k).

This method is very fast but not precise, especially when the event we may generate has a great variation in size depending on random values, thus skewing the division result.

There are different ways we may want to change the current behaviour to improve accuracy of forecasting total events. Our goal is not to be perfectly precise, but to be precise enough to create relevant datasets.

This discussion is to present and discuss possible improvements to this calculation. If in presenting your idea you want to provide examples, please use 100kB as a reference size.

endorama · 2023-03-14T16:27:49Z

endorama
Mar 14, 2023
Author

Averaging multiple sample events

Instead of generating just 1 event, generate a set (5 or 10) and average the sizes. Use this averaged size to compute the number of total events to generate.

This should be more precise that using a single value, but still may produce wrong approximations for datasets with high variance in event size.

0 replies

endorama · 2023-03-14T16:31:21Z

endorama
Mar 14, 2023
Author

Generated event size counter

A different approach would be to subtract for each iteration the size of the generated event from the total size of the corpus to generate. In this way we would have maximum precision.

Es: we need to generate 100kB. Generation starts and the first event size is 0.8kB. This value is subtracted from the total: 100 - 0.8 = 99.2kB. Second event size is 1.2Kb: 99.2 - 1.2 = 98kB. At each iteration this behaviour is repeated. Generation stops when the remaining size is either "less than what is needed for 1 more event" or "less than zero" (as we are approximating the latter may be preferred).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating total events to generate #78

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Calculating total events to generate #78

endorama Mar 14, 2023

Replies: 2 comments

endorama Mar 14, 2023 Author

Averaging multiple sample events

endorama Mar 14, 2023 Author

Generated event size counter

endorama
Mar 14, 2023

endorama
Mar 14, 2023
Author

endorama
Mar 14, 2023
Author