Replies: 2 comments
-
Averaging multiple sample eventsInstead of generating just 1 event, generate a set (5 or 10) and average the sizes. Use this averaged size to compute the number of total events to generate. This should be more precise that using a single value, but still may produce wrong approximations for datasets with high variance in event size. |
Beta Was this translation helpful? Give feedback.
-
Generated event size counterA different approach would be to subtract for each iteration the size of the generated event from the total size of the corpus to generate. In this way we would have maximum precision. Es: we need to generate |
Beta Was this translation helpful? Give feedback.
-
Currently we calculate total events by dividing the requested size by the size of a single event we generate as sample.
Es: requested size is 100kB. We run the templating function once, say it produces an event of size 1kB. We define the number of events to be generated to be 100 (100k/1k).
This method is very fast but not precise, especially when the event we may generate has a great variation in size depending on random values, thus skewing the division result.
There are different ways we may want to change the current behaviour to improve accuracy of forecasting total events. Our goal is not to be perfectly precise, but to be precise enough to create relevant datasets.
This discussion is to present and discuss possible improvements to this calculation. If in presenting your idea you want to provide examples, please use
100kB
as a reference size.Beta Was this translation helpful? Give feedback.
All reactions