-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ties and data in JSON #280
Comments
Adding third digit is absolutely doable (just few changes in runner scripts), however as we're moving into milliseconds scale, additional factors could affect the results:
Please give your feedback, I could overestimate the effects of printing and network operations, thus runner scripts changes could be enough. |
I think increasing the data size will result in measuring more of json parsing and the other factors become less of a factor. So if that is the goal, the price is longer test times. At gb/s json parsing 100mb isnt a lot. I noticed it uses max time, is worst case the desired time? Or did i read that wrong |
The results are given as arithmetic mean values with the standard deviations, and the original timing is measured between network requests to the runner (start/stop measuring). While arithmetic mean could be not the best number to rely on, I think it's good enough for the relative comparisons. I think increasing the sample JSON is acceptable. I had to decrease it in due time as my home server took forever to run tests, but now I have newer hardware and could use bigger fixture file. |
Another quick one, regarding the issues and memory usage. Would adding another column for usage before parsing, the JSON string mostly, be useful? I don't think subtracting it is doable(Haskell example is streaming) but having the knowledge the change from before/after might be useful in seeing if the cost/impact on memory is worth it to library users. |
First, let me note that I did some research (please see additional details below), and found out that the current math is giving slightly misleading results. For example, in my tests I had values (0.78, 0.78, ... , 0.78, 0.82) and that last fluctuation worsened the final results. Therefore, I've decided to switch to use medians. Unfortunately, GitHub doesn't natively support rendering math formulas (hence no pleasant way to show quartiles), thus I think I'm going to use the format MEDIANMAD (e.g. 0.780.01), where MAD is the median absolute deviation. Second, regarding memory usage. Not only Haskell has streaming parsing, but some other tests too. I think it's worth to show the memory increase within the benchmark, however the table is already quite wide, so maybe it's better to change the format of the existing column, i.e. to use BASE_MEDIANBASE_MAD + MEM_DIFF_MEDIANMEM_DIFF_MAD, there BASE is the memory before the benchmark, MEM_DIFF - memory increase during the benchmark. For example, the value 122.91 ± 05.96 will become something like 22.913.94 + 100.002.02 Back to the research I did:
|
It's all addressed in #281. Plus to the changes mentioned above, I've also changed time measurements to use nanoseconds instead of seconds in float format. |
Would adding a third digit in the timing make sense for the JSON tests where the tests are coming pretty close now. Or adding more data, but that has the affect of taking longer and making it harder to compare with old.
The text was updated successfully, but these errors were encountered: