Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking #872

Closed
jayvdb opened this issue Jul 26, 2017 · 2 comments
Closed

Benchmarking #872

jayvdb opened this issue Jul 26, 2017 · 2 comments

Comments

@jayvdb
Copy link
Contributor

jayvdb commented Jul 26, 2017

I've found it non trivial to do benchmarking using timeit , which isnt the most accurate approach by likely good enough for decisions on this util. This is my current attempt, which I am certain is wrong as I think I need to restore sys.stdout after each run:

setup="import csvkit.utilities.csvjson as csvjson; import os, sys; sys.stdout = open(os.devnull, 'w'); sys.argv = ['--no-inference','--lon=longitude','--lat=latitude','--indent=2','--key=id','data.csv']"
statement="csvjson.launch_new_instance()"

python3 -m timeit -s "$setup" "$statement"

Would be good if we can make that a little easier.

Also I need to read the CSV in the setup, and avoid the JSON write, as those parts of the process are drowning out the other performance issues.

@jayvdb
Copy link
Contributor Author

jayvdb commented Jul 26, 2017

I've got semi-reasonable stable-ish timing data out of timeit by manually disabling the dump_json, with:

$ filename='examples/test_geo.csv'
$ setup="filename='$filename';from csvkit.utilities.csvjson import CSVJSON; import os, sys, io; devnull = open(os.devnull, 'w'); data = open(filename).read(); sys.argv = ['csvjson']"
$ statement="x = CSVJSON(); x.output_file = devnull; x.args.no_inference = True; x.args.lon = 'longitude'; x.args.lat = 'latitude'; x.args.input_file = filename; x.input_file = io.StringIO(data); x.main()"

$ python3 -m timeit -s "$setup" "$statement"
100 loops, best of 3: 2.96 msec per loop

Ideally separating reading and writing from the processing will make that a lot cleaner.

@jpmckinney
Copy link
Member

Thanks for performing the benchmarking for #867! Closing as I'm not sure there is an issue to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants