-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csvjson: Support types other than Point #867
Conversation
I'm not super happy about the |
Re: What if you make them class properties ( |
csvkit/utilities/csvjson.py
Outdated
|
||
for i, c in enumerate(row): | ||
if i == lat_column: | ||
if not c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that if c
is None
, then float(c)
raises TypeError
instead of ValueError
. Can you make a separate PR that fixes the issue of c
being None
, by catching TypeError
? It's separate from the rest of this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct.
However, this only occurs when the type is not Point
, and fixing this exception only causes the tool to generate a .geojson which is broken in other ways. Fixing only the exception doesnt make the tool better; in fact, I would say that the exception is a good thing, as it prevents people from believing that their input was correctly parsed.
But, happy to split to separate it to a separate commit - other improvements needed anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nevermind, I confused this with a different comment you made about lat/lon being empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if a '0' is read as an integer 0
instead of the string '0'
, then we should check if c is None
instead.
csvkit/utilities/csvjson.py
Outdated
update_boundary_lon(coords[0]) | ||
update_boundary_lat(coords[1]) | ||
else: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in what scenario the coords from the geojson will fail to iterate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt find any. Removing.
(I did a quick check of types that I dont use using https://github.com/node-geojson/geojson-flatten/tree/master/test/fixture , resulting in #871 ). If there are still some types which might cause an exception, we need exceptions here to catch and fix them.
My main reason to avoid this is efficiency. Any non 'local' name results in a lot of extra name resolution work for large files, and this computation of the bbox is IMO not always desirable (see #868) so I am reticent to make it also inefficient . |
Do you have a real speed comparison? |
Happy to provide it this evening (WIB) when I get back into this project. |
On real data for the bbox computation, using cProfile, Sorting by total time instead, it is the 10th most significant component. Ordered by number of calls, number 11-13 are:
That is a lot of calls! I do not have analysis of switching from local vars to instance variables, as that is going to need a more involved benchmark/profile rig to avoid the CSV read/JSON write phases. (c.f. #872 ) But as seen by the number of calls, this is not trivial due to the recursive nature, as using instance variables is going to require at least 2 * 2 * 285911 extra |
Parses and removes 'geojson' from the properties, putting the coordinate into the GeoJSON feature geometry. Fixes wireservice#866
1 similar comment
Do you still want me to get rid of the |
Considering reading/writing take up so much time (and are absolutely necessary to do anything with csvkit), I think it's unnecessary optimization to make that loop more efficient, at the cost of potentially hard-to-fix bugs in the future due to the use of globals. So, removing globals would be best. |
ok |
Parses and removes 'geojson' from the properties,
putting the coordinate into the GeoJSON feature geometry.
Fixes #866