Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round-trip GeoJSON #868

Closed
jayvdb opened this issue Jul 24, 2017 · 5 comments
Closed

Round-trip GeoJSON #868

jayvdb opened this issue Jul 24, 2017 · 5 comments
Labels
Milestone

Comments

@jayvdb
Copy link
Contributor

jayvdb commented Jul 24, 2017

I would like to be able to round-trip GeoJSON through csvkit, so that small operations can be done using csvkit in the middle, without resulting in strange diffs. #867 is part of this effort.

i.e. The following should have very minimal output.

$ wget https://raw.githubusercontent.com/lyzidiamond/learn-geojson/master/geojson/pdxplaces.geojson
$ in2csv -f geojson pdxplaces.geojson > pdxplaces.csv
$ csvjson --lon longitude --lat latitude --indent=2 pdxplaces.csv > pdxplaces.csv.geojson
$ diff -u pdxplaces.geojson pdxplaces.csv.geojson

Some of the diff results which need to be controllable using command line args:

  1. sequence of keys in a feature ; i.e. should geometry appear before properties or the opposite. It seems different tools make different choices for this ordering, and ideally in2csv can annotate its output with this ordering (assuming it is consistent throughout the input) so that csvjson can re-use the same ordering.
  2. do not emit empty properties. Currently they are being emitted and mostly they shouldn't be emitted. (csvjson: Do not emit empty properties #869)
  3. order the properties by key. in2csv adds a column when it sees a new property, which doesnt work well if many properties do not exist on the first Feature, but appear in latter nodes. Sorting could be a feature of in2csv or csvjson
  4. generation of the bbox. It should be possible to disable this being computed and added, and it is non-trivial to compute it for complex geojson (c.f. csvjson: Support types other than Point #867).
  5. other metadata (in2csv: GeoJSON metadata discarded #870)

After simplistic solutions for those three, the diff looks like:

diff -u pdxplaces.geojson pdxplaces.csv.geojson
--- pdxplaces.geojson	2017-07-24 10:22:24.234446535 +0700
+++ pdxplaces.csv.geojson	2017-07-24 12:58:46.408943761 +0700
@@ -106,7 +106,7 @@
       "properties": {
         "Name": "place 2",
         "Contributor": "geografa",
-        "Reason": 2
+        "Reason": "2"
       },
       "geometry": {
         "type": "Point",
@@ -164,8 +164,7 @@
     {
       "type": "Feature",
       "properties": {
-        "my polygon": "it's here",
-        "Name": ""
+        "my polygon": "it's here"
       },
       "geometry": {
         "type": "Polygon",
@@ -251,8 +250,8 @@
       "type": "Feature",
       "properties": {
         "Name": "The Commons Brewery",
-        "Reason": "Farmhouse ales, duh.",
-        "Contributor": "Dillon Mahmoudi"
+        "Contributor": "Dillon Mahmoudi",
+        "Reason": "Farmhouse ales, duh."
       },
       "geometry": {
         "type": "Point",
@@ -267,7 +266,7 @@
       "properties": {
         "Name": "Kenilworth Coffeehouse",
         "Utility": "Excellent biscuits",
-        "Coffee": "Yes"
+        "Coffee": true
       },
       "geometry": {
         "type": "Point",
@@ -327,8 +326,7 @@
       "properties": {
         "Name": "Square 54",
         "Contributor": "Henrik",
-        "Reason": "Mystic zombie area",
-        "my polygon": ""
+        "Reason": "Mystic zombie area"
       },
       "geometry": {
         "type": "Polygon",

The change to "Coffee" is concerning (and maybe there is a command line arg which would prevent that), but the rest of those changes are IMO a good "linted" output.

@jayvdb
Copy link
Contributor Author

jayvdb commented Jul 24, 2017

After a bit of fiddling, I now have a diff for the round-tripping of the 20Mb data file I am interested in, and there are only four distinct sets of changes, all of them problematic.

@@ -598962,7 +598965,7 @@
       "id": "way/93984451",
       "properties": {
         "@id": "way/93984451",
-        "mooring": "yes",
+        "mooring": true,
         "natural": "coastline",
         "source": "Yahoo hires"
       },
@@ -1167190,7 +1167193,6 @@
       "properties": {
         "@id": "node/259453536",
         "amenity": "swimming_pool",
-        "covered": "no",
         "horse": "destination",
         "name": "Pantai Sigandu",
         "sport": "swimming"
@@ -1167280,7 +1167282,7 @@
         "@id": "node/1551798713",
         "name": "Wind Jammer Beach",
         "natural": "beach",
-        "wheelchair": "yes",
+        "wheelchair": true,
         "wheelchair:description": "Kein Behinderten-WC"
       },
       "geometry": {
@@ -1167430,7 +1167430,7 @@
       "properties": {
         "@id": "node/3902525069",
         "addr:city": "Nusa Penida",
-        "addr:housenumber": "1",
+        "addr:housenumber": true,
         "addr:postcode": "80771",
         "addr:street": "Desa Jl. Batu Nunggul",
         "name": "Nusa Garden Bungalow",

@jpmckinney
Copy link
Member

What if you run it with --no-inference?

@jayvdb
Copy link
Contributor Author

jayvdb commented Jul 26, 2017

csvjson --no-inference does the trick!

(ftr, in2csv --no-inference had no effect)

@jpmckinney
Copy link
Member

Cool, leaving open as a meta-issue for the other issues.

@jpmckinney
Copy link
Member

I added a --no-bbox option. #870 is still open. For the reordering of properties, I don't think there's much that can be done. The original GeoJSON can have different ordering of keys/properties in different features, and there's no reasonable way to preserve that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants