Skip to content

Commit

Permalink
Merge #199
Browse files Browse the repository at this point in the history
199: GeoJSON to/from custom struct using serde r=urschrei,rmanoka a=michaelkirk

- [x] I agree to follow the project's [code of conduct](https://github.com/georust/geo/blob/master/CODE_OF_CONDUCT.md).
- [x] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users.
---

This is an attempt to improve the ergonomics of parsing GeoJson using serde (FIXES #184) . 

~~This PR is a draft because there are a lot of error paths related to invalid parsing that I'd like to add tests for, but I first wanted to check in on overall direction of the API. What do people think?~~ I think this is ready for review!

# Examples

Given some geojson like this:
```rust
let feature_collection_string = r#"{
    "type": "FeatureCollection",
    "features": [
        {
           "type": "Feature",
           "geometry": {
             "type": "Point",
             "coordinates": [125.6, 10.1]
           },
           "properties": {
             "name": "Dinagat Islands",
             "age": 123
           }
        },
        {
           "type": "Feature",
           "geometry": {
             "type": "Point",
             "coordinates": [2.3, 4.5]
           },
           "properties": {
             "name": "Neverland",
             "age": 456
           }
         }
   ]
}"#
.as_bytes();

let io_reader = std::io::BufReader::new(feature_collection_string);
```

## Before 

### Deserialization

You used to parse it like this:
```
use geojson:: FeatureIterator;
let feature_reader = FeatureIterator::new(io_reader);
for feature in feature_reader.features() {
    let feature = feature.expect("valid geojson feature");

    let name = feature.property("name").unwrap().as_str().unwrap();
    let age = feature.property("age").unwrap().as_u64().unwrap();
    let geometry = feature.geometry.value.try_into().unwrap();

    if name == "Dinagat Islands" {
        assert_eq!(123, age);
        assert_matches!(geometry, geo_types::Point::new(125.6, 10.1).into());
    } else if name == "Neverland" {
        assert_eq!(456, age);
        assert_matches!(geometry, geo_types::Point::new(2.3, 4.5).into());
    } else {
        panic!("unexpected name: {}", name);
    }
}
```

### Serialization

Then, to write it back to geojson, you'd have to either do all your processing strictly with the geojson types, or somehow convert your entities from and back to one of the GeoJson entities:

Something like:
```
// The implementation of this method is potentially a little messy / boilerplatey
let feature_collection: geojson::FeatureCollection = some_custom_method_to_convert_to_geojson(&my_structs);

// This part is easy enough though
serde_json::to_string(&geojson);
```

## After

But now you also have the option of parsing it into your own declarative struct using serde like this:

### Declaration
```
use geojson::{ser::serialize_geometry, de::deserialize_geometry};
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct MyStruct {
    // You can parse directly to geo_types via these helpers, otherwise this field will need to be a `geojson::Geometry`
    #[serde(serialize_with = "serialize_geometry", deserialize_with = "deserialize_geometry")]
    geometry: geo_types::Point<f64>,
    name: String,
    age: u64,
}
```

### Deserialization
```
for feature in geojson::de::deserialize_feature_collection::<MyStruct>(io_reader).unwrap() {
    let my_struct = feature.expect("valid geojson feature");

    if my_struct.name == "Dinagat Islands" {
        assert_eq!(my_struct.age, 123);
        assert_eq!(my_struct.geometry, geo_types::Point::new(125.6, 10.1));
    } else if my_struct.name == "Neverland" {
        assert_eq!(my_struct.age, 456);
        assert_eq!(my_struct.geometry, geo_types::Point::new(2.3, 4.5));
    } else {
        panic!("unexpected name: {}", my_struct.name);
    }
}
```

### Serialization

```
let my_structs: Vec<MyStruct> = get_my_structs();
geojson::ser::to_feature_collection_writer(writer, &my_structs).expect("valid serialization");
```


## Caveats

### Performance

Performance currently isn't great. There's a couple of things which seem ridiculous in the code that I've marked with `PERF:` that I don't have an immediate solution for. This is my first time really diving into the internals of serde and it's kind of a lot! My hope is that performance improvements would be possible with no or little changes to the API.


Some specific numbers (from some admittedly crude benchmarks):

**Old Deserialization:**
```
FeatureReader::features (countries.geojson)                                                                                                                                  
                        time:   [5.8497 ms 5.8607 ms 5.8728 ms]                                                                                                              
```

**New Deserialization:**
```                                                                                                                                                                             
FeatureReader::deserialize (countries.geojson)                                                                                                                               
                        time:   [7.1702 ms 7.1865 ms 7.2035 ms]                                                                                                              
```

**Old serialization:**
```
serialize geojson::FeatureCollection struct (countries.geojson)                                                                             
                        time:   [3.1471 ms 3.1552 ms 3.1637 ms]
```

**New serialization:**
```
serialize custom struct (countries.geojson)                                                                             
                        time:   [3.8076 ms 3.8144 ms 3.8219 ms]
```

So the new "ergonomic" serialization/deserialization takes about 1.2x the time as the old way. Though it's actually probably a bit better than that because with the new form you have all your data ready to use. With the old way, you still need to go through this dance before you can start your analysis:
```
let name = feature.property("name").unwrap().as_str().unwrap();
let age = feature.property("age").unwrap().as_u64().unwrap();
let geometry = feature.geometry.value.try_into().unwrap();
```

Anyway, I think this kind of speed difference is well worth the improved usability for my use cases.

### Foreign Members

This doesn't support anything besides `geometry` and `properties` - e.g. foreign members are dropped. I'm hopeful that this is useful even with that limitation, but if not, maybe we can think of a way to accommodate most people's needs.

Co-authored-by: Michael Kirk <[email protected]>
  • Loading branch information
bors[bot] and michaelkirk authored Aug 29, 2022
2 parents d58a05f + b936683 commit 956aa80
Show file tree
Hide file tree
Showing 16 changed files with 1,877 additions and 56 deletions.
28 changes: 26 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,36 @@

* Fix: FeatureIterator errors when reading "features" field before "type" field.
* <https://github.com/georust/geojson/pull/200>
* Added `geojson::{ser, de}` helpers to convert your custom struct to and from GeoJSON.
* For external geometry types like geo-types, use the `serialize_geometry`/`deserialize_geometry` helpers.
* Example:
```
#[derive(Serialize, Deserialize)]
struct MyStruct {
#[serde(serialize_with = "serialize_geometry", deserialize_with = "deserialize_geometry")]
geometry: geo_types::Point<f64>,
name: String,
age: u64,
}
// read your input
let my_structs: Vec<MyStruct> = geojson::de::deserialize_feature_collection(geojson_reader).unwrap();
// do some processing
process(&mut my_structs);
// write back your results
geojson::ser::to_feature_collection_string(&my_structs).unwrap();
```
* PR: <https://github.com/georust/geojson/pull/199>
* Added IntoIter implementation for FeatureCollection.
* <https://github.com/georust/geojson/pull/196>
* Add `geojson::Result<T>`
* Added `geojson::Result<T>`.
* <https://github.com/georust/geojson/pull/198>
* BREAKING: Change the Result type of FeatureIterator from io::Result to crate::Result
* <https://github.com/georust/geojson/pull/199>
* Add `TryFrom<&geometry::Value>` for geo_type variants.
* <https://github.com/georust/geojson/pull/202>
* <https://github.com/georust/geojson/pull/202>
* Changed the format of the error produced when converting a geometry to an incompatible type - e.g. a LineString into a Point.
* <https://github.com/georust/geojson/pull/203>
Expand Down
10 changes: 7 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ edition = "2018"
default = ["geo-types"]

[dependencies]
serde = "~1.0"
serde = { version="~1.0", features = ["derive"] }
serde_json = "~1.0"
serde_derive = "~1.0"
geo-types = { version = "0.7", optional = true }
geo-types = { version = "0.7", features = ["serde"], optional = true }
thiserror = "1.0.20"
log = "0.4.17"

[dev-dependencies]
num-traits = "0.2"
Expand All @@ -28,6 +28,10 @@ criterion = "0.3"
name = "parse"
harness = false

[[bench]]
name = "serialize"
harness = false

[[bench]]
name = "to_geo_types"
harness = false
Expand Down
69 changes: 60 additions & 9 deletions benches/parse.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use geojson::de::deserialize_geometry;
use geojson::GeoJson;

use criterion::{black_box, criterion_group, criterion_main, Criterion};

use std::io::BufReader;

fn parse_feature_collection_benchmark(c: &mut Criterion) {
c.bench_function("parse (countries.geojson)", |b| {
let geojson_str = include_str!("../tests/fixtures/countries.geojson");
let geojson_str = include_str!("../tests/fixtures/countries.geojson");

c.bench_function("parse (countries.geojson)", |b| {
b.iter(|| {
let _ = black_box({
match geojson_str.parse::<geojson::GeoJson>() {
Expand All @@ -18,21 +21,69 @@ fn parse_feature_collection_benchmark(c: &mut Criterion) {
})
});

c.bench_function("FeatureIter (countries.geojson)", |b| {
let geojson_str = include_str!("../tests/fixtures/countries.geojson");
c.bench_function("FeatureReader::features (countries.geojson)", |b| {
b.iter(|| {
let feature_reader =
geojson::FeatureReader::from_reader(BufReader::new(geojson_str.as_bytes()));
let _ = black_box({
let mut count = 0;
for feature in feature_reader.features() {
let _ = feature.unwrap();
count += 1;
}
assert_eq!(count, 180);
});
});
});

c.bench_function("FeatureReader::deserialize (countries.geojson)", |b| {
b.iter(|| {
let feature_iter =
geojson::FeatureIterator::new(BufReader::new(geojson_str.as_bytes()));
#[allow(unused)]
#[derive(serde::Deserialize)]
struct Country {
geometry: geojson::Geometry,
name: String,
}
let feature_reader =
geojson::FeatureReader::from_reader(BufReader::new(geojson_str.as_bytes()));

let _ = black_box({
let mut count = 0;
for _ in feature_iter {
for feature in feature_reader.deserialize::<Country>().unwrap() {
let _ = feature.unwrap();
count += 1;
}
assert_eq!(count, 184);
assert_eq!(count, 180);
});
});
});

#[cfg(feature="geo-types")]
c.bench_function(
"FeatureReader::deserialize to geo-types (countries.geojson)",
|b| {
b.iter(|| {
#[allow(unused)]
#[derive(serde::Deserialize)]
struct Country {
#[serde(deserialize_with = "deserialize_geometry")]
geometry: geo_types::Geometry,
name: String,
}
let feature_reader =
geojson::FeatureReader::from_reader(BufReader::new(geojson_str.as_bytes()));

let _ = black_box({
let mut count = 0;
for feature in feature_reader.deserialize::<Country>().unwrap() {
let _ = feature.unwrap();
count += 1;
}
assert_eq!(count, 180);
});
});
},
);
}

fn parse_geometry_collection_benchmark(c: &mut Criterion) {
Expand Down
78 changes: 78 additions & 0 deletions benches/serialize.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use geojson::{de::deserialize_geometry, ser::serialize_geometry};

fn serialize_feature_collection_benchmark(c: &mut Criterion) {
let geojson_str = include_str!("../tests/fixtures/countries.geojson");

c.bench_function(
"serialize geojson::FeatureCollection struct (countries.geojson)",
|b| {
let geojson = geojson_str.parse::<geojson::GeoJson>().unwrap();

b.iter(|| {
black_box({
let geojson_string = serde_json::to_string(&geojson).unwrap();
// Sanity check that we serialized a long string of some kind.
assert_eq!(geojson_string.len(), 256890);
});
});
},
);

c.bench_function("serialize custom struct (countries.geojson)", |b| {
#[derive(serde::Serialize, serde::Deserialize)]
struct Country {
geometry: geojson::Geometry,
name: String,
}
let features =
geojson::de::deserialize_feature_collection_str_to_vec::<Country>(geojson_str).unwrap();
assert_eq!(features.len(), 180);

b.iter(|| {
black_box({
let geojson_string = geojson::ser::to_feature_collection_string(&features).unwrap();
// Sanity check that we serialized a long string of some kind.
//
// Note this is slightly shorter than the GeoJson round-trip above because we drop
// some fields, like foreign members
assert_eq!(geojson_string.len(), 254908);
});
});
});

#[cfg(feature="geo-types")]
c.bench_function(
"serialize custom struct to geo-types (countries.geojson)",
|b| {
#[derive(serde::Serialize, serde::Deserialize)]
struct Country {
#[serde(
serialize_with = "serialize_geometry",
deserialize_with = "deserialize_geometry"
)]
geometry: geo_types::Geometry,
name: String,
}
let features =
geojson::de::deserialize_feature_collection_str_to_vec::<Country>(geojson_str)
.unwrap();
assert_eq!(features.len(), 180);

b.iter(|| {
black_box({
let geojson_string =
geojson::ser::to_feature_collection_string(&features).unwrap();
// Sanity check that we serialized a long string of some kind.
//
// Note this is slightly shorter than the GeoJson round-trip above because we drop
// some fields, like foreign members
assert_eq!(geojson_string.len(), 254908);
});
});
},
);
}

criterion_group!(benches, serialize_feature_collection_benchmark);
criterion_main!(benches);
1 change: 1 addition & 0 deletions benches/to_geo_types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ fn benchmark_group(c: &mut Criterion) {
let geojson_str = include_str!("../tests/fixtures/countries.geojson");
let geojson = geojson_str.parse::<geojson::GeoJson>().unwrap();

#[cfg(feature="geo-types")]
c.bench_function("quick_collection", move |b| {
b.iter(|| {
let _: Result<geo_types::GeometryCollection<f64>, _> =
Expand Down
28 changes: 28 additions & 0 deletions examples/deserialize.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
struct Country {
// see the geo_types example if you want to store
// geotypes in your struct
geometry: geojson::Geometry,
name: String,
}

use std::error::Error;
use std::fs::File;
use std::io::{BufReader, BufWriter};

fn main() -> Result<(), Box<dyn Error>> {
let file_reader = BufReader::new(File::open("tests/fixtures/countries.geojson")?);

// Create a Vec of Country structs from the GeoJSON
let countries: Vec<Country> =
geojson::de::deserialize_feature_collection_to_vec::<Country>(file_reader)?;
assert_eq!(countries.len(), 180);

// Write the structs back to GeoJSON
let file_writer = BufWriter::new(File::create("example-output-countries.geojson")?);
geojson::ser::to_feature_collection_writer(file_writer, &countries)?;

Ok(())
}
37 changes: 37 additions & 0 deletions examples/deserialize_to_geo_types.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
use geojson::{de::deserialize_geometry, ser::serialize_geometry};

use serde::{Deserialize, Serialize};
use std::error::Error;
use std::fs::File;
use std::io::{BufReader, BufWriter};

#[cfg(not(feature="geo-types"))]
fn main() -> Result<(), Box<dyn Error>> {
panic!("this example requires geo-types")
}

#[cfg(feature="geo-types")]
fn main() -> Result<(), Box<dyn Error>> {
#[derive(Serialize, Deserialize)]
struct Country {
#[serde(
serialize_with = "serialize_geometry",
deserialize_with = "deserialize_geometry"
)]
geometry: geo_types::Geometry,
name: String,
}

let file_reader = BufReader::new(File::open("tests/fixtures/countries.geojson")?);

// Create a Vec of Country structs from the GeoJSON
let countries: Vec<Country> =
geojson::de::deserialize_feature_collection_to_vec::<Country>(file_reader)?;
assert_eq!(countries.len(), 180);

// Write the structs back to GeoJSON
let file_writer = BufWriter::new(File::create("example-output-countries.geojson")?);
geojson::ser::to_feature_collection_writer(file_writer, &countries)?;

Ok(())
}
18 changes: 18 additions & 0 deletions examples/deserialize_to_geojson_types.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
use std::error::Error;
use std::fs::File;
use std::io::{BufReader, BufWriter};

use geojson::FeatureCollection;

fn main() -> Result<(), Box<dyn Error>> {
let file_reader = BufReader::new(File::open("tests/fixtures/countries.geojson")?);

let countries: FeatureCollection = serde_json::from_reader(file_reader)?;
assert_eq!(countries.features.len(), 180);

// Write the structs back to GeoJSON
let file_writer = BufWriter::new(File::create("example-output-countries.geojson")?);
serde_json::to_writer(file_writer, &countries)?;

Ok(())
}
Loading

0 comments on commit 956aa80

Please sign in to comment.