Rewrite `rapidml` from scratch #158

mpadge · 2018-11-25T21:10:25Z

I now know a lot more than when this package started, and can see the rapidml header could fairly easily be re-written from scratch as a custom OSM XML parser that would do the storage during initial reading. Interestingly, and excitingly for @mdsumner's silicate work, this direct store-on-read procedure is only really possible with and because of silicate. The entire OSM structure is in essence fully silicate-compliant, and can be directly stored line-for-line as read.

This should ultimately enable the entire package to be re-written to simply dump directly to SC format, and then use silicate to convert outputs to other formats. (Plus some additional fiddling to insert the "hidden" but necessary row names containing OSM IDs.) MIke, I've done some preliminary comparisons of direct SC-storage, and for the test data set (a chunk of about 1/3 of Melbourne streets), the current 15-16s reduces to about 0.4s. So we're looking at least a tenfold boost in speed, which is well worth pursuing.

Related to general osmdata_sc issue #148.

The text was updated successfully, but these errors were encountered:

mpadge · 2018-11-29T12:29:05Z

So it's not really rapidxml that needs rewriting, the xml2::read_xml() function takes almost all of the XML pre-processing time, with rapidxml just taking a tiny fraction of this. Instead, the commit linked above completes an entire rewrite of the C++ side of osmdata_sc, with the following results tested on a very large OSM document (50MB or so):

> rbenchmark::benchmark (
+                        x <- osmdata_sf (q, doc),
+                        x <- osmdata_sc (q, doc),
+                        replications = 10)
                     test replications elapsed relative user.self sys.self user.child sys.child
2 x <- osmdata_sc(q, doc)           10  17.070    1.000    17.017    0.030          0         0
1 x <- osmdata_sf(q, doc)           10  80.834    4.735    77.251    3.434          0         0

so osmdata_sc() is now around 5 times faster than sf, and leaves plenty of spare processing time for conversion from sc to sf to still likely be ultimately more efficient than current osmdata_sf.

mdsumner · 2018-11-29T12:44:15Z

wow ;)

mpadge mentioned this issue Nov 25, 2018

open-elevation #157

Closed

This was referenced Nov 27, 2018

Parallel #129

Open

osmdata_sc function #148

Closed

mpadge closed this as completed in 0fc6daa Nov 29, 2018

mpadge added a commit that referenced this issue Nov 29, 2018

clean out redundant code in src/osmdata-sc after #158

d00bc82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite `rapidml` from scratch #158

Rewrite `rapidml` from scratch #158

mpadge commented Nov 25, 2018

mpadge commented Nov 29, 2018

mdsumner commented Nov 29, 2018

Rewrite rapidml from scratch #158

Rewrite rapidml from scratch #158

Comments

mpadge commented Nov 25, 2018

mpadge commented Nov 29, 2018

mdsumner commented Nov 29, 2018

Rewrite `rapidml` from scratch #158

Rewrite `rapidml` from scratch #158