Code Reorganization and Geodesic Routines #4

yeesian · 2015-08-31T00:06:28Z

Here's some collected thoughts after having worked on it over the weekend,

Code Organization

We should separate the low-level C-facing routines from the high-level user-facing ones.

lower-level

In the low-level C-facing routines (src/geodesic.jl and src/proj_capi.jl), that's where

pointer arithmetic should be handled
error messages should be dealt with
C-type immutables/structs (for the express purpose of interfacing with the PROJ.4 library, rather than for users) should be defined, and dealt with locally

Preferably the only function arguments that should be pointers are native objects from the proj.4 library. I've made an exception for one of the _transform methods, because it gets re-used alot in the other calls to transform (for reshaping between Array{T,2} and Vector{T}).

higher-level

The higher-level user-facing routines are organized into src/proj_types.jl and src/proj_functions.jl. This being a wrapper library, I'm biased towards

ordered arguments, rather than keyword arguments.
concretely-typed variables, rather than loosely-typed ones
surfacing decisions for the user (through function arguments/naming), rather than making decisions for them; convenience functions can be added at a later stage

This is most obviously felt in transform, transform2 and transform3 (both with and without the bang). It might feel like code-bloat, but the wider variety of function calls is actually useful for the end-user: most of the time, they would know whether or not the data they have is 2 or 3 dimensional. In such cases, they will prefer to save on the checking/branching by calling the most specific function, especially if they're doing the geodesic operations in a tight loop.

Support

We should also provide consistency in user expectation (if violated, it's always from my own wrongdoing), rather than simply documenting interfaces:

calls to functions with vector arguments return vector results, and calls with NxM arrays return NxM results.
Results should not end up mysteriously transposed/etc.
If a argument of a certain type is allowed in some functions, it should be allowed in all "similar"/reasonable functions

(This has implications in the "other thoughts section" when it comes to tightening the range of allowable argument types.)

I don't mind if someone else comes along, and submits a PR to support v0.3, but I personally don't have the energy to maintain that level of support, sorry! I have gone back to clean up all the unused functions (geodesic2geocentric, latlong_from_projection, etc) to scope it down into something I can manage. Also, I suggest we stick with Float64/EPSG/ESRI, and support more general user arguments (Int/Real/etc) only when we have the resources (and are in a good position) to. I personally think that time/effort is better spent on developing the rest of the missing functionality for working with geospatial data (e.g. OGR/etc) though.

Other thoughts

The semantics of Array{T,2} is not entirely clear-cut to me; I can imagine, for instance, that the columns (rather than the rows) might correspond to distinct points. Although it might feel more "general"(i.e. a vector can be "reshaped" into a 2-d matrix), it introduces a lot of degrees of freedom in how it might/should be interpreted. Moreover, it is weirder to support Arrays without also supporting Vectors, rather than vice versa
The length of the code is secondary to me for now; it can be shortened via parameterizing the functions, or meta-programming at a later stage
I have been rather wild and loose with my individual commits, since there was (and still is) a lot of ground to cover. I expect to change after we tag the first release, and hope it doesn't annoy you too much.
re: Version 0.1 todo #2 (comment), I work from a macbook, so I can help with building/testing for OS X, but I hope you can figure out how the build service works too, to increase the bus factor of this package.

to separate the following 1. user-facing convenience functions (in `src/proj_functions.jl`) 2. the low-level wrapper functions (in `src/proj_capi.jl`) 3. type definitions (in `src/proj_types.jl`) The module file (`src/Proj4.jl`) itself should be primarily for organizing imports/exports/versioning-cruft.

Vector{T} will return Vector{T}, and Array{T,2} will return Array{T,2}. (We might want to drop support for 2-dimensional arrays, and request that the user handle it themselves? They can always do a loop and call `vec()`) Tests passing on v0.4 now. Some attempt at trying to provide support for v0.3, but it has segfaults I don’t have the energy to trace.

and code clean-up, removing unused functions

yeesian · 2015-08-31T01:00:34Z

Alright, I'm done here (for now). Have at it!

c42f · 2015-08-31T01:46:29Z

src/proj_functions.jl

+is_geocent(proj::Projection) = _is_geocent(proj.rep)
+
+@doc "Return true if the projection is a geocentric coordinate system" ->
+is_identical(p1::Projection, p2::Projection) = _compare_datums(p1.rep, p2.rep)


I'm not an expert on this stuff, but my impression is that pj_compare_datum only compares a very specific part of the projection definition: the "datum" ie, the underlying latlong coordinate system.

Unfortunately proj doesn't seem to provide a way to fully compare projections - see the comment at the top of pj_transform: https://github.com/OSGeo/proj.4/blob/master/src/pj_transform.c#L77

Thus, I guess there's no neat way to implement is_identical.

oh sure, i should stick to the original name compare_datums then. Copied wrongly from https://github.com/OSGeo/proj.4/blob/5d2af3a89be0898458e4f0fe272affc36642d304/src/pj_transform.c#L448-L453

yeesian · 2015-08-31T01:58:59Z

For reference: https://github.com/jswhit/pyproj/blob/master/_proj.pyx

c42f · 2015-09-04T11:35:16Z

src/Proj4.jl

-    transform!(src, dest, @compat(map(Float64,position)), radians=radians)
+export Projection, # proj_types.jl
+       transform2, transform2!, transform3, transform3!, transform, transform!,  # proj_functions.jl
+       is_latlong, is_geocent, is_identical, spheroid_params,


is_identical -> compare_datums now that you changed it. Ditto for the tests.

Sorry about that!

c42f · 2015-09-04T13:47:14Z

Right, I've only just had the chance to get back to this. Some thoughts to start with:

The low level / high level split is something I had already started on in a minor way and I think it makes sense to move into separate files as you've done here.

higher-level

The higher-level user-facing routines are organized into src/proj_types.jl and src/proj_functions.jl. This being a wrapper library, I'm biased towards

ordered arguments, rather than keyword arguments.

I'm naturally biased toward readability, and I think keyword arguments are far better for "configuration" type stuff like the radians argument, where the usage pattern is very likely to involve a literal constant value in the calling code.

# Consider the positional argument version where a reader will need to look up
# the Pro4.jl API:
transform(srcCoords, destCoords, position, true)

# vs the entirely self-documenting alternative:
transform(srcCoords, destCoords, position, radians=true)

Now, I haven't measured the performance impact of doing this, and that may be a good counteragument. In principle it's zero, but that doesn't seem to currently be the case. I looked at the code_native output for a simple synthetic keyword example and it was a bit of a complexity disaster compared to the positional argument version.

(Example in question:

foo(x, s=false) = s ? x*x : x
bar(x) = foo(x, true)
code_native(bar, (Int,))  # super neat

foo2(x; s::Bool=false) = s ? x*x : x
bar2(x) = foo2(x, s=true)
code_native(bar2, (Int,)) # oh no what

)

concretely-typed variables, rather than loosely-typed ones

Concrete typing makes sense when it's required for passing reference to the underlying C API. Clearly there's no point defining a transform! with anything other than Cdouble arrays, since the underlying API truly doesn't support anything else. OTOH, what exactly does tight typing buy us in the case of value types? From a user point of view it seems the difference between just working and tediously having to cast types?

surfacing decisions for the user (through function arguments/naming), rather than making decisions for them; convenience functions can be added at a later stage

I agree in the cases where the decision may affect performance, and where
there's no clearly correct API choice.

This is most obviously felt in transform, transform2 and transform3 (both with and without the bang). It might feel like code-bloat, but the wider variety of function calls is actually useful for the end-user: most of the time, they would know whether or not the data they have is 2 or 3 dimensional. In such cases, they will prefer to save on the checking/branching by calling the most specific function, especially if they're doing the geodesic operations in a tight loop.

I don't really buy the argument that checking for dimensionality inside transform() - or other similarly simple operations - could be significant for performance. Even simple cartographic projections are likely to involve quite a bit of floating point math internally which should swamp a simple and predictable branch instruction. Lucikly, it should be easy to benchmark and actually find out :-)

c42f · 2015-09-04T13:55:28Z

Another note regarding the radians argument - this is entirely a convenience feature which I stole from pyproj. Under the premise of providing convenience functions at a later stage, it should just be removed entirely, and people will have to stick with radians.

Not that I'm necessarily arguing for removing it - I put in the API because I thought the pyproj people had made a good call about the need for the option, and the appropriate default, given that the world is basically stuck with degrees rather than following the one true way ;-)

yeesian · 2015-09-04T13:56:18Z

Luckily for keyword args, you can define a new function:

transform(srcCoords, destCoords, position; radians=false) = transform(srcCoords, destCoords, position, radians)

c42f · 2015-09-04T14:03:21Z

Good enough I guess, though a bit featury having so many versions of the same thing.

yeesian · 2015-09-04T14:05:07Z

OTOH, what exactly does tight typing buy us in the case of value types? From a user point of view it seems the difference between just working and tediously having to cast types?

As another user coming from python, I agree with you too, about how tedious it is, and it is definitely felt when writing tests. Some thoughts:

If it's for a casual end-user, playing around and exploring the code, it is mildly annoying. Most times though, people should be reading it from databases etc, in which case it should "just work"
If it's for another program, it could range from ("gosh, so annoying") to ("okay, good to know"). I don't have the bandwidth to benchmark, but it makes sense to perform all the operations for conversions in batch, before performing all the projection operations.
We can define higher-order functions that perform the conversions. I just elected to have not done it yet (the whole point of multi-dispatch was to have different behavior for different types; for wrapper libraries, not all types are created equal)
It would involve more work than I am willing to support in a single PR (c.f. 3) as, you have said it too, it feels feature-y

Which brings me to my central point: You could argue -- why not have "one interface to rule it all" -- performing all the "tedious work" for users. The annoyance of having a library function do too much checks is what makes them slower than user/handwritten ones. With multi-dispatch, it is "free" to provide convenience functions around specialized ones, but not vice-versa. It does amount to more work for library developers (but it is work that makes sense) that we don't have to support if we lack the effort/energy for now.

(It doesn't sound like too much work -- until you consider the fact that if the transform function accepts values <: Real, we'll need to support it for all the other functions too, for consistency of user experience)

c42f · 2015-09-04T14:35:43Z

On to the other points.

Support

We should also provide consistency in user expectation (if violated, it's always from my own wrongdoing), rather than simply documenting interfaces:

Yes, API designers sometimes blame users for not reading documentation, but I think that's often just as much lazy design as lazy users. If a design is really done well, users should be able to guess, and guess correctly :)

calls to functions with vector arguments return vector results, and calls with NxM arrays return NxM results.

Agreed.

Results should not end up mysteriously transposed/etc.

Double agreed. This is something that matlab APIs suffer from terribly, the main reason being that they try to guess intended meaning based on matrix shape. Invariably this works in prototyping, only to fail horribly on an edge case in production code.

If a argument of a certain type is allowed in some functions, it should be allowed in all "similar"/reasonable functions

Sounds reasonable.

(This has implications in the "other thoughts section" when it comes to tightening the range of allowable argument types.)

I don't mind if someone else comes along, and submits a PR to support v0.3, but I personally don't have the energy to maintain that level of support, sorry! I have gone back to clean up all the unused functions (geodesic2geocentric, latlong_from_projection, etc) to scope it down into something I can manage. Also, I suggest we stick with Float64/EPSG/ESRI, and support more general user arguments (Int/Real/etc) only when we have the resources (and are in a good position) to. I personally think that time/effort is better spent on developing the rest of the missing functionality for working with geospatial data (e.g. OGR/etc) though.

I never had any particular intention of supporting 0.3, at least not to start with. I'm certainly not against it, but it's just not a priority for me with 0.4 so close. I'm perfectly happy if you just rip out 0.3 support if you like.

Other thoughts

The semantics of Array{T,2} is not entirely clear-cut to me; I can imagine, for instance, that the columns (rather than the rows) might correspond to distinct points. Although it might feel more "general"(i.e. a vector can be "reshaped" into a 2-d matrix), it introduces a lot of degrees of freedom in how it might/should be interpreted. Moreover, it is weirder to support Arrays without also supporting Vectors, rather than vice versa

Indeed, transforming a single vector should be the base functionality, I only started with matrices because large sets of points are my use case. The data layout (Nx3 vs 3xN) is a particularly nasty decision because both conventions are used in practice.

To make matters worse, some people may want to represent their data as vectors of ImmutableArrays.Vector3, or whatever the successor to that library will be once a more efficient NTuple appears in 0.5. In terms of data layout, this is compatible with a 3xN matrix, but the user will be stuck with reinterpreting the data by hand.

I'm afraid this might be a hopeless case until some strong consensus arises in the julia community around the representation for small fixed size vectors.

The length of the code is secondary to me for now; it can be shortened via parameterizing the functions, or meta-programming at a later stage

I have been rather wild and loose with my individual commits, since there was (and still is) a lot of ground to cover. I expect to change after we tag the first release, and hope it doesn't annoy you too much.

No major problem, I'm reviewing all commits here as a big chunk. If the individual commits themselves are messy and not very meaningful, you can always squash them together to neaten things up.

re: Version 0.1 todo #2 (comment), I work from a macbook, so I can help with building/testing for OS X, but I hope you can figure out how the build service works too, to increase the bus factor of this package.

I'll try to tackle this when I get time. No guarantees this will be really soon - I expect real life to intrude in the shape of a new baby within a couple of weeks or so!

c42f · 2015-09-04T14:44:13Z

src/proj_capi.jl

+
+@doc "forward projection from Lat/Lon to X/Y (only supports 2 dimensions)" ->
+function _fwd!(lonlat::Vector{Cdouble}, proj_ptr::Ptr{Void})
+    xy = ccall((:pj_fwd, libproj), ProjUV, (ProjUV, Ptr{Void}), ProjUV(lonlat[1], lonlat[2]), proj_ptr)


pj_fwd and pj_inv both set errno - we should check it.

and identified unnecessary allocations of memory when transforming between radians and degrees

c42f · 2015-09-05T01:35:55Z

src/proj_capi.jl

+@doc "forward projection from Lat/Lon to X/Y (only supports 2 dimensions)" ->
+function _fwd!(lonlat::Vector{Cdouble}, proj_ptr::Ptr{Void})
+    xy = ccall((:pj_fwd, libproj), ProjUV, (ProjUV, Ptr{Void}), ProjUV(lonlat[1], lonlat[2]), proj_ptr)
+    _errno() != 0 &&  error("forward projection error: $(_strerrno())")


Stylistic nitpick: I find this really hard to read. The following "assertion style" equivalent works much better for me, since it can be read as a valid English sentence:

_errno() == 0 || error("forward projection error: $(_strerrno())")

c42f · 2015-09-07T13:39:07Z

It's looking fairly solid, and I'm inclined to merge this soon rather than any later since it's basically blocking me from doing anything further (and probably you too). I have more cleanup and decisions on the naming of API functions in mind, but those can be done later.

However, I tried to run the tests, and found that the geod_ set of functions must be a relatively recent addition - they're not in my version of libproj as installed from the package manager (4.8.0). It'd be good if the wrapper was graceful about not requiring a very recent proj version, given that the basic proj.4 functionality is extremely mature. PyCall has an example of one way to do something similar - it does some switching on pyversion.

c42f · 2015-09-07T13:41:24Z

It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to transform! if it sees fit.

yeesian · 2015-09-07T16:16:44Z

However, I tried to run the tests, and found that the geod_ set of functions must be a relatively recent addition - they're not in my version of libproj as installed from the package manager (4.8.0). It'd be good if the wrapper was graceful about not requiring a very recent proj version, given that the basic proj.4 functionality is extremely mature. PyCall has an example of one way to do something similar - it does some switching on pyversion.

Sure. They don't provide an easy way to query the major.minor version of Proj4 though, so we'll probably have to parse the string returned by pj_get_release().

May I check what it returns for you? E.g.

julia> using Proj4
Proj4.
julia> Proj4.version() # calls pj_get_release()
"Rel. 4.9.1, 04 March 2015"

yeesian · 2015-09-07T16:38:46Z

However, I tried to run the tests, and found that the geod_ set of functions must be a relatively recent addition - they're not in my version of libproj as installed from the package manager (4.8.0). It'd be good if the wrapper was graceful about not requiring a very recent proj version, given that the basic proj.4 functionality is extremely mature.

Ah, that's our blocker, haha. No wonder you prefer the decoupling of the geodesic stuff from the Projections. (The issue is that even the initialization of the Projection [in this PR] already relies on a geod_ function)

I'll see what I can do about it.

yeesian · 2015-09-07T17:15:38Z

Give it a shot now? We'll probably move it into the build script when we eventually have one

yeesian · 2015-09-07T17:17:57Z

It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to transform! if it sees fit.

For another PR/issue maybe? (:

c42f · 2015-09-07T23:18:04Z

It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to transform! if it sees fit.

For another PR/issue maybe? (:

Sure. I just meant: thanks for putting together the benchmark with tech_square.osm.

c42f · 2015-09-07T23:20:17Z

The version installed with ubuntu 14.04:

julia> Proj4.libproj_version()
"Rel. 4.8.0, 6 March 2012"

yeesian · 2015-09-07T23:44:26Z

Ah, the regex I wrote should work for your version too then. See if it runs (and passes) the relevant tests now?

c42f · 2015-09-08T00:45:16Z

Tests pass now! There's a couple of niggles left which I'll note on the associated lines.

c42f · 2015-09-08T00:52:20Z

src/proj_geodesic.jl

+    Algorithms for geodesics (arXiv:1109.4448v2 [physics.geo-ph] 28 Mar 2012)
+""" ->
+function _geod_geodesic(a::Cdouble, f::Cdouble)
+    g_ptr = pointer_from_objref(geod_geodesic())


This seems dangerous, and will likely cause segfaults later. From the docs:

pointer_from_objref(object_instance)

Get the memory address of a Julia object as a Ptr. The existence of the resulting Ptr will not protect the object from garbage collection, so you must ensure that the object remains referenced for the whole time that the Ptr will be used.

The critical thing being that we're not holding a julia reference to the geod_geodesic instance (rather, we're returning a bare pointer) so garbage collection will eventually collect the instance and our pointer will be left dangling.

to geod_(direct | inverse | destination | distance)

c42f · 2015-09-08T01:55:24Z

Ah cool you fixed the REQUIRE, I had to tweak that locally.

Personally I think the only really good thing about the geod_* names is that they're consistent with the underlying library. geod_destination is definitely better than destination though.

c42f · 2015-09-08T02:04:24Z

Having said that, I think I'll merge this now, and do a full review of the API naming consistency once it's clearer what exactly is going to be included.

c42f · 2015-09-08T02:05:46Z

Oh, I like that you renamed ellps_distance for consistency. I didn't notice that immediately.

c42f · 2015-09-08T02:12:29Z

Ok, I rebased and merged. Thanks for all the hard work.

yeesian · 2015-09-08T02:13:03Z

👍 Thanks for your code reviews! the resulting code has significantly improved for it

c42f · 2015-09-08T03:12:03Z

Cheers :)

c42f · 2015-09-08T04:04:39Z

Oh, BTW I did rebase your changes before merging to make the history simpler. It might be worth it if you reset your master before doing anything more, just to keep the history tidy. I hope that's not confusing, it's always hard to know how much attention to pay to keeping the history graph readable.

yeesian · 2015-09-08T04:21:10Z

I'm afraid it is confusing (I could always nuke my repo, and re-fork); if it's not too much trouble, can you do a set of instructions for me to follow (after which I'll probably have a good sense of what's happening)?

c42f · 2015-09-08T04:29:02Z

Ack sorry about that, now that I've been using if for quite a while I sometimes forget how confusing git is.

All you should need to do is reset your master branch to FugroRoames/master:

git checkout master
# Now, ensure you don't have any local changes on master or in the working copy!
# The following resets the current branch (master) to point to FugroRoames/master,
# assuming your git remote name for the FugroRoames/Proj4 is actually FugroRoames.
git reset --hard FugroRoames/master

After that, branch any new changes off the top of your master.

yeesian added 13 commits August 28, 2015 22:59

Provide ESRI and EPSG codes

f77c2fd

fix mispelling

2ac8d6f

backwards compatibility for v0.3

359b6cc

runtests: print projection strings that couldn't be parsed

d596b15

merge upstream changes

a30cb0e

low-level wrapper

b1a83b0

wrap more low-level proj_api functions

9795c8e

implement functions for geodesic operations

40f8dab

added new methods and tests

e92bc95

bugfixes for v0.3

4761bfc

drop support for v0.3

a9e5fcf

and code clean-up, removing unused functions

yeesian mentioned this pull request Aug 31, 2015

Some reorganization #3

Closed

yeesian mentioned this pull request Aug 31, 2015

Version 0.1 todo #2

Closed

5 tasks

c42f reviewed Aug 31, 2015
View reviewed changes

c42f reviewed Sep 4, 2015
View reviewed changes

rename is_identical -> compare_datums

ed926de

c42f reviewed Sep 4, 2015
View reviewed changes

remove transform2 and transform3

ee4228b

and identified unnecessary allocations of memory when transforming between radians and degrees

c42f reviewed Sep 5, 2015
View reviewed changes

improved assertion style

650827e

conditionally load geodesic routines

517dcdf

c42f reviewed Sep 8, 2015
View reviewed changes

yeesian added 3 commits September 7, 2015 21:13

fix dangling pointer for geod_geodesic

d21c115

update REQUIRE

cf5e8ea

rename geodesic methods

36f3864

to geod_(direct | inverse | destination | distance)

c42f closed this Sep 8, 2015

yeesian deleted the geod branch September 8, 2015 04:33

Code Reorganization and Geodesic Routines #4

Code Reorganization and Geodesic Routines #4

Conversation

yeesian commented Aug 31, 2015

Code Organization

lower-level

higher-level

Support

Other thoughts

yeesian commented Aug 31, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeesian commented Aug 31, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c42f commented Sep 4, 2015

higher-level

c42f commented Sep 4, 2015

yeesian commented Sep 4, 2015

c42f commented Sep 4, 2015

yeesian commented Sep 4, 2015

c42f commented Sep 4, 2015

Support

Other thoughts

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c42f commented Sep 7, 2015

c42f commented Sep 7, 2015

yeesian commented Sep 7, 2015

yeesian commented Sep 7, 2015

yeesian commented Sep 7, 2015

yeesian commented Sep 7, 2015

c42f commented Sep 7, 2015

c42f commented Sep 7, 2015

yeesian commented Sep 7, 2015

c42f commented Sep 8, 2015

Choose a reason for hiding this comment

c42f commented Sep 8, 2015

c42f commented Sep 8, 2015

c42f commented Sep 8, 2015

c42f commented Sep 8, 2015

yeesian commented Sep 8, 2015

c42f commented Sep 8, 2015

c42f commented Sep 8, 2015

yeesian commented Sep 8, 2015

c42f commented Sep 8, 2015