Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Reorganization and Geodesic Routines #4

Closed
wants to merge 22 commits into from
Closed

Code Reorganization and Geodesic Routines #4

wants to merge 22 commits into from

Conversation

yeesian
Copy link
Member

@yeesian yeesian commented Aug 31, 2015

Here's some collected thoughts after having worked on it over the weekend,

Code Organization

We should separate the low-level C-facing routines from the high-level user-facing ones.

lower-level

In the low-level C-facing routines (src/geodesic.jl and src/proj_capi.jl), that's where

  • pointer arithmetic should be handled
  • error messages should be dealt with
  • C-type immutables/structs (for the express purpose of interfacing with the PROJ.4 library, rather than for users) should be defined, and dealt with locally

Preferably the only function arguments that should be pointers are native objects from the proj.4 library. I've made an exception for one of the _transform methods, because it gets re-used alot in the other calls to transform (for reshaping between Array{T,2} and Vector{T}).

higher-level

The higher-level user-facing routines are organized into src/proj_types.jl and src/proj_functions.jl. This being a wrapper library, I'm biased towards

  • ordered arguments, rather than keyword arguments.
  • concretely-typed variables, rather than loosely-typed ones
  • surfacing decisions for the user (through function arguments/naming), rather than making decisions for them; convenience functions can be added at a later stage

This is most obviously felt in transform, transform2 and transform3 (both with and without the bang). It might feel like code-bloat, but the wider variety of function calls is actually useful for the end-user: most of the time, they would know whether or not the data they have is 2 or 3 dimensional. In such cases, they will prefer to save on the checking/branching by calling the most specific function, especially if they're doing the geodesic operations in a tight loop.

Support

We should also provide consistency in user expectation (if violated, it's always from my own wrongdoing), rather than simply documenting interfaces:

  1. calls to functions with vector arguments return vector results, and calls with NxM arrays return NxM results.
  2. Results should not end up mysteriously transposed/etc.
  3. If a argument of a certain type is allowed in some functions, it should be allowed in all "similar"/reasonable functions

(This has implications in the "other thoughts section" when it comes to tightening the range of allowable argument types.)

I don't mind if someone else comes along, and submits a PR to support v0.3, but I personally don't have the energy to maintain that level of support, sorry! I have gone back to clean up all the unused functions (geodesic2geocentric, latlong_from_projection, etc) to scope it down into something I can manage. Also, I suggest we stick with Float64/EPSG/ESRI, and support more general user arguments (Int/Real/etc) only when we have the resources (and are in a good position) to. I personally think that time/effort is better spent on developing the rest of the missing functionality for working with geospatial data (e.g. OGR/etc) though.

Other thoughts

  • The semantics of Array{T,2} is not entirely clear-cut to me; I can imagine, for instance, that the columns (rather than the rows) might correspond to distinct points. Although it might feel more "general"(i.e. a vector can be "reshaped" into a 2-d matrix), it introduces a lot of degrees of freedom in how it might/should be interpreted. Moreover, it is weirder to support Arrays without also supporting Vectors, rather than vice versa
  • The length of the code is secondary to me for now; it can be shortened via parameterizing the functions, or meta-programming at a later stage
  • I have been rather wild and loose with my individual commits, since there was (and still is) a lot of ground to cover. I expect to change after we tag the first release, and hope it doesn't annoy you too much.
  • re: Version 0.1 todo #2 (comment), I work from a macbook, so I can help with building/testing for OS X, but I hope you can figure out how the build service works too, to increase the bus factor of this package.

yeesian added 13 commits August 28, 2015 22:59
to separate the following

1. user-facing convenience functions (in `src/proj_functions.jl`)
2. the low-level wrapper functions (in `src/proj_capi.jl`)
3. type definitions (in `src/proj_types.jl`)

The module file (`src/Proj4.jl`) itself should be primarily for
organizing imports/exports/versioning-cruft.
Vector{T} will return Vector{T}, and Array{T,2} will return Array{T,2}.
(We might want to drop support for 2-dimensional arrays, and request
that the user handle it themselves? They can always do a loop and call
`vec()`)

Tests passing on v0.4 now.

Some attempt at trying to provide support for v0.3, but it has
segfaults I don’t have the energy to trace.
and code clean-up, removing unused functions
@yeesian yeesian mentioned this pull request Aug 31, 2015
@yeesian
Copy link
Member Author

yeesian commented Aug 31, 2015

Alright, I'm done here (for now). Have at it!

@yeesian yeesian mentioned this pull request Aug 31, 2015
5 tasks
is_geocent(proj::Projection) = _is_geocent(proj.rep)

@doc "Return true if the projection is a geocentric coordinate system" ->
is_identical(p1::Projection, p2::Projection) = _compare_datums(p1.rep, p2.rep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on this stuff, but my impression is that pj_compare_datum only compares a very specific part of the projection definition: the "datum" ie, the underlying latlong coordinate system.

Unfortunately proj doesn't seem to provide a way to fully compare projections - see the comment at the top of pj_transform: https://github.com/OSGeo/proj.4/blob/master/src/pj_transform.c#L77

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, I guess there's no neat way to implement is_identical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sure, i should stick to the original name compare_datums then. Copied wrongly from https://github.com/OSGeo/proj.4/blob/5d2af3a89be0898458e4f0fe272affc36642d304/src/pj_transform.c#L448-L453

@yeesian
Copy link
Member Author

yeesian commented Aug 31, 2015

transform!(src, dest, @compat(map(Float64,position)), radians=radians)
export Projection, # proj_types.jl
transform2, transform2!, transform3, transform3!, transform, transform!, # proj_functions.jl
is_latlong, is_geocent, is_identical, spheroid_params,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_identical -> compare_datums now that you changed it. Ditto for the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that!

@c42f
Copy link
Member

c42f commented Sep 4, 2015

Right, I've only just had the chance to get back to this. Some thoughts to start with:

The low level / high level split is something I had already started on in a minor way and I think it makes sense to move into separate files as you've done here.

higher-level

The higher-level user-facing routines are organized into src/proj_types.jl and src/proj_functions.jl. This being a wrapper library, I'm biased towards

  • ordered arguments, rather than keyword arguments.

I'm naturally biased toward readability, and I think keyword arguments are far better for "configuration" type stuff like the radians argument, where the usage pattern is very likely to involve a literal constant value in the calling code.

# Consider the positional argument version where a reader will need to look up
# the Pro4.jl API:
transform(srcCoords, destCoords, position, true)

# vs the entirely self-documenting alternative:
transform(srcCoords, destCoords, position, radians=true)

Now, I haven't measured the performance impact of doing this, and that may be a good counteragument. In principle it's zero, but that doesn't seem to currently be the case. I looked at the code_native output for a simple synthetic keyword example and it was a bit of a complexity disaster compared to the positional argument version.

(Example in question:

foo(x, s=false) = s ? x*x : x
bar(x) = foo(x, true)
code_native(bar, (Int,))  # super neat

foo2(x; s::Bool=false) = s ? x*x : x
bar2(x) = foo2(x, s=true)
code_native(bar2, (Int,)) # oh no what

)

  • concretely-typed variables, rather than loosely-typed ones

Concrete typing makes sense when it's required for passing reference to the underlying C API. Clearly there's no point defining a transform! with anything other than Cdouble arrays, since the underlying API truly doesn't support anything else. OTOH, what exactly does tight typing buy us in the case of value types? From a user point of view it seems the difference between just working and tediously having to cast types?

  • surfacing decisions for the user (through function arguments/naming), rather than making decisions for them; convenience functions can be added at a later stage

I agree in the cases where the decision may affect performance, and where
there's no clearly correct API choice.

This is most obviously felt in transform, transform2 and transform3 (both with and without the bang). It might feel like code-bloat, but the wider variety of function calls is actually useful for the end-user: most of the time, they would know whether or not the data they have is 2 or 3 dimensional. In such cases, they will prefer to save on the checking/branching by calling the most specific function, especially if they're doing the geodesic operations in a tight loop.

I don't really buy the argument that checking for dimensionality inside transform() - or other similarly simple operations - could be significant for performance. Even simple cartographic projections are likely to involve quite a bit of floating point math internally which should swamp a simple and predictable branch instruction. Lucikly, it should be easy to benchmark and actually find out :-)

@c42f
Copy link
Member

c42f commented Sep 4, 2015

Another note regarding the radians argument - this is entirely a convenience feature which I stole from pyproj. Under the premise of providing convenience functions at a later stage, it should just be removed entirely, and people will have to stick with radians.

Not that I'm necessarily arguing for removing it - I put in the API because I thought the pyproj people had made a good call about the need for the option, and the appropriate default, given that the world is basically stuck with degrees rather than following the one true way ;-)

@yeesian
Copy link
Member Author

yeesian commented Sep 4, 2015

Luckily for keyword args, you can define a new function:

transform(srcCoords, destCoords, position; radians=false) = transform(srcCoords, destCoords, position, radians)

@c42f
Copy link
Member

c42f commented Sep 4, 2015

Good enough I guess, though a bit featury having so many versions of the same thing.

@yeesian
Copy link
Member Author

yeesian commented Sep 4, 2015

OTOH, what exactly does tight typing buy us in the case of value types? From a user point of view it seems the difference between just working and tediously having to cast types?

As another user coming from python, I agree with you too, about how tedious it is, and it is definitely felt when writing tests. Some thoughts:

  1. If it's for a casual end-user, playing around and exploring the code, it is mildly annoying. Most times though, people should be reading it from databases etc, in which case it should "just work"
  2. If it's for another program, it could range from ("gosh, so annoying") to ("okay, good to know"). I don't have the bandwidth to benchmark, but it makes sense to perform all the operations for conversions in batch, before performing all the projection operations.
  3. We can define higher-order functions that perform the conversions. I just elected to have not done it yet (the whole point of multi-dispatch was to have different behavior for different types; for wrapper libraries, not all types are created equal)
  4. It would involve more work than I am willing to support in a single PR (c.f. 3) as, you have said it too, it feels feature-y

Which brings me to my central point: You could argue -- why not have "one interface to rule it all" -- performing all the "tedious work" for users. The annoyance of having a library function do too much checks is what makes them slower than user/handwritten ones. With multi-dispatch, it is "free" to provide convenience functions around specialized ones, but not vice-versa. It does amount to more work for library developers (but it is work that makes sense) that we don't have to support if we lack the effort/energy for now.

(It doesn't sound like too much work -- until you consider the fact that if the transform function accepts values <: Real, we'll need to support it for all the other functions too, for consistency of user experience)

@c42f
Copy link
Member

c42f commented Sep 4, 2015

On to the other points.

Support

We should also provide consistency in user expectation (if violated, it's always from my own wrongdoing), rather than simply documenting interfaces:

Yes, API designers sometimes blame users for not reading documentation, but I think that's often just as much lazy design as lazy users. If a design is really done well, users should be able to guess, and guess correctly :)

  1. calls to functions with vector arguments return vector results, and calls with NxM arrays return NxM results.

Agreed.

  1. Results should not end up mysteriously transposed/etc.

Double agreed. This is something that matlab APIs suffer from terribly, the main reason being that they try to guess intended meaning based on matrix shape. Invariably this works in prototyping, only to fail horribly on an edge case in production code.

  1. If a argument of a certain type is allowed in some functions, it should be allowed in all "similar"/reasonable functions

Sounds reasonable.

(This has implications in the "other thoughts section" when it comes to tightening the range of allowable argument types.)

I don't mind if someone else comes along, and submits a PR to support v0.3, but I personally don't have the energy to maintain that level of support, sorry! I have gone back to clean up all the unused functions (geodesic2geocentric, latlong_from_projection, etc) to scope it down into something I can manage. Also, I suggest we stick with Float64/EPSG/ESRI, and support more general user arguments (Int/Real/etc) only when we have the resources (and are in a good position) to. I personally think that time/effort is better spent on developing the rest of the missing functionality for working with geospatial data (e.g. OGR/etc) though.

I never had any particular intention of supporting 0.3, at least not to start with. I'm certainly not against it, but it's just not a priority for me with 0.4 so close. I'm perfectly happy if you just rip out 0.3 support if you like.

Other thoughts

  • The semantics of Array{T,2} is not entirely clear-cut to me; I can imagine, for instance, that the columns (rather than the rows) might correspond to distinct points. Although it might feel more "general"(i.e. a vector can be "reshaped" into a 2-d matrix), it introduces a lot of degrees of freedom in how it might/should be interpreted. Moreover, it is weirder to support Arrays without also supporting Vectors, rather than vice versa

Indeed, transforming a single vector should be the base functionality, I only started with matrices because large sets of points are my use case. The data layout (Nx3 vs 3xN) is a particularly nasty decision because both conventions are used in practice.

To make matters worse, some people may want to represent their data as vectors of ImmutableArrays.Vector3, or whatever the successor to that library will be once a more efficient NTuple appears in 0.5. In terms of data layout, this is compatible with a 3xN matrix, but the user will be stuck with reinterpreting the data by hand.

I'm afraid this might be a hopeless case until some strong consensus arises in the julia community around the representation for small fixed size vectors.

  • The length of the code is secondary to me for now; it can be shortened via parameterizing the functions, or meta-programming at a later stage
  • I have been rather wild and loose with my individual commits, since there was (and still is) a lot of ground to cover. I expect to change after we tag the first release, and hope it doesn't annoy you too much.

No major problem, I'm reviewing all commits here as a big chunk. If the individual commits themselves are messy and not very meaningful, you can always squash them together to neaten things up.

  • re: Version 0.1 todo #2 (comment), I work from a macbook, so I can help with building/testing for OS X, but I hope you can figure out how the build service works too, to increase the bus factor of this package.

I'll try to tackle this when I get time. No guarantees this will be really soon - I expect real life to intrude in the shape of a new baby within a couple of weeks or so!


@doc "forward projection from Lat/Lon to X/Y (only supports 2 dimensions)" ->
function _fwd!(lonlat::Vector{Cdouble}, proj_ptr::Ptr{Void})
xy = ccall((:pj_fwd, libproj), ProjUV, (ProjUV, Ptr{Void}), ProjUV(lonlat[1], lonlat[2]), proj_ptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pj_fwd and pj_inv both set errno - we should check it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

and identified unnecessary allocations of memory when transforming between radians and degrees
@doc "forward projection from Lat/Lon to X/Y (only supports 2 dimensions)" ->
function _fwd!(lonlat::Vector{Cdouble}, proj_ptr::Ptr{Void})
xy = ccall((:pj_fwd, libproj), ProjUV, (ProjUV, Ptr{Void}), ProjUV(lonlat[1], lonlat[2]), proj_ptr)
_errno() != 0 && error("forward projection error: $(_strerrno())")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic nitpick: I find this really hard to read. The following "assertion style" equivalent works much better for me, since it can be read as a valid English sentence:

_errno() == 0 || error("forward projection error: $(_strerrno())")

@c42f
Copy link
Member

c42f commented Sep 7, 2015

It's looking fairly solid, and I'm inclined to merge this soon rather than any later since it's basically blocking me from doing anything further (and probably you too). I have more cleanup and decisions on the naming of API functions in mind, but those can be done later.

However, I tried to run the tests, and found that the geod_ set of functions must be a relatively recent addition - they're not in my version of libproj as installed from the package manager (4.8.0). It'd be good if the wrapper was graceful about not requiring a very recent proj version, given that the basic proj.4 functionality is extremely mature. PyCall has an example of one way to do something similar - it does some switching on pyversion.

@c42f
Copy link
Member

c42f commented Sep 7, 2015

It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to transform! if it sees fit.

@yeesian
Copy link
Member Author

yeesian commented Sep 7, 2015

However, I tried to run the tests, and found that the geod_ set of functions must be a relatively recent addition - they're not in my version of libproj as installed from the package manager (4.8.0). It'd be good if the wrapper was graceful about not requiring a very recent proj version, given that the basic proj.4 functionality is extremely mature. PyCall has an example of one way to do something similar - it does some switching on pyversion.

Sure. They don't provide an easy way to query the major.minor version of Proj4 though, so we'll probably have to parse the string returned by pj_get_release().

May I check what it returns for you? E.g.

julia> using Proj4
Proj4.
julia> Proj4.version() # calls pj_get_release()
"Rel. 4.9.1, 04 March 2015"

@yeesian
Copy link
Member Author

yeesian commented Sep 7, 2015

However, I tried to run the tests, and found that the geod_ set of functions must be a relatively recent addition - they're not in my version of libproj as installed from the package manager (4.8.0). It'd be good if the wrapper was graceful about not requiring a very recent proj version, given that the basic proj.4 functionality is extremely mature.

Ah, that's our blocker, haha. No wonder you prefer the decoupling of the geodesic stuff from the Projections. (The issue is that even the initialization of the Projection [in this PR] already relies on a geod_ function)

I'll see what I can do about it.

@yeesian
Copy link
Member Author

yeesian commented Sep 7, 2015

Give it a shot now? We'll probably move it into the build script when we eventually have one

@yeesian
Copy link
Member Author

yeesian commented Sep 7, 2015

It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to transform! if it sees fit.

For another PR/issue maybe? (:

@c42f
Copy link
Member

c42f commented Sep 7, 2015

It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to transform! if it sees fit.

For another PR/issue maybe? (:

Sure. I just meant: thanks for putting together the benchmark with tech_square.osm.

@c42f
Copy link
Member

c42f commented Sep 7, 2015

The version installed with ubuntu 14.04:

julia> Proj4.libproj_version()
"Rel. 4.8.0, 6 March 2012"

@yeesian
Copy link
Member Author

yeesian commented Sep 7, 2015

Ah, the regex I wrote should work for your version too then. See if it runs (and passes) the relevant tests now?

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Tests pass now! There's a couple of niggles left which I'll note on the associated lines.

Algorithms for geodesics (arXiv:1109.4448v2 [physics.geo-ph] 28 Mar 2012)
""" ->
function _geod_geodesic(a::Cdouble, f::Cdouble)
g_ptr = pointer_from_objref(geod_geodesic())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dangerous, and will likely cause segfaults later. From the docs:

pointer_from_objref(object_instance)

Get the memory address of a Julia object as a Ptr. The existence of the resulting Ptr will not protect the object from garbage collection, so you must ensure that the object remains referenced for the whole time that the Ptr will be used.

The critical thing being that we're not holding a julia reference to the geod_geodesic instance (rather, we're returning a bare pointer) so garbage collection will eventually collect the instance and our pointer will be left dangling.

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Ah cool you fixed the REQUIRE, I had to tweak that locally.

Personally I think the only really good thing about the geod_* names is that they're consistent with the underlying library. geod_destination is definitely better than destination though.

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Having said that, I think I'll merge this now, and do a full review of the API naming consistency once it's clearer what exactly is going to be included.

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Oh, I like that you renamed ellps_distance for consistency. I didn't notice that immediately.

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Ok, I rebased and merged. Thanks for all the hard work.

@c42f c42f closed this Sep 8, 2015
@yeesian
Copy link
Member Author

yeesian commented Sep 8, 2015

👍 Thanks for your code reviews! the resulting code has significantly improved for it

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Cheers :)

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Oh, BTW I did rebase your changes before merging to make the history simpler. It might be worth it if you reset your master before doing anything more, just to keep the history tidy. I hope that's not confusing, it's always hard to know how much attention to pay to keeping the history graph readable.

@yeesian
Copy link
Member Author

yeesian commented Sep 8, 2015

I'm afraid it is confusing (I could always nuke my repo, and re-fork); if it's not too much trouble, can you do a set of instructions for me to follow (after which I'll probably have a good sense of what's happening)?

@c42f
Copy link
Member

c42f commented Sep 8, 2015

Ack sorry about that, now that I've been using if for quite a while I sometimes forget how confusing git is.

All you should need to do is reset your master branch to FugroRoames/master:

git checkout master
# Now, ensure you don't have any local changes on master or in the working copy!
# The following resets the current branch (master) to point to FugroRoames/master,
# assuming your git remote name for the FugroRoames/Proj4 is actually FugroRoames.
git reset --hard FugroRoames/master

After that, branch any new changes off the top of your master.

@yeesian yeesian deleted the geod branch September 8, 2015 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants