-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Reorganization and Geodesic Routines #4
Conversation
to separate the following 1. user-facing convenience functions (in `src/proj_functions.jl`) 2. the low-level wrapper functions (in `src/proj_capi.jl`) 3. type definitions (in `src/proj_types.jl`) The module file (`src/Proj4.jl`) itself should be primarily for organizing imports/exports/versioning-cruft.
Vector{T} will return Vector{T}, and Array{T,2} will return Array{T,2}. (We might want to drop support for 2-dimensional arrays, and request that the user handle it themselves? They can always do a loop and call `vec()`) Tests passing on v0.4 now. Some attempt at trying to provide support for v0.3, but it has segfaults I don’t have the energy to trace.
and code clean-up, removing unused functions
Alright, I'm done here (for now). Have at it! |
is_geocent(proj::Projection) = _is_geocent(proj.rep) | ||
|
||
@doc "Return true if the projection is a geocentric coordinate system" -> | ||
is_identical(p1::Projection, p2::Projection) = _compare_datums(p1.rep, p2.rep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on this stuff, but my impression is that pj_compare_datum
only compares a very specific part of the projection definition: the "datum" ie, the underlying latlong coordinate system.
Unfortunately proj doesn't seem to provide a way to fully compare projections - see the comment at the top of pj_transform
: https://github.com/OSGeo/proj.4/blob/master/src/pj_transform.c#L77
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus, I guess there's no neat way to implement is_identical
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sure, i should stick to the original name compare_datums
then. Copied wrongly from https://github.com/OSGeo/proj.4/blob/5d2af3a89be0898458e4f0fe272affc36642d304/src/pj_transform.c#L448-L453
transform!(src, dest, @compat(map(Float64,position)), radians=radians) | ||
export Projection, # proj_types.jl | ||
transform2, transform2!, transform3, transform3!, transform, transform!, # proj_functions.jl | ||
is_latlong, is_geocent, is_identical, spheroid_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_identical -> compare_datums now that you changed it. Ditto for the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that!
Right, I've only just had the chance to get back to this. Some thoughts to start with: The low level / high level split is something I had already started on in a minor way and I think it makes sense to move into separate files as you've done here.
I'm naturally biased toward readability, and I think keyword arguments are far better for "configuration" type stuff like the radians argument, where the usage pattern is very likely to involve a literal constant value in the calling code. # Consider the positional argument version where a reader will need to look up
# the Pro4.jl API:
transform(srcCoords, destCoords, position, true)
# vs the entirely self-documenting alternative:
transform(srcCoords, destCoords, position, radians=true) Now, I haven't measured the performance impact of doing this, and that may be a good counteragument. In principle it's zero, but that doesn't seem to currently be the case. I looked at the code_native output for a simple synthetic keyword example and it was a bit of a complexity disaster compared to the positional argument version. (Example in question: foo(x, s=false) = s ? x*x : x
bar(x) = foo(x, true)
code_native(bar, (Int,)) # super neat
foo2(x; s::Bool=false) = s ? x*x : x
bar2(x) = foo2(x, s=true)
code_native(bar2, (Int,)) # oh no what )
Concrete typing makes sense when it's required for passing reference to the underlying C API. Clearly there's no point defining a
I agree in the cases where the decision may affect performance, and where
I don't really buy the argument that checking for dimensionality inside |
Another note regarding the radians argument - this is entirely a convenience feature which I stole from pyproj. Under the premise of providing convenience functions at a later stage, it should just be removed entirely, and people will have to stick with radians. Not that I'm necessarily arguing for removing it - I put in the API because I thought the pyproj people had made a good call about the need for the option, and the appropriate default, given that the world is basically stuck with degrees rather than following the one true way ;-) |
Luckily for keyword args, you can define a new function: transform(srcCoords, destCoords, position; radians=false) = transform(srcCoords, destCoords, position, radians) |
Good enough I guess, though a bit featury having so many versions of the same thing. |
As another user coming from python, I agree with you too, about how tedious it is, and it is definitely felt when writing tests. Some thoughts:
Which brings me to my central point: You could argue -- why not have "one interface to rule it all" -- performing all the "tedious work" for users. The annoyance of having a library function do too much checks is what makes them slower than user/handwritten ones. With multi-dispatch, it is "free" to provide convenience functions around specialized ones, but not vice-versa. It does amount to more work for library developers (but it is work that makes sense) that we don't have to support if we lack the effort/energy for now. (It doesn't sound like too much work -- until you consider the fact that if the |
On to the other points.
Yes, API designers sometimes blame users for not reading documentation, but I think that's often just as much lazy design as lazy users. If a design is really done well, users should be able to guess, and guess correctly :)
Agreed.
Double agreed. This is something that matlab APIs suffer from terribly, the main reason being that they try to guess intended meaning based on matrix shape. Invariably this works in prototyping, only to fail horribly on an edge case in production code.
Sounds reasonable.
I never had any particular intention of supporting 0.3, at least not to start with. I'm certainly not against it, but it's just not a priority for me with 0.4 so close. I'm perfectly happy if you just rip out 0.3 support if you like.
Indeed, transforming a single vector should be the base functionality, I only started with matrices because large sets of points are my use case. The data layout (Nx3 vs 3xN) is a particularly nasty decision because both conventions are used in practice. To make matters worse, some people may want to represent their data as vectors of ImmutableArrays.Vector3, or whatever the successor to that library will be once a more efficient NTuple appears in 0.5. In terms of data layout, this is compatible with a 3xN matrix, but the user will be stuck with reinterpreting the data by hand. I'm afraid this might be a hopeless case until some strong consensus arises in the julia community around the representation for small fixed size vectors.
No major problem, I'm reviewing all commits here as a big chunk. If the individual commits themselves are messy and not very meaningful, you can always squash them together to neaten things up.
I'll try to tackle this when I get time. No guarantees this will be really soon - I expect real life to intrude in the shape of a new baby within a couple of weeks or so! |
|
||
@doc "forward projection from Lat/Lon to X/Y (only supports 2 dimensions)" -> | ||
function _fwd!(lonlat::Vector{Cdouble}, proj_ptr::Ptr{Void}) | ||
xy = ccall((:pj_fwd, libproj), ProjUV, (ProjUV, Ptr{Void}), ProjUV(lonlat[1], lonlat[2]), proj_ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pj_fwd and pj_inv both set errno - we should check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
and identified unnecessary allocations of memory when transforming between radians and degrees
@doc "forward projection from Lat/Lon to X/Y (only supports 2 dimensions)" -> | ||
function _fwd!(lonlat::Vector{Cdouble}, proj_ptr::Ptr{Void}) | ||
xy = ccall((:pj_fwd, libproj), ProjUV, (ProjUV, Ptr{Void}), ProjUV(lonlat[1], lonlat[2]), proj_ptr) | ||
_errno() != 0 && error("forward projection error: $(_strerrno())") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic nitpick: I find this really hard to read. The following "assertion style" equivalent works much better for me, since it can be read as a valid English sentence:
_errno() == 0 || error("forward projection error: $(_strerrno())")
It's looking fairly solid, and I'm inclined to merge this soon rather than any later since it's basically blocking me from doing anything further (and probably you too). I have more cleanup and decisions on the naming of API functions in mind, but those can be done later. However, I tried to run the tests, and found that the |
It's great to have an example benchmark by the way. It might be worth considering putting the test loops inside a function so that the optimizer has something to chew on and a chance to inline the calls to |
Sure. They don't provide an easy way to query the major.minor version of Proj4 though, so we'll probably have to parse the string returned by May I check what it returns for you? E.g. julia> using Proj4
Proj4.
julia> Proj4.version() # calls pj_get_release()
"Rel. 4.9.1, 04 March 2015" |
Ah, that's our blocker, haha. No wonder you prefer the decoupling of the geodesic stuff from the Projections. (The issue is that even the initialization of the Projection [in this PR] already relies on a I'll see what I can do about it. |
Give it a shot now? We'll probably move it into the build script when we eventually have one |
For another PR/issue maybe? (: |
Sure. I just meant: thanks for putting together the benchmark with tech_square.osm. |
The version installed with ubuntu 14.04:
|
Ah, the regex I wrote should work for your version too then. See if it runs (and passes) the relevant tests now? |
Tests pass now! There's a couple of niggles left which I'll note on the associated lines. |
Algorithms for geodesics (arXiv:1109.4448v2 [physics.geo-ph] 28 Mar 2012) | ||
""" -> | ||
function _geod_geodesic(a::Cdouble, f::Cdouble) | ||
g_ptr = pointer_from_objref(geod_geodesic()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems dangerous, and will likely cause segfaults later. From the docs:
pointer_from_objref(object_instance)
Get the memory address of a Julia object as a Ptr. The existence of the resulting Ptr will not protect the object from garbage collection, so you must ensure that the object remains referenced for the whole time that the Ptr will be used.
The critical thing being that we're not holding a julia reference to the geod_geodesic
instance (rather, we're returning a bare pointer) so garbage collection will eventually collect the instance and our pointer will be left dangling.
to geod_(direct | inverse | destination | distance)
Ah cool you fixed the REQUIRE, I had to tweak that locally. Personally I think the only really good thing about the |
Having said that, I think I'll merge this now, and do a full review of the API naming consistency once it's clearer what exactly is going to be included. |
Oh, I like that you renamed |
Ok, I rebased and merged. Thanks for all the hard work. |
👍 Thanks for your code reviews! the resulting code has significantly improved for it |
Cheers :) |
Oh, BTW I did rebase your changes before merging to make the history simpler. It might be worth it if you reset your master before doing anything more, just to keep the history tidy. I hope that's not confusing, it's always hard to know how much attention to pay to keeping the history graph readable. |
I'm afraid it is confusing (I could always nuke my repo, and re-fork); if it's not too much trouble, can you do a set of instructions for me to follow (after which I'll probably have a good sense of what's happening)? |
Ack sorry about that, now that I've been using if for quite a while I sometimes forget how confusing git is. All you should need to do is reset your master branch to FugroRoames/master: git checkout master
# Now, ensure you don't have any local changes on master or in the working copy!
# The following resets the current branch (master) to point to FugroRoames/master,
# assuming your git remote name for the FugroRoames/Proj4 is actually FugroRoames.
git reset --hard FugroRoames/master After that, branch any new changes off the top of your master. |
Here's some collected thoughts after having worked on it over the weekend,
Code Organization
We should separate the low-level C-facing routines from the high-level user-facing ones.
lower-level
In the low-level C-facing routines (
src/geodesic.jl
andsrc/proj_capi.jl
), that's wherePreferably the only function arguments that should be pointers are native objects from the proj.4 library. I've made an exception for one of the
_transform
methods, because it gets re-used alot in the other calls totransform
(for reshaping betweenArray{T,2}
andVector{T}
).higher-level
The higher-level user-facing routines are organized into
src/proj_types.jl
andsrc/proj_functions.jl
. This being a wrapper library, I'm biased towardsThis is most obviously felt in
transform
,transform2
andtransform3
(both with and without the bang). It might feel like code-bloat, but the wider variety of function calls is actually useful for the end-user: most of the time, they would know whether or not the data they have is 2 or 3 dimensional. In such cases, they will prefer to save on the checking/branching by calling the most specific function, especially if they're doing the geodesic operations in a tight loop.Support
We should also provide consistency in user expectation (if violated, it's always from my own wrongdoing), rather than simply documenting interfaces:
(This has implications in the "other thoughts section" when it comes to tightening the range of allowable argument types.)
I don't mind if someone else comes along, and submits a PR to support v0.3, but I personally don't have the energy to maintain that level of support, sorry! I have gone back to clean up all the unused functions (
geodesic2geocentric
,latlong_from_projection
, etc) to scope it down into something I can manage. Also, I suggest we stick with Float64/EPSG/ESRI, and support more general user arguments (Int/Real/etc) only when we have the resources (and are in a good position) to. I personally think that time/effort is better spent on developing the rest of the missing functionality for working with geospatial data (e.g. OGR/etc) though.Other thoughts
Array{T,2}
is not entirely clear-cut to me; I can imagine, for instance, that the columns (rather than the rows) might correspond to distinct points. Although it might feel more "general"(i.e. a vector can be "reshaped" into a 2-d matrix), it introduces a lot of degrees of freedom in how it might/should be interpreted. Moreover, it is weirder to support Arrays without also supporting Vectors, rather than vice versa