Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-Opa and optimizations #1271

Open
wants to merge 48 commits into
base: master
Choose a base branch
from
Open

N-Opa and optimizations #1271

wants to merge 48 commits into from

Conversation

doublestranded
Copy link
Contributor

@doublestranded doublestranded commented Feb 3, 2019

Hello Transitland - I’m submitting this PR after some time reflecting on the distance calculation problem for RSPs and shapes/stops without supplied distances in general. In particular, I was curious to see whether the Alevras paper in the comments could really be applied here. I’m not sure how this PR will fit with Transitland’s future plans, but right now it’s still more of a conceptual PR that shouldn’t be merged just yet. I hope it’s useful!

Some of the goals of this PR are:

  1. Improve the accuracy of distance calculation, at least in terms of the stop-segment assignment matching approach. The OTP/backtracking algorithm with an additional pre-selection heuristic seems pretty good (and performant!), but now I believe it is not the best tool for the job. One could construct a reasonable case that could thwart OTP; in fact the specs hide such a case if I’m not mistaken. The new approach “pulverizes” (for lack of a better term) the route line into smaller segments so that the “N-OPA” algorithm can make more granular assignments. I think the pulverization can be applied in such a way as to make the solution exact, and I’m thinking through that. Currently, the line is pulverized into same-size segments using an error value, and that’s probably very inefficient.
  2. Clean up a lot of the code in the distance calculation / geometry module. A lot of the source of the convolution came from handling what I called “inverted” matches - stops that matched to the same segment, but out of order in their nearest points. Fortunately, it appears the N-OPA algorithm handles these, so there’s no need to hack their resolution on the fly (and miss some more complicated cases of inversion). I’ve also done some restructuring to decouple the “pure” matching from Transitland’s data model and rules about quality and first/last stop matching.
  3. Optimize performance. I’ve noticed a few areas that can be improved (e.g, loading and memoizing an RSP’s stops from SQL?). I’m hoping more progress can be made on the application of N-OPA and performance. There’s a few options, such as keeping a pre-selection heuristic to ensure only “complex” route lines need N-OPA, writing N-OPA in a faster language like C or Go, or adding a heuristic within N-OPA taking advantage of some mathematical properties, or all of the above. It probably seems strange that an heuristic algorithm would be replaced with another, but it’s still better to base the heuristic on an algorithm that’s more exact; or at least can be configured to be exact. I wouldn’t merge this just yet, as performance is still not great.

@drewda
Copy link
Member

drewda commented Feb 14, 2019

Hi @doublestranded! Great to hear that you've been continuing to think about and work on these hard route geometry problems.

I'll have a look through your PR -- and will ping @irees to do so as well -- as time allow.

In terms of practical effects, what would this PR change externally: Would releasing it mean that current RSPs should be recalculated? Would the distances in SSPs need to be recalculated?

Finally, re your option 3: we've actually been starting to move some of the heaviest GTFS import steps to Go -- it's not yet ready for public GitHub, but glad to chat more about that with you some time if you're interested.

@irees
Copy link
Member

irees commented Feb 15, 2019

@doublestranded this is great - I am reading through the approach now. The new data importer currently uses simple linear interpolation to add distances and missing values, but when it is ready we can definitely implement this approach there as well.

@doublestranded
Copy link
Contributor Author

Thank you for taking a look @drewda and @irees! That's a great question about re-calculating both the RSP and SSPs. I say there's no rush to do that, even once this is complete and merged. Distances errors are still very theoretical right now (one slightly broken spec doesn't mean much in my opinion), but I do suspect they are some somewhere with such a large repository of data. Assuming continual imports of feeds when changed, manual imports feels a little unnecessary. I'm doing some local imports and investigation (when time permits for me as well) to test; if I start finding more legitimate errors with the existing algorithm, I'd say reconsider. Another thing to do might be to alert the Valhalla team of an impending merge.

I'm glad you mentioned that @irees - that's exactly what I had in mind. This is a problem in general with the shapes and stops without distances given, and the code should appropriately reflect the layers of abstraction involved.

@drewda As far as imports, thanks for the heads ups on the switch to Go. I probably don't need to be in the loop for now, but I'd be curious to see it when completed. I'll let you know of any progress with improving performance and finding inaccuracies.

@doublestranded
Copy link
Contributor Author

doublestranded commented Jun 13, 2019

@drewda @irees - I’m starting to wrap up this PR. I’ve been testing it on some feeds, and fixing small bugs here and there. So far so good.

For performance, I noticed that there was a bug in the way I was making (unnecessary) stack calls in the N-Opa code, and fixing that greatly improved performance (!). It’s possible there are other improvements to be made, but imports seem to run in reasonable time.

I held off on implementing an “exact” solution. That solution, in my thinking at least, would have involved “pulverizing” each and every segment at all stops’ closest points to the segment, instead of breaking the segments based on a uniform small size. This would have involved an extra sorting step that would have impacted performance. Instead, I opted for a computation that ensures that each stop gets an assignment, and that segment sizes are small enough to account for stops being close together.

I also simplified some of the logic regarding first stops and last stops. I felt that treating these two cases differently than the other stops was a too much of a judgement call, and one that conflicted with the new algorithm’s ability to compute the surrounding stops accurately. This necessitated updating a few of the tests to meet then new expectations.

Finally, I ran into some issues with Ruby versions and gem dependencies - you may already be aware? I initially suspected Ruby 2.3 was causing problems with the builds, so I tried merging in the PR for Ruby 2.5 (it's probably best to go ahead and upgrade). Then some of the convex hull computations broke. From what I could tell, there are some issues transforming coordinates when proj4 is activated in RGeo (locally the computations match the test expectations). Ultimately, I copied out the buffer method from RGeo so we can bypass the transformations (I’ll provide a link in the comments). I think the best solution might turn out to be upgrading RGeo to the latest, along with its companion gems. Because one of those gems is ‘activerecord-postgis-adapter’, an upgrade to Rails 5 looks to be in order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants