N-Opa and optimizations #1271

doublestranded · 2019-02-03T03:57:18Z

Hello Transitland - I’m submitting this PR after some time reflecting on the distance calculation problem for RSPs and shapes/stops without supplied distances in general. In particular, I was curious to see whether the Alevras paper in the comments could really be applied here. I’m not sure how this PR will fit with Transitland’s future plans, but right now it’s still more of a conceptual PR that shouldn’t be merged just yet. I hope it’s useful!

Some of the goals of this PR are:

Improve the accuracy of distance calculation, at least in terms of the stop-segment assignment matching approach. The OTP/backtracking algorithm with an additional pre-selection heuristic seems pretty good (and performant!), but now I believe it is not the best tool for the job. One could construct a reasonable case that could thwart OTP; in fact the specs hide such a case if I’m not mistaken. The new approach “pulverizes” (for lack of a better term) the route line into smaller segments so that the “N-OPA” algorithm can make more granular assignments. I think the pulverization can be applied in such a way as to make the solution exact, and I’m thinking through that. Currently, the line is pulverized into same-size segments using an error value, and that’s probably very inefficient.
Clean up a lot of the code in the distance calculation / geometry module. A lot of the source of the convolution came from handling what I called “inverted” matches - stops that matched to the same segment, but out of order in their nearest points. Fortunately, it appears the N-OPA algorithm handles these, so there’s no need to hack their resolution on the fly (and miss some more complicated cases of inversion). I’ve also done some restructuring to decouple the “pure” matching from Transitland’s data model and rules about quality and first/last stop matching.
Optimize performance. I’ve noticed a few areas that can be improved (~~e.g, loading and memoizing an RSP’s stops from SQL?~~). I’m hoping more progress can be made on the application of N-OPA and performance. There’s a few options, such as keeping a pre-selection heuristic to ensure only “complex” route lines need N-OPA, writing N-OPA in a faster language like C or Go, or adding a heuristic within N-OPA taking advantage of some mathematical properties, or all of the above. It probably seems strange that an heuristic algorithm would be replaced with another, but it’s still better to base the heuristic on an algorithm that’s more exact; or at least can be configured to be exact. I wouldn’t merge this just yet, as performance is still not great.

https://www.ruby-lang.org/en/news/2018/10/18/ruby-2-5-3-released/

…till need skip stops

drewda · 2019-02-14T04:37:12Z

Hi @doublestranded! Great to hear that you've been continuing to think about and work on these hard route geometry problems.

I'll have a look through your PR -- and will ping @irees to do so as well -- as time allow.

In terms of practical effects, what would this PR change externally: Would releasing it mean that current RSPs should be recalculated? Would the distances in SSPs need to be recalculated?

Finally, re your option 3: we've actually been starting to move some of the heaviest GTFS import steps to Go -- it's not yet ready for public GitHub, but glad to chat more about that with you some time if you're interested.

irees · 2019-02-15T18:26:56Z

@doublestranded this is great - I am reading through the approach now. The new data importer currently uses simple linear interpolation to add distances and missing values, but when it is ready we can definitely implement this approach there as well.

doublestranded · 2019-02-19T03:16:08Z

Thank you for taking a look @drewda and @irees! That's a great question about re-calculating both the RSP and SSPs. I say there's no rush to do that, even once this is complete and merged. Distances errors are still very theoretical right now (one slightly broken spec doesn't mean much in my opinion), but I do suspect they are some somewhere with such a large repository of data. Assuming continual imports of feeds when changed, manual imports feels a little unnecessary. I'm doing some local imports and investigation (when time permits for me as well) to test; if I start finding more legitimate errors with the existing algorithm, I'd say reconsider. Another thing to do might be to alert the Valhalla team of an impending merge.

I'm glad you mentioned that @irees - that's exactly what I had in mind. This is a problem in general with the shapes and stops without distances given, and the code should appropriately reflect the layers of abstraction involved.

@drewda As far as imports, thanks for the heads ups on the switch to Go. I probably don't need to be in the loop for now, but I'd be curious to see it when completed. I'll let you know of any progress with improving performance and finding inaccuracies.

doublestranded · 2019-06-13T03:32:20Z

@drewda @irees - I’m starting to wrap up this PR. I’ve been testing it on some feeds, and fixing small bugs here and there. So far so good.

For performance, I noticed that there was a bug in the way I was making (unnecessary) stack calls in the N-Opa code, and fixing that greatly improved performance (!). It’s possible there are other improvements to be made, but imports seem to run in reasonable time.

I held off on implementing an “exact” solution. That solution, in my thinking at least, would have involved “pulverizing” each and every segment at all stops’ closest points to the segment, instead of breaking the segments based on a uniform small size. This would have involved an extra sorting step that would have impacted performance. Instead, I opted for a computation that ensures that each stop gets an assignment, and that segment sizes are small enough to account for stops being close together.

I also simplified some of the logic regarding first stops and last stops. I felt that treating these two cases differently than the other stops was a too much of a judgement call, and one that conflicted with the new algorithm’s ability to compute the surrounding stops accurately. This necessitated updating a few of the tests to meet then new expectations.

Finally, I ran into some issues with Ruby versions and gem dependencies - you may already be aware? I initially suspected Ruby 2.3 was causing problems with the builds, so I tried merging in the PR for Ruby 2.5 (it's probably best to go ahead and upgrade). Then some of the convex hull computations broke. From what I could tell, there are some issues transforming coordinates when proj4 is activated in RGeo (locally the computations match the test expectations). Ultimately, I copied out the buffer method from RGeo so we can bypass the transformations (I’ll provide a link in the comments). I think the best solution might turn out to be upgrading RGeo to the latest, along with its companion gems. Because one of those gems is ‘activerecord-postgis-adapter’, an upgrade to Rails 5 looks to be in order.

drewda and others added 30 commits September 21, 2018 10:22

upgrade to Ruby 2.5.1

afaf8e7

Merge branch 'master' into ruby-2.5.1

f6a35dc

Ruby 2.5.3 is now the latest

e6d7adb

https://www.ruby-lang.org/en/news/2018/10/18/ruby-2-5-3-released/

fixing query

b1183ef

clarification to algo description

f5297de

WIP ripping out inverted segment logic temporarily

814ad93

denestify some conditionals

1fa8eb1

pulverize line segments

521f23d

WIP major refactor

b3a8a8e

more fixes, including to straight line dist

25b1ecb

prepare_stop_distances method

62f1761

moving methods

94d059d

fix to fallback

cafbf84

disable test because it might be overzealous

67a0b36

match_array_within

b998f86

modifying hdpt test bc pulverize provides more accurate result

17a120e

another spec adjustment from pulverize; adjusting pulverize e term

c5b4878

checks all stops for outlier

2d3b126

WIP initial replacement of EnhancedOTP with OpapWc; minor refactor; s…

14044a0

…till need skip stops

fixes

ca4be32

adjusting test to remove judgement call

d68a972

syntax fix

a146e8a

implement complex condition and remove otp algorithm code

39c0b59

disable test - need to match to the same segment

59b0c75

going back to larger e value in pulverize

35e6312

nil guard; remove complex check

e9633c3

rename class

e7ff1a6

refactor tl entities out of DistanceCalculation

3d70307

simplify complicated first/last assignment logic

4692bfa

Merge branch 'optimizations' into opap-wc

ac231c2

doublestranded added 4 commits December 29, 2018 22:45

adjust spec

1d2b87b

avoid cartesian point, rename n_opa

3a11a32

adjust spec

e0bf095

leniency for spec

b9cef49

doublestranded added the in progress label Feb 3, 2019

doublestranded added 2 commits February 5, 2019 21:28

Gemfile fix

a02cc96

drop memoize, be selective

a120ff2

doublestranded added 12 commits March 9, 2019 14:28

bug fix and optional first pass heuristic

31a5ece

fixing wrong initialize in update_computed_attributes

eea24a6

handling more edge cases of outlier stops

82396cb

n_opa version bump

ec826b7

stop locators only needed in first pass

89557e6

Merge branch 'ruby-2.5.1' into optimizations

d74358a

Merge branch 'master' into optimizations

747136f

better choice of pulverize error, simplify first/last stops

9fd8b39

various fixes related to ruby 2.5

3e56b76

gem will not be published

1d8d45e

adjusting tests to match results of new strategy and rounding errors

5b7acb0

temp buffer method for single operator stop convex hull

cdd5ad4

doublestranded requested review from irees and drewda June 19, 2019 00:38

doublestranded removed the in progress label Oct 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N-Opa and optimizations #1271

N-Opa and optimizations #1271

doublestranded commented Feb 3, 2019 •

edited

Loading

drewda commented Feb 14, 2019

irees commented Feb 15, 2019

doublestranded commented Feb 19, 2019

doublestranded commented Jun 13, 2019 •

edited

Loading

N-Opa and optimizations #1271

Are you sure you want to change the base?

N-Opa and optimizations #1271

Conversation

doublestranded commented Feb 3, 2019 • edited Loading

drewda commented Feb 14, 2019

irees commented Feb 15, 2019

doublestranded commented Feb 19, 2019

doublestranded commented Jun 13, 2019 • edited Loading

doublestranded commented Feb 3, 2019 •

edited

Loading

doublestranded commented Jun 13, 2019 •

edited

Loading