-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitter refactor #2762
Splitter refactor #2762
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @evansiroky, this seems like an important fix. I'm not completely sure I understand all the context, so I've added a few comments at the points where questions arise. If you can answer these questions I'll be able to give a more complete review.
Also, as I understand it, this is not a recent problem in OTP, so is present in both 1.x and 2.x. If so I will port the eventual fix over to 2.x upon merge.
|
||
//We build a spatial index if it isn't provided | ||
if (hashGridSpatialIndex == null) { | ||
// build a nice private spatial index, since we're adding and removing edges | ||
idx = new HashGridSpatialIndex<>(); | ||
for (StreetEdge se : Iterables.filter(graph.getEdges(), StreetEdge.class)) { | ||
// Note: Although it's up to the caller of idx.insert() to make it thread-safe, we don't need to | ||
// worry about it here as it's inside the constructor of SimpleStreetSplitter | ||
// and it's guaranteed that there is only one thread doing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify how it is guaranteed that only one thread can call this constructor at once. It does look like it's only called in single-threaded contexts. The rules about graph updaters running sequentially in a single thread provide some guarantee there. But someone could easily call this constructor in a multi-threaded context. Maybe update the "note" in the Javadoc on this constructor to specifically say it must be called in a single thread only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I didn't actually author this comment, so I have to agree with you here. I copied this verbatim from here: https://github.com/opentripplanner/OpenTripPlanner/pull/2596/files#diff-ff5b71da473283e2fbb823607004cf6c.
After looking this over rather thoroughly, it appears that even though there are 2 "notes" that say only 1 SimpleStreetSplitter should be in use per graph in the Javadoc of this class (1 in each constructor), this SimpleStreetSplitter class is being recreated numerous times on the same graph in various places, thus creating the potential for multiple inconsistent street indexes in the splitter. Given this issue, I am going to significantly revise this PR.
But even with these revisions, I still foresee that it would be easy for someone to perform thread-unsafe write modifications to a HashGridSpatialIndex instance despite there being notes in comments in various places about how you're not supposed to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy for someone to do it, but if they read the Javadoc on the classes and methods they are using, they should be warned not to do it. There are a lot of non-threadsafe data structures, and it's generally expected that someone will consult the documentation before mutating a data structure in multi-threaded code. That said, it doesn't hurt to add checks where feasible that contracts are being respected.
this SimpleStreetSplitter class is being recreated numerous times on the same graph in various places, thus creating the potential for multiple inconsistent street indexes in the splitter.
I'm glad you spotted this fact, this clarifies somewhat what's going on here. It seems like this problematic code has been around for a long time in the bike parking updaters. Apparently the warning about having only one instance was not visible enough, and/or was not taken seriously because there was no explanation of why it was important to only have one. The warning should probably be moved from the SimpleStreetSplitter constructor to the class-level Javadoc, made more visible, and/or expanded with some explanation. And possibly duplicated on the constructor for good measure.
I think that class definitely needs some additional documentation also explaining what "destructive" vs. "non-destructive" splitting is, since it's not just about destroying edges, but also about whether they are indexed and therefore visible to subsequent searches and splitting operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that destructive/non-destructive split documentation has been added in cf74d81. Will review there.
@@ -96,30 +101,26 @@ | |||
|
|||
private static GeometryFactory geometryFactory = GeometryUtils.getGeometryFactory(); | |||
|
|||
//If true edges are split and new edges are created (used when linking transit stops etc. during graph building) | |||
//If false new temporary edges are created and no edges are deleted (Used when searching for origin/destination) | |||
private final boolean destructiveSplitting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the core change, and quite a good one. The splitter itself is not configured to do one kind of splitting - each call must specify whether it's destructive or not. Can you confirm for any reviewers that this is the core idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the newly refactored code, the major purpose of this PR is to attempt to have only one StreetSplitter instantiated per graph. Since there's only one splitter per graph, the StreetSplitter must be able to accept an argument to indicate the type of street splitting that should occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: the main purpose of the PR is still the same right, to resolve the bug #2758? Do you mean you've adopted a different approach to fixing that bug, which involves ensuring there is a single spatial index of discoverable edges (including split edges)?
src/main/java/org/opentripplanner/graph_builder/linking/SimpleStreetSplitter.java
Show resolved
Hide resolved
src/main/java/org/opentripplanner/graph_builder/linking/SimpleStreetSplitter.java
Show resolved
Hide resolved
src/main/java/org/opentripplanner/graph_builder/linking/SimpleStreetSplitter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opentripplanner/graph_builder/linking/SimpleStreetSplitter.java
Show resolved
Hide resolved
- rename SimpleStreetSplitter to StreetSplitter as it is the only StreetSplitter in use - add some checks to make sure duplicate indexing doesn't occur on the same graph - refactor some areas where duplicate indexing was occurring
I have gone ahead and refactored this even more. The problem that was causing #2758 was that numerous StreetSplitter instances were being created on the same graph. The Javadoc for both constructors of the SimpleStreetSplitter class (now renamed to StreetSplitter) clearly stated:
In order to try to prevent regressions, I have added a check to make sure only one StreetSplitter is instantiated per graph. Once I implemented that check, numerous test cases failed. So I then thoroughly reviewed and fixed the offending code. I reviewed code that interacts with the StreetSplitter class and decided to do some cleanup of removing code that is no longer referenced by any other code and also removing code paths that will never get executed. |
src/main/java/org/opentripplanner/graph_builder/module/PruneFloatingIslands.java
Show resolved
Hide resolved
src/main/java/org/opentripplanner/graph_builder/module/PruneFloatingIslands.java
Show resolved
Hide resolved
@@ -682,6 +682,11 @@ public static Graph load(File file) throws IOException { | |||
* TODO: do we really need a factory for different street vertex indexes? | |||
*/ | |||
public void index (StreetVertexIndexFactory indexFactory) { | |||
if (streetIndex != null) { | |||
throw new UnsupportedOperationException( | |||
"A streetIndex has already been defined. Can't you just reuse the existing one?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a snarky error message here, as I don't think there would be a legitimate reason to re-index a graph, but maybe I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one of the checks to ensure that constraints are respected (that there's only one spatial index created per graph)? If so it would be good to explain to the person seeing the message why they should just reuse the existing one, and why it's unwise to create another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a check that was added to prevent the creation of more StreetIndexes in downstream code. However, I honestly don't know if there are benefits to reindexing a graph, so I could use some guidance here.
src/main/java/org/opentripplanner/routing/impl/StreetVertexIndexServiceImpl.java
Show resolved
Hide resolved
src/main/java/org/opentripplanner/graph_builder/module/NearbyStopFinder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the responses @evansiroky, I am reviewing this again, with a better understanding that it's fixing the problem reported in #2758 by cherry-picking changes originally contributed in #2596. The first time around I had not fully grasped that this was a batch of changes from another author.
I added some free-form comments before starting a proper review, so sorry the comments are a bit scattered around. Unfortunately this is not a complete review yet but I thought I should pass on the comments and questions I have so far, because I need to wrap up for the day.
My main observation is that it's quite hard to review this PR due to the very limited information available, both here and in the PR that was the source of some of the changes. I've already spent over four hours today just trying to comprehend and articulate what I'm looking at. I think this is another case that illustrates the importance of explanations provided in pull requests by the original authors of the code.
Comments, starting with PR #2596 which was the source of the cherry-pick: These fixes were originally submitted last summer, seemingly in the context of work for Coord, which provides a bikeshare information API. The description of that PR does not really provide any detail. It just says it "adds new features and fixes some bugs" with a cursory list of 5 bullet points. In retrospect this is why that PR remains unreviewed and unmerged: there's just not enough information to evaluate it. One of the bullet points on the original source PR (apparently relevant to the PR at hand) is "Make BikeParkUpdater and BikeRentalUpdater use the same street splitter as the one used by the Graph as opposed to creating their own splitter". It does not state clearly that this is a bugfix, what bug it fixes, nor the mechanism by which the bug is resolved, anywhere in the code, PR comments, or commit messages. I am guessing from your comments that you have figured out issue #2758 is caused by the existence of multiple street splitters, and that cherry-picking this change resolves the issue by using a single street splitter, but still, nowhere is the mechanism of action clearly stated and elaborated upon.
The cherry-pick commit has the message "Cherry-pick street splitting improvements from Coord branch" but I'm finding myself reverse-engineering those third-party changes to determine whether these are verifiably "improvements" from the perspective of the issue being confronted, and from the perspective of all other use cases of OTP. This is not a small task, because this single cherry-picked commit changes 51 files with 1,476 additions and 401 deletions, with a cursory commit message of "Coord changes to OpenTripPlanner". It's hard to tell what else might have changed and what the repercussions will be.
However, looking more closely however, I see that your commit 4ae9647 does not appear to be a full cherry-pick in the git sense of the referenced commit cadbeb3. In fact it looks much smaller. Can you confirm whether this is a carefully selected subset of the changes in the referenced commit? Are you using the expression "cherry-pick" in a looser/more general sense or is this commit the result of a git cherry-pick operation?
Turning to this PR itself (rather than that source PR): the description says "refactor the splitting of street edges which was previously not correctly adding the split edges back into the StreetIndex". I think this statement needs to be greatly expanded upon, both in the PR and within OTP itself. Of course the lack of detail on these subjects is a pre-existing problem, but since people are already thinking through the source of the problem and how these components are intended to behave, that information should be recorded as much as possible. Taking that step now could easily save multiple days of work in the future.
I see that some comments have already been added, but these are my observations while reviewing the code changes: Within OTP (via Javadoc on methods etc.) we should clarify exactly when split edges are indexed and when they are not, and why they are sometimes indexed and sometimes not. And we need to explain in a prominent location why it's important that there be only one splitter. My understanding is that it's because the splitter holds the spatial index of edges that are candidates for linking and splitting, and if you have multiple splitters, you end up with multiple such indexes.
The use of synchronized blocks to increase the thread safety of spatial index writes has raised some additional questions for me - I added a few comments inline but will have to look into that in more detail tomorrow.
for (StreetEdge se : Iterables.filter(graph.getEdges(), StreetEdge.class)) { | ||
idx.insert(se.getGeometry(), se); | ||
// place a lock while doing this to ensure thread-safe writes to the index | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more clear to synchronize on the index instance rather than the StreetSplitter that contains it. This synchronization does resolve the concern about multiple updaters possible writing to the index at once. I'm not sure whether it's safe to read from the index while writing to it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, this is not locking individual writes to the index, it's locking the entire process of creating the index. Is there actually a risk of multiple threads creating indexes at once on a single StreetSplitter? This is also inside a constructor. How would multiple threads be running a constructor on the same instance?
//If true edges are split and new edges are created (used when linking transit stops etc. during graph building) | ||
//If false new temporary edges are created and no edges are deleted (Used when searching for origin/destination) | ||
private final boolean destructiveSplitting; | ||
// used to keep track of splitters associated with graphs to make sure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems incomplete. I'm a little hesitant about maintaining static state to enforce a simple usage constraint but I guess it gets the job done. There might however be some possible race conditions where multiple threads create StreetSplitters on the same graph at the same time (though perhaps in practice that can't happen with graph updater runnables, if their initialization is guaranteed to be done sequentially).
LineString geometry = e.getGeometry(); | ||
if (geometry == null) { | ||
continue; | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this postSetup method intended to be called from multiple threads? The presence of a synchronized block implies that it is. It seems to index the entire graph though, which I wouldn't expect to be done more than once.
@@ -25,6 +22,8 @@ | |||
import org.slf4j.Logger; | |||
import org.slf4j.LoggerFactory; | |||
|
|||
import static org.opentripplanner.graph_builder.linking.StreetSplitter.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we should avoid wildcard imports.
} | ||
|
||
/** | ||
* Tests whether the proper edge is linked in a request where the original StreetEdge has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand Javadoc to explain what is meant by the "proper edge". I'm thinking the idea is that once you split the edge to insert a bike rental station, the resulting sub-edges should be seen rather than the pre-split entire edge.
I will update the original PR description to try to elaborate more on the changes being proposed.
Basically, yes. In my work coding up #2680, I copy-pasted a lot of code from that PR specifically related to the bikeshare regions which allows the creation of floating bikeshare rentals that can be dropped off anywhere inside the region. Some time later, once #2758 was reported, I remembered that I also saw some changes to the street splitting in #2596 and upon further examination, it looked like that subset of changes would lead to the resolution of #2758. This PR is a culmination of that.
Yes to both. I did use a
I will update the original PR description to try to elaborate more on the changes being proposed.
That's correct, multiple splitters were being created and thus the newly created edges that were split were not being added back into a "master" index. I was thinking about this a little further and was wondering if it would make sense to have a splitter and index be contained within the
I honestly don't have a whole lot of experience/knowledge about making code threadsafe, so I might have been too liberal in my application of using synchronized code blocks, so comments are appreciated. |
This might be its own issue. Right now the graph is indexed when starting up. Should this happen during the graph build and then be serialized? I think this would help a lot with startup time. |
I just added a few changes to this PR that combines a few unnecessary files together and refactors the Graph.index method to more tightly manage the creation of StreetVertexIndexServices. I wrote more details about these changes in an edited version of this PR description. |
This was a trade-off between storage space, load time, and overall startup time. When we used the Java serialization serialized data could be really big and take a long time to load - it was much faster to recreate it on load. In addition, since the index is derived data, always reconstructing it from the source data could be considered good form. So taking all those reasons together it was clearly better to reconstruct the index at the time. The conclusion may be different now. |
The street splitter has been refactored in OTP2 where the original edge is restored when the rental vehicle is no longer there. Also, there hasn't been any development on the 1.5 branch for a few years. Therefore I'm closing this issue. If that was a mistake, please re-open. |
To be completed by pull request submitter:
closes #45
).To be completed by @opentripplanner/plc:
This PR seeks to do the following:
Graph.index
method so that it manages the creation of StreetVertexIndexService instances.Fix bug #2758
#2758 is a bug wherein edges split by updaters such as bike or car rental updaters were not being used as starting or ending edges in trip plans. The issue is primarily the result of numerous StreetSplitters being created which each contain their own StreetIndex. The updaters would keep their version of the StreetIndex updated, but the StreetIndex used in routing requests was not updated.
Refactor StreetSplitter class
The solution being proposed in this changeset is to add in a mechanism to obtain a single source of truth StreetSplitter. There are also additional checks added in that throw errors if this process is attempted to be circumvented. An initial attempt to resolve this was included as a subset of #2596. This PR brings in that subset of changes and expands upon that in an attempt to prevent this issue from happening in the future. A test was also added to demonstrate the bug and prevent future regressions.
Refactor and clean up stuff that's related to the StreetSplitter
This includes the following:
TransitToStreetNetworkModule.java
as it was never used in any codepathsNearbyStopFinder.java
,StreetLinkerModule.java
andTransitToTaggedStopsModule.java
to ensure the use of only one StreetSplitter is used per graphStreetVertexIndexServiceImpl.java
to contain the single source of truth StreetSplitter and also remove some unused codepathsBikeParkUpdater.java
andBikeRentalUpdater.java
to reuse an existing StreetSplitterInitialStopsTest.java
,LinkStopToPlatformTest.java
,FakeGraph.java
,LinkingTest.java
,TestIntermediatePlaces.java
,TestOpenStreetMapGraphBuilder.java
,StateEditorTest.java
andGraphSerializationTest.java
) that either created numerous StreetSplitters on the same graph or re-indexed the graph a bunch. I'm not 100% sure if a re-index of the graph is actually needed though, but it was recreating new StreetSplitters so I removed it. All tests still pass so ¯_(ツ)_/¯ .Refactor the
Graph.index
methodA work-in-progress commit of this PR resulted in a bug being thrown for times when a graph was being built over wire. See ibi-group#11. In order to resolve that issue a further refactor of the
Graph.index
method was done in order to manage the creation of StreetVertexIndexService instances within this method. This method was the only place where StreetVertexIndexService instances were created and a lot of code was being duplicated with TODOs in a number of places asking why this is being done. So, I ended up doing a deep dive and looked at the whole build process and each of the places where theGraph.index
method was being called. I discovered that there were 3 use cases for rebuilding the index.Use case 1: Building a new graph
Firstly, during graph building, the StreetVertexIndexService is required to be used, but not all modules need the data produced from the indexes in order to work properly. See below for a table detailing how the data is used within each graph builder module.
Use case 2: indexing data after loading a pre-built graph
The second use case involves loading a graph. In this case, the graph has already been built, so a fresh streetIndex is created and will be initialized with all of the data that is needed.
Use case 3: re-indexing data and performing routing queries in the same program immediately after building a graph
The third problematic use case involves building a graph and then immediately using that graph after it is built. In this case, some vertices and edges created during the StreetLinkerModule and DirectTransferGenerator may not be in every index made available via running of the Graph.index method. Therefore, in this case, it does make sense to reindex the graph (including the streetIndex) so all edges are indexed properly. The risk for using 2 StreetSplitters might still exist, however, in practice, most future StreetSplits would occur in updaters such as the BikeRentalUpdater which would not use the StreetVertexIndexService created during graph build.
Refactor the creation of StreetVertexIndexService so it doesn't have an interface and factory
Another pain point as noted in several existing code comments was the use of a Factory and interface for the StreetVertexIndexService. I went ahead and removed all the factory classes and refactored the StreetVertexIndexService class to have all the implementing code that was needed.