Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issues in GraphUpdaters #2545

Closed
abyrd opened this issue Apr 2, 2018 · 5 comments
Closed

Concurrency issues in GraphUpdaters #2545

abyrd opened this issue Apr 2, 2018 · 5 comments
Assignees
Labels
Stale This issue is stale, no activity for 90 days. Remove stale label or comment within 30 days. X OTP1 ~ Not in use any more ~ Fix or backport to the 1.x version of OTP
Milestone

Comments

@abyrd
Copy link
Member

abyrd commented Apr 2, 2018

On the opentripplanner-users mailing list, Combitrip reports a ConcurrentModificationException when starting up a GBFS bike rental updater:

14:27:31.296 ERROR (GraphUpdaterManager.java:153) Error while setting up updater org.opentripplanner.updater.bike_rental.BikeRentalUpdater:
java.util.ConcurrentModificationException: null
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[na:1.8.0_131]
	at java.util.HashMap$ValueIterator.next(HashMap.java:1466) ~[na:1.8.0_131]
	at org.opentripplanner.routing.graph.Graph.getEdges(Graph.java:337) ~[otp-1.2.0-shaded.jar:1.1]
	at org.opentripplanner.graph_builder.linking.SimpleStreetSplitter.<init>(SimpleStreetSplitter.java:106) ~[otp-1.2.0-shaded.jar:1.1]
	at org.opentripplanner.graph_builder.linking.SimpleStreetSplitter.<init>(SimpleStreetSplitter.java:123) ~[otp-1.2.0-shaded.jar:1.1]
	at org.opentripplanner.updater.bike_rental.BikeRentalUpdater.setup(BikeRentalUpdater.java:137) ~[otp-1.2.0-shaded.jar:1.1]
	at org.opentripplanner.updater.GraphUpdaterManager$1.run(GraphUpdaterManager.java:146) ~[otp-1.2.0-shaded.jar:1.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]

This led me to do a broad review of the GraphUpdater code and I have found several potential concurrency problems. Before going into detail on those, I will diagnose the particular exception that was reported. Based on the stack trace, we are modifying the map of vertices in the Graph while we are iterating over it.

The iteration over the vertices is happening because the BikeRentalUpdater has its own SimpleStreetSplitter, which creates its own HashGridSpatialIndex of every edge in the graph. The Graph provides a list of all its edges by concatenating the edge lists of all its vertices.

The modification of this vertex collection cannot be pinpointed from the stack trace, but since this is happening during graph updater startup, we might suspect the creation of bike rental vertices or other dynamically inserted vertices.

At first it was hard to imagine how these could be happening at the same time, because I had assumed that nothing multithreaded was going on here. But after a closer look I see that while the setup() and run() methods of each GraphUpdater are run sequentially, if there are multiple GraphUpdaters their setup() methods are being run simultaneously in different threads. Concurrent execution of the setup() methods seems unnecessary - we should probably ensure that all setup code runs single threaded and only when all setup has completed will the fetching/updating code be run concurrently in separate threads.

@abyrd abyrd self-assigned this Apr 2, 2018
@abyrd
Copy link
Member Author

abyrd commented Apr 2, 2018

Here are other observations I made while reviewing the GraphUpdater code:

  • Each instance of BikeRentalUpdater or BikeParkUpdater has its own SimpleStreetSplitter instance. Each SimpleStreetSplitter instance has its own HashGridSpatialIndex. So if you create two BikeRentalUpdaters and one BikeParkUpdater you get three identical indexes on every edge in the graph. If more than one piece of code needs this index, there should be a single shared instance (e.g. a transient lazy-initialized field on the Graph itself). Furthermore, the Javadoc on SimpleStreetSplitter says there should only be one instance of this class active at a time for a given Graph. So if we just make a single shared lazy-initialized SimpleStreetSplitter we could save a lot of memory and some startup time and eliminate any undefined behavior due to multiple simultaneously active street splitters.
  • BikeRentalUpdater uses BikeRentalStationService, which seems to have concurrent collection iteration and modification problems. An HTTP API seems to allow fetching a list of stations while that list is being updated.
  • The GraphUpdater interface Javadoc does not make it clear which of its methods will be run sequentially and which will be run in a separate thread. One might assume that the setup methods are called sequentially as the OTP server starts up. But in reality the setup() and run() methods are both run in a separate thread. This creates a lot more unpredictable situations, because one setup method can be running while another updater is already trying to change the graph, and setup methods could still be running when routing requests come in. We should change the system so all setup() methods are run in sequence in the calling thread. Javadoc on GraphUpdater should also be updated to make this clear.
  • All GraphUpdaters' run() methods are run only once and occupy an entire thread in the thread pool. Even the polling updaters never exit their run() method. Instead, the polling graph updaters sleep() most of the time in a loop in their run() methods. This only works with larger numbers of updaters because we're assigning them to a thread pool that's allowed to grow to any size. While not ideal, this system works but because it's a bit convoluted it should be better documented in the code. Calling thread.sleep() can be considered bad form in a production system. Ideally we should be telling a scheduler to execute a short-duration run() method a regular intervals. However we do want to keep the current behavior where the periodic data-fetching runnables are concurrent, but produce GraphWriterRunnables that are executed sequentially by another executor.
  • The Javadoc on our single-threaded ScheduledExecutorService for graph writing says it executes things periodically. However we don't seem to use the periodic scheduling functionality of the single threaded executor service at all. All the runnables are scheduled for immediate execution. This is the intended behavior, but the Javadoc (and perhaps the type of this field) should be updated accordingly.
  • The scheduler and updaterPool fields are initialized twice, once at the field definition and once in the constructor.
  • According to Javadoc on its interface, GbfsBikeRentalDataSource.update() should return true if any update happened. The logical operator between the multiple constituent update operations should be 'or' rather than 'and'.
  • BikeRentalUpdater inserts edges into the graph while searches are may be taking place on that same graph. It creates RentABikeOnEdge and RentABikeOffEdge instances, which causes addOutgoing and addIncoming to be called on the from and to vertices. However, there is synchronization on these methods and they use a copy-on-write strategy which means they are tolerant of being changed while a search is happening. There should be comments explaining this in the updater code though, to avoid a future maintainer digging through the code in search of a concurrency problem.

@abyrd
Copy link
Member Author

abyrd commented Apr 2, 2018

@laurentg could you give some feedback or comments on my observations above about GraphUpdaters?

@evansiroky
Copy link
Contributor

@abyrd would this issue be potentially resolved with #2762?

@t2gran t2gran added X OTP1 ~ Not in use any more ~ Fix or backport to the 1.x version of OTP OTP2 labels Jun 11, 2019
@abyrd
Copy link
Member Author

abyrd commented Oct 13, 2019

Taking another look at this one as we've been working with GraphUpdaters. Setup methods are now run sequentially in a single thread since commit 544e1b1, and Javadoc makes it more clear what runs sequentially (the setup methods) and what runs in parallel (the run methods, after setup is complete). So the original issue is resolved, but I would like to keep this issue open until all the other points identified in the code audit (in #2545 (comment) above) are checked out and resolved.

@abyrd abyrd added this to the 2.1 milestone Oct 13, 2020
@abyrd abyrd removed the OTP2 label Oct 13, 2020
@t2gran t2gran modified the milestones: 2.1, 2.2 Mar 14, 2022
@github-actions
Copy link

github-actions bot commented Jul 9, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days

@github-actions github-actions bot added the Stale This issue is stale, no activity for 90 days. Remove stale label or comment within 30 days. label Jul 9, 2022
@github-actions github-actions bot closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale This issue is stale, no activity for 90 days. Remove stale label or comment within 30 days. X OTP1 ~ Not in use any more ~ Fix or backport to the 1.x version of OTP
Projects
None yet
Development

No branches or pull requests

4 participants