fix: FsNeo4jCSVLoader fails if nodes have disjoint keys #408
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of Changes
I discovered a bug where
FsNeo4jCSVLoader
fails if two nodes of the same type have the same number of attributes, but with different names. This happened to us forDashboardChart
, due to this code:amundsendatabuilder/databuilder/models/dashboard/dashboard_chart.py
Lines 58 to 77 in d91c0c5
The root cause of this is that
FsNeo4jCSVLoader
names CSV files by the number of keys the node has. My new test demonstrates the problem. When the first node is loaded, a CSV is created with a column forjob
. On attempting to load the second node, the loader fails because it cannot find a column in the CSV namedpet
.This fixes the problem by making the file key dependent on the actual set of record keys. I actually tried two implementations:
fieldset -> ID
, and assigns increasing IDs.I went with the second to avoid excessively long filenames.
Tests
I added a unit test which catches the bug. You can check out the "Add failing test" commit and run
make test
to observe it.Documentation
N/A
CheckList
Make sure you have checked all steps below to ensure a timely review.
make test