-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Multi-indexing fails in a very specific case #249
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the conciseness of the unit test, but is it possible to make it clearer why we expect the state we're asserting on? It's difficult to know what the test is about without a bunch of external context.
I also added an extra assertion, which I hope makes things clearer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Fixes: #248, in a wild turn of events, my projected 3 part PR series has grown to 4. I found this bug while working on #245. The fix is inversely proportional to the amount of time it took to diagnose and recreate the issue in a unit test..
I had to mock the method which gives the available multi-index for an edge in the graph because it would not give me the same behaviour as I was seeing with a larger dataset. The problem happens when this method outputs an available multi-index that is sandwiched by multi-indices already being used on the graph. In this small example it would always give me the largest index being used +1, which did not recreate the error.
Stripped from all the context, the problem was setting a dataframe with an output of a groupby (of the same dataframe) which has inherited indexing from the groupby operation. This did not match the indexing of the original dataframe, so when trying to set it back in, it resulted in loss of data.