-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for 1PC COPY on append and hash-distributed tables #2244
Conversation
3121431
to
6c01440
Compare
Codecov Report
@@ Coverage Diff @@
## master #2244 +/- ##
=========================================
- Coverage 93.78% 93.68% -0.1%
=========================================
Files 110 106 -4
Lines 27576 26638 -938
=========================================
- Hits 25861 24956 -905
+ Misses 1715 1682 -33 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one general open question for adding the failure testing is the following:
Do we need to add tests for all possible scenarios for all tests (e.g., #2212) including
* Replication factor 1 vs 2
* Parallel vs Sequential mode (if applicable)
* 1PC vs 2PC
* Cancel tests for each step as we do for failures
If we add all the tests, we'd end up with a huge duplication since we'd be exercising the same code-paths over and over and over again. On the other hand, we're going add these tests only once and we'd be able forget about almost all network failure scenarios if we have all those tests.
What would be your thoughts on this? I'm also curious what @marcocitus would think about that.
SELECT * FROM citus.dump_network_traffic(); | ||
conn | source | message | ||
------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
0 | coordinator | [initial message] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that this network dump is super useful and have it during development or adding new tests is very useful.
But, travis sometimes fail due to place of + 0 | coordinator | ['Terminate()']
.
Some random idea to not get into this state in general: Shall we (or could we) filter only the packets that we're going to test from dump_network_traffic ()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, travis sometimes fail due to place of + 0 | coordinator | ['Terminate()'].
This is a bug that should be fixed. I guess it's a timing issue. mitmdump should ignore any packets from connections which existed before the last time the buffer was cleared. I can fix it tomorrow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the packet trace for now.
CONTEXT: while executing command on localhost:57640 | ||
ERROR: failure on connection marked as essential: localhost:57640 | ||
SELECT * FROM pg_dist_shard s, pg_dist_shard_placement p | ||
WHERE (s.shardid = p.shardid) AND s.logicalrelid = 'copy_test'::regclass; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add ORDER BY
to all queries selecting from pg_dist*
to be on the safe side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, pushed a new commit where we always sort by placementid
.
Hey @marcocitus, any ideas on this? We will start to write automated tests next week, so if you have some comments, please let us know. |
At some point adding a lot more failure tests using the current framework will have diminishing returns, while there are other areas that are not yet covered. In particular network timeouts and network failures behind a pgbouncer (e.g. BEGIN results in a More on topic: The way 1PC vs. 2PC works in COPY is not different than in other parts of the code. It might make sense to dedicate a test file specifically to commit/abort failures in 1PC and 2PC, while not doing commit-time fault-injection in other tests. Some code paths with different failure handling logic:
Ideally, we would test all of them with replication vs. no replication, and test cancellation. I don't think there are other dimensions we need to consider. |
6c01440
to
3e309e3
Compare
@lithp, could you rebase this to the master and make ready to merge? |
Are these tests checked in #2262? |
Just noting that other tests are blocked on this for Travis, so it would be good if prioritize this over others :) |
(4 rows) | ||
|
||
-- the COPY data fails | ||
SELECT citus.mitmproxy('conn.onCopyData().killall()'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add one more step
SELECT citus.mitmproxy('conn.onCommandComplete(command="COPY 4").kill()');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a lot of value in this test because, from the coordinator's POV, it looks exactly the same as the conn.onCopyData().killall()
test, but I've just pushed a commit which adds it.
@lithp Could you follow-up on the action-items and close this PR out? |
@onderkalaci I've rebased on master and addressed, I think, all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are pretty comprehensive, and all looks good, after considering two minor comments
(Also, I think you could consider merging the ephemeral port PR first and update the test outputs here).
|
||
-- ==== okay, run some tests where there's only one active shard ==== | ||
COPY copy_test FROM PROGRAM 'echo 0, 0 && echo 1, 1 && echo 2, 4 && echo 3, 9' WITH CSV; | ||
SELECT * FROM copy_test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Shall we add ORDER BY
? Though the data comes from a single shard, there is a possibility of ending up with a different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
CONTEXT: while executing command on localhost:57640 | ||
WARNING: connection not open | ||
CONTEXT: while executing command on localhost:57640 | ||
ERROR: failure on connection marked as essential: localhost:57640 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty minor: this test took some time for me to digest. Since we've not printed placements just after the successful COPY
operation 2 lines before, I thought oh, we're erroring out but also creating the placements successfully, what's happening
😄
So, could be useful to print the placements just before starting the test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, added!
81a9493
to
c23bd5c
Compare
No description provided.