Mitmproxy-based automated failure testing #2119

lithp · 2018-04-19T22:18:16Z

Some slides explaining how this works: https://docs.google.com/presentation/d/1zxF32GvcFJp0s6UClL39hla9WqhbGvYgo30e2WYnADQ/edit?usp=sharing

A modernized #2044. This one works at the network level instead of the process level. A drawback is: we don't have a good way of exposing timing problems. An example might be, maybe if a packet comes in before the process is ready for it the process crashes. That class of bugs seems rare to me, so I think this approach is safe to move forward with.

It's also much easier! This doesn't require sudo, just an extra daemon which packets flow through. It also has the advantage of not caring at all how the process runs, so the same tests should work modulo any refactoring which we choose to do.

Some work left to be done:

lithp · 2018-04-25T02:54:14Z

COPY with a mkfifo works great, but it relies on #2127 to somehow be fixed.

codecov · 2018-04-27T02:38:12Z

Codecov Report

Merging #2119 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #2119      +/-   ##
=========================================
+ Coverage    93.7%   93.7%   +<.01%     
=========================================
  Files         110     110              
  Lines       27446   27453       +7     
=========================================
+ Hits        25719   25726       +7     
  Misses       1727    1727

lithp · 2018-05-02T00:13:02Z

It's finally running (albeit in a rather hacky way) and passing on Travis!

lithp · 2018-05-02T20:40:20Z

Rebased onto master

lithp · 2018-05-02T20:42:22Z

Clearing out some outdated TODOs:

rebase onto master
Jason brought up a good point: We could remove the dependency on plperlu by using COPY to write to our fifo.
The fifo location is hard-coded, use some kind of path which always works. Maybe make it a relative one?
Does the travis postgres have plperlu? Is that really something we want to be adding a dependency on?
Break out the tests into a new make target: make check-failure
Somehow install mitmproxy on travis
Have pg_regress_multi.pl start the mitmproxy daemon.
Maybe we can work around COPY inside plpgsql functions fails after first invocation? #2127 by using EXECUTE so it plans the query afresh every time?

lithp · 2018-05-02T20:59:30Z

Cancellation testing

The proxy sits at the network layer instead of acting directly on the binary. This means it's more stable under refactoring, but also means we have much less control over what the binary is doing when we take some action. If the action is sending a signal (to cancel the binary), it's likely that precise timing will be more important. We'll want to send a signal right at some critical section. So, maybe cancellation testing is better handled a different way.

For instance, we could embed macro calls like

SIGNAL_INJECTION_POINT("after_begin")

which raise a signal if the test script has previously called

citus.interrupt_at('after_begin')

With mitmproxy

If mitmproxy is told the name of the process, it could send a signal when asked to. We could embed a script like flow.contains(b"SELECT").interrupt(19234), which you would call with the result of pg_backend_pid().

lithp · 2018-05-02T21:00:01Z

Here's the failure testing portion of our 7.4 release testing plan.

lithp · 2018-05-02T22:28:31Z

Here's the traceback when the race condition happens:

Traceback (most recent call last):
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/bin/mitmdump", line 11, in <module>
    sys.exit(mitmdump())
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/tools/main.py", line 155, in mitmdump
    m = run(dump.DumpMaster, cmdline.mitmdump, args, extra)
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/tools/main.py", line 122, in run
    master.run()
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/master.py", line 93, in run
    self.tick(0.1)
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/master.py", line 109, in tick
    self.addons.handle_lifecycle(mtype, obj)
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/addonmanager.py", line 238, in handle_lifecycle
    self.trigger(name, message)
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/addonmanager.py", line 282, in trigger
    with safecall():
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/home/brian/.local/share/virtualenvs/regress-MQrw9tUE/lib/python3.5/site-packages/mitmproxy/addonmanager.py", line 60, in safecall
    tell = ctx.master.tell
AttributeError: 'NoneType' object has no attribute 'tell'

lithp · 2018-05-16T00:12:02Z

All the tests from our test document:

Fail some nodes while running a query, but all shards have at least one active placements
- Do this with real time executor
  - aggregates
  - simple joins
  - table fetching queries
  - functions
- Do this with task tracker executor queries (Look at the task tracker design document for identifying/testing various failure scenarios)
  - aggregates
  - simple joins
  - re-partition joins
- Do this with router executor queries
  - Aggregates
  - window functions
  - simple-key value lookups
  - recursively planned queries
Fail some nodes while running a query, one shard doesn’t have any active placements (N/A, I'm rolling this into the above test)
COPY on distributed tables with (1PC)
- Test with
  - hash
  - append
  - range
- Stop a worker before COPY starts;
  - If there is one replica; COPY should error out, placement should stay healthy.
  - If there are more than one replica; COPY should not error out, placements should be marked as INACTIVE. (This test is only applicable for hash distributed tables, otherwise the query errors out)
- Stop a worker during COPY;
  - If there is one replica; COPY should error out, placement should stay healthy.
  - If there are more than one replica; COPY should error out, placement should stay healthy.
- Stop a worker during COMMIT;
  - If there is one replica; COPY should give warning, some of the data already committed, placements should stay healthy.
  - If there are more than one replica; COPY should not error out, placement should be marked as INACTIVE.
  - If data is corrupted (i.e., 1 row is erroneous), COPY errors-out and no data is ingested to the workers
COPY on hash partitioned tables with default values (2PC):
- Stop a worker before COPY starts;
  - If there is one replica; COPY should error out, placement should stay healthy.
  - If there are more than one replica; COPY should not error out, placements should be marked as INACTIVE.
- Stop a worker during COPY;
  - If there is one replica; COPY should error out, placement should stay healthy.
  - If there are more than one replica; COPY should error out, placement should stay healthy.
- Stop a worker during COMMIT;
  - If there is one replica; COPY should give warning,
    - Wait show citus.recover_2pc_interval and see that prepared transactions are rollbacked on the workers via: SELECT run_command_on_workers('SELECT count(*) FROM pg_prepared_xacts');
    - You can mostly do that with gdb, stopping at StartRemoteTransactionCommit() on the calls after the first one
  - If there are more than one replica; COPY should not error out, placement should be marked as INACTIVE
    - I think this should be the expected behaviour
    - Citus errors out whole transaction and doesn’t mark any placements INVALID. (But, the problem is that we’re not consistent with the other cases such as when the worker is down at the beginning of the transaction )
    - Opened an issue for discussion Copy on hash distributed tables with 2PC enabled and replication factor > 1 doesn't behave consistently on failures #1933
Copy on reference tables
- Always use 2PC, never mark any placement as INACTIVE
  - Example: If a table has unique constraint and we COPY the same row twice, COPY errors-out and no data is ingested to the workers and no placement is marked as inactive.
COPY on append partitioned failures
- Shard creation fails: Create table with composite type and don’t create composite type on one node. Verify COPY handles failure as expected.
- Min/Max fetching fails: In the above don’t create min/max function for composite type on workers. Verify COPY handles failure as expected .
- Test COPY in transaction & ROLLBACK
Fail nodes during master_create_empty_shard
- Fail one node and verify that it is able to create a shard. Verify metadata.
- Fail several nodes such that active nodes < shard_replication_factor and verify that no metadata is inserted.
- Test master_create_empty_shard in transaction & ROLLBACK
Fail nodes during master_append_table_to_shard
- Fail all nodes having the shard and verify that we error out and no metadata is added
- Fail some nodes having the shard and verify it is marked as invalid and metadata is updated accordingly
- Ctrl-C during master_append_table_to_shard
- Test master_append_table_to_shard in transaction & ROLLBACK
Fail nodes during master_apply_delete_command ( )
- Verify that the transaction fails, no metadata changes are happened and none of the shard placements are dropped at all
- Verify that pg_dist_shard_placement row is not deleted.
- Verify that shards on nodes which are up are not deleted and their metadata is not removed.
- Test master_apply_delete_command & ROLLBACK
Fail some nodes while running a subselect
- behaviour should be same as executor behaviour
  - Task-tracker
  - Real-time
  - Pull-push
    - Unusual failure handling for pull & push execution with replication > 1 #2048
- run a single table subquery, single node failure is tolerated, 2 node failure caused query to fail
Fail some nodes while inserting/updating/deleting rows
- Test BEGIN, inserting/updating/deleting rows, ROLLBACK
- Also test multiple shard inserts
Fail on create_distributed_table
- The whole transaction should be rollbacked in any case: With replication factor 1 or greater and with 2PC/1PC enabled, inside a transaction or not
- Test BEGIN; create_worker_shards(); ROLLBACK
Fail a node during create_distributed_table(..., colocate_with => '...')
- Test with all colocate_with options
Fail nodes during create_reference_table()
Fail nodes during DDL command propagation
- Verify that if node is failed before executing DDL command, then command can’t be executed.
- Verify that if node fails during execution or if you get a postgres error on a particular shard and not the others, we error out and ROLLBACK.
Fail nodes during truncate command
Fail nodes during insert into/select command
Fail nodes during insert into/select command (when data is pulled to master)
Fail nodes during add/remove/disable node with and without reference tables
Fail nodes during modification to reference tables. Whole transaction should fail, no data should be inserted and no metadata update should happen.
Fail a node during MX metadata sync
- Test during create_distributed_table, ddl changes, mark_tables_colocated, DROP TABLE
Fail a node during creating distributed table from local NON-EMPTY table
- Check metadata, check if workers have corrupted data etc.
Fail a node during creating distributed table from partitioned table
- Check metadata, check if workers have corrupted data etc.
Fail a node when creating a partition of a distributed-partitioned table
- Check metadata, check if workers have corrupted data etc.
Fail a MX node during DROP SEQUENCE
Fail a node while running table size functions
- This is probably okay. Size functions do not run when replication factor is greater than 1. They also fail when a single node is down.
Fail a node during ALTER EXTENSION citus UPDATE
Fail a node during master_add_node, master_activate_node, master_disable_node, master_add_inactive_node
- master_add_inactive_node should not error out
- check metadata of reference tables, corruption of reference table data, replication count in pg_dist_colocation
Fail a node during CREATE INDEX … CONCURRENTLY
- Issued created index concurrently
- Killed a worker process
Fail a node during ALTER TABLE ADD/DROP CONSTRAINT
Fail a node during ALTER TABLE RENAME COLUMN
Fail a node when creating a savepoint (or rollbacking to a savepoint)
- When a node fails, the whole transaction fails anyway. But, keep the test anyway.
Fail node when DISTINCT/COUNT DISTINCT running
- Fail a node when processing is on the worker
- Fail the coordinator when the processing is on the coordinator
Fail/kill maintenance daemon
- When the daemon is sleeping
- When the daemon is running call home
- When the daemon is running deadlock detection
- When the daemon is running recovery
Fail node while recover_prepared_statements() in progress
- Fail the coordinator
- Fail the workers
Fail node while running multi-shard UPDATE/DELETEs
- Test with both 1PC and 2PC
  - Fail coordinator
  - Fail worker

-- not exists in the release testing but let's try to get them as well

Shard rebalancer
Tenant isolation

lithp · 2018-05-16T00:38:27Z

Things to play with: citus.shard_count, citus.shard_replication_factor

lithp · 2018-05-16T00:47:04Z

Useful: get_shard_id_for_distribution_column(table-name, value)

lithp · 2018-05-16T00:49:44Z

Q: How do you tell which executor is being used, to know which one a specific join uses?
A: EXPLAIN tells you

lithp · 2018-05-16T01:38:00Z

There are additional tests here: https://docs.google.com/document/d/10mEwsYftun6ONoRCmtwlNcKvgdHEuGzqTuBnAZJuUag/edit

lithp · 2018-05-16T01:38:56Z

Can you add any test cases for this: #2031

lithp · 2018-05-18T01:41:46Z

src/test/regress/expected/failure_1pc_copy.out

+WARNING:  connection not open
+CONTEXT:  while executing command on localhost:57640
+COPY copy_test, line 1: "0, 0"
+ERROR:  failure on connection marked as essential: localhost:57640


I think this is a bug. If the other worker is accessible we shouldn't error out, we should just mark the placement inactive

lithp · 2018-05-18T01:42:18Z

src/test/regress/expected/failure_1pc_copy.out

+	This probably means the server terminated abnormally
+	before or while processing the request.
+CONTEXT:  while executing command on localhost:57640
+COPY copy_test, line 1: "0, 0"


Same here, the other worker is doing fine, this should mark the placement inactive and continue.

lithp · 2018-05-18T01:42:52Z

src/test/regress/expected/failure_1pc_copy.out

+(1 row)
+
+COPY copy_test FROM PROGRAM 'echo 0, 0 && echo 1, 1 && echo 2, 4 && echo 3, 9' WITH CSV;
+ERROR:  failed to COPY to shard 100400 on localhost:57640


And again here, it the COPY fails while some rows are being sent we error out, instead of marking the placement inactive.

onderkalaci · 2018-06-08T12:41:41Z

.travis.yml

@@ -39,7 +49,7 @@ install:
      sudo dpkg --force-confold --force-confdef --force-all -i *hll*.deb
    fi
 before_script: citus_indent --quiet --check
-script: CFLAGS=-Werror pg_travis_multi_test check
+script: CFLAGS=-Werror pg_travis_multi_test check-failure


Let's not forget this here :)

onderkalaci

Also noting the things we've discussed here:

Try to avoid the code changes in Citus [Brain]
Update the older tests (old APIs are used), try to avoid .source files and remove them if not a small change for now
Add tests for DDLs and real-time SELECTs [Onder]

onderkalaci · 2018-06-12T11:10:48Z

src/test/regress/mitmscripts/fluent.py

+import structs
+
+'''
+Use with a command line like this:


move to a readme under the folder

onderkalaci · 2018-06-12T11:16:22Z

src/test/regress/mitmscripts/fluent.py

+There are also some special commands. This proxy also records every packet and lets you
+inspect them:
+
+recorder.dump() - emits a list of captured packets in COPY text format


also add dump_networker_traffic / clear_network_traffic

onderkalaci · 2018-06-12T11:31:35Z

src/test/regress/pg_regress_multi.pl

@@ -275,6 +287,18 @@ sub revert_replace_postgres
 push(@pgOptions, '-c', "citus.remote_task_check_interval=1ms");
 push(@pgOptions, '-c', "citus.shard_replication_factor=2");
 push(@pgOptions, '-c', "citus.node_connection_timeout=${connectionTimeout}");
+push(@pgOptions, '-c', "citus.sslmode=disable");


Only do it when failre testing

onderkalaci · 2018-06-12T11:33:23Z

src/test/regress/pg_regress_multi.pl

@@ -178,6 +185,11 @@ ()
 MESSAGE
 }

+if ($useMitmproxy)


error out on check-full and check-failure

I've added check-failure to check-full.

I've also added a line item to the travis build matrix, on PG10 we test both the regular tests and also the failure tests

onderkalaci · 2018-06-12T11:44:01Z

src/test/regress/mitmscripts/fluent.py

+        with open(fifoname, mode='w') as fifo:
+            fifo.write('{}\n'.format(result))
+
+def replace_thread(fifoname):


maybe create_thread

onderkalaci · 2018-06-12T11:45:52Z

src/test/regress/mitmscripts/fluent.py

+        self.root = self
+        self.command = None
+
+    def dump(self, normalize_shards=True, dump_unknown_messages=False):


dump_unknown_messages maybe remove it?

onderkalaci · 2018-06-12T12:01:42Z

src/test/regress/sql/failure_test_helpers.sql

+CREATE FUNCTION citus.dump_network_traffic(
+  normalize_shards bool default true,
+  dump_unknown_messages bool default false
+) RETURNS TABLE(conn int, from_client bool, message text) AS $$


from_client bool use a text instead

It's now source text, and a lot easier to read!

onderkalaci · 2018-06-12T12:02:50Z

src/test/regress/Makefile

@@ -79,6 +79,10 @@ check-follower-cluster: all
 	$(pg_regress_multi_check) --load-extension=citus --follower-cluster \
 	-- $(MULTI_REGRESS_OPTS) --schedule=$(citus_abs_srcdir)/multi_follower_schedule $(EXTRA_TESTS)

+check-failure: all


If simple: stderr is directed to a file instead of stdout

already done

lithp · 2018-06-21T17:16:35Z

I've rewritten history so that this branch never messed with the citus version or added a .sql file for it, and instead always used test_helpers.sql. This made rebasing easy, and I've now rebased it onto master. @onderkalaci you should now be able to test sequential modifications!

lithp · 2018-06-21T20:50:14Z

I've removed the changes to prepared transaction id generation and reworked the test, failure_drop_table, to no longer rely on it. I've pushed the changes to a branch for posterity in case we want them back, I think the test is a lot easier to read with them.

onderkalaci · 2018-06-26T11:12:45Z

@onderkalaci you should now be able to test sequential modifications!

All looks good, I've added the tests in #2212. Once you address the feedback we've added together here, feel free to ping me. I'd like to have a final look to the changes.

Add tests for DDLs and real-time SELECTs [Onder]

I think I won't have time to add proper regression tests for real-time select failure / cancellation right now. I'm planning to tackle those during the release testing. For the other tests that are in the failure_schedule, feel free to pick one and add tests covering various edge cases. I'm OK to remove the other tests and let the whole team involve in writing all the tests mentioned here.

We should probably get #2212 and #2210 to master (plus one other extensive test written by you).

lithp · 2018-06-26T20:20:07Z

I've merged #2182 and rebased onto master so that this PR now includes 0 changes to citus code

lithp · 2018-06-27T01:03:20Z

@onderkalaci ready for you to review again :)

I think this is my only remaining work:

send mitmproxy's stderr to a file instead of the console
rewrite the tests to use the newer API (I think there are enough tests left for the team to write that it's not a problem to leave some of them in here)
review and merge DDL failure testing #2212 and Added failure test for create index concurrently #2210

onderkalaci

I've a slight preference to not add all the tests in this PR. I think it'd be better to concentrate on each test separately during the release testing. That said, feel free to not remove them, I can look a bit closer to each.

I think we're almost done, I'll have a final look & final test once you address the minor notes you've commented.

onderkalaci · 2018-06-28T13:43:54Z

src/test/regress/expected/failure_testing.out

@@ -0,0 +1,63 @@
+CREATE TABLE test (a int, b int);


do we need this file at all?

onderkalaci · 2018-06-28T13:47:17Z

src/test/regress/mitmscripts/README

@@ -0,0 +1,165 @@
+Automated Failure testing


Great readme, thanks a lot!

onderkalaci · 2018-06-28T13:49:07Z

src/test/regress/mitmscripts/fluent.py

+    def _handle(self, flow, message):
+        flow.kill() # tell mitmproxy this connection should be closed
+
+        client_conn = flow.client_conn # connections.ClientConnection(tcp.BaseHandler)


is connections.ClientConnection(tcp.BaseHandler) forgotten?

I can remove this, I think I added this while I was figuring out how to get the actual socket. This is just a note so I could remember what the type is client_conn is.

Changed the comment to something more clear

onderkalaci · 2018-06-28T13:59:18Z

src/test/regress/Pipfile.lock

@@ -0,0 +1,328 @@
+{


Do we need to include it Pipfile.lock? Isn't it something that is generated on the fly?

I don't fully understand the reasoning but it sounds like adding Pipfile.lock is recommended: pypa/pipenv#598 (comment)

I don't fully understand the reasoning

Having just read waaaay too much about the craptastic Python dependency-management universe (I do not understand why it is so Balkanized), we should definitely check this in. Pipfile has the general version constraints and Pipfile.lock has the specific versions that we're using which meet those constraints. We'd only want to not provide Pipfile.lock if we were developing a library, I think.

lithp · 2018-06-29T02:14:29Z

@onderkalaci addressed your feedback, removed all the tests (I'll open them again in new PRs, already opened #2244), sent all output to a file, and squashed, this is ready for you again!

onderkalaci

All looks good, after considering pretty minor comments

onderkalaci · 2018-07-02T07:56:35Z

.travis.yml

@@ -1,6 +1,8 @@
 sudo: required
 dist: trusty
 language: c
+python:


One minor note that I've not a good answer for now:

For example, test-automation repo comes with Python 2.7, and it's a bit painful to install 3.5 to execute the tests here. Also, any other 3.Xpython gives warnings etc starting the tests.

Is there anything we can do about relaxing the version checks? It seems not for now?

Hmm, interesting. This script hooks unfortunately deeply into the mitmproxy, which is why the version requirement is so specific.

One solution might be to move test-automation from 2.7 to 3.5.

Another would be to replace mitmproxy with our own proxy, we're not using very much of mitmproxy so this wouldn't be a huge change, probably only a week.

I've been using pyenv to install other Python versions and using pipenv on top of that to isolate all this. This works reliably well now.

onderkalaci · 2018-07-02T08:00:32Z

src/test/regress/mitmscripts/README

+
+# II. Running mitmproxy manually
+
+$ mkfifo /tmp/mitm.fifo  # first, you need a fifo


Maybe add cd src/test/regress/mitmscripts ?

onderkalaci · 2018-07-02T08:03:33Z

src/test/regress/mitmscripts/README

+means mitmdump will accept connections on port 9702 and forward them to the worker
+listening on port 9700.
+
+Now, open psql and run:


maybe add cd src/test/regress

onderkalaci · 2018-07-02T08:29:10Z

src/test/regress/mitmscripts/README

+You also want to tell the UDFs how to talk to mitmproxy (careful, this must be an absolute
+path):
+
+# SET citus.mitmfifo = '/tmp/mitm.fifo';


Shall we note that this is not actually a GUC, but, still Postgres allows us to read it? Though I'm now familiar with this, I was confused when I first saw/realize this fact.

Sure, I can add a quick note

onderkalaci · 2018-07-02T08:31:13Z

src/test/regress/Makefile

@@ -79,6 +79,10 @@ check-follower-cluster: all
 	$(pg_regress_multi_check) --load-extension=citus --follower-cluster \
 	-- $(MULTI_REGRESS_OPTS) --schedule=$(citus_abs_srcdir)/multi_follower_schedule $(EXTRA_TESTS)

+check-failure: all


already done

metdos · 2018-07-06T08:54:33Z

@lithp, when are we expecting to merge this?

- Lots of detail is in src/test/regress/mitmscripts/README - Create a new target, make check-failure, which runs tests - Tells travis how to install everything and run the tests

jasonmp85 · 2018-07-06T22:25:55Z

@metdos @lithp so we merged this but that checklist looks pretty un-checked. Are we going to track further outstanding work elsewhere?

metdos · 2018-07-08T09:36:24Z

@metdos @lithp so we merged this but that checklist looks pretty un-checked. Are we going to track further outstanding work elsewhere?

I added test conversion parts to this issue (#2262), do we have any plans for others @lithp?

lithp added the needs review label Apr 19, 2018

lithp mentioned this pull request Apr 25, 2018

Don't copy utilityStmt into the incorrect memory context #2128

Merged

lithp force-pushed the mitmproxy-failure-testing branch 4 times, most recently from 8a24953 to cade93b Compare April 26, 2018 19:54

lithp force-pushed the mitmproxy-failure-testing branch 3 times, most recently from cbfd90f to 05108bb Compare May 1, 2018 23:54

lithp force-pushed the mitmproxy-failure-testing branch from 05108bb to fedadb2 Compare May 2, 2018 20:40

lithp mentioned this pull request May 2, 2018

master_create_empty_shard does not retry if a failure happens after connection establishment #2145

Open

lithp force-pushed the mitmproxy-failure-testing branch from c1dd90f to aa71c3d Compare May 16, 2018 00:00

lithp commented May 18, 2018

View reviewed changes

lithp requested a review from onderkalaci May 18, 2018 01:43

onderkalaci reviewed Jun 8, 2018

View reviewed changes

onderkalaci changed the title ~~[WIP] Mitmproxy-based automated failure testing~~ Mitmproxy-based automated failure testing Jun 8, 2018

lithp force-pushed the mitmproxy-failure-testing branch from da1181a to 49ab26a Compare June 11, 2018 10:34

onderkalaci reviewed Jun 12, 2018

View reviewed changes

lithp force-pushed the mitmproxy-failure-testing branch 2 times, most recently from 09b40b5 to 8eea61c Compare June 21, 2018 17:14

lithp force-pushed the mitmproxy-failure-testing branch from 8eea61c to 9133912 Compare June 21, 2018 20:48

lithp force-pushed the mitmproxy-failure-testing branch from 9133912 to 466fed0 Compare June 26, 2018 20:17

onderkalaci reviewed Jun 28, 2018

View reviewed changes

lithp force-pushed the mitmproxy-failure-testing branch 3 times, most recently from 3121431 to 6c01440 Compare June 29, 2018 02:13

onderkalaci approved these changes Jul 2, 2018

View reviewed changes

network proxy-based failure testing

3e309e3

- Lots of detail is in src/test/regress/mitmscripts/README - Create a new target, make check-failure, which runs tests - Tells travis how to install everything and run the tests

lithp force-pushed the mitmproxy-failure-testing branch from 6c01440 to 3e309e3 Compare July 6, 2018 18:51

lithp merged commit a54f9a6 into master Jul 6, 2018

begriffs removed the needs review label Jul 6, 2018

metdos mentioned this pull request Jul 8, 2018

Automate Remaining Failure/Cancellation Tests #2262

Open

5 tasks

onderkalaci mentioned this pull request Oct 4, 2018

Failure tests for modifying multiple shards in txn #2411

Merged

onderkalaci mentioned this pull request Jun 26, 2019

Automate failure and cancellation testing #2001

Closed

serprex deleted the mitmproxy-failure-testing branch August 22, 2019 20:58


		# II. Running mitmproxy manually

		$ mkfifo /tmp/mitm.fifo # first, you need a fifo

Mitmproxy-based automated failure testing #2119

Mitmproxy-based automated failure testing #2119

Conversation

lithp commented Apr 19, 2018 • edited Loading

lithp commented Apr 25, 2018

codecov bot commented Apr 27, 2018 • edited Loading

Codecov Report

lithp commented May 2, 2018

lithp commented May 2, 2018

lithp commented May 2, 2018

lithp commented May 2, 2018

Cancellation testing

With mitmproxy

lithp commented May 2, 2018

lithp commented May 2, 2018 • edited Loading

lithp commented May 16, 2018 • edited by furkansahin Loading

lithp commented May 16, 2018

lithp commented May 16, 2018

lithp commented May 16, 2018 • edited Loading

lithp commented May 16, 2018

lithp commented May 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onderkalaci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp commented Jun 21, 2018

lithp commented Jun 21, 2018

onderkalaci commented Jun 26, 2018

lithp commented Jun 26, 2018

lithp commented Jun 27, 2018

onderkalaci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp commented Jun 29, 2018 • edited Loading

onderkalaci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metdos commented Jul 6, 2018

jasonmp85 commented Jul 6, 2018

metdos commented Jul 8, 2018

lithp commented Apr 19, 2018 •

edited

Loading

codecov bot commented Apr 27, 2018 •

edited

Loading

lithp commented May 2, 2018 •

edited

Loading

lithp commented May 16, 2018 •

edited by furkansahin

Loading

lithp commented May 16, 2018 •

edited

Loading

lithp commented Jun 29, 2018 •

edited

Loading