You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although it is expected, I'd like to open an issue to keep track of.
For replication factor 2, even if a worker is down, Citus successfully handles failures and returns the result. It looks like the decision we've taken for pull & push execution breaks that. See references [1], [2]
See the steps:
-- create a distributed table with replication factor = 2SETcitus.shard_replication_factor TO 2;
CREATETABLEusers_table (user_id int, timetimestamp, value_1 int, value_2 int, value_3 float, value_4 bigint);
SELECT create_distributed_table('users_table', 'user_id');
-- generate some random dataINSERT INTO users_table SELECT (i * random())::int % 10000, timestamp'2014-01-10 20:00:00'+
random() * (timestamp'2014-01-20 20:00:00'-timestamp'2014-01-10 10:00:00'),(i * random())::int % 10000, (i * random())::int % 10000, (i * random())::int % 10000FROM generate_series(0, 10000) i;
-- stop one of the workers
pg-latest/bin/pg_ctl -D citus-installation/data/-m i stop
-- run a real-time query, it'll get the resultsSELECTcount(*) FROM users_table ;
WARNING: connection error: 10.192.0.174:5432
DETAIL: could not connect to server: Connection refused
Is the server running on host "10.192.0.174"and accepting
TCP/IP connections on port 5432?
WARNING: connection error: 10.192.0.174:5432-- some more warnings
count
-------10001
(1 row)
-- now, run the same query via pull & push executionSELECT*FROM (SELECTcount(*) FROM users_table OFFSET 0) as foo;
DEBUG: generating subplan 51_1 for subquery SELECTcount(*) AS count FROMpublic.users_table OFFSET 0
DEBUG: Plan 51 query after replacing subqueries and CTEs: SELECT count FROM (SELECTintermediate_result.countFROM read_intermediate_result('51_1'::text, 'binary'::citus_copy_format) intermediate_result(count bigint)) foo
WARNING: connection error: 10.192.0.174:5432
DETAIL: could not send data to server: Connection refused
could not send SSL negotiation packet: Connection refused
ERROR: failure on connection marked as essential: 10.192.0.174:5432
The reason is that we've marked the connections for pushing results back as critical, which leads to this issue.
The text was updated successfully, but these errors were encountered:
Although it is expected, I'd like to open an issue to keep track of.
For replication factor 2, even if a worker is down, Citus successfully handles failures and returns the result. It looks like the decision we've taken for
pull & push execution
breaks that. See references [1], [2]See the steps:
The reason is that we've marked the connections for pushing results back as critical, which leads to this issue.
The text was updated successfully, but these errors were encountered: