Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

v1.15 upgrade broken ipv6 replication config (ipv6 literals) #7695

Open
grinapo opened this issue Jun 14, 2020 · 8 comments
Open

v1.15 upgrade broken ipv6 replication config (ipv6 literals) #7695

grinapo opened this issue Jun 14, 2020 · 8 comments
Labels
z-bug (Deprecated Label) z-regression (Deprecated Label)

Comments

@grinapo
Copy link

grinapo commented Jun 14, 2020

Description

v1.13.x workers config broken after v1.15.0 upgrade. Server dies in mysterious ways.

2020-06-14 23:49:11,767 - synapse.http.client - 283 - INFO - replication-POSITION-27- Sending request GET http://::1:None/_synapse/replication/get_repl_stream_updates/federation/XSBWfTbBUR?from_token=0&upto_token=2
2020-06-14 23:49:11,767 - synapse.http.client - 330 - INFO - replication-POSITION-27- Error sending request to  GET http://::1:None/_synapse/replication/get_repl_stream_updates/federation/XSBWfTbBUR?from_token=0&upto_token=2: URLParseError expected integer for port, not ':1:None'
2020-06-14 23:49:11,768 - synapse.metrics.background_process_metrics - 215 - ERROR - replication-POSITION-27- Background process 'replication-POSITION' threw an exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/hyperlink/_url.py", line 993, in from_text
    port = int(port)
ValueError: invalid literal for int() with base 10: ':1:None'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/synapse/metrics/background_process_metrics.py", line 213, in run
    return (yield result)
hyperlink._url.URLParseError: expected integer for port, not ':1:None'

So far it has been working as:

worker_app: synapse.app.synchrotron
worker_replication_host: '::1'
worker_replication_port: 9092

Apart from it may have require worker_replication_http_port now (not sure though) the …_host is completely broken. I can try various valid combos without success, like

worker_replication_host: '[::1]'

but then it stupidly tries to interpret it as a hostname. I did not find a working solution, so I had to downgrade all connections to ipv4.

Steps to reproduce

  • use ipv6 literals in replication
  • nothing works

Version information

  • Homeserver: grin.hu
  • Version: 1.15.0
  • Install method: Debian (non NV)
  • Platform: debian/buster
@clokep
Copy link
Member

clokep commented Jun 15, 2020

Thanks for the bug report! I was able to reproduce this in sytest by running in worker mode with a command like:

docker run --rm -it -e POSTGRES=true -e WORKERS=true -v /Users/clokep/matrix/synapse\:/src:ro -v /Users/clokep/matrix/sytest/logs\:/logs -v /Users/clokep/matrix/sytest\:/sytest:ro matrixdotorg/sytest-synapse:py35 tests/10apidoc/35room-typing.pl

After modifying sytests to bind to ::1:

Details
diff --git a/lib/SyTest/Homeserver/Synapse.pm b/lib/SyTest/Homeserver/Synapse.pm
--- a/lib/SyTest/Homeserver/Synapse.pm
+++ b/lib/SyTest/Homeserver/Synapse.pm
@@ -719,7 +719,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.pusher",
          "worker_pid_file"         => "$hsdir/pusher.pid",
          "worker_log_config"       => $self->configure_logger("pusher"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_listeners"        => [
@@ -745,7 +745,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.appservice",
          "worker_pid_file"         => "$hsdir/appservice.pid",
          "worker_log_config"       => $self->configure_logger("appservice"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_listeners"        => [
@@ -771,7 +771,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.federation_sender",
          "worker_pid_file"         => "$hsdir/federation_sender.pid",
          "worker_log_config"       => $self->configure_logger("federation_sender"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_listeners"        => [
@@ -797,7 +797,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.synchrotron",
          "worker_pid_file"         => "$hsdir/synchrotron.pid",
          "worker_log_config"       => $self->configure_logger("synchrotron"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_listeners"        => [
@@ -831,7 +831,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.federation_reader",
          "worker_pid_file"         => "$hsdir/federation_reader.pid",
          "worker_log_config"       => $self->configure_logger("federation_reader"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_listeners"        => [
@@ -865,7 +865,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.media_repository",
          "worker_pid_file"         => "$hsdir/media_repository.pid",
          "worker_log_config"       => $self->configure_logger("media_repository"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_listeners"        => [
@@ -933,7 +933,7 @@ sub wrap_synapse_command
          "worker_app"              => "synapse.app.user_dir",
          "worker_pid_file"         => "$hsdir/user_dir.pid",
          "worker_log_config"       => $self->configure_logger("user_dir"),
-         "worker_replication_host" => "$bind_host",
+         "worker_replication_host" => "::1",
          "worker_replication_port" => $self->{ports}{synapse_replication_tcp},
          "worker_replication_http_port" => $self->{ports}{synapse_unsecure},
          "worker_listeners"        => [

Although I tried to bisect what broke this and now I'm not able to reproduce it anymore...

@clokep clokep added the z-bug (Deprecated Label) label Jun 15, 2020
@clokep
Copy link
Member

clokep commented Jun 15, 2020

Looks like this got broken in v1.14.0 due to #7517, was able to reproduce with the following:

docker run --rm -it -e POSTGRES=true -e WORKERS=true -v /Users/clokep/matrix/synapse\:/src:ro -v /Users/clokep/matrix/sytest/logs\:/logs -v /Users/clokep/matrix/sytest\:/sytest:ro matrixdotorg/sytest-synapse:py35 tests/10apidoc/12device_management.pl

@clokep
Copy link
Member

clokep commented Jun 16, 2020

From #synapse-dev:matrix.org:

I guess this has actually always been broken, just that likely their set up didn't use replication http pokes before
we changed the replication protocol so that we requested missing updates via http rather than in band on the tcp connection
https://github.com/matrix-org/synapse/blob/master/synapse/replication/http/_base.py#L191 is the offending line

@clokep
Copy link
Member

clokep commented Jun 16, 2020

I think this is python-hyper/hyperlink#68, which is also the cause of #4092 it seems?

@erikjohnston
Copy link
Member

Maybe, though I'll note that we don't correctly construct the URL in the first place as we don't enclose IPv6 literals in [..]. Having done that we may still run into the issue linked

@clokep
Copy link
Member

clokep commented Jun 16, 2020

@erikjohnston I attempted putting it directly in the config (as @grinapo suggested in the description) and then ran into the hyperlink issue.

I also tried making the URL that we construct a byte string instead (as that goes through a slightly different code path). I'm not convinced that really made anything different though.

@clokep
Copy link
Member

clokep commented Aug 18, 2020

I think #4478 is a duplicate of this, although that has a workaround:

A workaround is using ip6-localhost in the URL.

@AluisioASG
Copy link

I see a similar issue to #4478 when configuring an appservice with an IPv6 literal address in the url registration field. Using ip6-localhost instead works.

synapse.appservice.api: [as-sender-494a742717f3d068f2e73680da4c2366d8d489df6ce56a050e85fed8c599d097-0] push_bulk to http://[::1]:54554/transactions/1 threw exception Codepoint U+003A at position 1 of '::1' not allowed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label) z-regression (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

4 participants