Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connections inside a container shown as going between containers #1733

Closed
monowai opened this issue Jul 30, 2016 · 7 comments
Closed

connections inside a container shown as going between containers #1733

monowai opened this issue Jul 30, 2016 · 7 comments
Assignees
Labels
bug Broken end user or developer functionality; not working as the developers intended it
Milestone

Comments

@monowai
Copy link

monowai commented Jul 30, 2016

The container view shows an incorrect link between Rabbit and Riak.
image

Both services do use Erlang but they are independent containers that are otherwise unaware of each other.
image

@rade
Copy link
Member

rade commented Jul 30, 2016

Would you mind attaching the report ("</>" button in bottom right corner) for this?

@monowai
Copy link
Author

monowai commented Jul 30, 2016

@rade rade added the bug Broken end user or developer functionality; not working as the developers intended it label Jul 30, 2016
@rade
Copy link
Member

rade commented Jul 30, 2016

Thanks. I've managed to reproduce this with

docker run -d rabbitmq
docker run -d lapax/riak

Both of these run a separate epmd process inside the container, and the main erlang beam process should connect to that. Based on my quick investigation I reckon the connection from the riak beam is mis-attributed as going to the rabbitmq epmd instead of the riak epmd.

And I've just reproduced the same kind of mis-attribution with two alpine containers that each run

nc -l -p 1122 &
nc 127.0.0.1 1122

Thanks for reporting this.

@rade rade changed the title Incorrect container relationship involving Erlang processes connections inside a container shown as going between containers Jul 30, 2016
@rade rade added this to the July2016 milestone Jul 30, 2016
@rade
Copy link
Member

rade commented Aug 4, 2016

@paulbellamy's and my theory here is that connection endpoints in containers are generally just identified by IP and port. That is fine (and indeed required) for overlay networks, but clearly wrong for localhost connections inside a container.

So a (hopefully) quick fix would be to include the container id in the identity of localhost connection endpoints.

@2opremio 2opremio self-assigned this Aug 8, 2016
@2opremio
Copy link
Contributor

2opremio commented Aug 8, 2016

@paulbellamy's and my theory here is that connection endpoints in containers are generally just identified by IP and port. That is fine (and indeed required) for overlay networks, but clearly wrong for localhost connections inside a container.

We also use PIDs for persistent connections (like this ones). proc-based tracking (which provides PIDs) and conntrack-based tracking run in parallel, and it could be that we don't correctly prioritize proc-tracked connections.

Also, even if the proc-tracked connections were used, we would probably be able to reproduce this with short-lived (conntrack-based) connections, so we need to handle the loopback interface specially.

@2opremio
Copy link
Contributor

2opremio commented Aug 8, 2016

Actually, host-scoping aside, the problem adheres to the theory from @rade and @paulbellamy

After reproducing the problem in the way suggested by @rade :

[...] two alpine containers that each run

nc -l -p 1122 &
nc 127.0.0.1 1122

I inspected the Endpoint topology and found:

screen shot 2016-08-08 at 4 52 11 pm

vagrant-ubuntu-wily-64 is the hostname of my (docker host) machine. So, connections are keyed with the hostname/ip/port, which, for loopback container connections is not good enough.

This causes a key-clash between the processes listening on 127.0.0.1:1122 making the containers fight for the PID entry in the LatestMap and always causing a connection across the containers (both clients are identified as talking to a single server).

Two possible solutions are:

  • Append the net namespace inode to Endpoint node key
  • Append the PID to the Endpoint node keys (the namespace inode is only available when we know the PID anyways).

Also, in case it's not done already, we should make sure that loopback interface IPs are discarded when tracking short-live connections since they cannot be uniquely attributed to a container (related #1260)

@2opremio
Copy link
Contributor

2opremio commented Aug 8, 2016

Two possible solutions are:

  • Append the net namespace inode to Endpoint node key
  • Append the PID to the Endpoint node keys (the namespace inode is only available when we know the PID anyways).

@rade pointed out offline that we can only do it for loopback connections (since the PID/namespace scope won't match for connections across hosts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Broken end user or developer functionality; not working as the developers intended it
Projects
None yet
Development

No branches or pull requests

3 participants