Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panicked at 'Removing an unknown connection' #61

Closed
theprogrammer67 opened this issue Nov 29, 2017 · 24 comments
Closed

panicked at 'Removing an unknown connection' #61

theprogrammer67 opened this issue Nov 29, 2017 · 24 comments
Labels

Comments

@theprogrammer67
Copy link

sometimes the program terminates with an error:

2017-11-29 20:43:02.240 ERROR TcpConnection: 10.0.0.2:48290 -> 13.107.3.128:443 Cannot read: [ConnectionReset] Удаленный хост принудительно разорвал существующее подключение. (os error 10054)
2017-11-29 20:43:02.241 INFO TcpConnection: 10.0.0.2:48290 -> 13.107.3.128:443 Close
thread 'main' panicked at 'Removing an unknown connection', src\libcore\option.rs:839:4
note: Run with RUST_BACKTRACE=1 for a backtrace.

What is the reason for this?

@rom1v rom1v self-assigned this Nov 29, 2017
@rom1v rom1v added the bug label Nov 29, 2017
@rom1v
Copy link
Collaborator

rom1v commented Nov 29, 2017

Thank you for your report.

Is it on the last release (2.1), or on the dev branch?

@rom1v
Copy link
Collaborator

rom1v commented Nov 29, 2017

I fail to reproduce, and by reading the code, for now I can't think of any execution leading to remove an unknown connection.

Does it always happen after a Cannot read: [ConnectionReset]? Do you have more context?

@theprogrammer67
Copy link
Author

release 2.1
I did not find the law. Sometimes after 10 minutes, sometimes after 2 hours:

2017-11-30 11:03:43.102 ERROR TcpConnection: 10.0.0.2:50815 -> 13.107.3.128:443 Cannot read: [ConnectionReset] Удаленный хост принудительно разорвал существующее подключение. (os error 10054)
2017-11-30 11:03:43.102 INFO TcpConnection: 10.0.0.2:50815 -> 13.107.3.128:443 Close
thread 'main' panicked at 'Removing an unknown connection', src\libcore\option.rs:839:4
note: Run with RUST_BACKTRACE=1 for a backtrace.

@theprogrammer67
Copy link
Author

In my program I will restart after an error.
I also need a "clean" launch command, without the adb command.
Also add myself to gnirehtet.exe

@theprogrammer67
Copy link
Author

I'm developing a Windows service based on your software
Thank you for your work!

@rom1v
Copy link
Collaborator

rom1v commented Nov 30, 2017

I did not find the law.

OK, the error log seems consistent. i will think about what the cause could be.

I also need a "clean" launch command, without the adb command.
Also add myself to gnirehtet.exe

What do you mean?

I did not understand.

@theprogrammer67
Copy link
Author

I again inattentive. Everything is already there:
impl Command for RelayCommand {
fn command (& self) -> & 'static str {
"relay"
}

@rom1v rom1v removed their assignment Feb 6, 2018
rom1v added a commit that referenced this issue Mar 6, 2018
In a single poll() result, the client may be ready to receive a packet
from the device, that will be sent to a specific network connection, and
in the same time that connection may be ready to send a packet to the
client.

There is a possibility that both events will lead to close the
connection, and remove it from the router. In that case, the second
removal did panic.

Instead, remove it only if it does not exist.

See:
 - <#61>
 - <#85>
@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Fixed on dev branch. See #85 (comment)

@rom1v rom1v closed this as completed Mar 6, 2018
@aruediger
Copy link

aruediger commented Mar 6, 2018

Running latest master (350e39b) on OS X 10.12.6 I get:

2018-03-06 13:42:59.962 ERROR TcpConnection: 10.0.0.2:4001 -> XXX Cannot read: [ConnectionRefused] Connection refused (os error 61)
2018-03-06 13:42:59.962 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
thread 'main' panicked at 'Removing an unknown connection', src/libcore/option.rs:891:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic
   4: std::panicking::begin_panic_fmt
   5: core::panicking::panic_fmt
   6: core::option::expect_failed
   7: relaylib::relay::router::Router::remove
   8: <F as relaylib::relay::selector::EventHandler>::on_ready
   9: relaylib::relay::relay::Relay::run
  10: <gnirehtet::RunCommand as gnirehtet::Command>::execute
  11: gnirehtet::main
  12: std::rt::lang_start::{{closure}}
  13: main

@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Running latest master

It's fixed on dev, not on master.

@aruediger
Copy link

Oops. Thanks! Now with dev I get:

2018-03-06 13:52:49.418 ERROR TcpConnection: 10.0.0.2:4001 -> XXX Cannot read: [ConnectionRefused] Connection refused (os error 61)
2018-03-06 13:52:49.418 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
2018-03-06 13:52:49.428 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
thread 'main' panicked at 'Token not found', src/libcore/option.rs:891:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic
   4: std::panicking::begin_panic_fmt
   5: core::panicking::panic_fmt
   6: core::option::expect_failed
   7: relaylib::relay::relay::Relay::run
   8: <gnirehtet::RunCommand as gnirehtet::Command>::execute
   9: gnirehtet::main
  10: std::rt::lang_start::{{closure}}
  11: main

@aruediger
Copy link

...as well as these:

2018-03-06 13:58:21.356 ERROR TcpConnection: 10.0.0.2:4001 -> XXX Cannot read: [ConnectionRefused] Connection refused (os error 61)
2018-03-06 13:58:21.356 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
2018-03-06 13:58:21.360 INFO TcpConnection: 10.0.0.2:4001 -> XXX Open
2018-03-06 13:58:21.384 ERROR TcpConnection: 10.0.0.2:4001 -> XXX Cannot read: [ConnectionRefused] Connection refused (os error 61)
2018-03-06 13:58:21.384 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
2018-03-06 13:58:21.389 INFO TcpConnection: 10.0.0.2:4001 -> XXX Open
2018-03-06 13:58:21.750 ERROR TcpConnection: 10.0.0.2:4001 -> XXX Cannot read: [ConnectionRefused] Connection refused (os error 61)
2018-03-06 13:58:21.750 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
2018-03-06 13:58:21.770 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
thread 'main' panicked at 'invalid key', slab-0.4.0/src/lib.rs:681:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic
   4: <slab::Slab<T>>::remove
   5: relaylib::relay::relay::Relay::run
   6: <gnirehtet::RunCommand as gnirehtet::Command>::execute
   7: gnirehtet::main
   8: std::rt::lang_start::{{closure}}
   9: main

@rom1v rom1v reopened this Mar 6, 2018
rom1v added a commit that referenced this issue Mar 6, 2018
This reverts commit 10d5e32.

This does not fix the root cause, so the panic still happens.

<#61 (comment)>
@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Thank you @2Beaucoup, I reverted the patch.

I need to reproduce it myself. How do you reproduce it often enough?

@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

@2Beaucoup If you can reproduce the problem easily, could you please test the debug-issue-61 branch, and give me the logs when it crashes.

Thank you :)

@aruediger
Copy link

aruediger commented Mar 6, 2018

Sure. :)

...
2018-03-06 20:20:01.594 DEBUG Router: connection already closed on send_to_network? 10.0.0.2:4001 -> VVV false
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> VVV handle_first_packet()
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> VVV Initialized seq=152906120; ack=2679289588
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> VVV State = SynSent
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> VVV interests: Writable
2018-03-06 20:20:01.594 DEBUG Selector: event=Event { kind: Readable, token: Token(1) }
2018-03-06 20:20:01.594 INFO TcpConnection: 10.0.0.2:4001 -> YYY Open
2018-03-06 20:20:01.594 DEBUG Router: connection already closed on send_to_network? 10.0.0.2:4001 -> YYY false
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY handle_first_packet()
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY Initialized seq=3482950945; ack=963613825
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY State = SynSent
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY interests: Writable
2018-03-06 20:20:01.594 DEBUG Selector: event=Event { kind: Writable, token: Token(146) }
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY State = SynReceived
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY Forging empty response (flags=18) (seq=3482950945, ack=963613825)
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY Acking (seq=3482950945, ack=963613825)
2018-03-06 20:20:01.594 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY interests: (empty)
2018-03-06 20:20:01.594 DEBUG Selector: event=Event { kind: Writable, token: Token(1) }
2018-03-06 20:20:01.596 DEBUG Selector: event=Event { kind: Writable | Error | Hup, token: Token(148) }
2018-03-06 20:20:01.596 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX State = SynReceived
2018-03-06 20:20:01.596 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX Forging empty response (flags=18) (seq=4252117833, ack=2644448553)
2018-03-06 20:20:01.596 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX Acking (seq=4252117833, ack=2644448553)
2018-03-06 20:20:01.596 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX interests: (empty)
2018-03-06 20:20:01.596 DEBUG Selector: event=Event { kind: Writable, token: Token(1) }
2018-03-06 20:20:01.598 DEBUG Selector: event=Event { kind: Readable, token: Token(1) }
2018-03-06 20:20:01.598 DEBUG Router: connection already closed on send_to_network? 10.0.0.2:4001 -> YYY false
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY Receiving expected packet 963613825 (flags=16)
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY Client acked 3482950946
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY handle_ack()
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY State = Established
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY interests: Readable
2018-03-06 20:20:01.598 INFO TcpConnection: 10.0.0.2:4001 -> ZZZ Open
2018-03-06 20:20:01.598 DEBUG Router: connection already closed on send_to_network? 10.0.0.2:4001 -> ZZZ false
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> ZZZ handle_first_packet()
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> ZZZ Initialized seq=2708259458; ack=1278256463
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> ZZZ State = SynSent
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> ZZZ interests: Writable
2018-03-06 20:20:01.598 DEBUG Selector: event=Event { kind: Readable, token: Token(146) }
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY Packet (44 bytes) sent to client (seq=3482950946, ack=963613825)
2018-03-06 20:20:01.598 DEBUG TcpConnection: 10.0.0.2:4001 -> YYY interests: Readable
2018-03-06 20:20:01.598 DEBUG Selector: event=Event { kind: Writable, token: Token(1) }
2018-03-06 20:20:01.602 DEBUG Selector: event=Event { kind: Readable, token: Token(1) }
2018-03-06 20:20:01.602 DEBUG Router: connection already closed on send_to_network? 10.0.0.2:4001 -> XXX false
2018-03-06 20:20:01.602 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX Receiving expected packet 2644448553 (flags=16)
2018-03-06 20:20:01.602 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX Client acked 4252117834
2018-03-06 20:20:01.602 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX handle_ack()
2018-03-06 20:20:01.602 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX State = Established
2018-03-06 20:20:01.602 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX interests: Readable
2018-03-06 20:20:01.603 DEBUG Selector: event=Event { kind: Readable | Error | Hup, token: Token(148) }
2018-03-06 20:20:01.603 ERROR TcpConnection: 10.0.0.2:4001 -> XXX Cannot read: [ConnectionRefused] Connection refused (os error 61)
2018-03-06 20:20:01.603 DEBUG TcpConnection: 10.0.0.2:4001 -> XXX Forging empty response (flags=4) (seq=4252117834, ack=2644448553)
2018-03-06 20:20:01.603 INFO TcpConnection: 10.0.0.2:4001 -> XXX Close
2018-03-06 20:20:01.603 DEBUG Router: connection already closed on self-remove? 10.0.0.2:4001 -> XXX true
thread 'main' panicked at 'Removing an unknown connection', src/libcore/option.rs:891:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic
   4: std::panicking::begin_panic_fmt
   5: core::panicking::panic_fmt
   6: core::option::expect_failed
   7: relaylib::relay::router::Router::remove
   8: <F as relaylib::relay::selector::EventHandler>::on_ready
   9: relaylib::relay::relay::Relay::run
  10: <gnirehtet::RunCommand as gnirehtet::Command>::execute
  11: gnirehtet::main
  12: std::rt::lang_start::{{closure}}
  13: main

@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Thank you. 👍

May I ask you to use different XXX for different connections, please? (for example, XXX, YYY)
This would help.

(for instance, the two Open are necessarily different connections)

Additional question: are there more logs containing connection from router before the ones you pasted?

@aruediger
Copy link

Of course. I updated the output. Just ping if you need more. Thanks!

@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Thank you very much. I located the issue, but don't understand it.

In Router::remove(), ptr::eq() gives unexpected results: ptr::eq(x, y) returns false while the representation {:p} of x and y are equal.

To simplify:

let x = …;
let y = …;
println!("{:p} == {:p} ? {}", x, y, ptr::eq(x, y));

It prints:

0x7f87a8c3a618 == 0x7f87a8c3a618 ? false

This is scary. And for now, I failed to reproduce in a sample (the sample always works as expected).

rom1v added a commit that referenced this issue Mar 6, 2018
Connection is a trait, so "*const Connection" is a fat pointer (128
bits, 64 bits pointing to the instance, 64 bits pointing to the vtable).

For some reason (a bug?), in Router::remove(), the connection passed as
argument has a vtable different from the (same) connection stored in the
"connections" field.

This has the unexpected consequence that for some x and y:

    println!("{:p} == {:p} ? {}", x, y, ptr::eq(x, y));

prints:

    0x7f87a8c3a618 == 0x7f87a8c3a618 ? false

Thus, the connection to remove was not found, leading to the panic
"Removing an unknown connection".

See <#61 (comment)>

Thank you to mbrubeck on IRC freenode/##rust, who helped a lot to
understand the issue.
@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Fixed by c36fa4d.

My craziest Rust bug ever.

@rom1v rom1v closed this as completed Mar 6, 2018
@rom1v
Copy link
Collaborator

rom1v commented Mar 6, 2018

Thanks a lot @2Beaucoup for your help.

@aruediger
Copy link

Nice! Will give it a try tomorrow.

rom1v added a commit that referenced this issue Mar 6, 2018
Connection is a trait, so "*const Connection" is a fat pointer (128
bits, 64 bits pointing to the instance, 64 bits pointing to the vtable).

For some reason (a bug?), in Router::remove(), the connection passed as
argument has a vtable different from the (same) connection stored in the
"connections" field.

This has the unexpected consequence that for some x and y:

    println!("{:p} == {:p} ? {}", x, y, ptr::eq(x, y));

prints:

    0x7f87a8c3a618 == 0x7f87a8c3a618 ? false

Thus, the connection to remove was not found, leading to the panic
"Removing an unknown connection".

See <#61 (comment)>

Thanks to mbrubeck on IRC freenode/##rust, who helped a lot to
understand the issue.
rom1v added a commit that referenced this issue Mar 6, 2018
Connection is a trait, so "*const Connection" is a fat pointer (128
bits, 64 bits pointing to the instance, 64 bits pointing to the vtable).

For some reason (a bug?), in Router::remove(), the connection passed as
argument has a vtable different from the (same) connection stored in the
"connections" field.

This has the unexpected consequence that for some x and y:

    println!("{:p} == {:p} ? {}", x, y, ptr::eq(x, y));

prints:

    0x7f87a8c3a618 == 0x7f87a8c3a618 ? false

Thus, the connection to remove was not found, leading to the panic
"Removing an unknown connection".

Instead, compare only the data part, by casting the fat pointers to thin
pointers.

See <#61 (comment)>

Thanks to mbrubeck on IRC freenode/##rust, who helped a lot to
understand the issue.
rom1v added a commit that referenced this issue Mar 6, 2018
Connection is a trait, so "*const Connection" is a fat pointer (128
bits, 64 bits pointing to the instance, 64 bits pointing to the vtable).

For some reason (a bug?), in Router::remove(), the connection passed as
argument has a vtable different from the (same) connection stored in the
"connections" field.

This has the unexpected consequence that for some x and y:

    println!("{:p} == {:p} ? {}", x, y, ptr::eq(x, y));

prints:

    0x7f87a8c3a618 == 0x7f87a8c3a618 ? false

Thus, the connection to remove was not found, leading to the panic
"Removing an unknown connection".

Instead, compare only the data part, by casting the fat pointers to thin
pointers.

See <#61 (comment)>

Thanks to mbrubeck on IRC freenode/##rust, who helped a lot to
understand the issue.
@aruediger
Copy link

works like a charm. :)

@aruediger
Copy link

FYI: these are the currently logged errors for a sample session:

Cannot read: [ConnectionRefused] Connection refused (os error 61)
Cannot read: [Other] Network is unreachable (os error 51)
Cannot read: [TimedOut] Operation timed out (os error 60)
Cannot read: [ConnectionReset] Connection reset by peer (os error 54)
Cannot write: [BrokenPipe] Broken pipe (os error 32)
Unexpected first packet 4161411899; acking 773087661; flags=24

@rom1v
Copy link
Collaborator

rom1v commented Mar 7, 2018

@2Beaucoup yes, the crash occurred after network socket errors.

Now it does not crash anymore, but you still have network socket errors (your computer cannot connect to the requested server) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants