Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trippy BSOD on NetBSD when resizing window #276

Closed
fujiapple852 opened this issue Aug 19, 2022 · 30 comments · Fixed by #670
Closed

Trippy BSOD on NetBSD when resizing window #276

fujiapple852 opened this issue Aug 19, 2022 · 30 comments · Fixed by #670
Labels
bug Something isn't working netbsd
Milestone

Comments

@fujiapple852
Copy link
Owner

fujiapple852 commented Aug 19, 2022

Starting trippy (0.6.0-dev) and resizing the window will result in a BSOD with the error:

IO error: Interrupted system call (os error 4)
@fujiapple852 fujiapple852 added bug Something isn't working netbsd labels Aug 19, 2022
@fujiapple852 fujiapple852 self-assigned this Aug 19, 2022
@fujiapple852 fujiapple852 added this to the 0.6.0 milestone Aug 19, 2022
@fujiapple852
Copy link
Owner Author

Also occurs with the existing 0.5.0 release and so not a new regression. @0323pin do you recall if you ever saw this issue?

@fujiapple852 fujiapple852 changed the title BSOD on NetBSD when resizing window Trippy BSOD on NetBSD when resizing window Aug 19, 2022
@0323pin
Copy link
Contributor

0323pin commented Aug 19, 2022

@fujiapple852 I don't because, I do not resize windows, I use leftwm (tiling window manager).

But, I can reproduce the error with 0.5.0 when I do a window resize.

@fujiapple852
Copy link
Owner Author

Thank you @0323pin! Given it isn't a regression bug I'll proceed with the 0.6.0 release and investigate this with a view to fixing in 0.7.0.

@fujiapple852 fujiapple852 modified the milestones: 0.6.0, 0.7.0 Aug 19, 2022
@0323pin
Copy link
Contributor

0323pin commented Aug 19, 2022

No worries, @fujiapple852
Weird that this happens, though.

@0323pin
Copy link
Contributor

0323pin commented Aug 19, 2022

@fujiapple852 Just so you know, I've updated the package already but, didn't have the time to merge it. Most probably late this evening or, tomorrow early morning.

@fujiapple852 fujiapple852 modified the milestones: 0.7.0, 0.8.0 Mar 25, 2023
@fujiapple852
Copy link
Owner Author

fujiapple852 commented May 12, 2023

Added #552 and #153 to assist with diagnostic of this issue

@fujiapple852
Copy link
Owner Author

Best guess is this relates to not handling EINT properly.

https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call

@fujiapple852 fujiapple852 removed this from the 0.8.0 milestone May 13, 2023
@0323pin
Copy link
Contributor

0323pin commented May 13, 2023

Sorry for the slow reply, been AFK for a few days.

I've asked internally, if we are missing something obvious here.

@fujiapple852
Copy link
Owner Author

fujiapple852 commented May 14, 2023

Thanks @0323pin, there is nothing needed right now, I've managed to get an AWS NetBSD instance working again and so I should be able to debug this now.

Edit: I spoke to soon, I'm unable to build the latest master (or even the previous 0.7.0) on my AWS environment. I can install Rust (1.64, a bit old but should be ok) but it fails to build some core Rust packages.

error: could not compile `syn`

Caused by:
  process didn't exit successfully: `rustc --crate-name syn --edition=2018 /root/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/syn-1.0.109/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 --cfg 'feature="clone-impls"' --cfg 'feature="default"' --cfg 'feature="derive"' --cfg 'feature="extra-traits"' --cfg 'feature="full"' --cfg 'feature="parsing"' --cfg 'feature="printing"' --cfg 'feature="proc-macro"' --cfg 'feature="quote"' --cfg 'feature="visit"' --cfg 'feature="visit-mut"' -C metadata=e75730d19c6f40fa -C extra-filename=-e75730d19c6f40fa --out-dir /root/trippy/target/debug/deps -L dependency=/root/trippy/target/debug/deps --extern proc_macro2=/root/trippy/target/debug/deps/libproc_macro2-2836222ef2fb938b.rmeta --extern quote=/root/trippy/target/debug/deps/libquote-fd31c030121713c2.rmeta --extern unicode_ident=/root/trippy/target/debug/deps/libunicode_ident-7347f110e8245f8d.rmeta --cap-lints allow --cfg syn_disable_nightly_tests` (signal: 9, SIGKILL: kill)

I'm using the public AMI ami-041f8cb5cca00f023 which describes itself as NetBSD 9 arm64 2021-07-01a and NetBSD/evbarm-aarch64 9. It is also described as arm64 but in reality it is a evbarm.

It appears to be NetBSD 9.2_STABLE, as that is what pkgin tells me and so I've configured it to pull packages from http://ftp.netbsd.org/pub/pkgsrc/packages/NetBSD/aarch64/9.2/All. I do see a evbppc package repo but not evbarm. There is also https://wiki.netbsd.org/ports/evbarm/

Aside: I can pkgin install trippy and run it without issue, I just can't build it!

@0323pin
Copy link
Contributor

0323pin commented May 14, 2023

I'm unable to build the latest master (or even the previous 0.7.0) ...

If you want, I can do a test build on bare-metal x86_64.

@fujiapple852
Copy link
Owner Author

@0323pin that would be useful, thanks. I'm going to release 0.8.0 soon so it would be good to confirm everything still works before I do.

Apart from checking that it builds from master, I was also hoping to run against a branch with some new logging enabled to capture more details above the failure above. Would you be able to do that if it's not too much trouble?

git checkout feat-error-context
cargo build
sudo target/debug/trip example.com -m silent -v --log-span-events full > trippy.log

When running, resize the window and it should fail with the Interrupted system call error, then send me the log. Note that the above will generate a lot of output, so i'd only run it for a few seconds before resizing the window.

@0323pin
Copy link
Contributor

0323pin commented May 14, 2023

I was also hoping to run against a branch with some new logging enabled to capture more details above the failure above. Would you be able to do that if it's not too much trouble?

Yes, I can do this but, I've already built from the 5b5ca30

~> trip -V
trip 0.8.0-dev
~> uname -v
NetBSD 10.99.4 (GENERIC) #0: Fri May 12 13:29:41 UTC 2023  [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC
~> pkgin list | grep rust
rust-1.69.0          Safe, concurrent, practical language

2023-05-14-130643_1366x768_scrot

Resizing the window causes the expected failure. I'll try to find the time to build with the logging feature enabled later today. But, at least you know it builds and runs.

@0323pin
Copy link
Contributor

0323pin commented May 14, 2023

target/debug/trip example.com -m silent -v --log-span-events full > trippy.log with a window resize sent by e-mail.

Hopefully it comes through, it's 19 MB. Let me know if it doesn't get to you and I'll host it on git.

@fujiapple852
Copy link
Owner Author

Thanks, I got the file.

From the trace it doesn't look like it crashed, so I guess it only crashes when you resize when the TUI is running and the backend tracing is running, which the current tracing code doesn't support.

To fix this I think I'll need a netBSD env I can use to debug this directly (also I tried a VM in virtualBox without much luck). What would be a good forum to ask for help for this?

Anyway good news that the latest code that it builds ok, so I can proceed with the release.

@0323pin
Copy link
Contributor

0323pin commented May 15, 2023

I guess it only crashes when you resize when the TUI is running and the backend tracing is running, which the current tracing code doesn't support.

Yeah, i tried that also but, got verbose option is not available in TUI mode or, something along those lines.

To fix this I think I'll need a netBSD env ... I tried a VM in virtualBox without much luck

I've never used VirtualBox, I always use QEMU

What would be a good forum to ask for help for this?

www.unitedbsd.com
Check this thread, https://www.unitedbsd.com/d/348-how-to-use-qemu-to-run-netbsd-91/5
Yes, I'm active there :)

EDIT: 0.8.0 merged, https://mail-index.netbsd.org/pkgsrc-changes/2023/05/15/msg274784.html

@fujiapple852
Copy link
Owner Author

fujiapple852 commented Aug 31, 2023

@0323pin I was eventually able to get NetBSD running locally on macOS (intel Mac) using qemu.

I installed qemu:

brew install qemu

I downloaded:

http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/202308261920Z/images/NetBSD-9.3_STABLE-amd64.iso

I created the VM with:

qemu-img create virtualmachine.img 10G

For the initial install I ran:

qemu-system-x86_64 -boot d -cdrom ~/Downloads/NetBSD-9.3-amd64.iso -enable-kvm -m 3G -hda virtualmachine.img

...and followed the prompts (picked defaults for most things, enabled sshd and nntpd)

After installation, i'm running it with (i've tried a few variation on this to try and speed things up):

qemu-system-x86_64 -m 3G -M q35 -cpu host -smp 4 -hda virtualmachine.img -accel hvf

I then installed pkgin by uncommenting the PKG_PATH line in .profile. I had to change from the default https://cdn.netbsd.org to http://ftp.netbsd.org to get it to work.

I don't know why but pkg_add & pkgin run really slowly (multi-minute to search for or install a package). Not sure if it is a qemu issue or not, everything else seems to run at a sensible speed.

I then installed trippy version 0.8.0:

pkgin install trippy

It runs, though i'm obviously missing some setup as it looks like the following:

Screenshot 2023-08-31 at 11 50 35 PM

I don't see any ICMP traffic being received, which I guess is some qemu config I need to tweak. A standard traceroute seems to have the same problem so I suspect it is config.

Because i'm working directly in the console I don't have any way to trigger the bug with a window "resize", I guess I need some kind of graphical environment?

I wasn't able to ssh into the vm from my Mac host, so i'm just working in the qemu window that pops up.

@0323pin
Copy link
Contributor

0323pin commented Aug 31, 2023

Hi, great that you have managed to install it :)

I don't know why but pkg_add & pkgin run really slowly (multi-minute to search for or install a package).

I've never experienced this. Ok, it's not as fast as xbps but, it's not slower than apt. Location/mirror?

Because i'm working directly in the console I don't have any way to trigger the bug with a window "resize", I guess I need some kind of graphical environment?

There are two window managers (kind of anyway, twm and the default ctwm) in the base install, as well as Xorg and three shells.

@0323pin
Copy link
Contributor

0323pin commented Aug 31, 2023

@fujiapple852 Are you on Matrix? I'm really short of time today but we could set-up sometime to chat through your issues.

@0323pin
Copy link
Contributor

0323pin commented Sep 1, 2023

@fujiapple852 On a second thought ...

If your intention is only to debug the window-resize issue on a disposable qemu-vm, which you do not intend to keep, NetBSD has everything you need on the base install.

Xorg is part of base and ctwm the default WM, so if you just run startx from the tty, you will be on a graphical env
A bare-bones one with a white xterm (create .Xresources, if you want to define other colors) but, nevertheless a graphic env 😄

Sorry, if I can't give you proper .xinitrc and .Xresources files but, I haven't used modified defaults in ages. These days, I'm using alacritty built from git-HEAD and not xterm (no need for .Xresources) and elvish also built from git-HEAD as my default shell, configured to start a graphical env straight from login.

If you, by any chance, run into issues with .Xauthority (not able to start the X-server) make sure your /etc/hosts is properly configured. It should contain the proper machine name (the name you gave your host during install) and your DNS domain and it should look like this:

#	$NetBSD: hosts,v 1.9 2013/11/24 07:20:01 dholland Exp $
#
# Host name database.
#
# This file contains addresses and aliases for local hosts whose names
# need to be resolvable during system boot; typically this includes only
# the address and FQDN for this machine's hostname.
#
# By default this file is consulted before DNS, so adding additional
# material here that then becomes out of date can lead to confusion.
# See nsswitch.conf(5).
#
::1			mybox.my.domain mybox
000.0.0.0		mybox.my.domain mybox
#
# RFC 1918 specifies that these networks are "internal":
# 10.0.0.0        -   10.255.255.255  (10/8 prefix)
# 172.16.0.0      -   172.31.255.255  (172.16/12 prefix)
# 192.168.0.0     -   192.168.255.255 (192.168/16 prefix)

Note: I've posted this on a forum a longtime ago so, I've hidden the numbers, 000.0.0.0 is actually something else. Don't touch the default, just fix domain name and hostname.

Now, you should be able to resize your terminal and reproduce the issue.

@fujiapple852
Copy link
Owner Author

Are you on Matrix?

@0323pin I am now! @fujiapple852:matrix.org

@fujiapple852
Copy link
Owner Author

If your intention is only to debug the window-resize issue on a disposable qemu-vm, which you do not intend to keep, NetBSD has everything you need on the base install.

Yeh, disposable is fine. I'd like to be able to fire up trippy on netbsd run a basic test before each release like I used to do on a cloud env.

Xorg is part of base and ctwm the default WM, so if you just run startx from the tty, you will be on a graphical env

Wow, that just...worked :)

From the graphical env I was able to start trippy, resize the window and observe the crash. Nice!

@c-git
Copy link
Collaborator

c-git commented Sep 1, 2023

Yeh, disposable is fine. I'd like to be able to fire up trippy on netbsd run a basic test before each release like I used to do on a cloud env.

Is there an easy way to add this to the pre-release CI?

@c-git
Copy link
Collaborator

c-git commented Sep 1, 2023

I also joined matrix @one.----:matrix.org

@0323pin
Copy link
Contributor

0323pin commented Sep 1, 2023

Yeh, disposable is fine. I'd like to be able to fire up trippy on netbsd run a basic test before each release like I used to do on a cloud env.

Is there an easy way to add this to the pre-release CI?

I know nothing about CIs but, there's now support for NetBSD in https://cirrus-ci.com/build/6221284932583424

@fujiapple852
Copy link
Owner Author

fujiapple852 commented Sep 1, 2023

@0323pin provisional fix available in #670

It seems to fix it in my qemu-vm environment, would you care to try it?

One caveat here is that I've had to temporarily downgrade clap as the latest version requires Rust 1.70 and the latest version available on netbsd 9.3 appears to be 1.69 (the latest rust available is 1.72). Once netbsd bumps to 1.70 I can merge this fix, which is fine as I don't plan a trippy release for a while anyway. If needed I could back port the fix to 0.8.x and release it, but I think that may be overkill.

@fujiapple852
Copy link
Owner Author

I know nothing about CIs but, there's now support for NetBSD in https://cirrus-ci.com/build/6221284932583424

That's awesome!

Now I wonder if it allows raw sockets and ICMP (GH actions do not...)

@0323pin
Copy link
Contributor

0323pin commented Sep 1, 2023

Thanks! I'll take it for a spin after the weekend.
Sorry, family visit.

Did you see my comment on the Rust version in your branch commit?

@fujiapple852
Copy link
Owner Author

fujiapple852 commented Sep 1, 2023

@0323pin yes I did, that's fine. No rush here, we've waited this long (over a year!) it can wait a few more weeks :)

@0323pin
Copy link
Contributor

0323pin commented Sep 2, 2023

@fujiapple852 It works 😄

2023-09-02-173905_1366x768_scrot

Built with Rust-1.71.1, resize without crash.

@fujiapple852
Copy link
Owner Author

Thanks! The fix has been Merged and will go into the 0.9.0 release.

@fujiapple852 fujiapple852 removed their assignment Sep 2, 2023
@fujiapple852 fujiapple852 added this to the 0.9.0 milestone Sep 2, 2023
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 1, 2023
[0.9.0] - 2023-11-30

Added

- Added support for tracing flows
  ([#776](fujiapple852/trippy#776))
- Added support for `icmp` extensions
  ([#33](fujiapple852/trippy#33))
- Added support for `MPLS` label stack class `icmp` extension
  objects ([#753](fujiapple852/trippy#753))
- Added support for [paris]
  (https://github.com/libparistraceroute/libparistraceroute) ECMP routing
  for `IPv6/udp` ([#749](fujiapple852/trippy#749))
- Added `--unprivileged` (`-u`) flag to allow tracing without elevated
  privileges (macOS
  only) ([#101](fujiapple852/trippy#101))
- Added `--tui-privacy-max-ttl` flag to hide host and IP details for low ttl
  hops ([#766](fujiapple852/trippy#766))
- Added `toggle-privacy` (default: `p`) key binding to show or hide private
  hops ([#823](fujiapple852/trippy#823))
- Added `toggle-flows` (default: `f`) key binding to show or hide tracing
  flows ([#777](fujiapple852/trippy#777))
- Added `--dns-resolve-all` (`-y`) flag to allow tracing to all IPs resolved
  from DNS lookup
  entry ([#743](fujiapple852/trippy#743))
- Added `dot` report mode (`-m dot`) to output hop graph in Graphviz `DOT`
  format ([#582](fujiapple852/trippy#582))
- Added `flows` report mode (`-m flows`) to output a list of all unique tracing
  flows ([#770](fujiapple852/trippy#770))
- Added `--icmp-extensions` (`-e`) flag for parsing `IPv4`/`IPv6` `icmp`
  extensions ([#751](fujiapple852/trippy#751))
- Added `--tui-icmp-extension-mode` flag to control how `icmp` extensions are
  rendered ([#752](fujiapple852/trippy#752))
- Added `--print-config-template` flag to output a template config
  file ([#792](fujiapple852/trippy#792))
- Added `--icmp` flag as a shortcut for `--protocol icmp`
  ([#649](fujiapple852/trippy#649))
- Added `toggle-help-alt` (default: `?`) key binding to show or hide
  help ([#694](fujiapple852/trippy#694))
- Added panic handing to Tui
  ([#784](fujiapple852/trippy#784))
- Added official Windows `scoop` package
  ([#462](fujiapple852/trippy#462))
- Added official Windows `winget` package
  ([#460](fujiapple852/trippy#460))
- Release `musl` Debian `deb` binary asset
  ([#568](fujiapple852/trippy#568))
- Release `armv7` Linux binary assets
  ([#712](fujiapple852/trippy#712))
- Release `aarch64-apple-darwin` (aka macOS Apple Silicon) binary
  assets ([#801](fujiapple852/trippy#801))
- Added additional Rust Tier 1 and Tier 2 binary assets
  ([#811](fujiapple852/trippy#811))

Changed

- [BREAKING CHANGE] `icmp` extension object data added to `json` and `stream`
  reports ([#806](fujiapple852/trippy#806))
- [BREAKING CHANGE] IPs field added to `csv` and all tabular
  reports ([#597](fujiapple852/trippy#597))
- [BREAKING CHANGE] Command line flags `--dns-lookup-as-info` and
  `--tui-preserve-screen` no longer require a boolean
  argument ([#708](fujiapple852/trippy#708))
- [BREAKING CHANGE] Default key binding for `ToggleFreeze` changed from `f`
  to `ctrl+f` ([#785](fujiapple852/trippy#785))
- Always render AS lines in hop details mode
  ([#825](fujiapple852/trippy#825))
- Expose DNS resolver module as part of `trippy` library
  ([#754](fujiapple852/trippy#754))
- Replaced unmaintained `tui-rs` crate with `ratatui` crate
  ([#569](fujiapple852/trippy#569))

Fixed

- Reverse DNS lookup not working in reports
  ([#509](fujiapple852/trippy#509))
- Crash on NetBSD during window resizing
  ([#276](fujiapple852/trippy#276))
- Protocol mismatch causes tracer panic
  ([#745](fujiapple852/trippy#745))
- Incorrect row height in Tui hop detail navigation view for hops with no
  responses ([#765](fujiapple852/trippy#765))
- Unnecessary socket creation in certain tracing modes
  ([#647](fujiapple852/trippy#647))
- Incorrect byte order in `IPv4` packet length calculation
  ([#686](fujiapple852/trippy#686))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working netbsd
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants