Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid calls to (*os.File).Fd() and operations on raw file descriptor ints #167

Merged
merged 6 commits into from
Oct 28, 2023

Conversation

sio
Copy link
Contributor

@sio sio commented Apr 28, 2023

Several users (me included) were surprised by blocking i/o on ptmx file descriptor.

ptmx.Read() would hang indefinitely and could not have been interrupted neither by (*os.File).SetDeadline nor by (*os.File).Close. This could lead to goroutine leaks (if Read() was called in goroutine) or to whole application hanging indefinitely with no way to gracefully unblock.

In addition to unwanted freezes, using (*os.File).Fd() could lead to data races:

This PR removes all calls to (*os.File).Fd() which would implicitly set file descriptor into blocking mode (go doc, go source). Most such calls were related to ioctl() function, so I added a wrapper that takes (*os.File) instead of raw fd and uses (*os.File).SyscallConn() under the hood. The only other place where Fd() was used is Solaris-specific - I've added a SyscallConn() workaround there too.

There still remain a few places where raw fds are passed around in pty_darwin.go/pty_freebsd.go/pty_solaris.go, but all of them are limited to pty opening phase and are unlikely be a source of data races (in my understanding).

I've also added tests to detect blocked i/o (based on example provided by @gcrtnst in #162). As to be expected, these tests fail on current master branch and succeed after applying this PR.

Caveats:

  • If (*os.File).SyscalConn fails we fall back to old racy/blocking behavior
  • darwin intentionally uses blocking i/o for ptmx (#52, #53, golang/go#22099). I did not touch that, so the tests for unblocking are not executed there. Still, this PR is beneficial to Darwin users because of removal of data races.
  • (*os.File).SyscallConn was introduced in go1.12. Older platforms will continue using original racy/blocking behavior.

sio added 6 commits April 28, 2023 09:23
(*os.File).Fd() implicitly sets file descriptor into blocking mode [1], [2]
We avoid that by calling (*os.File).SyscallConn instead.
This method was added in go1.12 (released on 2019-02-25)

[1]: https://pkg.go.dev/os#File.Fd
[2]: https://cs.opensource.google/go/go/+/refs/tags/go1.20.3:src/os/file_unix.go;l=80-87
This filename trick forces git to render previous commit diff
in a more comprehensible form
When compiled with go older than 1.12 creack/pty will not include a fix
for blocking Read() and will be prone to data races - but at least it will work

For more information see issues: creack#88 creack#114 creack#156 creack#162
@nichtverstehen
Copy link

@sio Kudos on getting to the bottom of this! Would be great to see this change merged. @creack, are there any concerns with it?

@creack
Copy link
Owner

creack commented Oct 28, 2023

My main concern would be that some project have been using the old behavior for 10+ years. It is probably best to make this part of the v2 alongside windows support which has other breaking changes.

@sio
Copy link
Contributor Author

sio commented Oct 28, 2023

I'm not sure how deadlocks and data races could be considered a feature someone would rely upon, but then there is always https://xkcd.com/1172/

Jokes aside, I don't mind bumping major version if you think that's required. In my opinion, no existing workflow should be broken by this change because it enables canceling Read() which was not possible before, but it does not cancel anything automatically unless instructed by user. So default behavior stays the same, we just have one less failure mode.

@creack
Copy link
Owner

creack commented Oct 28, 2023

It has been a long long time since, but back when I used it within Docker, I recall relying on the blocking behavior for something.

I may be wrong, but just to be safe and as we plan a v2 anyway, might as well keep the existing behavior for v1 and this PR will be the first part of v2.

That being said, if you are confident I am wrong and chaning the blocking flag of the fd will have no impact, I can push it as part of the last patch of v1.

@creack
Copy link
Owner

creack commented Oct 28, 2023

After more testing, I think you are right. I'll merge this as part of v1.

@creack
Copy link
Owner

creack commented Oct 28, 2023

Thank you for the big contribution!

@creack creack merged commit 1985fd4 into creack:master Oct 28, 2023
@nichtverstehen
Copy link

nichtverstehen commented Oct 28, 2023

Thanks for the comments!

Any change has a potential to be a breaking change for someone but I have a few arguments to merge this nonetheless :). FWIW, I am not the author of the PR, I am a user of this library.

  1. The issue is quite a subtle one, so many users might not realize that they are affected. In fact, we as users didn't realize for a long time (we were just silently leaking processes).
  2. A singnificant number of open issues has this as the root cause.
  3. The behavior actually has changed since 10 years ago!

Until go1.8 (2017) the IO subsystem used blocking IO for normal files. So calling File.Fd() didn't affect subsequent IO:
https://cs.opensource.google/go/go/+/refs/tags/go1.8:src/os/file_unix.go;l=43-48

Go1.9 changed file IO use the poller:
https://cs.opensource.google/go/go/+/c05b06a12d005f50e4776095a60d6bd9c2c91fac

Now File.Fd() changed the FD to blocking mode:
https://cs.opensource.google/go/go/+/refs/tags/go1.9:src/os/file_unix.go;l=66-68;bpv=1;bpt=0

This affects subsequent IO operations since the file is never deregistered from the poll subsystem correctly (maybe a bug in Go stdlib?).

So Go1.9 caused this regression.

Update. My eloquence was not needed. Thanks folks for the resolution! Hope to see a minor release soon :)

renovate bot referenced this pull request in kairos-io/provider-kairos Oct 28, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/creack/pty](https://togithub.com/creack/pty) | require |
patch | `v1.1.19-0.20220421211855-0d412c9fbeb1` -> `v1.1.20` |

---

### Release Notes

<details>
<summary>creack/pty (github.com/creack/pty)</summary>

### [`v1.1.20`](https://togithub.com/creack/pty/releases/tag/v1.1.20)

[Compare
Source](https://togithub.com/creack/pty/compare/v1.1.19...v1.1.20)

#### What's Changed

- Avoid calls to (\*os.File).Fd() and operations on raw file descriptor
ints by [@&#8203;sio](https://togithub.com/sio) in
[https://github.com/creack/pty/pull/167](https://togithub.com/creack/pty/pull/167)

**Full Changelog**:
creack/pty@v1.1.19...v1.1.20

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "after 11pm every weekday,before 7am
every weekday,every weekend" in timezone Europe/Brussels, Automerge - At
any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/kairos-io/provider-kairos).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMS41IiwidXBkYXRlZEluVmVyIjoiMzcuMzEuNSIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@creack
Copy link
Owner

creack commented Oct 30, 2023

@sio, this PR seems to have broken ioctls on Linux. Still digging but about to revert the PR.

Simplest example would be the Getsize function always returning 0 and Setsize having no effects.

Edit: Might have been something wrong on my end, can't reproduce the issue. Will not revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants