-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: clarify the behavior of readFile with fd #10853
Conversation
Makes it explicit that readFile doesn't seek to the beginning.
We probably need to get a feeling for user expectations on this, and if the current behaviour is unexpected, weigh that against the breakage of changing it? |
Are you sure you quoted the right sentence? I don't see the connection.
Well we have the following two facts:
|
Oh, sorry. What I basically meant to say was “I’m not sure whether this fixes #9699 or not because we haven’t made any decision on behavioural changes yet.” |
A bug was reported against fs because when reading from a fd, it did not read the whole file. I think its weird to not read the whole file. I also think the seek could be wrapped with try/catch and ignored on |
Can you link to it? |
@sam-github Um, yeah, that's my PR. Have you read my original comment? |
@jorangreef reported a bug in #9671 and you reference it. You also claim that "no one has complained about this behaviour as far as I know" in #10853 (comment). And in this PR you just document the weird behaviour based on "no one has complained". @seishun Seems contradictory to me, perhaps you can clear up my confusion? |
@sam-github @jorangreef's issue isn't specifically about |
Can you link to the PR or issue where this is happening? |
I was planning to look into it after this PR is merged. |
So here are some of my thoughts: rants? The history of this is a tangled mess, reaching from here, back to #10809, to #3163, to nodejs/node-v0.x-archive#8471 and nodejs/node-v0.x-archive#8522, and seems to have had some fd leaks and other quality problems on the way. Adding fd support was thought of as a "sure, why not?", but we didn't adequately consider what that means when the fd is not currently seeked to start-of-file, or what to do on an unseekable fd, or the maintenance cost, which is clearly At this point, we have tests running on stdin, which fail, but the tests are probably invalid, looks like they were written agains stdin without documenting what the behaviour is with stdin, or what it even means to "read file" from a fd, when the fd is a pipe. And now we have an API that is complex to describe, and doesn't really do the same thing when called with a fd and a path. This is a classic example of an API that should never have been implemented in node, it could perfectly reasonably be a npm module, or two... or three, one for every variant of thought on how it should work, but since its a node API, we have to state a definitive position, and possibly break backwards compat in the process. I'm personally tempted to revert the entire "allow fd" feature, then let people debate how it should work with fds adequately before it gets merged in again. And explain why a feature to "read a file" should be able to read a pipe, or read any fd, really. Same for writeFile. What does it even do when given a fd? The docs say only:
Clearly, they weren't updated when the fd feature was added, I very much doubt that when passing a fd to writeFile, it replaces the file if it already exists, even if the fd was to a file, which it doesn't have to be. So, at this point, what do we do? I don't know. If the best we can do is doc how it works now, then so it goes. @seishun Thanks for taking a shot at cleaning this up. I look forward to seeing what you can do. Consider my suggestion for reverting the "fd" feature, please, but if you go the doc route, look at writeFile, too, it doesn't document what it does with an fd (and may not test it, either). Also, you mentioned positional reads on Windows above. I happen to know that pread/pwrite don't change the fd position on Unix, and that fs.read/write with positions are implemented in their terms, but the docs for fs.read and fs.write say absolutely nothing about what happens to the current fd position, and in particular, don't say that what happens depends on Windows vs. Unix. So there is plenty to doc there, too. And possibly to test. |
My issue #9671 was specifically about |
If it was specifically about |
I can certainly see the point behind “this should never have been introduced in the first place” but I doubt that removing it is feasible. |
@sam-github I agree with your summary of how we got here and the history of readFile() and writeFile(). I would like to add a few points to yours:
readFile(fd) is useful for working with long-lived files, where the open call can be amortized across the life of the program. For example, a key-value store might open several files at startup, then use read() and write() calls on the file at various positions, and then a compaction job might use readFile(fd) to read in the entire file, which should be read from offset 0. I think readFile(fd) for regular files is worth keeping, since such a program would otherwise have to implement the same itself, and the implementation has proved difficult enough. Perhaps we are nearly there now. If anything should be deprecated, I think it should be readFile(fd) for special files. Alternatively, this could be special-cased. |
Yes, I have an issue with that. Sure, I discovered it when working with sparse files on Windows. Would it have made a difference if I found the problem when working with sparse files on Linux? The issue is with readFile specifically not reading the whole file, regardless of platform. Or what did you think my issue was about? |
readFile('/dev/stdin') has always worked afaik. fd vs path has nothing to do with regularness.
Perhaps you would like to update the issue description then. As it stands now, the issue in |
I was referring to a regular file (which can be accessed by fd or path) vs a special file (stdin, socket etc.).
If it helps, sure.
That's besides the point. My test case showed that readFile does not read the whole file at least on Windows. I care that readFile reads the whole file on all platforms. The culprit may or may not be fs.read but that's an implementation detail. I do not mind too much how readFile works, so long as it does what it claims to do. |
No, that's not an implementation detail, but a precondition in your test case. Your test case basically uses If your issue indeed doesn't have anything to do with |
|
It doesn't assume it's 0, it just starts reading from the current position. The docs are unclear on whether this behavior is correct, so this PR fixes the docs. If you think the behavior needs to be changed instead, then please take a look at the options I listed in #10809 (comment). |
That's an understanding of readFile contributed by this PR. According to the present docs, readFile is supposed to "asynchronously read the entire contents of a file". If "it just starts reading from the current position" then readFile must be assuming that the current position is 0.
The docs are clear as to "asynchronously read the entire contents of a file". Where the docs are not clear is that readFile and writeFile were designed for regular files which can be seeked. Support for special files happened to work while readFile made assumptions about the fd's seek position.
I would go for option 2, adding special casing for special files, or option 3, deprecating support for special files, pending a review of whether readFile(fd)/writeFile(fd) is in fact ever used for these. This PR will likely not avert corruption issues for programs such as the key-value store use-case mentioned above, since readFile has been available for a while in Node and I would wager that most people expect readFile(path) and readFile(fd) to return the same data when working with regular files. |
Well apparently opinions differ, see for instance @addaleax's comment in #10809 (comment).
I'm not a huge fan of either of these options. Option 2 would make the behavior inconsistent between seekable and non-seekable files. Option 3 would introduce an arbitrary limitation by making |
Not really, Option 2 would read from the beginning of a seekable file, and read from the current position in a non-seekable file. That would match user expectations in both cases. |
Are you sure this matches user expectations? As far as I know, yours was the first issue that brought it up, and you'd never have noticed it if you used Linux. Which shows that either most people are fine with the current behavior, or no one really uses |
Of course. Or do you think reading half a regular file would match user expectations better? |
Why do you say that? Problem is clearly reproduceable on Linux:
The file was read the first time. The file was not read completely the second time, clearly in contradiction of the docs, and I'm having a hard time docing this as "expected". Sometimes we find old node APIs, that have done something for a long time that isn't what we would want if we could do it over, and we have to bite the bullet and just document the behaviour. This isn't one of those cases, IMO, its a new API, and its not very common to call readFile with a fd. Allowing fds is not well justified for this API, as I've stated, but one possible justification is because you do intend to call readFile repeatedly, and don't want to pay the open price every time. This use pattern is currently not supported. I agree with #10853 (comment) At this point, reading entire files from position 0 for seekable fds, and doing best we can, reading from current position for unseekable fds is least surprising. The API is documented and called "readFile"/"read entire file", not "readRemaing"/"read rest of file". Under what conditions, given a seekable fd, would it be useful to read from current position to end of file? |
His test case in #9671 wasn't reproducible on Linux.
Okay, maybe you're right, but I'd like some other people to chime in. @addaleax and @cjihrig approved this PR so perhaps they have some arguments against your proposal. I assume you want to basically do the same as in 4444e73, but fall back to the old behavior with |
I liked 4444e73, that's why I approved it :-). I gather the problem is it adds a requirement that the fd be seekable/be preadable? If it fails on unseekable streams, I'm OK with that. What does it even mean to "Asynchronously reads the entire contents of a file." on an unseekable fd? But if we decide to define "entire contents" as "everything we can get" for unseekable streams, I'm OK with that, too, as long as we document it. |
I find this whole concept of “support” for different types of files a bit hard to grasp. Maybe it’s my UNIX background speaking (I have no idea how Windows handles things, but I believe it’s kind of same?), but in my mind one Great Thing about file descriptors is that they are universal; you can use the same code on all kinds of files and you’ll get about the same result. That’s also part of why I find the current behaviour to be the expected one; when I read the API description of Also, “Supporting” special files isn’t superfluous; I can definitely see applications using
Even if it matches user expectations, your description of that behaviour does sound like the definition of an inconsistency to me ;)
That sounds more like a good reason to implement
I’m not sure this is something that happens in The Real World a lot (then again nothing in this discussion happens a lot, I’d guess), but I can see an application reading the beginning of a file, to inspect a file header and the like, then deciding whether it wants to read the rest or not. |
@@ -1347,7 +1347,8 @@ added: v0.1.29 | |||
* `flag` {String} default = `'r'` | |||
* `callback` {Function} | |||
|
|||
Asynchronously reads the entire contents of a file. Example: | |||
Asynchronously reads the entire contents of a file. If a file descriptor is | |||
specified as the `file`, it will read from the current file position. Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it more explicit:
If a file descriptor is specified as the `file`, reading will begin from the current
file position and not from the beginning of the file.
To me, too. There is no definition of this API that is consistent with path and fd, since using args that are so different do different things, but you were -1 on removing the API. OK, I'm abstaining from voting on whether we just doc this or fix it. The pros and cons seem of equal weight. This kind of API probilem is one more example of why node should be decreasing its API surface, though. |
I don't think the difference between path and fd is the problem here. Imagine if
Me too. Should this be brought to CTC? |
If readFile only took a fd, it wouldn't have been called readFile :-) I don't have an opinion on whether it should hit the CTC. I'm OK with it going either way. @nodejs/ctc |
Updates on this one? |
@jasnell Still unclear whether the current behavior should be considered a bug or a feature. |
There hasn't been any activity here. I'm closing this. Feel free to reopen (or ping a collaborator) if closed in error. |
Makes it explicit that readFile doesn't seek to the beginning.
This is related to #9671, but that issue is actually a combination of two issues.
fs.readFile
reads from the current offset when called with anfd
instead of from the beginning.Fixing at least one of the above would fix @jorangreef's issue. I tried fixing the second one in #9699, but that broke Linux and Mac OS tests (see #10809). This PR clarifies that it's not a bug but a feature.
Not sure which of the above mentioned issues should be mentioned in the commit message, if any.
Checklist
Affected core subsystem(s)
doc