Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom WriteStream on writes end of stream #202

Closed
doowb opened this issue Aug 12, 2016 · 10 comments
Closed

custom WriteStream on writes end of stream #202

doowb opened this issue Aug 12, 2016 · 10 comments

Comments

@doowb
Copy link
Member

doowb commented Aug 12, 2016

When a stream is large enough, only the end is written to the file.

The code here is using this.pos which is set to null when the append flag is set. This is good for writeFile which writes the entire file at once, but it should be different inside the WriteStream.

I've tried just passing in null so the contents will be written to the end of the file and also, updating the this.pos in the onWrite callback. I think the latter is a better approach since the append flag may still be used.

I think updating the onWrite callback would look like this:

  function onWrite(writeErr) {
    if (writeErr) {
      self.destroy();
      callback(writeErr);
      return;
    }

    if (self.pos !== null) {
      self.pos += data.length;
    }
    callback();
  }

I can submit a PR if you'd like. I'm not sure the best approach for testing this.

Thanks to @criticalmash for discovering this and working with me on debugging.

@phated
Copy link
Member

phated commented Aug 12, 2016

@doowb that's interesting because I copied most of the logic from node's write stream. From my initial digging, I found that fs.write maintains the position internally if you don't pass a position. Can you provide an example repo of this?

@doowb
Copy link
Member Author

doowb commented Aug 12, 2016

fs.write does maintain the position internally and uses that if you pass a non number in. But this line is passing in this.pos, which is set to 0 here.

I'll put a repo together showing it so we can get a test out of it.

@phated
Copy link
Member

phated commented Aug 12, 2016

Oh, that might have happened during refactoring (notice the double semi-colon). We can probably get away with hardcoding that as null.

@doowb
Copy link
Member Author

doowb commented Aug 12, 2016

That was my first thought too, but will that just "append" to the file if it already has contents?

@phated
Copy link
Member

phated commented Aug 12, 2016

@doowb the fs.open call is still made with the proper flag, so I would hope so. Otherwise something is seriously messed up in node core.

@doowb
Copy link
Member Author

doowb commented Aug 12, 2016

fs.open call is still made with the proper flag

K, that's the part I missed. Passing null should be good.

@doowb
Copy link
Member Author

doowb commented Aug 12, 2016

Here's a simple repo showing the issue in case it can be used for a test: https://github.com/doowb/vinyl-fs-write-stream

doowb added a commit to doowb/vinyl-fs that referenced this issue Aug 12, 2016
@criticalmash
Copy link

Played around with @doowb's repo, and I think's it's doing more than just cutting off beginning of the file. I tested files of different lengths and found this when using a file 1450 lines long.

of text
[1449] this is a line in a large file of texts a line in a large file of text
[2] this is a line in a large file of text
[3] this is a line in a large file of text
[4] this is a line in a large file of text
...
[1447] this is a line in a large file of text
[1448] this is a line in a large file 

It looks like script reaches the 1449th line then decides to go back to the top of the file and overwrite the existing content (lines 0 & 1). It seems that the script just runs out of memory and decides to cannibalise existing memory.

I created and even larger file, 3550 lines, and passed all of them through index.js.

ls -la fixtures

-rw-r--r--   1 john  staff   90889 Aug 13 15:07 large.txt
-rw-r--r--   1 john  staff   65589 Aug 13 15:21 mid.txt
-rw-r--r--   1 john  staff      29 Aug 13 15:07 small.txt
-rw-r--r--   1 john  staff  162189 Aug 13 15:32 verylarge.txt
ls -la actual
-rw-r--r--   1 john  staff  65536 Aug 13 15:07 large.txt
-rw-r--r--   1 john  staff  65536 Aug 13 15:21 mid.txt
-rw-r--r--   1 john  staff     29 Aug 13 15:07 small.txt
-rw-r--r--   1 john  staff  65536 Aug 13 15:32 verylarge.txt

So the outputted files are all capped at 65536 bytes, or 2 to the 16th. I'm thinking we're running up against some system wide limit here.

@doowb
Copy link
Member Author

doowb commented Aug 13, 2016

@criticalmash data comes through the stream in chunks (probably 64k) so each chunk is written to the file starting at position 0. If the last chunk is smaller than the max, then the data is written at the beginning of the file and the rest of the data from the previous chunk is kept.

I did a PR that changes this to pass in null so fs.read keeps track of the position internally.

You can try out the changes by doing npm i doowb/vinyl-fs#write-stream in the test repo and you should see the entire file copied.

@criticalmash
Copy link

criticalmash commented Aug 13, 2016

Though 64k was a bit low as a limit. But given that it's streaming chunks of content in 64k packages it now makes sense.

Tried your fork, it works as expected.

@phated phated closed this as completed in 7b2a84c Aug 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants