-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update First(n int) to read no more data than necessary #199
Update First(n int) to read no more data than necessary #199
Conversation
Thanks @bartdeboer! A little more context would be helpful for me on this one. |
Hi John, I'm happy to provide more context. The func (p *Pipe) FilterScan(filter func(string, io.Writer)) *Pipe {
return p.Filter(func(r io.Reader, w io.Writer) error {
scanner := newScanner(r)
for scanner.Scan() {
filter(scanner.Text(), w)
}
return scanner.Err()
})
}
func (p *Pipe) First(n int) *Pipe {
if p.Error() != nil {
return p
}
if n <= 0 {
return NewPipe()
}
i := 0
return p.FilterScan(func(line string, w io.Writer) {
if i >= n {
return
}
fmt.Fprintln(w, line)
i++
})
} It filters out n number of lines, but keeps scanning until EOF even after n lines have been received. This means that when used to scan for user input we're not sending in EOF, the function doesn't break out of the scanning loop. My modified version solves this by breaking out of the scanner once the n number of lines is exceeded: func (p *Pipe) First(n int) *Pipe {
if p.Error() != nil {
return p
}
if n <= 0 {
return NewPipe()
}
return p.Filter(func(r io.Reader, w io.Writer) error {
scanner := newScanner(r)
i := 0
for scanner.Scan() {
_, err := fmt.Fprintln(w, scanner.Text())
if err != nil {
return err
}
i++
if i >= n {
break
}
}
return scanner.Err()
})
} |
I'm not sure this is true, if I'm understanding you correctly. For example, consider this program: script.Stdin().First(1).Stdout() If we run this, and type "hello" followed by a newline, we immediately see the output "hello". Can you help me out by showing an example program that doesn't work as you'd expect with the current implementation? |
Here's some examples of what I'm seeing: With .String() fmt.Print("User input: ")
input, _ := script.Stdin().First(1).String()
fmt.Printf("Input was %s\n", input) With PR:
Without PR:
With .Stdout() fmt.Print("User input: ")
bytes, _ := script.Stdin().First(1).Stdout()
fmt.Printf("Input was %d bytes\n", bytes) With PR:
Without PR:
I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming? I think maybe it should terminate. It isn't going to send anything more. What happens if I want to read the first 10 lines of a 1GB log file? Considering it should work the same like |
This is a very good point. As a matter of fact, no What do you think about this slightly more compact way of writing the same logic? return p.Filter(func(r io.Reader, w io.Writer) error {
scanner := newScanner(r)
for i := 0; i < n && scanner.Scan(); i++ {
_, err := fmt.Fprintln(w, scanner.Text())
if err != nil {
return err
}
}
return scanner.Err()
}) |
I think your version writes it more elegantly, yes. Would you like to commit that change to the PR or shall I? Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR? |
Yes, go ahead and make the change by all means. Also, shall we document this behaviour in the doc comment? Something like: // First produces only the first n lines of the pipe's contents, or all the
// lines if there are less than n. If n is zero or negative, there is no output
// at all. When n lines have been produced, First stops reading its input and
// sends EOF to its output.
Yes, good idea—let's make these changes atomic. |
Great! Also, we should probably have a test that shows this behaviour, shouldn't we? What about something like a |
Something like this? It checks if there's remaining bytes in the reader after running First (accounting for Scanner's 4096 buffer size). func TestFirstHasRemainingBytesAfterNLinesOfInput(t *testing.T) {
t.Parallel()
lineSize, testNumLines, totalLines := 1024, 3, 9
var input strings.Builder
// Generate input that is bigger than Scanner's 4096 byte buffer size.
for i := 0; i < totalLines; i++ {
line := strings.Repeat(fmt.Sprintf("%d", i), lineSize-1) + "\n"
input.WriteString(line)
}
inputReader := strings.NewReader(input.String())
got, _ := script.NewPipe().WithReader(inputReader).First(testNumLines).String()
if len(got) == 0 {
t.Errorf("Expected bytes from First, got 0")
}
remInputBytes, err := io.ReadAll(inputReader)
if err != nil {
t.Fatal("Error reading remaining input bytes:", err)
}
if len(remInputBytes) == 0 {
t.Errorf("Expected remaining bytes in the inputReader, got 0")
}
} |
Great! I forgot about the buffer. What about this slightly shorter version of the same idea: func TestFirstDoesNotConsumeUnnecessaryData(t *testing.T) {
t.Parallel()
// First uses a 4096-byte buffer, so will always read at least
// that much, but no more (once N lines have been read).
r := strings.NewReader(strings.Repeat("line\n", 1000))
got, err := script.NewPipe().WithReader(r).First(1).String()
if err != nil {
t.Fatal(err)
}
want := "line\n"
if want != got {
t.Errorf("want output %q, got %q", want, got)
}
if r.Len() == 0 {
t.Errorf("no data left in reader")
}
} |
This slightly modifies
First
and introducesPrompt
that can be used for user input:or
Fixes #51, #132