Update First(n int) to read no more data than necessary #199

bartdeboer · 2024-05-02T22:53:44Z

This slightly modifies First and introduces Prompt that can be used for user input:

fmt.Print("User input: ")
input, _ := script.Stdin().First(1).String()
fmt.Printf("Input was %s\n", input)

or

input, _ := script.Prompt("User input: ").String()
fmt.Printf("Input was %s\n", input)

Fixes #51, #132

bitfield · 2024-05-03T08:53:41Z

Thanks @bartdeboer! A little more context would be helpful for me on this one. Prompt makes sense, but what effect does the change to First have that makes it necessary here?

bartdeboer · 2024-05-03T09:29:13Z

Hi John,

I'm happy to provide more context. The First(n int) function is designed to read the first n lines from a stream. However, the old implementation had a limitation:

func (p *Pipe) FilterScan(filter func(string, io.Writer)) *Pipe {
	return p.Filter(func(r io.Reader, w io.Writer) error {
		scanner := newScanner(r)
		for scanner.Scan() {
			filter(scanner.Text(), w)
		}
		return scanner.Err()
	})
}

func (p *Pipe) First(n int) *Pipe {
	if p.Error() != nil {
		return p
	}
	if n <= 0 {
		return NewPipe()
	}
	i := 0
	return p.FilterScan(func(line string, w io.Writer) {
		if i >= n {
			return
		}
		fmt.Fprintln(w, line)
		i++
	})
}

It filters out n number of lines, but keeps scanning until EOF even after n lines have been received. This means that when used to scan for user input we're not sending in EOF, the function doesn't break out of the scanning loop.

My modified version solves this by breaking out of the scanner once the n number of lines is exceeded:

func (p *Pipe) First(n int) *Pipe {
	if p.Error() != nil {
		return p
	}
	if n <= 0 {
		return NewPipe()
	}
	return p.Filter(func(r io.Reader, w io.Writer) error {
		scanner := newScanner(r)
		i := 0
		for scanner.Scan() {
			_, err := fmt.Fprintln(w, scanner.Text())
			if err != nil {
				return err
			}
			i++
			if i >= n {
				break
			}
		}
		return scanner.Err()
	})
}

bitfield · 2024-05-03T10:18:48Z

It filters out n number of lines, but keeps scanning until EOF even after n lines have been received.

I'm not sure this is true, if I'm understanding you correctly. For example, consider this program:

script.Stdin().First(1).Stdout()

If we run this, and type "hello" followed by a newline, we immediately see the output "hello". First is not waiting for EOF in order to produce its result. All further input lines are ignored, as you'd expect, and the pipe itself doesn't terminate until the input ends, also as you'd expect.

Can you help me out by showing an example program that doesn't work as you'd expect with the current implementation?

bartdeboer · 2024-05-03T11:04:54Z

Here's some examples of what I'm seeing:

With .String()

fmt.Print("User input: ")
input, _ := script.Stdin().First(1).String()
fmt.Printf("Input was %s\n", input)

With PR:

User input: 1st input\n
Input was 1st input

Without PR:

User input: 1st input\n
2nd input\n
3rd input\n
Ctrl+D
Input was 1st input

With .Stdout()

fmt.Print("User input: ")
bytes, _ := script.Stdin().First(1).Stdout()
fmt.Printf("Input was %d bytes\n", bytes)

With PR:

User input: 1st input\n
1st input
Input was 10 bytes

Without PR:

User input: 1st input\n
1st input
2nd input\n
3rd input\n
Ctrl+D
Input was 10 bytes

I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming?

I think maybe it should terminate. It isn't going to send anything more. What happens if I want to read the first 10 lines of a 1GB log file?

Considering it should work the same like head -1 I think it also terminates after n lines?

bitfield · 2024-05-05T10:38:21Z

I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming?

This is a very good point. As a matter of fact, no script filter currently closes the pipe by itself. I agree it seems logical that First should do that, though, since reading any further lines is guaranteed to be wasted work for all downstream stages. And head does indeed behave this way.

What do you think about this slightly more compact way of writing the same logic?

	return p.Filter(func(r io.Reader, w io.Writer) error {
		scanner := newScanner(r)
		for i := 0; i < n && scanner.Scan(); i++ {
			_, err := fmt.Fprintln(w, scanner.Text())
			if err != nil {
				return err
			}
		}
		return scanner.Err()
	})

bartdeboer · 2024-05-05T10:49:18Z

I think your version writes it more elegantly, yes.

Would you like to commit that change to the PR or shall I?

Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR?

bitfield · 2024-05-05T12:17:55Z

Yes, go ahead and make the change by all means. Also, shall we document this behaviour in the doc comment? Something like:

// First produces only the first n lines of the pipe's contents, or all the
// lines if there are less than n. If n is zero or negative, there is no output
// at all. When n lines have been produced, First stops reading its input and
// sends EOF to its output.

Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR?

Yes, good idea—let's make these changes atomic.

…ed by John

bitfield · 2024-05-05T13:11:37Z

Great! Also, we should probably have a test that shows this behaviour, shouldn't we? What about something like a strings.Reader that contains several lines, and then after reading it through a pipe with First(1) we can prove that there are still lines left in the reader.

bartdeboer · 2024-05-05T18:11:52Z

Something like this?

It checks if there's remaining bytes in the reader after running First (accounting for Scanner's 4096 buffer size).

func TestFirstHasRemainingBytesAfterNLinesOfInput(t *testing.T) {
	t.Parallel()
	lineSize, testNumLines, totalLines := 1024, 3, 9
	var input strings.Builder
	// Generate input that is bigger than Scanner's 4096 byte buffer size.
	for i := 0; i < totalLines; i++ {
		line := strings.Repeat(fmt.Sprintf("%d", i), lineSize-1) + "\n"
		input.WriteString(line)
	}
	inputReader := strings.NewReader(input.String())
	got, _ := script.NewPipe().WithReader(inputReader).First(testNumLines).String()
	if len(got) == 0 {
		t.Errorf("Expected bytes from First, got 0")
	}
	remInputBytes, err := io.ReadAll(inputReader)
	if err != nil {
		t.Fatal("Error reading remaining input bytes:", err)
	}
	if len(remInputBytes) == 0 {
		t.Errorf("Expected remaining bytes in the inputReader, got 0")
	}
}

bitfield · 2024-05-05T19:12:11Z

Great! I forgot about the buffer. What about this slightly shorter version of the same idea:

func TestFirstDoesNotConsumeUnnecessaryData(t *testing.T) {
	t.Parallel()
	// First uses a 4096-byte buffer, so will always read at least
	// that much, but no more (once N lines have been read).
	r := strings.NewReader(strings.Repeat("line\n", 1000))
	got, err := script.NewPipe().WithReader(r).First(1).String()
	if err != nil {
		t.Fatal(err)
	}
	want := "line\n"
	if want != got {
		t.Errorf("want output %q, got %q", want, got)
	}
	if r.Len() == 0 {
		t.Errorf("no data left in reader")
	}
}

update First(n int) so that it can be used for user input

fc29cd8

bartdeboer mentioned this pull request May 3, 2024

Expand prompt logic for PR #199 #201

Open

bartdeboer added 2 commits May 5, 2024 14:27

make the update to First() more elegant and update comment as suggest…

819d15c

…ed by John

remove the Prompt function

9d02069

add test TestFirstDoesNotConsumeUnnecessaryData

53ab797

bitfield changed the title ~~Update First(n int) so that it can be used for user input~~ Update First(n int) to read no more data than necessary May 6, 2024

bitfield merged commit 507316b into bitfield:master May 6, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update First(n int) to read no more data than necessary #199

Update First(n int) to read no more data than necessary #199

bartdeboer commented May 2, 2024 •

edited

Loading

bitfield commented May 3, 2024

bartdeboer commented May 3, 2024

bitfield commented May 3, 2024

bartdeboer commented May 3, 2024

bitfield commented May 5, 2024

bartdeboer commented May 5, 2024

bitfield commented May 5, 2024

bitfield commented May 5, 2024

bartdeboer commented May 5, 2024

bitfield commented May 5, 2024

Update First(n int) to read no more data than necessary #199

Update First(n int) to read no more data than necessary #199

Conversation

bartdeboer commented May 2, 2024 • edited Loading

bitfield commented May 3, 2024

bartdeboer commented May 3, 2024

bitfield commented May 3, 2024

bartdeboer commented May 3, 2024

bitfield commented May 5, 2024

bartdeboer commented May 5, 2024

bitfield commented May 5, 2024

bitfield commented May 5, 2024

bartdeboer commented May 5, 2024

bitfield commented May 5, 2024

bartdeboer commented May 2, 2024 •

edited

Loading