Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update First(n int) to read no more data than necessary #199

Merged
merged 4 commits into from
May 6, 2024

Conversation

bartdeboer
Copy link
Contributor

@bartdeboer bartdeboer commented May 2, 2024

This slightly modifies First and introduces Prompt that can be used for user input:

fmt.Print("User input: ")
input, _ := script.Stdin().First(1).String()
fmt.Printf("Input was %s\n", input)

or

input, _ := script.Prompt("User input: ").String()
fmt.Printf("Input was %s\n", input)

Fixes #51, #132

@bitfield
Copy link
Owner

bitfield commented May 3, 2024

Thanks @bartdeboer! A little more context would be helpful for me on this one. Prompt makes sense, but what effect does the change to First have that makes it necessary here?

@bartdeboer
Copy link
Contributor Author

Hi John,

I'm happy to provide more context. The First(n int) function is designed to read the first n lines from a stream. However, the old implementation had a limitation:

func (p *Pipe) FilterScan(filter func(string, io.Writer)) *Pipe {
	return p.Filter(func(r io.Reader, w io.Writer) error {
		scanner := newScanner(r)
		for scanner.Scan() {
			filter(scanner.Text(), w)
		}
		return scanner.Err()
	})
}

func (p *Pipe) First(n int) *Pipe {
	if p.Error() != nil {
		return p
	}
	if n <= 0 {
		return NewPipe()
	}
	i := 0
	return p.FilterScan(func(line string, w io.Writer) {
		if i >= n {
			return
		}
		fmt.Fprintln(w, line)
		i++
	})
}

It filters out n number of lines, but keeps scanning until EOF even after n lines have been received. This means that when used to scan for user input we're not sending in EOF, the function doesn't break out of the scanning loop.

My modified version solves this by breaking out of the scanner once the n number of lines is exceeded:

func (p *Pipe) First(n int) *Pipe {
	if p.Error() != nil {
		return p
	}
	if n <= 0 {
		return NewPipe()
	}
	return p.Filter(func(r io.Reader, w io.Writer) error {
		scanner := newScanner(r)
		i := 0
		for scanner.Scan() {
			_, err := fmt.Fprintln(w, scanner.Text())
			if err != nil {
				return err
			}
			i++
			if i >= n {
				break
			}
		}
		return scanner.Err()
	})
}

@bitfield
Copy link
Owner

bitfield commented May 3, 2024

It filters out n number of lines, but keeps scanning until EOF even after n lines have been received.

I'm not sure this is true, if I'm understanding you correctly. For example, consider this program:

script.Stdin().First(1).Stdout()

If we run this, and type "hello" followed by a newline, we immediately see the output "hello". First is not waiting for EOF in order to produce its result. All further input lines are ignored, as you'd expect, and the pipe itself doesn't terminate until the input ends, also as you'd expect.

Can you help me out by showing an example program that doesn't work as you'd expect with the current implementation?

@bartdeboer
Copy link
Contributor Author

Here's some examples of what I'm seeing:

With .String()

fmt.Print("User input: ")
input, _ := script.Stdin().First(1).String()
fmt.Printf("Input was %s\n", input)

With PR:

User input: 1st input\n
Input was 1st input

Without PR:

User input: 1st input\n
2nd input\n
3rd input\n
Ctrl+D
Input was 1st input

With .Stdout()

fmt.Print("User input: ")
bytes, _ := script.Stdin().First(1).Stdout()
fmt.Printf("Input was %d bytes\n", bytes)

With PR:

User input: 1st input\n
1st input
Input was 10 bytes

Without PR:

User input: 1st input\n
1st input
2nd input\n
3rd input\n
Ctrl+D
Input was 10 bytes

I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming?

I think maybe it should terminate. It isn't going to send anything more. What happens if I want to read the first 10 lines of a 1GB log file?

Considering it should work the same like head -1 I think it also terminates after n lines?

@bitfield
Copy link
Owner

bitfield commented May 5, 2024

I guess herein lies the difference in behavior. If the pipe should terminate after n lines have been read or keep streaming?

This is a very good point. As a matter of fact, no script filter currently closes the pipe by itself. I agree it seems logical that First should do that, though, since reading any further lines is guaranteed to be wasted work for all downstream stages. And head does indeed behave this way.

What do you think about this slightly more compact way of writing the same logic?

	return p.Filter(func(r io.Reader, w io.Writer) error {
		scanner := newScanner(r)
		for i := 0; i < n && scanner.Scan(); i++ {
			_, err := fmt.Fprintln(w, scanner.Text())
			if err != nil {
				return err
			}
		}
		return scanner.Err()
	})

@bartdeboer
Copy link
Contributor Author

I think your version writes it more elegantly, yes.

Would you like to commit that change to the PR or shall I?

Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR?

@bitfield
Copy link
Owner

bitfield commented May 5, 2024

Yes, go ahead and make the change by all means. Also, shall we document this behaviour in the doc comment? Something like:

// First produces only the first n lines of the pipe's contents, or all the
// lines if there are less than n. If n is zero or negative, there is no output
// at all. When n lines have been produced, First stops reading its input and
// sends EOF to its output.

Next I think I/we should perhaps drop the Prompt() function from this PR and perhaps do that in a separate PR?

Yes, good idea—let's make these changes atomic.

@bitfield
Copy link
Owner

bitfield commented May 5, 2024

Great! Also, we should probably have a test that shows this behaviour, shouldn't we? What about something like a strings.Reader that contains several lines, and then after reading it through a pipe with First(1) we can prove that there are still lines left in the reader.

@bartdeboer
Copy link
Contributor Author

Something like this?

It checks if there's remaining bytes in the reader after running First (accounting for Scanner's 4096 buffer size).

func TestFirstHasRemainingBytesAfterNLinesOfInput(t *testing.T) {
	t.Parallel()
	lineSize, testNumLines, totalLines := 1024, 3, 9
	var input strings.Builder
	// Generate input that is bigger than Scanner's 4096 byte buffer size.
	for i := 0; i < totalLines; i++ {
		line := strings.Repeat(fmt.Sprintf("%d", i), lineSize-1) + "\n"
		input.WriteString(line)
	}
	inputReader := strings.NewReader(input.String())
	got, _ := script.NewPipe().WithReader(inputReader).First(testNumLines).String()
	if len(got) == 0 {
		t.Errorf("Expected bytes from First, got 0")
	}
	remInputBytes, err := io.ReadAll(inputReader)
	if err != nil {
		t.Fatal("Error reading remaining input bytes:", err)
	}
	if len(remInputBytes) == 0 {
		t.Errorf("Expected remaining bytes in the inputReader, got 0")
	}
}

@bitfield
Copy link
Owner

bitfield commented May 5, 2024

Great! I forgot about the buffer. What about this slightly shorter version of the same idea:

func TestFirstDoesNotConsumeUnnecessaryData(t *testing.T) {
	t.Parallel()
	// First uses a 4096-byte buffer, so will always read at least
	// that much, but no more (once N lines have been read).
	r := strings.NewReader(strings.Repeat("line\n", 1000))
	got, err := script.NewPipe().WithReader(r).First(1).String()
	if err != nil {
		t.Fatal(err)
	}
	want := "line\n"
	if want != got {
		t.Errorf("want output %q, got %q", want, got)
	}
	if r.Len() == 0 {
		t.Errorf("no data left in reader")
	}
}

@bitfield bitfield changed the title Update First(n int) so that it can be used for user input Update First(n int) to read no more data than necessary May 6, 2024
@bitfield bitfield merged commit 507316b into bitfield:master May 6, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get user input
2 participants