Put excess characters from regexp matching back into the buffer #2

krnowak · 2016-01-13T14:01:18Z

Put excess unmatched characters back into the buffer

The regexp.FindReaderSubmatchIndex() function may read more characters than it is necessary from the stream, thus possibly making subsequent regex expects to fail, like in following scenario:

String in stream is

prefix: foo
prefix: bar

A regexp we expect is (?m)^prefix: (.+)\s??$, the final \s?? is for matching the \r character in the \r\n newline

The first call to the expectRegexFind function may result in matches = ["prefix: foo", "foo"], output = "prefix: foo\npr" and err = nil. The output has the beginning of the second line in it.

The second call will then result in matches = [], output = "efix: bar\n" and err = "ExpectRegex didn't ..."

alban · 2016-01-13T14:19:25Z

gexpect_test.go

+func TestRegexFindNoExcessBytes(t *testing.T) {
+	t.Logf("Testing Regular Expressions returning output with no excess strings")
+	repeats := 100
+	seqCmd := fmt.Sprintf("`seq 1 %d`", repeats)


Can you also do a test with seq -s ' ' 1 100? I'd like to make sure that two matches on the same line work fine.

mmh, I guess I meant echo -n in the next line, rather than seq -s ' '.

Do you mean something like matching the a \d+ b regexp in "a 1 b a 2 b a 3 c" string? Sure, I can test it.

Yes, that would be good.

alban · 2016-01-13T14:26:02Z

Thanks for fixing this :)

alban · 2016-01-13T14:38:10Z

LGTM

krnowak · 2016-01-14T09:39:45Z

Fixed the code putting bytes back into the stream (it should put the bytes back after the whole match, not after the last match). Updated the tests. And hopefully wrote a better commit message now.

The expectRegexOutput uses the regexp.FindReaderSubmatchIndex function to do the matching. As its docs say, it may read more characters than it is necessary from the stream. This may be problematic if the regex we use can match chunks of the stream that come right after another. This is because the function might match the first chunk and read into the middle of the second chunk, so the subsequent call to the function will match the third chunk, not the second. For example: data: "one two three" regexp: "\w{3,}" The first call to the function may return "one" as a match and "one tw" as an output. We can see that the output has the beginning of the second word which comes after the matched first word. This part is not available anymore in the stream, thus the following call to the function will return "three" as a match (instead of "two") and "o three" as an output. To fix it, we put the excess characters coming after the whole match back into the stream, so the subsequent calls to the function may start their matching from the place when last whole match ended.

alban · 2016-01-14T10:02:02Z

gexpect_test.go

+				t.Fatalf("Failed to get the match number %d: %v", i, err)
+			}
+			if len(matches) != 2 {
+				t.Fatalf("Expected only 2 matches, got %d", len(matches))


// matches contain an index pair for each match t.Fatalf("Expected only 1 match, got %d", len(matches)/2)

No no, the ExpectRegexFindWithOutput returns matches as a slice of strings. The underlying function returns a slice of ints being index pairs, but that's an implementation detail.

In this case we expect the matches slice to have the whole match (like "prefix: 1 line") and a submatch for the number only ("1").

mmhokay.

LGTM then.

alban · 2016-01-14T10:03:26Z

small nit, otherwise LGTM

alban · 2016-01-14T16:36:51Z

@steveej who has merge-powers on this repository?

jonboulle · 2016-01-14T17:13:09Z

I do

Put excess characters from regexp matching back into the buffer

alban reviewed Jan 13, 2016
View reviewed changes

krnowak force-pushed the krnowak/regexp-matching-fixes branch from c183861 to 83e29f6 Compare January 14, 2016 09:36

krnowak added 2 commits January 14, 2016 10:55

Add a test for putting the excess characters back into the buffer

3bc350a

krnowak force-pushed the krnowak/regexp-matching-fixes branch from 83e29f6 to 3bc350a Compare January 14, 2016 09:55

alban reviewed Jan 14, 2016
View reviewed changes

krnowak mentioned this pull request Jan 14, 2016

stage1 handling and configuration rkt/rkt#1977

Closed

jonboulle added a commit that referenced this pull request Jan 14, 2016

Merge pull request #2 from kinvolk/krnowak/regexp-matching-fixes

aefa734

Put excess characters from regexp matching back into the buffer

jonboulle merged commit aefa734 into coreos:master Jan 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Put excess characters from regexp matching back into the buffer #2

Put excess characters from regexp matching back into the buffer #2

krnowak commented Jan 13, 2016

alban Jan 13, 2016

alban Jan 13, 2016

krnowak Jan 13, 2016

alban Jan 13, 2016

alban commented Jan 13, 2016

alban commented Jan 13, 2016

krnowak commented Jan 14, 2016

alban Jan 14, 2016

krnowak Jan 14, 2016

krnowak Jan 14, 2016

alban Jan 14, 2016

alban commented Jan 14, 2016

alban commented Jan 14, 2016

jonboulle commented Jan 14, 2016

Put excess characters from regexp matching back into the buffer #2

Put excess characters from regexp matching back into the buffer #2

Conversation

krnowak commented Jan 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban commented Jan 13, 2016

alban commented Jan 13, 2016

krnowak commented Jan 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban commented Jan 14, 2016

alban commented Jan 14, 2016

jonboulle commented Jan 14, 2016