-
Notifications
You must be signed in to change notification settings - Fork 10
Put excess characters from regexp matching back into the buffer #2
Put excess characters from regexp matching back into the buffer #2
Conversation
func TestRegexFindNoExcessBytes(t *testing.T) { | ||
t.Logf("Testing Regular Expressions returning output with no excess strings") | ||
repeats := 100 | ||
seqCmd := fmt.Sprintf("`seq 1 %d`", repeats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also do a test with seq -s ' ' 1 100
? I'd like to make sure that two matches on the same line work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh, I guess I meant echo -n
in the next line, rather than seq -s ' '
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like matching the a \d+ b
regexp in "a 1 b a 2 b a 3 c" string? Sure, I can test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be good.
Thanks for fixing this :) |
LGTM |
c183861
to
83e29f6
Compare
Fixed the code putting bytes back into the stream (it should put the bytes back after the whole match, not after the last match). Updated the tests. And hopefully wrote a better commit message now. |
The expectRegexOutput uses the regexp.FindReaderSubmatchIndex function to do the matching. As its docs say, it may read more characters than it is necessary from the stream. This may be problematic if the regex we use can match chunks of the stream that come right after another. This is because the function might match the first chunk and read into the middle of the second chunk, so the subsequent call to the function will match the third chunk, not the second. For example: data: "one two three" regexp: "\w{3,}" The first call to the function may return "one" as a match and "one tw" as an output. We can see that the output has the beginning of the second word which comes after the matched first word. This part is not available anymore in the stream, thus the following call to the function will return "three" as a match (instead of "two") and "o three" as an output. To fix it, we put the excess characters coming after the whole match back into the stream, so the subsequent calls to the function may start their matching from the place when last whole match ended.
83e29f6
to
3bc350a
Compare
t.Fatalf("Failed to get the match number %d: %v", i, err) | ||
} | ||
if len(matches) != 2 { | ||
t.Fatalf("Expected only 2 matches, got %d", len(matches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// matches contain an index pair for each match
t.Fatalf("Expected only 1 match, got %d", len(matches)/2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no, the ExpectRegexFindWithOutput
returns matches
as a slice of strings. The underlying function returns a slice of ints being index pairs, but that's an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we expect the matches
slice to have the whole match (like "prefix: 1 line") and a submatch for the number only ("1").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmhokay.
LGTM then.
small nit, otherwise LGTM |
@steveej who has merge-powers on this repository? |
I do |
Put excess characters from regexp matching back into the buffer
Put excess unmatched characters back into the buffer
The regexp.FindReaderSubmatchIndex() function may read more characters than it is necessary from the stream, thus possibly making subsequent regex expects to fail, like in following scenario:
String in stream is
A regexp we expect is
(?m)^prefix: (.+)\s??$
, the final\s??
is for matching the\r
character in the\r\n
newlineThe first call to the
expectRegexFind
function may result inmatches = ["prefix: foo", "foo"]
,output = "prefix: foo\npr"
anderr = nil
. The output has the beginning of the second line in it.The second call will then result in
matches = []
,output = "efix: bar\n"
anderr = "ExpectRegex didn't ..."