Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose optional match offsets in grep-searcher::SinkMatch #1634

Open
BurntSushi opened this issue Jul 3, 2020 · 1 comment
Open

expose optional match offsets in grep-searcher::SinkMatch #1634

BurntSushi opened this issue Jul 3, 2020 · 1 comment
Labels
enhancement An enhancement to the functionality of the software. libripgrep An issue related to modularizing ripgrep into libraries. question An issue that is lacking clarity on one or more points.

Comments

@BurntSushi
Copy link
Owner

Originally asked as a discussion question here: #1633

I'm using the grep crate (AKA: libripgrep) to try and match exact byte portions of a slice. I've found that when my Matcher returns a match with a given range, this information is lost when it's sent over to the Sink as a SinkMatch struct.

I've created a small repository to reproduce the issue. Here's the entire code: https://github.com/acheronfail/grep-sink-example/blob/master/src/main.rs. See the README.md for the output of the program.

What did I expect?

That the Match object which returned a range of 2..8 in the haystack would translate to a SinkMatch with a bytes portion that maps roughly to HAYSTACK[2..8].

What happens?

The SinkMatch struct's bytes field includes the entire line matched, and not the matched portion.

Am I using this incorrectly? Should the SinkMatch be giving me the matched bytes, or should I be doing this via the captures trait methods instead?

@BurntSushi BurntSushi added enhancement An enhancement to the functionality of the software. question An issue that is lacking clarity on one or more points. libripgrep An issue related to modularizing ripgrep into libraries. labels Jul 3, 2020
@BurntSushi
Copy link
Owner Author

Re-posting my response from: #1633 (comment)


You aren't using it incorrectly and this is indeed expected behavior, although it is perhaps a design flaw. In particular, the grep-searcher and grep-matcher architecture generally assume two things:

  1. Detecting whether a match exists in a particular line is cheaper (possibly substantially so) than finding the exact boundary of that match.
  2. A match is generally infrequent compared to the size of the haystack.

This is why hamfisted APIs like Matcher::find_candidate_line exist. Indeed, in the most basic grep output format (no color), you never even need the match boundaries in the first place. All you need is the line that matched.

If, however, you're working on a problem in which you always find the precise match offsets up front, then the current architecture drops them on the floor, which would require you to re-run your search for each matching line. If matching lines are rare in your particular situation, then this shouldn't be a problem. If matching lines are frequent, then this could be a significant performance problem.

I do think the API is flexible enough where we could potentially fix this by enriching the SinkMatch structure, but it would require some thinking and some design work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the functionality of the software. libripgrep An issue related to modularizing ripgrep into libraries. question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

1 participant