-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nimgrep improvements #12779
nimgrep improvements #12779
Conversation
This has a lot of great changes but I think you'd have more luck with smaller PRs |
Well nimgrep improvements don't imply stdlib changes and I don't understand "Posix newlines", a file in Unix land is simply a sequence of bytes, they don't have to end in a newline. |
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
|
Well but nimgrep also works on binary files too which have no "Posix lines". It does work on binary files too because only binary files exist. There are no "text" files. Too bad huh? |
@Araq, I agree that for this particular PR "newlines" is more of an aesthetic feature: I don't like when Nim splitLines/countLines add 1 additional empty line at the end of each file. Which is not actually there. It's only become apparent with introduction of context printing that prints some number of lines after the match. But I believe it should be addressed anyway because of potential misunderstanding, see "bug" #12335 for example. The explanation of feature: There are 2 popular line definitions: Windows one, which considers its separator (CR LF) as "line break", and Posix one, which considers it as "line ending". So for windows world "A\r\lB\r\l" is 3 lines "A","B","". While for posix world the corresponding "A\lB\l" are just 2 lines "A", "B" without any additional empty line. BTW this is why many Linux-origin tools like git, gcc, etc by default report warnings when there is no newline at the end of file, they consider it as a "corrupted" text file in some sense. That is addressed by additional parameter Default behavior of splitLines/countLines has not been changed, it sticks to Windows "line break" definition. |
There are. At least there is the definition in the link provided by zedeus above. Though AFAIK this requirement of having newline at the end of file is only essential for Unix command line workflow, e.g. when concatenating 2 text files by |
But that's just a document somebody wrote, claiming to be a "standard". In reality there are no text files, there are no known ways to open a file to see whether it's "binary" or "text", I know the heuristic checking for |
No, that was the text of actual Posix standard, it's called IEEE Std 1003.1, version of 2017. Otherwise you are mostly correct. Indeed tools like GNU grep detect binary files by
In reality nowadays most of new Linux programs (including git) can work with non-conforming files, so probably it's not particularly relevant. Decide it yourself whether Nim should support that newline complication or not. |
Just want to chime in that I agree strongly with the change to the recursive flag... |
Also the "line break" vs "line ending" dichotomy is not just about any standard, it's about the formal definition of text files grammar according to which splitLines does its parsing:
Result for splitLines:
|
Well that's my point, I don't like the stdlib additions. But I do like your nimgrep improvements! |
* add context printing (lines after and before a match) * nimgrep: add exclude/include options * nimgrep: improve error printing & symlink handling * nimgrep: rename dangerous `-r` argument * add a `--newLine` style option for starting matching/context lines from a new line * add color themes: 3 new themes besides default `simple` * enable printing of multi-line matches with line numbers * proper display of replace when there was another match replaced at the same line / context block * improve cmdline arguments error reporting
2a34de1
to
ec69425
Compare
OK, as you wish |
Can this be backported into 1.0.x? cc @narimiran Edit: will need to backport #13958 as well if this is accepted. |
@genotrance Why do you need for 1.0.x? It changes the meaning of the |
Mainly because I need --follow for nimterop but I understand if breaking changes cannot go into 1.0.x. |
introduced in nimgrep improvements nim-lang#12779
introduced in nimgrep improvements nim-lang#12779
introduced in nimgrep improvements nim-lang#12779
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements #12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options #15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
Those are a few changes to nimgrep:
--newLine
style option for starting matching/contextlines from a new line
simple
the same line / context block
-r
argument. Having -r as --replace is dangerous: it can be easily confused with --recursive, and indeed a few popular Unix tools like GNU grep, ack, freebsd grep, etc have -r as recursive which makes it especially unfortunate.Here is an example of invocation:
Also a minor bug with terminal.writeStyled/styledWrite for Posix was found, see fix and the (visual) test in the 1st commit.
One minor thing: I also observed that there is no way to detect for Posix style file ending at the end of a file in Nim, that is when '\l' at the end of a file is considered as end of previous line and does not generate next line according to Posix standard. 2nd commit here is to be able to handle this correctly in splitLines/countLines (it does not change the default behavior). The test is updated.