nimgrep improvements #12779

a-mr · 2019-11-30T16:43:36Z

Those are a few changes to nimgrep:

add context printing (lines after and before a match)
add exclude/include options for files and exclude for directories
add a --newLine style option for starting matching/context
lines from a new line
add color themes: 3 new themes (ack, gnu, bnw) besides default simple
enable printing of multi-line matches with line numbers
proper display of replace when there was another match replaced at
the same line / context block
improve cmdline arguments error reporting
improve error printing
improve symlink handling
context printing: handle Posix file endings with native appearance
rename dangerous -r argument. Having -r as --replace is dangerous: it can be easily confused with --recursive, and indeed a few popular Unix tools like GNU grep, ack, freebsd grep, etc have -r as recursive which makes it especially unfortunate.

Here is an example of invocation:

nimgrep -y --recursive --color --excludeDir:"\.git$" --excludeDir:"nimcache$" \
  --noExt --newLine --colortheme:ack --afterContext:1 --beforeContext:3 \
  linesBefore .|less

Also a minor bug with terminal.writeStyled/styledWrite for Posix was found, see fix and the (visual) test in the 1st commit.

One minor thing: I also observed that there is no way to detect for Posix style file ending at the end of a file in Nim, that is when '\l' at the end of a file is considered as end of previous line and does not generate next line according to Posix standard. 2nd commit here is to be able to handle this correctly in splitLines/countLines (it does not change the default behavior). The test is updated.

zedeus · 2019-11-30T18:14:35Z

This has a lot of great changes but I think you'd have more luck with smaller PRs

Araq · 2019-11-30T23:47:02Z

Well nimgrep improvements don't imply stdlib changes and I don't understand "Posix newlines", a file in Unix land is simply a sequence of bytes, they don't have to end in a newline.

zedeus · 2019-12-01T00:03:41Z

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

3.206 Line
A sequence of zero or more non- characters plus a terminating <newline> character.

Araq · 2019-12-01T00:11:49Z

Well but nimgrep also works on binary files too which have no "Posix lines". It does work on binary files too because only binary files exist. There are no "text" files. Too bad huh?

a-mr · 2019-12-01T13:47:45Z

@Araq, I agree that for this particular PR "newlines" is more of an aesthetic feature: I don't like when Nim splitLines/countLines add 1 additional empty line at the end of each file. Which is not actually there. It's only become apparent with introduction of context printing that prints some number of lines after the match.

But I believe it should be addressed anyway because of potential misunderstanding, see "bug" #12335 for example.

The explanation of feature:

There are 2 popular line definitions: Windows one, which considers its separator (CR LF) as "line break", and Posix one, which considers it as "line ending". So for windows world "A\r\lB\r\l" is 3 lines "A","B","". While for posix world the corresponding "A\lB\l" are just 2 lines "A", "B" without any additional empty line. BTW this is why many Linux-origin tools like git, gcc, etc by default report warnings when there is no newline at the end of file, they consider it as a "corrupted" text file in some sense.

That is addressed by additional parameter newlineEnds parameter that changes default behavior from "line break" to "line ending", in conjunction with new function endsLikePosix which detects that file has only '\l' at its end. So when using splitLines/countLines with newlineEnds=file.endsLikePosix they will work correctly both for text files generated by Windows and Linux programs: a "CR LF" at the end of a file will generate 1 additional empty line while "LF" will not.

Default behavior of splitLines/countLines has not been changed, it sticks to Windows "line break" definition.

a-mr · 2019-12-01T14:07:34Z

There are no "text" files

There are. At least there is the definition in the link provided by zedeus above.

Though AFAIK this requirement of having newline at the end of file is only essential for Unix command line workflow, e.g. when concatenating 2 text files by cat which does byte-by-byte copying. So "corrupted" files like "A\nB" and "C\nD" would concatenate into the wrong "A\nBC\nD".

Araq · 2019-12-01T14:54:54Z

There are. At least there is the definition in the link provided by zedeus above.

But that's just a document somebody wrote, claiming to be a "standard". In reality there are no text files, there are no known ways to open a file to see whether it's "binary" or "text", I know the heuristic checking for \0 inside the file, but it's only that, a heuristic.

a-mr · 2019-12-01T15:26:32Z

No, that was the text of actual Posix standard, it's called IEEE Std 1003.1, version of 2017.

Otherwise you are mostly correct. Indeed tools like GNU grep detect binary files by \0, it originally goes not only from C programming language, but also from the clause of Posix standard "3.403 Text File":

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2017 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

In reality nowadays most of new Linux programs (including git) can work with non-conforming files, so probably it's not particularly relevant.

Decide it yourself whether Nim should support that newline complication or not.

Varriount · 2019-12-02T08:21:04Z

Just want to chime in that I agree strongly with the change to the recursive flag...

a-mr · 2019-12-02T20:52:57Z

Also the "line break" vs "line ending" dichotomy is not just about any standard, it's about the formal definition of text files grammar according to which splitLines does its parsing:

<line> = { character-not-separator }
<textFileB> = <line> [ separator <textFileB> ]
<textFileE> = { <line> separator }

Result for splitLines:

string	Break	Ending
""	[""]	[]
"\n"	["", ""]	[""]
"A"	["A"]	illegal ["A"]
"A\n"	["A", ""]	"A"
"A\nB"	["A", "B"]	illegal ["A", "B"]
"A\nB\n"	["A", "B", ""]	["A", "B"]

Araq · 2019-12-03T08:29:26Z

Decide it yourself whether Nim should support that newline complication or not.

Well that's my point, I don't like the stdlib additions. But I do like your nimgrep improvements!

* add context printing (lines after and before a match) * nimgrep: add exclude/include options * nimgrep: improve error printing & symlink handling * nimgrep: rename dangerous `-r` argument * add a `--newLine` style option for starting matching/context lines from a new line * add color themes: 3 new themes besides default `simple` * enable printing of multi-line matches with line numbers * proper display of replace when there was another match replaced at the same line / context block * improve cmdline arguments error reporting

a-mr · 2019-12-04T21:20:51Z

OK, as you wish

genotrance · 2020-10-04T04:10:57Z

Can this be backported into 1.0.x?

cc @narimiran

Edit: will need to backport #13958 as well if this is accepted.

Araq · 2020-10-05T07:09:03Z

@genotrance Why do you need for 1.0.x? It changes the meaning of the -r parameter, it would be a backport of a breaking change.

genotrance · 2020-10-05T11:29:55Z

Mainly because I need --follow for nimterop but I understand if breaking changes cannot go into 1.0.x.

introduced in nimgrep improvements nim-lang#12779

* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements #12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options #15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations

* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations

fix sticky colors in styledWrite

ef2997d

a-mr changed the title ~~Feat/nimgrep improvements~~ nimgrep improvements Nov 30, 2019

a-mr force-pushed the feat/nimgrep-improvements branch from 2a34de1 to ec69425 Compare December 4, 2019 21:14

Araq merged commit 26074f5 into nim-lang:devel Dec 5, 2019

genotrance mentioned this pull request Apr 12, 2020

Improve #12920 fix #13958

Merged

a-mr added a commit to a-mr/Nim that referenced this pull request Oct 17, 2020

fix performance regression

9676021

introduced in nimgrep improvements nim-lang#12779

a-mr mentioned this pull request Oct 17, 2020

Nimgrep improvements 2 #15612

Merged

a-mr added a commit to a-mr/Nim that referenced this pull request Oct 18, 2020

fix performance regression

5f433eb

introduced in nimgrep improvements nim-lang#12779

a-mr added a commit to a-mr/Nim that referenced this pull request Nov 8, 2020

fix performance regression

d7c090f

introduced in nimgrep improvements nim-lang#12779

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nimgrep improvements #12779

nimgrep improvements #12779

a-mr commented Nov 30, 2019 •

edited

Loading

zedeus commented Nov 30, 2019

Araq commented Nov 30, 2019

zedeus commented Dec 1, 2019

Araq commented Dec 1, 2019

a-mr commented Dec 1, 2019

a-mr commented Dec 1, 2019

Araq commented Dec 1, 2019

a-mr commented Dec 1, 2019

Varriount commented Dec 2, 2019

a-mr commented Dec 2, 2019

Araq commented Dec 3, 2019

a-mr commented Dec 4, 2019

genotrance commented Oct 4, 2020 •

edited

Loading

Araq commented Oct 5, 2020

genotrance commented Oct 5, 2020

nimgrep improvements #12779

nimgrep improvements #12779

Conversation

a-mr commented Nov 30, 2019 • edited Loading

zedeus commented Nov 30, 2019

Araq commented Nov 30, 2019

zedeus commented Dec 1, 2019

Araq commented Dec 1, 2019

a-mr commented Dec 1, 2019

a-mr commented Dec 1, 2019

Araq commented Dec 1, 2019

a-mr commented Dec 1, 2019

Varriount commented Dec 2, 2019

a-mr commented Dec 2, 2019

Araq commented Dec 3, 2019

a-mr commented Dec 4, 2019

genotrance commented Oct 4, 2020 • edited Loading

Araq commented Oct 5, 2020

genotrance commented Oct 5, 2020

a-mr commented Nov 30, 2019 •

edited

Loading

genotrance commented Oct 4, 2020 •

edited

Loading