Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nimgrep improvements #12779

Merged
merged 2 commits into from
Dec 5, 2019
Merged

nimgrep improvements #12779

merged 2 commits into from
Dec 5, 2019

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Nov 30, 2019

Those are a few changes to nimgrep:

  • add context printing (lines after and before a match)
  • add exclude/include options for files and exclude for directories
  • add a --newLine style option for starting matching/context
    lines from a new line
  • add color themes: 3 new themes (ack, gnu, bnw) besides default simple
  • enable printing of multi-line matches with line numbers
  • proper display of replace when there was another match replaced at
    the same line / context block
  • improve cmdline arguments error reporting
  • improve error printing
  • improve symlink handling
  • context printing: handle Posix file endings with native appearance
  • rename dangerous -r argument. Having -r as --replace is dangerous: it can be easily confused with --recursive, and indeed a few popular Unix tools like GNU grep, ack, freebsd grep, etc have -r as recursive which makes it especially unfortunate.

Here is an example of invocation:

nimgrep -y --recursive --color --excludeDir:"\.git$" --excludeDir:"nimcache$" \
  --noExt --newLine --colortheme:ack --afterContext:1 --beforeContext:3 \
  linesBefore .|less

image

Also a minor bug with terminal.writeStyled/styledWrite for Posix was found, see fix and the (visual) test in the 1st commit.

One minor thing: I also observed that there is no way to detect for Posix style file ending at the end of a file in Nim, that is when '\l' at the end of a file is considered as end of previous line and does not generate next line according to Posix standard. 2nd commit here is to be able to handle this correctly in splitLines/countLines (it does not change the default behavior). The test is updated.

@a-mr a-mr changed the title Feat/nimgrep improvements nimgrep improvements Nov 30, 2019
@zedeus
Copy link
Contributor

zedeus commented Nov 30, 2019

This has a lot of great changes but I think you'd have more luck with smaller PRs

@Araq
Copy link
Member

Araq commented Nov 30, 2019

Well nimgrep improvements don't imply stdlib changes and I don't understand "Posix newlines", a file in Unix land is simply a sequence of bytes, they don't have to end in a newline.

@zedeus
Copy link
Contributor

zedeus commented Dec 1, 2019

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

3.206 Line
A sequence of zero or more non- characters plus a terminating <newline> character.

@Araq
Copy link
Member

Araq commented Dec 1, 2019

Well but nimgrep also works on binary files too which have no "Posix lines". It does work on binary files too because only binary files exist. There are no "text" files. Too bad huh?

@a-mr
Copy link
Contributor Author

a-mr commented Dec 1, 2019

@Araq, I agree that for this particular PR "newlines" is more of an aesthetic feature: I don't like when Nim splitLines/countLines add 1 additional empty line at the end of each file. Which is not actually there. It's only become apparent with introduction of context printing that prints some number of lines after the match.

But I believe it should be addressed anyway because of potential misunderstanding, see "bug" #12335 for example.

The explanation of feature:

There are 2 popular line definitions: Windows one, which considers its separator (CR LF) as "line break", and Posix one, which considers it as "line ending". So for windows world "A\r\lB\r\l" is 3 lines "A","B","". While for posix world the corresponding "A\lB\l" are just 2 lines "A", "B" without any additional empty line. BTW this is why many Linux-origin tools like git, gcc, etc by default report warnings when there is no newline at the end of file, they consider it as a "corrupted" text file in some sense.

That is addressed by additional parameter newlineEnds parameter that changes default behavior from "line break" to "line ending", in conjunction with new function endsLikePosix which detects that file has only '\l' at its end. So when using splitLines/countLines with newlineEnds=file.endsLikePosix they will work correctly both for text files generated by Windows and Linux programs: a "CR LF" at the end of a file will generate 1 additional empty line while "LF" will not.

Default behavior of splitLines/countLines has not been changed, it sticks to Windows "line break" definition.

@a-mr
Copy link
Contributor Author

a-mr commented Dec 1, 2019

There are no "text" files

There are. At least there is the definition in the link provided by zedeus above.

Though AFAIK this requirement of having newline at the end of file is only essential for Unix command line workflow, e.g. when concatenating 2 text files by cat which does byte-by-byte copying. So "corrupted" files like "A\nB" and "C\nD" would concatenate into the wrong "A\nBC\nD".

@Araq
Copy link
Member

Araq commented Dec 1, 2019

There are. At least there is the definition in the link provided by zedeus above.

But that's just a document somebody wrote, claiming to be a "standard". In reality there are no text files, there are no known ways to open a file to see whether it's "binary" or "text", I know the heuristic checking for \0 inside the file, but it's only that, a heuristic.

@a-mr
Copy link
Contributor Author

a-mr commented Dec 1, 2019

No, that was the text of actual Posix standard, it's called IEEE Std 1003.1, version of 2017.

Otherwise you are mostly correct. Indeed tools like GNU grep detect binary files by \0, it originally goes not only from C programming language, but also from the clause of Posix standard "3.403 Text File":

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2017 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

In reality nowadays most of new Linux programs (including git) can work with non-conforming files, so probably it's not particularly relevant.

Decide it yourself whether Nim should support that newline complication or not.

@Varriount
Copy link
Contributor

Just want to chime in that I agree strongly with the change to the recursive flag...

@a-mr
Copy link
Contributor Author

a-mr commented Dec 2, 2019

Also the "line break" vs "line ending" dichotomy is not just about any standard, it's about the formal definition of text files grammar according to which splitLines does its parsing:

<line> = { character-not-separator }
<textFileB> = <line> [ separator <textFileB> ]
<textFileE> = { <line> separator } 

Result for splitLines:

string Break Ending
"" [""] []
"\n" ["", ""] [""]
"A" ["A"] illegal ["A"]
"A\n" ["A", ""] "A"
"A\nB" ["A", "B"] illegal ["A", "B"]
"A\nB\n" ["A", "B", ""] ["A", "B"]

@Araq
Copy link
Member

Araq commented Dec 3, 2019

Decide it yourself whether Nim should support that newline complication or not.

Well that's my point, I don't like the stdlib additions. But I do like your nimgrep improvements!

* add context printing (lines after and before a match)
* nimgrep: add exclude/include options
* nimgrep: improve error printing & symlink handling
* nimgrep: rename dangerous `-r` argument
* add a `--newLine` style option for starting matching/context
  lines from a new line
* add color themes: 3 new themes besides default `simple`
* enable printing of multi-line matches with line numbers
* proper display of replace when there was another match replaced at
  the same line / context block
* improve cmdline arguments error reporting
@a-mr a-mr force-pushed the feat/nimgrep-improvements branch from 2a34de1 to ec69425 Compare December 4, 2019 21:14
@a-mr
Copy link
Contributor Author

a-mr commented Dec 4, 2019

OK, as you wish

@Araq Araq merged commit 26074f5 into nim-lang:devel Dec 5, 2019
@genotrance genotrance mentioned this pull request Apr 12, 2020
@genotrance
Copy link
Contributor

genotrance commented Oct 4, 2020

Can this be backported into 1.0.x?

cc @narimiran

Edit: will need to backport #13958 as well if this is accepted.

@Araq
Copy link
Member

Araq commented Oct 5, 2020

@genotrance Why do you need for 1.0.x? It changes the meaning of the -r parameter, it would be a backport of a breaking change.

@genotrance
Copy link
Contributor

Mainly because I need --follow for nimterop but I understand if breaking changes cannot go into 1.0.x.

a-mr added a commit to a-mr/Nim that referenced this pull request Oct 17, 2020
introduced in nimgrep improvements nim-lang#12779
@a-mr a-mr mentioned this pull request Oct 17, 2020
a-mr added a commit to a-mr/Nim that referenced this pull request Oct 18, 2020
introduced in nimgrep improvements nim-lang#12779
a-mr added a commit to a-mr/Nim that referenced this pull request Nov 8, 2020
introduced in nimgrep improvements nim-lang#12779
Araq pushed a commit that referenced this pull request Nov 9, 2020
* nimgrep: speed up by threads and Channels
* nimgrep: add --bin, --text, --count options
* nimgrep: add --sortTime option
* allow Peg in all matches
including --includeFile, --excludeFile, --excludeDir

* add --match and --noMatch options
* add --includeDir option
* add --limit (-m) and --onlyAscii (-o) options
* fix performance regression

introduced in nimgrep improvements #12779

* better error handling
* add option --fit
* fix groups in --replace
* fix flushing, --replace, improve --count
* use "." as the default directory, not full path
* fix --fit for Windows
* force target to C for macosx
* validate non-negative int input for options #15318
* switch nimgrep to using --gc:orc
* address review: implement cropping in matches,...
* implement stdin/pipe & revise --help
* address stylistic review & add limitations
PMunch pushed a commit to PMunch/Nim that referenced this pull request Jan 6, 2021
* nimgrep: speed up by threads and Channels
* nimgrep: add --bin, --text, --count options
* nimgrep: add --sortTime option
* allow Peg in all matches
including --includeFile, --excludeFile, --excludeDir

* add --match and --noMatch options
* add --includeDir option
* add --limit (-m) and --onlyAscii (-o) options
* fix performance regression

introduced in nimgrep improvements nim-lang#12779

* better error handling
* add option --fit
* fix groups in --replace
* fix flushing, --replace, improve --count
* use "." as the default directory, not full path
* fix --fit for Windows
* force target to C for macosx
* validate non-negative int input for options nim-lang#15318
* switch nimgrep to using --gc:orc
* address review: implement cropping in matches,...
* implement stdin/pipe & revise --help
* address stylistic review & add limitations
mildred pushed a commit to mildred/Nim that referenced this pull request Jan 11, 2021
* nimgrep: speed up by threads and Channels
* nimgrep: add --bin, --text, --count options
* nimgrep: add --sortTime option
* allow Peg in all matches
including --includeFile, --excludeFile, --excludeDir

* add --match and --noMatch options
* add --includeDir option
* add --limit (-m) and --onlyAscii (-o) options
* fix performance regression

introduced in nimgrep improvements nim-lang#12779

* better error handling
* add option --fit
* fix groups in --replace
* fix flushing, --replace, improve --count
* use "." as the default directory, not full path
* fix --fit for Windows
* force target to C for macosx
* validate non-negative int input for options nim-lang#15318
* switch nimgrep to using --gc:orc
* address review: implement cropping in matches,...
* implement stdin/pipe & revise --help
* address stylistic review & add limitations
irdassis pushed a commit to irdassis/Nim that referenced this pull request Mar 16, 2021
* nimgrep: speed up by threads and Channels
* nimgrep: add --bin, --text, --count options
* nimgrep: add --sortTime option
* allow Peg in all matches
including --includeFile, --excludeFile, --excludeDir

* add --match and --noMatch options
* add --includeDir option
* add --limit (-m) and --onlyAscii (-o) options
* fix performance regression

introduced in nimgrep improvements nim-lang#12779

* better error handling
* add option --fit
* fix groups in --replace
* fix flushing, --replace, improve --count
* use "." as the default directory, not full path
* fix --fit for Windows
* force target to C for macosx
* validate non-negative int input for options nim-lang#15318
* switch nimgrep to using --gc:orc
* address review: implement cropping in matches,...
* implement stdin/pipe & revise --help
* address stylistic review & add limitations
ardek66 pushed a commit to ardek66/Nim that referenced this pull request Mar 26, 2021
* nimgrep: speed up by threads and Channels
* nimgrep: add --bin, --text, --count options
* nimgrep: add --sortTime option
* allow Peg in all matches
including --includeFile, --excludeFile, --excludeDir

* add --match and --noMatch options
* add --includeDir option
* add --limit (-m) and --onlyAscii (-o) options
* fix performance regression

introduced in nimgrep improvements nim-lang#12779

* better error handling
* add option --fit
* fix groups in --replace
* fix flushing, --replace, improve --count
* use "." as the default directory, not full path
* fix --fit for Windows
* force target to C for macosx
* validate non-negative int input for options nim-lang#15318
* switch nimgrep to using --gc:orc
* address review: implement cropping in matches,...
* implement stdin/pipe & revise --help
* address stylistic review & add limitations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants