Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting error: "adapter: poppler Error: Broken pipe" #113

Closed
bbardsley opened this issue Aug 19, 2021 · 26 comments
Closed

Getting error: "adapter: poppler Error: Broken pipe" #113

bbardsley opened this issue Aug 19, 2021 · 26 comments

Comments

@bbardsley
Copy link

bbardsley commented Aug 19, 2021

After the command rga --files-with-matches "search-term", I get for almost all the pdf files I have in my directory the error:

name-of-file.pdf:
adapter: poppler
Error: Broken pipe (os error 32)

For some files it also says:

adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:53
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on
an `Err` value: Any { .. }', src/adapters/spawning.rs:98:6

I have no idea what causes this error.

@phiresky
Copy link
Owner

please send an example pdf file and show the output of pdftotext -v

@bbardsley
Copy link
Author

The output of pdftotext -v is
pdftotext version 21.08.0 Copyright 2005-2021 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC

The error happens for example for
https://www.maths.ed.ac.uk/~v1ranick/papers/witten.pdf

@phiresky
Copy link
Owner

I can't reproduce the issue, for me rga foo in a directory with only the above pdf works fine. does rga-preproc witten.pdf work for you?

@bbardsley
Copy link
Author

rga foo works fine for me. rga --files-with-matches foo also works fine, but somehow rga --files-with-matches thing gives an error. rga-preproc witten.pdf gives a normal output.

@Ornanovitch
Copy link

I got the very same behavior since some weeks (but I only realize now because I use rga along with fzf and the latter doesn't output any error).

Everything is working fine except rga --files-with-matches, at least with a majority of my pdf:

adapter: poppler
Error: Broken pipe (os error 32)

I can't find out if some libs have been upgraded...

@phiresky
Copy link
Owner

I can't see anything in the changelog of ripgrep, but this could happen if ripgrep stops reading from the preprocessor once it finds the first match when using the --files-with-matches option or when it detects that a file is binary. this should be fixed in the next version.

@Ornanovitch

This comment has been minimized.

@Ornanovitch
Copy link

So today I tested with 6 fresh new pdf files. I named those files 1.pdf, 2.pdf... and I placed all of this into a new folder.

1. first attempt with --files-with-matches:

❯ rga --files-with-matches "essai"
5.pdf:
-------------------------------------------------------------------------------
adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/adapters/spawning.rs:92:5
-------------------------------------------------------------------------------
2.pdf:
-------------------------------------------------------------------------------
adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/adapters/spawning.rs:92:5
-------------------------------------------------------------------------------
6.pdf:
-------------------------------------------------------------------------------
adapter: poppler
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "could not finish zstd"', src/preproc.rs:117:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
-------------------------------------------------------------------------------
4.pdf:
-------------------------------------------------------------------------------
adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/adapters/spawning.rs:92:5
-------------------------------------------------------------------------------
1.pdf:
-------------------------------------------------------------------------------
adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/adapters/spawning.rs:92:5
-------------------------------------------------------------------------------

2. second attempt without --files-with-matches:

❯ rga "essai"
4.pdf
Page 6: tâches ont conduit à de nécessaires précisions dans des textes ultérieurs, tant
Page 12: HUGHES E.C. (1996), Le Regard sociologique. Essais choisis, Éditions de

6.pdf
Page 17: deviennent le monopole des journalistes et autres essayistes, il faut cesser de considérer l’empathie comme une condition nécessaire pour

2.pdf
Page 4: pénale essaie désormais d’agir le plus
Page 5: 16 octobre 2020 prescrivant les mesures générales nécessaires pour faire face à l’épidémie

5.pdf
Page 7: sur un sujet donné, les dirigeants du think tank essaient d’identifier des experts
Page 15: d’une dimension qui légitime le dessaisissement du traitement du problème par
Page 22: par Terra Nova, le capital symbolique (Sapiro, 2009) nécessaire à des interventions individuelles (signature de tribune dans la presse, invitation à la télévision
Page 25: Eymeri-Douzans J.-M., 2010, « Ce que faire l’expert pour la Commission européenne veut dire. Essai d’auto-analyse d’une trajectoire de socialisation », in

1.pdf
Page 5: légitimes ». Cf. Hughes (H.), Le regard sociologique. Essais choisis, Paris, Éditions de l’EHESS, 1996, p. 157.
Page 7: 20. Muller (P.), Le technocrate et le paysan. Essai sur la politique française de modernisation de l’agriculture de
Page 9: pour lesquels ils ne disposaient que de toutes petites lignes budgétaires expérimentales… Et dès […] qu’il devient nécessaire de re-financer 20 000 contrats
Page 10: « rigueur » favorise une phase d’« essaimage » de ces initiatives, qui aboutit à la
Page 20: avec d’autres, en son temps, L’ABC du créateur, bon on essaie de vulgariser les
Page 23: financements publics nécessaires à la pérennisation de leur propre poste.

3. third attempt with --files-with-matches:

❯ rga --files-with-matches "essai"
2.pdf
1.pdf:
-------------------------------------------------------------------------------
adapter: poppler
Error: Broken pipe (os error 32)
-------------------------------------------------------------------------------
4.pdf
6.pdf
5.pdf:
-------------------------------------------------------------------------------
adapter: poppler
Error: Broken pipe (os error 32)
-------------------------------------------------------------------------------

So it's getting better after a first rga, but not totally. Weird...

@TwistingTwists
Copy link

I am getting same error.

1. rga locust -- works
2. rga locust --files-with-matches -- gives error

adapter: poppler
Error: Broken pipe (os error 32)

@akobel
Copy link

akobel commented Mar 4, 2022

adapter: poppler is a red herring, I guess: the problem also occurs for cached data, where pdftotext is not called at all.
The problem occurs quite randomly (0.9.6 on an up-to-date Arch installation), it's not pinned to any specific document. Looks like a race condition to me in the stream handling of pipe_output (not the cache lookup, I guess; I first suspected an issue in the zstd decoder handling, but I could reproduce also with clean cache).

However, the problem seems gone in master, where pipe_output has been reworked fundamentally, as far as I can tell, although the comment regarding threading doesn't sound too stable yet...

@azu azu mentioned this issue Mar 27, 2022
@ygm-nob
Copy link

ygm-nob commented Apr 4, 2022

I use like this without using --files-with-matches:. I'm using fish shell on Mac.

#!/opt/homebrew//bin/fish
/usr/bin/open (rga $argv[1] | awk 'BEGIN{FS=":"}{print $1}' | sort -u | fzf --sort )

@mathieupost
Copy link
Contributor

mathieupost commented Jun 9, 2022

I'm having the exact same issue as @Ornanovitch in #113 (comment)

Getting this error for every pdf at the first try (or when running with --rga-no-cache)

adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:53
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any { .. }', src/adapters/spawning.rs:98:6

Running with RUST_BACKTRACE=1:

adapter: poppler
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: __pthread_start
', src/adapters/spawning.rs:93:53
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any { .. }', src/adapters/spawning.rs:98:6
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: ripgrep_all::adapters::spawning::pipe_output
   4: ripgrep_all::adapters::spawning::<impl ripgrep_all::adapters::FileAdapter for T>::adapt
   5: ripgrep_all::preproc::rga_preproc
   6: rga_preproc::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

After running without --files-with-matches every run random pdf's get this error:

adapter: poppler
Error: Broken pipe (os error 32)
❯ rga --version
ripgrep-all 0.9.6

❯ rg --version
ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

❯ pdftotext -v
pdftotext version 22.04.0
Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC

All installed from nixpkgs-22.05 on macOS

@sinnpi
Copy link

sinnpi commented Jun 27, 2022

I am getting same error.

1. rga locust -- works 2. rga locust --files-with-matches -- gives error

adapter: poppler
Error: Broken pipe (os error 32)

I had this same issue with the version provided on the ArchLinux repository (0.9.6). Installing from master branch solved the issue.

However the master branch has the peculiarity of not showing page-numbers with the results, for some reason. So I (admittedly) duct-taped a "solution" (or to be more honest: a hack) where my rga-fzf command uses a locally stored version of the rga from the master branch to list the files with --files-with-matches, and the latest release to do the actual search... Not proud of it, but it works currently.

@TwistingTwists
Copy link

can you share your rga-fzf function ?

and how to get those two different versions of rga you mentioned ?

Thanks !

I am getting same error.
1. rga locust -- works 2. rga locust --files-with-matches -- gives error

adapter: poppler
Error: Broken pipe (os error 32)

I had this same issue with the version provided on the ArchLinux repository (0.9.6). Installing from master branch solved the issue.

However the master branch has the peculiarity of not showing page-numbers with the results, for some reason. So I (admittedly) duct-taped a "solution" (or to be more honest: a hack) where my rga-fzf command uses a locally stored version of the rga from the master branch to list the files with --files-with-matches, and the latest release to do the actual search... Not proud of it, but it works currently.

@jonaustin
Copy link

jonaustin commented Jun 30, 2022

Not the original poster, but the way I did it is super simple:

git clone https://github.com/phiresky/ripgrep-all.git
cd ripgrep-all
cargo install --locked --path . --force # install master
pacman -S ripgrep-all # 0.9.6

the rga-fzf shell function is exactly the same as the one in this repo's readme; except the first rga command uses master

rga-fzf () {
    RG_PREFIX="~/.cargo/bin/rga --files-with-matches"
    local file
    file="$(FZF_DEFAULT_COMMAND="$RG_PREFIX '$1'" \
    fzf --sort --preview="[[ ! -z {} ]] && rga --pretty --context 5 {q} {}" \
    --phony -q "$1" \
    --bind "change:reload:$RG_PREFIX {q}" \
    --preview-window="70%:wrap"
    )"  && echo "opening $file" && xdg-open "$file"
}

I poked at the code at bit to see why it no longer works; looks like there was a major refactoring at some point and the poppler adapter has now moved into adapters/custom.rs for some reason; and this functionality just hasn't been added back yet:

            // postprocessors: [{name: "add_page_numbers_by_pagebreaks"}]

https://github.com/phiresky/ripgrep-all/blob/master/src/adapters/custom.rs

old code:

oup.write_all(format!("{}Page {}: {}\n", line_prefix, page, line).as_bytes())?;

@temberature
Copy link

Not the original poster, but the way I did it is super simple:

git clone https://github.com/phiresky/ripgrep-all.git
cd ripgrep-all
cargo install --locked --path . --force # install master
pacman -S ripgrep-all # 0.9.6

the rga-fzf shell function is exactly the same as the one in this readme's repo; except the first rga command uses master

rga-_fzf_ () {
    RG_PREFIX="~/.cargo/bin/rga --files-with-matches"
    local file
    file="$(_FZF__DEFAULT_COMMAND="$RG_PREFIX '$1'" \
    _fzf_ --sort --preview="[[ ! -z {} ]] && rga --pretty --context 5 {q} {}" \
    --phony -q "$1" \
    --bind "change:reload:$RG_PREFIX {q}" \
    --preview-window="70%:wrap"
    )"  && echo "opening $file" && xdg-open "$file"
}

I poked at the code at bit to see why it no longer works; looks like there was a major refactoring at some point and the poppler adapter has now moved into adapters/custom.rs for some reason; and this functionality just hasn't been added back yet:

            // postprocessors: [{name: "add_page_numbers_by_pagebreaks"}]

https://github.com/phiresky/ripgrep-all/blob/master/src/adapters/custom.rs

old code:

oup.write_all(format!("{}Page {}: {}\n", line_prefix, page, line).as_bytes())?;

I build from tag v0.9.6 like this and error disappear without behavior change.

@mpr1255
Copy link

mpr1255 commented Oct 25, 2022

FWIW for anyone else who comes across this. I'm on OSX 12.6 and it wasn't working after installing with brew (including uninstalling & reinstalling everything). But jonaustin's 30 Jun post did the job: clone the repo, install with cargo. It even added it to the path and everything so 'rga' at the terminal just works. rga-fzf too.

@Bellavene
Copy link

brew install rga --HEAD

@juffis
Copy link

juffis commented Mar 26, 2023

I just ran into this, too. Any progress on this?

@Bellavene
Copy link

What progress do you need, if you always can install the recent (--HEAD) version?

@jonaustin
Copy link

What progress do you need, if you always can install the recent (--HEAD) version?

Using the function in this comment is still better because the current version still doesn't return page numbers with the search: #113 (comment)

@Bellavene
Copy link

Thank you for letting know.

@juffis
Copy link

juffis commented Mar 28, 2023

I'm trying to get this to work in termux as I have all my pdfs on my phone.

Using the function in this comment is still better because the current version still doesn't return page numbers with the search: #113 (comment)

Do you mean the function with two different versions as posted by you?

@jonaustin
Copy link

Yeah (I mean it's better if you want the page numbers for each search result anyway).

No idea about termux. Neat though; post back if/how you get it working.

@juffis
Copy link

juffis commented Mar 30, 2023

Sure, I've never built anything with/for termux so this is gonna take a while, but if I get it working I'll post back.

@phiresky
Copy link
Owner

Fixed in 0.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests