Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Can't preview image and pdf also can't converte to PDF #3816

Open
PhamPham92 opened this issue Jul 18, 2024 · 14 comments
Open

BUG: Can't preview image and pdf also can't converte to PDF #3816

PhamPham92 opened this issue Jul 18, 2024 · 14 comments
Labels
support To track support requests

Comments

@PhamPham92
Copy link

Hello,

I have this msg when i want to preview a image or a pdf :
image
image
image

Can someone help me ? thank you.

@PhamPham92 PhamPham92 added bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team labels Jul 18, 2024
@PhamPham92 PhamPham92 changed the title BUG: Can't preview image and pdf also can't converted to PDF BUG: Can't preview image and pdf also can't converte to PDF Jul 18, 2024
@tillprochaska
Copy link
Contributor

Hi, thanks for using Aleph. Please make sure to provide the information specified in the issue template when opening issues in this repository.

Based on the error message, I think your ingest-file process might not use the correct database. How are you running Aleph? Did you specify a custom database URI (using the ALEPH_DATABASE_URI configuration option)?

@catileptic catileptic added support To track support requests and removed bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team labels Jul 30, 2024
@PhamPham92
Copy link
Author

PhamPham92 commented Jul 31, 2024

Hello,
I use the docker-compose from the Deployment production methods and i didn't touch anything :
image

@tillprochaska
Copy link
Contributor

Which Aleph version are you using?

@PhamPham92
Copy link
Author

I use the 3.17.0 :
image

@Amerousful
Copy link

Amerousful commented Sep 3, 2024

@PhamPham92 Hey! I also faced the same issue with ingest_cache. It occurred after switching to 3.17. Have you tried running aleph upgrade? It helped me.

UPD: Frankly, it helped only for some particular case/document! I reproduced this bug with other documents, unfortunately...

@PhamPham92
Copy link
Author

Hello, thanks for your answer. I have already tried, but it didn't help.

@gethert
Copy link

gethert commented Sep 11, 2024

I have the same issue while using 3.17.0
Mainly "OperationalError('(sqlite3.OperationalError) no such table: ingest_cache')"
and a few "Could not extract PDF file: RuntimeError('Set changed size during iteration')" and
Failed to open image: (sqlite3.OperationalError) no such table: ingest_cache [SQL: SELECT ingest_cache.value FROM ingest_cache WHERE ingest_cache."key" = ?] [parameters: ('ocr:4e272afede8878a8d943ac3d854b97d768613274',)] (Background on this error at: https://sqlalche.me/e/20/e3q8)

In docker-compose.yml I tried to comment out "~:/host"
and in aleph.tmpl I tried to comment in ARCHIVE_TYPE=file and ARCHIVE_PATH=/data

With a dataset of 20k files (2.2 GB), files with text are well indexed, small images often, larger images and pdfs almost never.

When I tried to crawl 2 .pngs and 2 .pdfs, 1 pdf was successfully indexed.

With a dataset of 6k files (860.9 MB) the success rate for pdfs was about 50 %, but it still failed with larger images.

@PhamPham92
Copy link
Author

Hello,

Exactly the same issue for me!

@riotbib
Copy link

riotbib commented Sep 17, 2024

One of my instances is also getting this bug:

No preview is available for this document
Failed to open image: (sqlite3.OperationalError) no such table: ingest_cache [SQL: SELECT ingest_cache.value FROM ingest_cache WHERE ingest_cache."key" = ?] [parameters: ('ocr:[redacted-string]:deu',)] (Background on this error at: https://sqlalche.me/e/20/e3q8)

I am using Aleph version 3.17.0 and ingest-file version 3.22.0. This bug did not occur with previous versions (that I know of).

Even though ALEPH_DATABASE_URI is commented out in the aleph.env, there is a postgres deployed via docker-compose.yml, in which ingest-file depends on postgres. [Edit: Bad thinking at the end of the day…]

Please let me know, if I can assist finding the bug's fault with my setup.

@simonwoerpel
Copy link
Contributor

could you all enter a shell within a running ingest-file container and print the result of echo $FTM_STORE_URI (should point to the default postgres uri if you didn't touch anything and using the official docker builds)

@riotbib
Copy link

riotbib commented Sep 17, 2024

$ docker-compose run --rm ingest-file bash                                                                                                                                              (main✱)
WARN[0000] /opt/aleph/docker-compose.yml: `version` is obsolete
[+] Creating 2/0
 ✔ Container aleph-redis-1     Running                                                                                                                                                                          0.0s
 ✔ Container aleph-postgres-1  Running                                                                                                                                                                          0.0s
root@ingest:/ingestors# echo $FTM_STORE_URI
postgresql://aleph:aleph@postgres/aleph
root@ingest:/ingestors# echo $ALEPH_DATABASE_URI

Thanks, @simonwoerpel. I also echoed ALEPH_DATABASE_URI which is empty. Maybe that's why it falls back to SQLite?

@simonwoerpel
Copy link
Contributor

No, ingest-file doesn't even know about aleph in that sense, it only knows about the ftm store :)

@PhamPham92
Copy link
Author

image

@sjinks
Copy link

sjinks commented Nov 17, 2024

It looks like the servicelayer (from ingest-file) uses the TAGS_DATABASE_URI environment variable to set the location of the tags database. By default, the variable is not set.

I was able to fix the issue by adding

TAGS_DATABASE_URI=sqlite:///data/tags.sqlite

to aleph.env (OK, adding it to the environment of ingest-file would probably be enough. Oh well…) and restarting ingest-file: docker compose up -d ingest-file --force-recreate

I then had to reingest all documents, but at least it works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support To track support requests
Projects
None yet
Development

No branches or pull requests

8 participants