Convert shell scripts to single Python script #48

pedropombeiro · 2024-09-24T22:15:21Z

This PR converts the scripts to a single Python 3 script (scanner.py), making the code more readable and more maintainable. The log output was also improved in order to increase readability. Improvements are welcome, as I'm not versed in Python.

NOTES:

The code should be in a good enough condition to use daily, but there might be bugs, especially in the scripts that I don't use (trigger*.py).
~~There have been recent changes that aren't yet incorporated in the Python script.~~
~~The remove_blank.sh script hasn't been converted fully yet, as that involves parsing the output of some command line utilities.~~ UPDATE: fixed

~~Later on, we might want to remove the remaining shell scripts by changing the contents of /opt/brother/scanner/brscan-skey/brscan-skey.config to refer directly to the Python scripts:~~

IMAGE="python3  /opt/brother/scanner/brscan-skey/script/scantoimage.py"
OCR="python3  /opt/brother/scanner/brscan-skey/script/scantoocr.py"
EMAIL="python3  /opt/brother/scanner/brscan-skey/script/scantoemail.py"
FILE="python3  /opt/brother/scanner/brscan-skey/script/scantofile.py"

Related to #42

PhilippMundhenk · 2024-09-25T18:11:11Z

WOW! Just wow! This is awesome, thank you!! I will need some time to take a look at this, though. I'll have to push this to the weekend (hopefully). Very sorry about that.

pedropombeiro · 2024-09-25T18:57:20Z

WOW! Just wow! This is awesome, thank you!! I will need some time to take a look at this, though. I'll have to push this to the weekend (hopefully). Very sorry about that.

No worries @PhilippMundhenk, I already have it running on my NAS as my daily driver by overriding the scripts, so I'm in no rush 🙂

pedropombeiro · 2024-10-07T20:28:14Z

The latest version should have fixed the scan order:

I'll now debug the scanner disconnection (no idea what could be causing this). Do you know if this is only happening in this branch - not in master?

pedropombeiro · 2024-10-07T20:30:15Z

Turns out that the duplicate logs were caused by a lingering batch I had in /tmp. Maybe we should clean incomplete batches on container startup (at least if the back pages are there but no .scan_pid file).

pedropombeiro · 2024-10-07T20:39:06Z

I tried a few more times and this time I didn't see any disconnections 🤷

Can you please try the latest image? Normally, OCR should be the only problem left.

pedropombeiro · 2024-10-07T20:47:30Z

OCR should also be fixed.

PhilippMundhenk · 2024-10-08T19:11:20Z

Thank you so much for the super fast respose! I don't manage to be this fast.

I can confirm that ordering is correct now!
For OCR, current behavior (thus allowing backward compatibility) is to trigger the OCR call after the conversion of either front or front and rear pages on scantofile or scantoemail (i.e., any scan), if OCR variables are set. scantoimage and scantoocr are currently undefined. I only took a quick look, but don't quite understand how OCR is enabled on the OCR key, but not the other keys.

I also tested with OCR key, but OCR also does not seem to work, maybe due to this error:

- Scanning rear to latest batch 2024-10-08-19-04-12
  rear side: Found front-side batch: 2024-10-08-19-04-12
  rear side: ERROR: scan_pid file {path} not found.
  Analyzing 4 pages in /tmp/2024-10-08-19-04-12/2024-10-08-19-04-12.pdf with threshold 0.3%

I also do not observe any more hanging.
Regarding leftover batches, I think it would make sense not to clean on container start, but rather on scan start, if we can be sure there is not conversion process running. We should enable the case though were two front pages are scanned quickly in a row. E.g., by checking for age of .scan_pid

I (accidentally) noticed that aborting a started scan leads to some issues. Logs (scanning stopped at scanner before first page being pulled in, second scan started before second line):

Scanning page 1
  front side: Waiting for 2 minutes before starting file conversion for 2024-10-08-18-57-25
  front side: converting to PDF for 2024-10-08-18-57-25...
  DEBUG: Executing command: ['gm', 'convert', '/tmp/2024-10-08-18-57-25/2024-10-08-18-57-25-front-page0001.pnm', '/tmp/2024-10-08-18-57-25/2024-10-08-18-57-25.pdf'], kwargs={'check': True}
  DEBUG: Moving /tmp/2024-10-08-18-57-25/2024-10-08-18-57-25.pdf to /scans/2024-10-08-18-57-25.pdf
  INFO: SSH environment variables not set, skipping inotify trigger.
  INFO: TELEGRAM_TOKEN or TELEGRAM_CHATID environment variables not set, skipping Telegram trigger.
scanimage: sane_read: Error during device I/O
Scanned page 1. (scanner status = 9)
Batch terminated, 1 page scanned
object address  : 0x7aa11b385780
object refcount : 3
object type     : 0x6037ff263140
object type name: CalledProcessError
object repr     : CalledProcessError(9, ['scanimage', '-l', '0', '-t', '0', '-x', '215', '-y', '297', '--format=pnm', '--resolution=300', '--  batch=/tmp/2024-10-08-18-58-20/2024-10-08-18-58-20-front-page%04d.pnm'])
lost sys.stderr
- Scanning front to batch /tmp/2024-10-08-19-00-41/2024-10-08-19-00-41-front-page%04d.pnm
  DEBUG: Executing command: ['scanimage', '-l', '0', '-t', '0', '-x', '215', '-y', '297', '--format=pnm', '--mode=True Gray', '--resolution=300', '--batch=/tmp/2024-10-08-19-00-41/2024-10-08-19-00-41-front-page%04d.pnm'], kwargs={'check': True}
scanimage: rounded value of br-x from 215 to 211.881

I have not tested FTPS, inotify, Telegram

pedropombeiro · 2024-10-08T21:12:55Z

I also tested with OCR key, but OCR also does not seem to work, maybe due to this error:

@PhilippMundhenk that's strange, because OCR key is mapped to scan_front(log, device, ["--mode=True Gray"]), and the logs you mention show a rear scan 🤔

I also do not observe any more hanging.

🎉

I (accidentally) noticed that aborting a started scan leads to some issues. Logs (scanning stopped at scanner before first page being pulled in, second scan started before second line):

I've tested the scenario locally and pushed a fix for it, which also logs what happened.

PhilippMundhenk · 2024-10-09T18:35:30Z

OCR: Oh ok, misunderstanding. I thought I can add rear pages and it still runs through OCR. I starte with scantoocr and then ran scantoemail for rear pages. I think we should just add OCR to front and front and rear scans on the scantofile and scantoemail buttons, if OCR variables are set. This would be backward compatible behavior.
I tested only front pages with scantoocr (my button calls it "Text") now and I don't see any difference in behavior. No OCR is being triggered.

- Scanning front to batch /tmp/2024-10-09-18-22-57/2024-10-09-18-22-57-front-page%04d.pnm
  DEBUG: Executing command: ['scanimage', '-l', '0', '-t', '0', '-x', '215', '-y', '297', '--format=pnm', '--mode=True Gray', '--resolution=300', '--batch=/tmp/2024-10-09-18-22-57/2024-10-09-18-22-57-front-page%04d.pnm'], kwargs={'check': True}
scanimage: rounded value of br-x from 215 to 211.881
scanimage: rounded value of br-y from 297 to 296.973
Scanning infinity pages, incrementing by 1, numbering from 1
Scanning page 1
Scanned page 1. (scanner status = 5)
Scanning page 2
Scanned page 2. (scanner status = 5)
Scanning page 3
scanimage: sane_start: Document feeder out of documents
Batch terminated, 2 pages scanned
  front side: INFO: Waiting to start conversion process for 2024-10-09-18-22-57 in process with PID 98
  front side: Waiting for 2 minutes before starting file conversion for 2024-10-09-18-22-57
  front side: converting to PDF for 2024-10-09-18-22-57...
  DEBUG: Executing command: ['gm', 'convert', '/tmp/2024-10-09-18-22-57/2024-10-09-18-22-57-front-page0001.pnm', '/tmp/2024-10-09-18-22-57/2024-10-09-18-22-57-front-page0002.pnm', '/tmp/2024-10-09-18-22-57/2024-10-09-18-22-57.pdf'], kwargs={'check': True}
  DEBUG: Moving /tmp/2024-10-09-18-22-57/2024-10-09-18-22-57.pdf to /scans/2024-10-09-18-22-57.pdf
  INFO: SSH environment variables not set, skipping inotify trigger.
  INFO: TELEGRAM_TOKEN or TELEGRAM_CHATID environment variables not set, skipping Telegram trigger.

Aborts: I noticed now that it seems to be the scan command that is hanging. There is probably little we can do about that, at least not now. Same behavior in master today. Never had issues with this though, so maybe that is not even a practical situation.

pedropombeiro · 2024-10-09T19:34:52Z

I think we should just add OCR to front and front and rear scans on the scantofile and scantoemail buttons, if OCR variables are set. This would be backward compatible behavior.

I'm not sure I follow. Right now we're calling OCR for any document that has finished scanning - regardless of which buttons were used. This seems logical, no?

I tested only front pages with scantoocr (my button calls it "Text") now and I don't see any difference in behavior. No OCR is being triggered

@PhilippMundhenk I don't see the following lines on your log. Were they present?

    print(f"  {side} side: Conversion and post-processing for finished.")
    print("-----------------------------------")

PhilippMundhenk · 2024-10-09T19:37:08Z

Yes, that would be the ideal behavior, indeed.

Nope, never saw those lines...

pedropombeiro · 2024-10-09T19:39:18Z

Nope, never saw those lines...

@PhilippMundhenk I wonder if the OCR variables are present at that point. I tested in the REPL inside the container and the check seems to work as expected. Would you be able to add some prints to your script to see what is happening?

PhilippMundhenk · 2024-10-09T19:55:33Z

Ok, so the issue is not within OCR, it is within the telegram notification. We never reach OCR:

    print("notifying...")
    notify(log, output_pdf_file, f"{job_name}.pdf ({side}) scanned")
    print("notified")

    print("cleaning...")
    clean_job_files(log, side, job_name)
    print("cleaned")

    print("OCRing...")
    # Check for OCR environment variables
    ocr_server = os.getenv("OCR_SERVER")
    ocr_port = os.getenv("OCR_PORT")
    ocr_path = os.getenv("OCR_PATH")

    print("OCR_SERVER: " + ocr_server)
    print("OCR_PORT: " + ocr_port)
    print("OCR_PATH: " + ocr_path)

log:

- Scanning rear to latest batch 2024-10-09-19-51-12
  rear side: Found front-side batch: 2024-10-09-19-51-12
  rear side: Read pid from /tmp/2024-10-09-19-51-12/.scan_pid, killing front processing job 69
  DEBUG: Executing command: ['scanimage', '-l', '0', '-t', '0', '-x', '215', '-y', '297', '--format=pnm', '--resolution=300', '--batch=/tmp/2024-10-09-19-51-12/2024-10-09-19-51-12-back-page%04d.pnm'], kwargs={'check': True}
scanimage: rounded value of br-x from 215 to 211.881
scanimage: rounded value of br-y from 297 to 296.973
Scanning infinity pages, incrementing by 1, numbering from 1
Scanning page 1
Scanned page 1. (scanner status = 5)
Scanning page 2
scanimage: sane_start: Document feeder out of documents
Batch terminated, 1 page scanned
  rear side: INFO: number of pages scanned: 1
  rear side: DEBUG: renamed 2024-10-09-19-51-12-front-page0001.pnm to index001-1-2024-10-09-19-51-12-front-page0001.pnm
  rear side: DEBUG: renamed 2024-10-09-19-51-12-back-page0001.pnm to index001-2-2024-10-09-19-51-12-back-page0001.pnm
  rear side: INFO: number of pages scanned: 1
  rear side: DEBUG: renamed 2024-10-09-19-51-12-front-page0001.pnm to index001-1-2024-10-09-19-51-12-front-page0001.pnm
  rear side: DEBUG: renamed 2024-10-09-19-51-12-back-page0001.pnm to index001-2-2024-10-09-19-51-12-back-page0001.pnm
  rear side: converting to PDF for 2024-10-09-19-51-12...
  DEBUG: Executing command: ['gm', 'convert', '/tmp/2024-10-09-19-51-12/index001-1-2024-10-09-19-51-12-front-page0001.pnm', '/tmp/2024-10-09-19-51-12/index001-2-2024-10-09-19-51-12-back-page0001.pnm', '/tmp/2024-10-09-19-51-12/2024-10-09-19-51-12.pdf'], kwargs={'check': True}
  Analyzing 2 pages in /tmp/2024-10-09-19-51-12/2024-10-09-19-51-12.pdf with threshold 0.3%
    Page 1: delete (ink coverage: 0.01%)
    Page 2: delete (ink coverage: 0.01%)
  DEBUG: Executing command: ['/usr/bin/pdftk', '/tmp/2024-10-09-19-51-12/2024-10-09-19-51-12.pdf', 'cat', 'output', '/tmp/2024-10-09-19-51-12/2024-10-09-19-51-12_noblank.pdf'], kwargs={'check': True}
  Removed 2 blank pages and saved as /tmp/2024-10-09-19-51-12/2024-10-09-19-51-12.pdf
  DEBUG: Moving /tmp/2024-10-09-19-51-12/2024-10-09-19-51-12.pdf to /scans/2024-10-09-19-51-12.pdf
�      
notifying...
  INFO: SSH environment variables not set, skipping inotify trigger.
  INFO: TELEGRAM_TOKEN or TELEGRAM_CHATID environment variables not set, skipping Telegram trigger.

PhilippMundhenk · 2024-10-09T19:56:36Z

Well, yeah, makes sense. We actually exit(1) there, rather than returning :)

made sure scan doesn't exit if telegram not found

pedropombeiro · 2024-10-09T20:02:39Z

Well, yeah, makes sense. We actually exit(1) there, rather than returning :)

😄 yeah, that would do it!

PhilippMundhenk · 2024-10-09T20:10:52Z

OCR: Something still off, but I'm looking into it.

Empty pages: Are you sure that removal of empty pages works? I just scanned a bunch of empties, it also logs that the pages have been removed, but I receive a PDF with two empty pages.

pedropombeiro · 2024-10-09T20:13:56Z

Empty pages: Are you sure that removal of empty pages works? I just scanned a bunch of empties, it also logs that the pages have been removed, but I receive a PDF with two empty pages.

Is it a document consisting of all empties? If so, I've noticed that the file comes out as original, but I don't think that's necessarily bad, since that's not a usual scenario, and at least it allows you to see what is wrong with the document, instead of receiving a PDF with zero pages. I've tried scanning pages with some empty backs, and it worked as expected.

PhilippMundhenk · 2024-10-09T20:21:39Z

Ah ok, yes, indeed, special case. I was too lazy to put paper in and just scanned a bunch of nothing. We can leave it like that.

OCR: Works!!

FTP: It tries to upload, although no variable set. I changed the checking condition.

PhilippMundhenk

lgtm

PhilippMundhenk · 2024-10-06T19:24:00Z

files/brscan-skey.config

Ah, just now I said in #53 that 1) there is not an issue, but here it is. If we remove the shell files altogether, we will also need to make sure the web interface is directly calling the python scripts...

PhilippMundhenk · 2024-10-06T19:30:40Z

html/active.php

We probably don't need any of the php cleanups, once #32 is in...

PhilippMundhenk · 2024-10-06T19:33:01Z

script/scantoemail.py

Wow! This is really neat! Much easier to handle and for users to adapt

I feel we could make it even simpler, by getting logging and some formalities out of the way, but this is really nitpicking. Let's keep some todos for the future ;)

PhilippMundhenk · 2024-10-09T20:26:07Z

Oh, more of a "note-to-self": One thing I noticed in OCR is that the files are now massive. 33MB for a 14MB input, used to be that they come out much smaller. Not sure why that is...

I will take a look at that another day. But this can be merged into dev, so that we can get started on the web UI integration...

pedropombeiro force-pushed the pedropombeiro/convert-to-python3 branch 5 times, most recently from 59ece31 to 3b031d4 Compare September 25, 2024 11:34

pedropombeiro changed the title ~~Draft: Convert shell scripts to single Python script~~ Convert shell scripts to single Python script Sep 25, 2024

pedropombeiro requested a review from PhilippMundhenk September 25, 2024 11:34

pedropombeiro force-pushed the pedropombeiro/convert-to-python3 branch 2 times, most recently from 99c9a0a to 9f3d47e Compare September 25, 2024 13:54

pedropombeiro self-assigned this Sep 25, 2024

pedropombeiro force-pushed the pedropombeiro/convert-to-python3 branch 4 times, most recently from 87cb872 to b84fabc Compare September 25, 2024 16:56

pedropombeiro force-pushed the pedropombeiro/convert-to-python3 branch 14 times, most recently from bff36c1 to 9ee9978 Compare October 1, 2024 20:34

Fix page order

5210fef

Rename scan_rear function to match filenames

0de5d6f

Try to fix OCR

665fa8c

pedropombeiro force-pushed the pedropombeiro/convert-to-python3 branch from 9b36b70 to 665fa8c Compare October 7, 2024 20:46

Handle cancelled scans

63efb30

fixed premature exit

a2daf48

made sure scan doesn't exit if telegram not found

fixed curl call

da283d4

PhilippMundhenk added 2 commits October 9, 2024 22:20

fixed FTPS condition

2d8f3d2

unifying style

6152198

PhilippMundhenk approved these changes Oct 9, 2024

View reviewed changes

unifying style

69490bc

PhilippMundhenk merged commit 0cb8ebe into development Oct 11, 2024
1 check passed

pedropombeiro deleted the pedropombeiro/convert-to-python3 branch October 11, 2024 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert shell scripts to single Python script #48

Convert shell scripts to single Python script #48

pedropombeiro commented Sep 24, 2024 •

edited

Loading

PhilippMundhenk commented Sep 25, 2024

pedropombeiro commented Sep 25, 2024

pedropombeiro commented Oct 7, 2024

pedropombeiro commented Oct 7, 2024

pedropombeiro commented Oct 7, 2024

pedropombeiro commented Oct 7, 2024

PhilippMundhenk commented Oct 8, 2024 •

edited

Loading

pedropombeiro commented Oct 8, 2024

PhilippMundhenk commented Oct 9, 2024 •

edited

Loading

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

PhilippMundhenk left a comment

PhilippMundhenk Oct 6, 2024

PhilippMundhenk Oct 6, 2024

PhilippMundhenk Oct 6, 2024

PhilippMundhenk Oct 6, 2024

PhilippMundhenk commented Oct 9, 2024 •

edited

Loading

Convert shell scripts to single Python script #48

Convert shell scripts to single Python script #48

Conversation

pedropombeiro commented Sep 24, 2024 • edited Loading

PhilippMundhenk commented Sep 25, 2024

pedropombeiro commented Sep 25, 2024

pedropombeiro commented Oct 7, 2024

pedropombeiro commented Oct 7, 2024

pedropombeiro commented Oct 7, 2024

pedropombeiro commented Oct 7, 2024

PhilippMundhenk commented Oct 8, 2024 • edited Loading

pedropombeiro commented Oct 8, 2024

PhilippMundhenk commented Oct 9, 2024 • edited Loading

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

pedropombeiro commented Oct 9, 2024

PhilippMundhenk commented Oct 9, 2024

PhilippMundhenk left a comment

Choose a reason for hiding this comment

PhilippMundhenk Oct 6, 2024

Choose a reason for hiding this comment

PhilippMundhenk Oct 6, 2024

Choose a reason for hiding this comment

PhilippMundhenk Oct 6, 2024

Choose a reason for hiding this comment

PhilippMundhenk Oct 6, 2024

Choose a reason for hiding this comment

PhilippMundhenk commented Oct 9, 2024 • edited Loading

pedropombeiro commented Sep 24, 2024 •

edited

Loading

PhilippMundhenk commented Oct 8, 2024 •

edited

Loading

PhilippMundhenk commented Oct 9, 2024 •

edited

Loading

PhilippMundhenk commented Oct 9, 2024 •

edited

Loading