Allow output text file to be loaded later for editing #315

Shreeshrii · 2018-02-14T16:21:09Z

This is a request for new feature.

This will be helpful for users to edit/proofread the OCRed text, specially when using muti-page tif or pdf and plain text output.

Currently output pane allows corrections and editing and this output can be saved. However, for large files, it may not be possible to do so in a single session. As far as I know, it is not possible to reload this output in plain text mode.

It will be useful, if the program will allow the loading of output files (same filename with .txt extension) so that they can be edited later.

Also helpful in this scenario will be the facility of synced page image view and output text - specially if page breaks are saved with FF in the text file.

manisandro · 2018-02-14T19:11:26Z

As far as opening an existing text file is concerned, sure it could be done, but you could also just copy-paste into the text widget.

As far as setting the displayed image according to the cursor position in the output text (I suppose this is what was asked), first of all, what do you mean by "specially if page breaks are saved with FF in the text file."? In any event, I suppose it could be done assuming that the user chooses for the filename/page markers to be added to the text output and leaves them there while editing, but needless to say it is very fragile.

Shreeshrii · 2018-02-15T03:29:40Z

Please see tesseract-ocr/tesseract#1140 regarding use of FF as default page break.

I agree, that user could edit them out causing the implementation to be faulty. However, it still might be helpful for editing.

Thanks for pointing out a work around for editing long files. The output pane opens only after OCR, so user would have to OCR a page and then replace the text from a file.

I will give it a try.

Shreeshrii · 2018-02-15T03:33:01Z

By syncing, what I had in mind was that if the page counter for the image is changed using the up and down arrow control, then the output text should also advance to that page.

You are referring to its inverse, changing the image page based on cursor in output text.

manisandro · 2018-02-15T13:56:46Z

There actually is a button to open the output pane at any moment, just next to the recgonize button.

I think cursor -> image is more usefull in general? I.e. the typical workflow is that you go through the text and compare it to the image.

Shreeshrii · 2018-02-15T15:09:35Z

There actually is a button to open the output pane at any moment, just next to the recgonize button.

Thanks for pointing that out. Many features in the program that I am not aware of.

manisandro closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow output text file to be loaded later for editing #315

Allow output text file to be loaded later for editing #315

Shreeshrii commented Feb 14, 2018

manisandro commented Feb 14, 2018

Shreeshrii commented Feb 15, 2018

Shreeshrii commented Feb 15, 2018

manisandro commented Feb 15, 2018

Shreeshrii commented Feb 15, 2018

Allow output text file to be loaded later for editing #315

Allow output text file to be loaded later for editing #315

Comments

Shreeshrii commented Feb 14, 2018

manisandro commented Feb 14, 2018

Shreeshrii commented Feb 15, 2018

Shreeshrii commented Feb 15, 2018

manisandro commented Feb 15, 2018

Shreeshrii commented Feb 15, 2018