Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow output text file to be loaded later for editing #315

Closed
Shreeshrii opened this issue Feb 14, 2018 · 5 comments
Closed

Allow output text file to be loaded later for editing #315

Shreeshrii opened this issue Feb 14, 2018 · 5 comments

Comments

@Shreeshrii
Copy link
Contributor

This is a request for new feature.

This will be helpful for users to edit/proofread the OCRed text, specially when using muti-page tif or pdf and plain text output.

Currently output pane allows corrections and editing and this output can be saved. However, for large files, it may not be possible to do so in a single session. As far as I know, it is not possible to reload this output in plain text mode.

It will be useful, if the program will allow the loading of output files (same filename with .txt extension) so that they can be edited later.

Also helpful in this scenario will be the facility of synced page image view and output text - specially if page breaks are saved with FF in the text file.

@manisandro
Copy link
Owner

As far as opening an existing text file is concerned, sure it could be done, but you could also just copy-paste into the text widget.

As far as setting the displayed image according to the cursor position in the output text (I suppose this is what was asked), first of all, what do you mean by "specially if page breaks are saved with FF in the text file."? In any event, I suppose it could be done assuming that the user chooses for the filename/page markers to be added to the text output and leaves them there while editing, but needless to say it is very fragile.

@Shreeshrii
Copy link
Contributor Author

Please see tesseract-ocr/tesseract#1140 regarding use of FF as default page break.

I agree, that user could edit them out causing the implementation to be faulty. However, it still might be helpful for editing.

Thanks for pointing out a work around for editing long files. The output pane opens only after OCR, so user would have to OCR a page and then replace the text from a file.

I will give it a try.

@Shreeshrii
Copy link
Contributor Author

By syncing, what I had in mind was that if the page counter for the image is changed using the up and down arrow control, then the output text should also advance to that page.

You are referring to its inverse, changing the image page based on cursor in output text.

@manisandro
Copy link
Owner

There actually is a button to open the output pane at any moment, just next to the recgonize button.

I think cursor -> image is more usefull in general? I.e. the typical workflow is that you go through the text and compare it to the image.

@Shreeshrii
Copy link
Contributor Author

There actually is a button to open the output pane at any moment, just next to the recgonize button.

Thanks for pointing that out. Many features in the program that I am not aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants