-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving file doesn't work #2
Comments
Hi, thanks for your response. Unfortunately the project is in a very early stage - currently it is more a "hOCR viewer" instead of a "editor". It currently implements just the basic ideas of a OCR-Web-Proofreader, to see what's possible. When having the time, I'll continue developing it. Help is welcome ;-) To your question: Saving of documents will be out of scope of this project anyway. This project covers just the frontend part of the editor (to be embedded in other applications). Providing a backend storage is your part ;-) |
Hi! Thanks for reply! I see, I'll keep following the project and mention it to colleagues in Helsinki who work with similar topics. We have quite many books that should be proofread, and I haven't found a very well working solution to proofread hOCR output from Tesseract. Ideally the output would be saved with page coordinates as well, but I know that gets messy after manual edits. I liked very much how navigating the text was implemented here. In principle setting up the backend is no problem either, good luck with your project! |
Yes, same to me. That was also my intention to start this project as I didn't find a good existing solution. It was also my plan to keep the page coordinates as good as possible. I.e. split the bounding boxes when inserting a whitespace, and allow manually editing/correcting the bounding-boxes, etc. One goal is, to render Image-With-Text-Beyond PDFs from those hOCRs - so the coordinates are very important. It would be great to find some more developers interested in this - the current implementation is just a ~450 line pure JavaScript using recent browser features, so it's quite manageable. ;-) |
I got the hocr-proofreader display my files very nicely, and I'll still experiment with it quite a bit. Great work! The bounding box problems seem common to all editors, but I agree, having the coordinates is very important. Drawing them manually sounds like a good idea, I think I haven't seen that option in other editors. I'll come up with some solution to save the hocr file for now, I'll also look deeper into JavaScript, although I'm not so familiar with it. Anyway I like very much how it is rather lightweight and does the basic document navigation so painlessly. I'll keep you updated. In case you are curious, I'm working in Helsinki with Tesseract models for one alphabet used in Soviet Union for Komi-Zyrian language at 1920s. I'm getting to the point where proofreading starts to be sensible, so I'm looking into all alternatives. |
Thanks. Cool, very interesting :-) |
Hi! When I click the Save-button I get the message:
The editor is very nicely designed and I would like to test it further.
Thanks for your useful work!
The text was updated successfully, but these errors were encountered: