Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving file doesn't work #2

Open
nikopartanen opened this issue May 30, 2018 · 5 comments
Open

Saving file doesn't work #2

nikopartanen opened this issue May 30, 2018 · 5 comments

Comments

@nikopartanen
Copy link

Hi! When I click the Save-button I get the message:

Failed to load resource: the server responded with a status of 404 (Not Found)
/hocr-proofreader/save.php:1 

The editor is very nicely designed and I would like to test it further.

Thanks for your useful work!

@not-implemented
Copy link
Owner

Hi, thanks for your response. Unfortunately the project is in a very early stage - currently it is more a "hOCR viewer" instead of a "editor". It currently implements just the basic ideas of a OCR-Web-Proofreader, to see what's possible.

When having the time, I'll continue developing it. Help is welcome ;-)

To your question: Saving of documents will be out of scope of this project anyway. This project covers just the frontend part of the editor (to be embedded in other applications). Providing a backend storage is your part ;-)

@nikopartanen
Copy link
Author

Hi! Thanks for reply! I see, I'll keep following the project and mention it to colleagues in Helsinki who work with similar topics. We have quite many books that should be proofread, and I haven't found a very well working solution to proofread hOCR output from Tesseract. Ideally the output would be saved with page coordinates as well, but I know that gets messy after manual edits. I liked very much how navigating the text was implemented here. In principle setting up the backend is no problem either, good luck with your project!

@not-implemented
Copy link
Owner

Yes, same to me. That was also my intention to start this project as I didn't find a good existing solution.

It was also my plan to keep the page coordinates as good as possible. I.e. split the bounding boxes when inserting a whitespace, and allow manually editing/correcting the bounding-boxes, etc. One goal is, to render Image-With-Text-Beyond PDFs from those hOCRs - so the coordinates are very important.

It would be great to find some more developers interested in this - the current implementation is just a ~450 line pure JavaScript using recent browser features, so it's quite manageable. ;-)

@nikopartanen
Copy link
Author

I got the hocr-proofreader display my files very nicely, and I'll still experiment with it quite a bit. Great work! The bounding box problems seem common to all editors, but I agree, having the coordinates is very important. Drawing them manually sounds like a good idea, I think I haven't seen that option in other editors.

I'll come up with some solution to save the hocr file for now, I'll also look deeper into JavaScript, although I'm not so familiar with it. Anyway I like very much how it is rather lightweight and does the basic document navigation so painlessly. I'll keep you updated.

In case you are curious, I'm working in Helsinki with Tesseract models for one alphabet used in Soviet Union for Komi-Zyrian language at 1920s. I'm getting to the point where proofreading starts to be sensible, so I'm looking into all alternatives.

image

@not-implemented
Copy link
Owner

Thanks. Cool, very interesting :-)

Repository owner deleted a comment from nikhilchh Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@not-implemented @nikopartanen and others