Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haravijaya proofreading #38

Open
ppasedach opened this issue Dec 16, 2020 · 6 comments
Open

haravijaya proofreading #38

ppasedach opened this issue Dec 16, 2020 · 6 comments

Comments

@ppasedach
Copy link

I would like to correct a few cantos of the Haravijaya OCR text, if that is welcome. I would start with canto 49, which is the last one missing in the incomplete e-text produced many years ago by Diwakar and Rabi Acharya. If it goes well, I would in due course correct a few more cantos. I am planning to then convert them to IAST, and integrate them into my electronic digital critical edition of the Haravijaya (work in progress of course). Anyway, I have started with the first few verses, but am not sure how I should encode the footnote markers, which, I am afraid, usually break in OCR. Should I maybe just leave them out, and whoever is interested in the variant readings will have to consult the scan of the edition, or later the electronic critical edition, anyways? Adding the numbers in the running text would make these specific words ungreppable sort of. Or do you have any convention for that?

Selection_087

वक्रारविन्दनिमितं करगाढसैद्ध-
पार्श्वद्वयं युधि बभार स पाञ्चजन्यम् ।
वैरिञ्चमण्डमिव निःश्वसितानिलोल-
पर्यस्यमानमुदरान्तरतो विनिर्यत् ॥ २ ॥

Here the ru together with footnote marker 3 was OCRed as sai.

@vvasuki
Copy link
Contributor

vvasuki commented Dec 17, 2020

Namaste!

Very happy to know that you are proofreading the text. I've converted the text to markdown and separated chapters into sections for convenience. Please keep sending periodic pull requests.

Of course, leaving out variant readings is an option. Otherwise you can do one of the following:

I personally prefer this modification of convention followed at sanskritdocuments website:

स-शङ्ख-चक्रं सकिरीट-कुण्डलं  
सपीत-वस्त्रं सरसी-रुहेक्षणम् ।  
सहारवक्षःस्थल-कौस्तुभश्रियं+++(var  स्थलशोभिकौस्तुभं)+++    
नमामि विष्णुं शिरसा चतुर्-भुजम् ॥ ६॥  

This renders as
image

@vvasuki
Copy link
Contributor

vvasuki commented Aug 11, 2021

Looks like TEI was adapted and that the text is available at - https://github.com/ppasedach/ratnakara-tei.git

@vvasuki vvasuki closed this as completed Aug 11, 2021
@ppasedach
Copy link
Author

Looks like TEI was adapted and that the text is available at - https://github.com/ppasedach/ratnakara-tei.git

No, this is not what has happened. I did not yet get to further working on your OCRed text. What you see in the ratnakara-tei repository, or, properly displayed using Charles Li's upama engine is for the major part an old e-text produced by Diwakar and Rabi Acharya. I converted it from velthuis encoding to IAST, and added TEI markup. But it lacks the commentary, and a few cantos. Some other cantos have been recently typed in from various manuscript sources, which is an ongoing process.

Particularly for those cantos missing in the old e-text I should sometime soon create something similar from your raw file, and I'd then like to do that in such a way that corrections which are made can then be reintegrated into your repository, which is one reason that has stopped me from doing it so far. It is much easier to just perform some conversions and corrections on a piece of text, and forget about the original source. If one wants to incorporate the changes to the original, one will need a more thought-out approach.

@vvasuki
Copy link
Contributor

vvasuki commented Aug 11, 2021

Particularly for those cantos missing in the old e-text I should sometime soon create something similar from your raw file, and I'd then like to do that in such a way that corrections which are made can then be reintegrated into your repository, which is one reason that has stopped me from doing it so far.

Ah I see - so I presume that you will add the missing canto-s to your TEI repo, and we can then use our regular TEI-to-markdown scripts to update our text. Please update this thread to notify me once this can be done. Curious to know your name, BTW.

@ppasedach
Copy link
Author

You can call me Peter. https://www.aai.uni-hamburg.de/indtib/personen/pasedach.html . Yes, that would probably be an easier approach, at least on my end. But my TEI will be encoded as IAST, if that's not a problem for you? In Upama you can switch to Devanāgarī display though, but I'm afraid not for export. Do you actually train your OCR with corrections?

@vvasuki
Copy link
Contributor

vvasuki commented Aug 11, 2021

You can call me Peter. https://www.aai.uni-hamburg.de/indtib/personen/pasedach.html .

Pleased to e-meet you!

Yes, that would probably be an easier approach, at least on my end. But my TEI will be encoded as IAST, if that's not a problem for you?

No problem - my script will transliterate.

Do you actually train your OCR with corrections?

No - just whatever I get with Google Vision or Google Drive.

@vvasuki vvasuki reopened this Jun 14, 2022
@vvasuki vvasuki changed the title How to encode footnote markers? haravijaya proofreading Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants