haravijaya proofreading #38

ppasedach · 2020-12-16T21:48:13Z

I would like to correct a few cantos of the Haravijaya OCR text, if that is welcome. I would start with canto 49, which is the last one missing in the incomplete e-text produced many years ago by Diwakar and Rabi Acharya. If it goes well, I would in due course correct a few more cantos. I am planning to then convert them to IAST, and integrate them into my electronic digital critical edition of the Haravijaya (work in progress of course). Anyway, I have started with the first few verses, but am not sure how I should encode the footnote markers, which, I am afraid, usually break in OCR. Should I maybe just leave them out, and whoever is interested in the variant readings will have to consult the scan of the edition, or later the electronic critical edition, anyways? Adding the numbers in the running text would make these specific words ungreppable sort of. Or do you have any convention for that?

वक्रारविन्दनिमितं करगाढसैद्ध-
पार्श्वद्वयं युधि बभार स पाञ्चजन्यम् ।
वैरिञ्चमण्डमिव निःश्वसितानिलोल-
पर्यस्यमानमुदरान्तरतो विनिर्यत् ॥ २ ॥

Here the ru together with footnote marker 3 was OCRed as sai.

The text was updated successfully, but these errors were encountered:

vvasuki · 2020-12-17T02:03:51Z

Namaste!

Very happy to know that you are proofreading the text. I've converted the text to markdown and separated chapters into sections for convenience. Please keep sending periodic pull requests.

Of course, leaving out variant readings is an option. Otherwise you can do one of the following:

you can use markdown conventions for footnotes - https://www.markdownguide.org/extended-syntax/#footnotes . To avoid breaking grep, you can move the footnote marker to the next space.
Otherwise, even a simple parenthesized para below the shloka suffices.

I personally prefer this modification of convention followed at sanskritdocuments website:

स-शङ्ख-चक्रं सकिरीट-कुण्डलं  
सपीत-वस्त्रं सरसी-रुहेक्षणम् ।  
सहारवक्षःस्थल-कौस्तुभश्रियं+++(var  स्थलशोभिकौस्तुभं)+++    
नमामि विष्णुं शिरसा चतुर्-भुजम् ॥ ६॥

This renders as

vvasuki · 2021-08-11T06:03:55Z

Looks like TEI was adapted and that the text is available at - https://github.com/ppasedach/ratnakara-tei.git

ppasedach · 2021-08-11T06:44:47Z

Looks like TEI was adapted and that the text is available at - https://github.com/ppasedach/ratnakara-tei.git

No, this is not what has happened. I did not yet get to further working on your OCRed text. What you see in the ratnakara-tei repository, or, properly displayed using Charles Li's upama engine is for the major part an old e-text produced by Diwakar and Rabi Acharya. I converted it from velthuis encoding to IAST, and added TEI markup. But it lacks the commentary, and a few cantos. Some other cantos have been recently typed in from various manuscript sources, which is an ongoing process.

Particularly for those cantos missing in the old e-text I should sometime soon create something similar from your raw file, and I'd then like to do that in such a way that corrections which are made can then be reintegrated into your repository, which is one reason that has stopped me from doing it so far. It is much easier to just perform some conversions and corrections on a piece of text, and forget about the original source. If one wants to incorporate the changes to the original, one will need a more thought-out approach.

vvasuki · 2021-08-11T08:49:37Z

Particularly for those cantos missing in the old e-text I should sometime soon create something similar from your raw file, and I'd then like to do that in such a way that corrections which are made can then be reintegrated into your repository, which is one reason that has stopped me from doing it so far.

Ah I see - so I presume that you will add the missing canto-s to your TEI repo, and we can then use our regular TEI-to-markdown scripts to update our text. Please update this thread to notify me once this can be done. Curious to know your name, BTW.

ppasedach · 2021-08-11T10:52:56Z

You can call me Peter. https://www.aai.uni-hamburg.de/indtib/personen/pasedach.html . Yes, that would probably be an easier approach, at least on my end. But my TEI will be encoded as IAST, if that's not a problem for you? In Upama you can switch to Devanāgarī display though, but I'm afraid not for export. Do you actually train your OCR with corrections?

vvasuki · 2021-08-11T16:10:23Z

You can call me Peter. https://www.aai.uni-hamburg.de/indtib/personen/pasedach.html .

Pleased to e-meet you!

Yes, that would probably be an easier approach, at least on my end. But my TEI will be encoded as IAST, if that's not a problem for you?

No problem - my script will transliterate.

Do you actually train your OCR with corrections?

No - just whatever I get with Google Vision or Google Drive.

vvasuki closed this as completed Aug 11, 2021

vvasuki reopened this Jun 14, 2022

vvasuki changed the title ~~How to encode footnote markers?~~ haravijaya proofreading Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

haravijaya proofreading #38

haravijaya proofreading #38

ppasedach commented Dec 16, 2020

vvasuki commented Dec 17, 2020

vvasuki commented Aug 11, 2021

ppasedach commented Aug 11, 2021

vvasuki commented Aug 11, 2021

ppasedach commented Aug 11, 2021

vvasuki commented Aug 11, 2021

haravijaya proofreading #38

haravijaya proofreading #38

Comments

ppasedach commented Dec 16, 2020

vvasuki commented Dec 17, 2020

vvasuki commented Aug 11, 2021

ppasedach commented Aug 11, 2021

vvasuki commented Aug 11, 2021

ppasedach commented Aug 11, 2021

vvasuki commented Aug 11, 2021