Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR Request: Vṛddha Sūryāruṇa Karma Vipāka | वृद्ध सूर्यारुण कर्म विपाक (1937) #49

Open
anooj27 opened this issue Mar 1, 2016 · 5 comments
Labels

Comments

@anooj27
Copy link

anooj27 commented Mar 1, 2016

Welcome! Answer all these questions.

Do you want a text OCR-ed? If yes continue below. If not, clear all this text and type your issue.

  • Has the text already been OCR-ed? Have you searched online (using both devanAgarI and latin transliterations)? For example in the large repositories of digitized texts listed here.
    • Answer: No, it hasn't been ocr-ed. Doesn't exists in any of these repositories.
  • If you commit to proof-read the result within a reasonable timeframe (say 1 month for 300 pages), we will be specially excited to oblige you. Are you willing to make such a commitment? See here to get an idea of what it involves.
    • Answer: I can surely give it a try but do not promise anything since I have serious health issues.
  • Are you OK with the scan quality that we currently offer? Or are you able to provide your own OCR text?
    • Answer:Yes I'm ok. No I don't have any means to ocr devanāgarī texts. If you have such a working software please let me have it. I have tried a few open sourced softwares in the past but didn't get much success.
  • What other factors should our OCR/ other volunteers consider when deciding whether to take up this request? In other words, what's so important about this text?
    • Answer: It is a very important text in Āyurveda and Jyotiṣa which lists the remedies in forms of mantras to many problems (ailments) in life.
  • Provide a link to the pdf of the printed book which needs to be OCR-ed. You can host the pdf online on several sites such as http://archive.org (← strongly recommended), http://dropbox.com or http://sites.google.com.
  • Now for some details: please enter the following metadata (use both devanAgarI and latin alphabet forms if you can).
    • Title: Vṛddha Sūryāruṇa Karma Vipāka | वृद्ध सूर्यारुण कर्म विपाक
    • Author: unknown
    • Commentator: Gaṅgāviṣṇu | गङ्गाविष्णु
  • Is there any other information you want to provide?
    • Answer: This is coming in the line of an ancient sage called Sūrya and most likely linked to those who were of Yāmala Bhāskara stream i.e., natives who worshiped Tantrik form of Sun and in whose sampradāya Maya (author of Sūrya Siddhānta) was from. Very important text. It would be appreciated if we could find someone to translate this later for the sake of humanity.

Ok - thanks for answering the above questions 🙏. Subscribe to this thread to stay updated.

@vvasuki
Copy link
Contributor

vvasuki commented Mar 1, 2016

Also uploaded to archive. Scan quality seems poor - so I expect OCR quality to be worse.

@vvasuki
Copy link
Contributor

vvasuki commented Mar 3, 2016

As predicted, the gocr tool produced poor output - https://github.com/sanskrit-coders/sanskrit-ocr-r0/blob/master/kalpa/sUryAruNa-karma-vikpAka/sUryAruNa-gocr.txt . Perhaps another software can do better?

@anooj27
Copy link
Author

anooj27 commented Mar 3, 2016

Sākhe,

As predicted, the gocr tool produced poor output - https://github.com/sanskrit-coders/sanskrit-ocr-r0/blob/master/kalpa/sUryAruNa-karma-vikpAka/sUryAruNa-gocr.txt.

However, although prima facie the first few pages look bad, but if you scroll down below, the sample ocr provided sounds decent, thus, needing very few human intervention (corrections).

Perhaps another software can do better?

Maybe?
Unfortunately I don't know/have such a software. I am myself looking for a (reliable) ocr software to ocr devanāgarī lipi. Do you have other alternatives w.r.t. ocr'ing devanāgarī texts?

@vvasuki
Copy link
Contributor

vvasuki commented Mar 6, 2016

Following conversation shows that the use of sanskritocr was quite unsuccessful as well -

2016-03-03 5:44 GMT+05:30 विश्वासो वासुकिजः (Vishvas Vasuki):
स्वस्ति ​मित्र श्रीवत्स,

https://github.com/sanskrit-coders/sanskrit-ocr-r0/issues/6 इत्यत्र तव तन्त्रांशस्य प्रयोगं प्रयतसे वा? कौतूहलमस्ति मे।​

​卍
​Vishvas /विश्वासः

हरिः‍ ॐ,
विश्वास,

चित्रेष्वेतेष्वनेके दोषास्सन्ति।

  1. scan dpi quite low. We need at least 300 dpi for good ocr. Almost all anusvAras eaten away. I didn't check the state of e ai, au, ardha repha with au, ardharepha with ai, ardharepha with anusvaara etc. but I can assume that they are quite bad.

  2. even if the dpi is low, many a time we get acceptable output if the scan images aren't warped. But many images are indeed warped.

    कतिपय पृष्टानां पाठोनेन सन्देशेन संलग्नः। यदि कामानुसारञ्चेच्छिष्टानामपि पृष्टानां OCR करिष्ये।

    Did about 200 odd pages but stopped the OCR when I found that the word detection was failing because of warped pages.

    May be a better idea for us to search for the book somewhere and scan it again. Else, we have to get the book keyed in. I know a good typist who is quite efficient. All these are proposals which you can place before the gentleman who requested for conversion of the images to text.

    Another example of how bad the work in DLI has been. Makes me extremely sad.

स्वस्ति,
भवानीभारती जयतेतमाम्,
श्रीवत्सः॥

@vvasuki vvasuki transferred this issue from sanskrit/sanskrit-ocr-r0 Oct 9, 2021
@vvasuki
Copy link
Contributor

vvasuki commented Oct 9, 2021

अन्ततो गत्वा +ऊट्टङ्कितपाठो लब्धो वा?

@vvasuki vvasuki added the OCR label Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants