Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the en_box_ocr? #3

Open
sky-fly97 opened this issue Jul 11, 2024 · 2 comments
Open

Question about the en_box_ocr? #3

sky-fly97 opened this issue Jul 11, 2024 · 2 comments

Comments

@sky-fly97
Copy link

Hello, I was looking at the coordinates of the en_box_cor task and noticed that they don't seem to match very well, what is the format of the coordinates and am I corresponding to the wrong one?
"image": "en_45.png", "conversations": [ { "from": "human", "value": "<image>\nGive the OCR results of the box [113, 660, 886, 914]." }, { "from": "gpt", "value": "10 This operation is sometimes called śabdātmikā bhāvanā (literally, verbal effectuation). See Kumārila’s Tantravārttika \non Mīmāṃsāsūtra 2.1.1 (TV II.344.14-16): “Of those, that operation, which belongs to linguistic expressions like \nexhortative verbal endings and which motivates a person towards object-directed effectuation (arthātmikā bhāvanā), is \nthe second kind of effectuation (bhāvanā), which is a property of linguistic expressions, which has the nature of \ndesignation (abhidhā), and which is said to be the vidhi.” (tatrārthātmikāyāṃ bhāvānāyāṃ liṅādiśabdānāṃ yaḥ puruṣaṃ prati \nprayojakavyāpāraḥ, sā dvitīyā śabdadharmo ’bhidhātmikā bhāvanā vidhir ucyate|) \n11 Throughout this essay, I will take the meaning (artha) of any linguistic expression (roughly, speaking) to be the object \nthat is conveyed to a linguistically competent hearer by that expression. On the Prābhākara view that Maṇḍana \npresents, the meaning of exhortative verbal endings is niyoga or injunction, also referred to as apūrva (VVS §7, or \nVVMG 36-78). When an agent hears an exhortative verbal ending in the context of an exhortation addressed to her, \nshe undergoes an awareness-event of the form: “I am enjoined’’ (niyukto ’smi). Such self-ascriptions are supposed to \ntrack an entity called injunction—according to one interpretation, something to be done or brought about—which is \nnot accessible by any means of knowing other than language, and, unlike other entities that are part of the natural \nfabric of reality, does not exist in the past, the present, or the future. For discussion of this view in its sources, see \nPrabhākara Miśra’s sub-commentary Bṛhatī on Mīmāṃsāsūtra 2.1.5 along with Śālikanātha Miśra’s Ṛjuvimalā (Bṛ 319-\n324) and the second chapter of Śālikanātha’s Vākyārthamātṛkā in Prakaraṇapañcikā (PP 417-450). For discussion of \nPrabhākara’s view, see Clooney (1990, pp. 245ff) and Yoshimizu (1997, pp. 96ff)." } ]
image

@ucaslcl
Copy link
Owner

ucaslcl commented Jul 11, 2024

The coordinates are the top-left corner and the bottom-right corner, and their values are normalized by the image width/height. You can resize the image into (1000, 1000) and draw the box again.

@sky-fly97
Copy link
Author

sky-fly97 commented Jul 11, 2024

The coordinates are the top-left corner and the bottom-right corner, and their values are normalized by the image width/height. You can resize the image into (1000, 1000) and draw the box again.

Well, I don't see the relevant information in the readme, which, as shown in the example therein, may lead to a usage error.

prompt = ann["conversations"][0]["value"] image_file = ann["image"] image_file_path = os.path.join(args.image_path, image_file) image = load_image(image_file_path) outputs = model.generate(image, prompt)

By the way, I'd like to ask if there are test results for other models to share in Table.4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants