Can't read Data from img #91

itlaosiji · 2017-11-06T08:45:20Z

don't read this website verifycode:
http://www.miitbeian.gov.cn/getVerifyCode?73
i try you offer img is OK ,i need you help.please .

you can save this website verify code XXX.JPG

thiagoalessio · 2017-11-06T21:20:15Z

hehehe trying to break captchas my friend ;D
Using the issues of this repo is a bit out of scope, but here it goes:

Given the original picture:

Cleaning

You'll need to use imagemagick (or something similar) to clean up the picture noise before sending to tesseract.
That can be achieved by playing around with some filters (a sequence of modulate, contrast-stretch and gaussian-blur) in order to minimize (or even get rid of) the thin strokes that compromise text recognition, for example:

$ convert -colorspace gray -modulate 120 -contrast-stretch 10%x80% -modulate 140 -gaussian-blur 1 -contrast-stretch 5%x50% +repage -negate -gaussian-blur 4 -negate -modulate 130 original.jpeg clean.jpeg

would give you the following image:

Recognizing

Now pass the clean image to tesseract:

echo (new TesseractOCR('clean.jpeg'))->run();
// outputs 655V,3A

There is an undesired comma (,) on the output, because the cleaning wasn't 100% perfect.
But since you know that this particular captcha is only formed of numbers and uppercase letters, you can give this hint to tesseract, making the recognition more effective:

echo (new TesseractOCR('clean.jpeg'))->whitelist(range(0, 9), range('A', 'Z'))->run();
// outputs 655V3A

And there you have it ... But I have to tell you, it will not work everytime. So make sure you collect a large number of captchas from this source, build the best cleaning sequence of filters you can, and prepare your code to keep trying new captchas until it succeeds.

itlaosiji · 2017-11-07T02:02:03Z

thank you verymuch

itlaosiji changed the title ~~Can't read Data from png~~ Can't read Data from img Nov 6, 2017

thiagoalessio closed this as completed Nov 6, 2017

thiagoalessio mentioned this issue Nov 22, 2017

Get text from image #93

Closed

thiagoalessio added the invalid Not a real issue on the library label Feb 18, 2020

Repository owner locked as off-topic and limited conversation to collaborators Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't read Data from img #91

Can't read Data from img #91

itlaosiji commented Nov 6, 2017 •

edited

Loading

thiagoalessio commented Nov 6, 2017

itlaosiji commented Nov 7, 2017

Can't read Data from img #91

Can't read Data from img #91

Comments

itlaosiji commented Nov 6, 2017 • edited Loading

thiagoalessio commented Nov 6, 2017

itlaosiji commented Nov 7, 2017

itlaosiji commented Nov 6, 2017 •

edited

Loading