Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barcode scanning fails with "Unknown encoding" for ISO-8859-1 encoded data matrix #218

Open
dspoeri opened this issue Jan 10, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@dspoeri
Copy link

dspoeri commented Jan 10, 2021

The official German medication plan data matrix ("BMP", Bundeseinheitlicher Medikationsplan) expects data to be encoded with ISO-8859-1. If the data contains a German umlaut, Google Vision barcode scanning fails with an "Unknown encoding" error.

Scanning the following data matrix reproduces the bug:
barcode

This bug sadly renders Google Vision barcode scanning useless for the mentioned use case.

Two suggested solutions:

  • accept other encodings than ASCII and UTF-8
  • provide access to the raw data through a byte array
@GarryKelly
Copy link

Just saw this and it is similar to an issue reported last year . I commented on that here #44 (comment) Unfortunately meant the library just didnt work for our use cases.... Its a shame as its an excellent library otherwise....

I agree with the suggested solutions... it would be wonderful for the library to either support the ISO-8859-1 characterset as an option. Or else to provide access to the scanned data as a byte array without going through any character set conversions... Both options would allow reading of all barcodes

I noticed there was some new version com.google.firebase:firebase-ml-vision-barcode-model:16.1.2 released later in 2020 but havent had time to see if these provided that access...

@ivan200
Copy link

ivan200 commented Jan 13, 2021

At least com.google.mlkit:barcode-scanning:16.1.0
contains barcode.rawBytes

Returns raw bytes as it was encoded in the barcode.
Returns null if the raw bytes can not be determined.

so I think you can make
return String(barcode.rawBytes, StandardCharsets.ISO_8859_1)

@dspoeri
Copy link
Author

dspoeri commented Jan 13, 2021

At least com.google.mlkit:barcode-scanning:16.1.0
contains barcode.rawBytes

Returns raw bytes as it was encoded in the barcode.
Returns null if the raw bytes can not be determined.

so I think you can make
return String(barcode.rawBytes, StandardCharsets.ISO_8859_1)

It doesn't help:
rawBytes returns an array with 16 bytes representing the string Unknown encoding.

@cs-googler
Copy link

Hi, we are working on a fix internally.

@cs-googler cs-googler added the enhancement New feature or request label Apr 5, 2021
@pke
Copy link

pke commented Mar 14, 2023

so @cs-googler how is the internal fix going? How about letting the user specify the encoding via BarcodeScannerOptions?

@mattemyoo
Copy link

mattemyoo commented Dec 9, 2024

@cs-googler

Hello. We are also trying to scan barcodes that include letters with umlauts.

When we are scanning a barcode that contains "ä", it becomes a "d". Even in the rawBytes array, we are getting the value 100, which corresponds to ASCII character "d".

@pke Did you get any new information about this?

@mattemyoo
Copy link

So, I did some research, and it seems like the int type that you are using is only covering the first 128 characters, and then the value starts from 0 again.

For example, the character "µ" has the actual ASCII number 181, but in the rawBytes, I am getting the value 53 (ASCII char "5").

Another example is the inverted exclamation mark (¡ with ASCII number 161). The actual rawByte I am receiving is 33 (161 - 128 = 33)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants