Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"invalid multibyte sequence" reading generated PO file #131

Open
delmicio opened this issue Sep 20, 2016 · 4 comments
Open

"invalid multibyte sequence" reading generated PO file #131

delmicio opened this issue Sep 20, 2016 · 4 comments

Comments

@delmicio
Copy link
Contributor

delmicio commented Sep 20, 2016

When I try to open the generated PO file is corrupted and can't open it with for ex.: poedit.
"invalid multibyte sequence"

I think is a problem with the encoding.

This code I use to generate the file:

use Gettext\Extractors;
use Gettext\Translations;

if(!isset(Extractors\PhpCode::$functions['_'])) {
    Extractors\PhpCode::$functions['_'] = '__';
}

$translations = new Translations();
$translations->setLanguage('en');
$translations->setHeader('Report-Msgid-Bugs-To', '[email protected]');

$files = array_merge(
    glob(__DIR__.'/*.php'),
    glob(__DIR__.'/*.PHP'),
    glob(__DIR__.'/*.html'),
    glob(__DIR__.'/*.HTML'),
    glob(__DIR__.'/*.htm'),
    glob(__DIR__.'/*.HTM')
);
foreach ($files as $key => $file) {
    if (strpos($file, '/vendor') === 0) continue;
    $translations->addFromPhpCodeFile($file);
}

//And then, export all translations in a single .po file
$translations->toPoFile('file.po');

I must say that not of all my project files are UTF-8 encoded ¿may be this is the problem?

USING v3

@oscarotero
Copy link
Member

I never had this problem with SublimeText, so maybe it's due to multiple encoded files in your projects (I don't know). SublimeText allows to reopen the file using a specific encoding, so reopening with utf-8 should'n report this error.

@delmicio
Copy link
Contributor Author

@oscarotero I really don't care much about SublimeText, was just a comment.
I've solved with not allowing different encoding than UTF8.

$filecontent = file_get_contents($file);
$encoding = mb_detect_encoding($filecontent, 'ISO-8859-1, UTF-8', true);
if($encoding && $encoding != 'UTF-8') {
    $filecontent = mb_convert_encoding($filecontent, 'UTF-8', $encoding);
    $encoding = 'UTF-8';
}
if($encoding == 'UTF-8') {
    $translations->addFromPhpCodeString($filecontent);
}

But this makes it slower.
May be there is already an option for this, like xgettex has --from-code=UTF-8

@lxg
Copy link
Contributor

lxg commented Jun 29, 2017

@delmicio This is a problem with your source files and the way you create the catalog.

  • By default, GNU Gettext and Poedit expect your input strings to be ASCII only.
  • When you create a .po file where the input strings have non-ASCII characters, you must set an appropriate header.

So you must either fix your source files or set the the correct header before creating the .po file:

$translations->setHeader("Content-Type", "text/plain; charset=UTF-8");

@oscarotero
Copy link
Member

In theory, this header is added by default. https://github.com/oscarotero/Gettext/blob/master/src/Translations.php#L109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants