Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal: iconv_open(UTF-8,UTF-8-MAC) failed, but needed: #50

Open
asmeurer opened this issue Jan 14, 2019 · 37 comments
Open

fatal: iconv_open(UTF-8,UTF-8-MAC) failed, but needed: #50

asmeurer opened this issue Jan 14, 2019 · 37 comments

Comments

@asmeurer
Copy link
Member

Sometimes when using git commands I get this error:

fatal: iconv_open(UTF-8,UTF-8-MAC) failed, but needed:
    precomposed unicode is not supported.
    If you want to use decomposed unicode, run
    "git config core.precomposeunicode false"

The error does not occur with other builds of git. I even built the same version of git from source and didn't have the issue.

This is on the osx build.

@scopatz
Copy link
Member

scopatz commented Jan 14, 2019

hmm weird.

@asmeurer
Copy link
Member Author

Also it seems to be repository dependent. Some repos exhibit the problem and some don't. Presumably the repos with the problem have a Unicode character somewhere (whether in a file or in the git metadata I don't know).

@SultanOrazbayev
Copy link

SultanOrazbayev commented Nov 3, 2019

I run into the same problem today. Pushing/pulling works fine, but when I check the status I get the same error message as in the top message. I'm running git version 2.23.0.

@SultanOrazbayev
Copy link

SultanOrazbayev commented Nov 3, 2019

So it turns out, in my case the problem was 'solved' by setting a local config:
config --local status.showUntrackedFiles no. My tracked files don't have unicodes in names, so now I don't get the error message.

For reference: https://www.git-tower.com/help/mac/faq-and-tips/faq/unicode-filenames

@phil-blain
Copy link
Contributor

phil-blain commented Nov 9, 2019

I just had the issue also after running git status in a repository with an untracked file with a non-breaking space in its name (created by acccident!)
After reading the above link from @SultanOrazbayev and a bit of googling and greping, I think I understand correctly the problem:

  1. When Git detects it is running on a Mac, it sets core.precomposeunicode to true in .git/config when doing git init or git clone, because this setting is needed for repos to be portable between macOS and Linux/Windows, since macOS uses decomposed Unicode (NFD) whereas Linux and Windows use precomposed unicode (NFC) (see link in above comment).

  2. When core.precomposeunicode is set, there is some special code that is activated in Git to make sure that the "precomposed" version of paths are stored in the repo so that the repo is portable. This code is in compat/precompose_utf8.c:

    $ cd ${path_to_git_source}
    $ git grep -n "UTF-8-MAC"
    compat/precompose_utf8.c:15:static const char *path_encoding = "UTF-8-MAC";
    $ git grep -A 4 -n "iconv_open(%s,%s) failed"
    compat/precompose_utf8.c:136:               die("iconv_open(%s,%s) failed, but needed:\n"
    compat/precompose_utf8.c-137-                       "    precomposed unicode is not supported.\n"
    compat/precompose_utf8.c-138-                       "    If you want to use decomposed unicode, run\n"
    compat/precompose_utf8.c-139-                       "    \"git config core.precomposeunicode false\"\n",
    compat/precompose_utf8.c-140-                       repo_encoding, path_encoding);
  3. This code calls iconv_open from libiconv to convert from the "Mac" version of UTF-8 ("UTF-8-MAC", decomposed) to UTF-8 (precomposed).
    However, only the Apple version of iconv is aware of this special "UTF-8-MAC" encoding :

    $ $CONDA_PREFIX/bin/iconv -l | grep UTF-8
    UTF-8
    $ /usr/bin/iconv -l | grep UTF-8
    UTF-8 UTF8
    UTF-8-MAC UTF8-MAC

So basically since we build Git with (GNU) iconv from conda-forge

--with-iconv="${PREFIX}/lib" \

- libiconv # [unix]

I don't think there is anyway around that error at the moment (GNU libiconv would have to add support for "UTF-8-MAC") [EDIT] and searching the libiconv mailing list reveals this probably won't happen [EDIT].

I did however find https://opensource.apple.com/tarballs/libiconv/ so maybe there could be a libiconv-mac feedstock that builds from opensource.apple.com and we would build Git against this libiconv just for macOS.
I'm not sure is this would actually be feasible in terms of licensing and glibc compatibility; I don't know enough about these subjects, but I thought it would be at least worth it to throw the idea.

@jakirkham
Copy link
Member

If someone could supply a reproducer, would expect that would help us address this issue.

@phil-blain
Copy link
Contributor

phil-blain commented Nov 10, 2019

@jakirkham :

mkdir test
cd test
git init
touch fileü
git status

@asmeurer
Copy link
Member Author

If it's on the Apple open source site does that mean it comes with macOS (or the developer tools)?

@phil-blain
Copy link
Contributor

It think it just means that Apple respects the terms of libiconv's license and thus since they modify it they make available their version.

I do think it comes with macOS.

@jakirkham
Copy link
Member

It's very hard to tell what is going on with stuff on that site unfortunately as there is no version control (at least none that I've seen).

@asmeurer
Copy link
Member Author

Can we confirm that it comes with macOS (and not just the developer tools)? If it does it seems the clear fix here is to link git against the system version.

@phil-blain
Copy link
Contributor

I will try to confirm that on a Mac without the developer tools installed later today.
I thought that stuff on conda-forge was not supposed to link against system libraries, that is why I suggested a new feedstock.

@asmeurer
Copy link
Member Author

I don't know the conda-forge policies, though I don't see why that would be a problem. It seems to me that having two separate libiconv packages would be hard to do.

@phil-blain
Copy link
Contributor

phil-blain commented Nov 13, 2019

So I checked on a Mac without the developer tools and it has iconv installed.
@scopatz, @jakirkham can you share some insights on this one ? Would it be ok to build Git against the system libiconv (on macOS only), or would it be better to create a separate "libiconv-apple" feedstock (or similar) to host the Apple version of libiconv that is necessary to build Git on Mac, and use this feedstock to build Git on macOS ?

@noloader
Copy link

noloader commented May 6, 2020

Sorry to step in on an old report.

The problem is, libiconv does not support UTF-8-Mac encoding. The maintainers don't approve of Apple's encoding so they left it out of the library. The maintainers don't appear to have a interest in supporting UTF-8-Mac. Apple supplies a patched version of libiconv.

Also see:

And the same maintainers declined to support UTF-8-Mac in Glibc. Also see Glibc Issue 14130, Add utf-8-mac encoding to iconv.

@asmeurer
Copy link
Member Author

Are the Apple patches public? We could patch the conda-forge libiconv if they are.

@phil-blain
Copy link
Contributor

I think it would be easier to just build against the system libiconv on macOS, no ?

@asmeurer
Copy link
Member Author

Probably. I don't know what the conda-forge policies are on either thing (building against system packages vs. shipping and also patching conda-forge shipped packages). We need to do one of them because right now the git feedstock is completely broken on Mac for any "bad" repo.

@noloader
Copy link

noloader commented May 12, 2020

Based on my observations from a repo called Build-Scripts. Build-Scripts supplies OpenSSH, Git, cURL, Wget and few other packages on platforms like CentOS 5, OS X 10.5 and Solaris 11. (I need the tools for other things I do on the older platforms).

If you are building non-trivial GNU packages, like iConv, GetText or pieces of Gnulib, then you should probably use libiconv-utf8mac because it provides UTF8-Mac support everywhere (Linux, Solaris, OS X, BSDs, etc).

If you don't use an UTF8-Mac enabled libiconv on OS X, then self tests in iConv, GetText or pieces of Gnulib will fail. This is the reason I started using libiconv-utf8mac. (Previously I was using GNU's libiconv).

If you use Apple's libiconv-59, then it is about 5 versions behind the current release. There are at least 3 CVE's present in Apple's libiconv-59 that have been cleared in GNU libiconv 1.16 (and libiconv-utf8mac). This is another reason I started using libiconv-utf8mac.

If you can build GNU's libiconv, then you can build libiconv-utf8mac. There's nothing special once you create the tarball. Or, grab the tarball form here.

@asmeurer
Copy link
Member Author

Is there a reason to provide utf8-mac on platforms other than Mac?

Also if we are talking about changing the conda-forge libiconv we should move the conversation over to the issue tracker here https://github.com/conda-forge/libiconv-feedstock. Or is the suggestion to make libiconv-utf8mac a separate package?

@analog-cbarber
Copy link

If you are working with a repo that already has files containing unicode characters is it safe to set core.precomposeunicode to false?

@phil-blain
Copy link
Contributor

@analog-cbarber I think asking that question on the Git mailing list (see https://git-scm.com/community) would be better, that way you would reach more knowledgeable people :)

@analog-cbarber
Copy link

analog-cbarber commented Jan 4, 2021

Thanks, although anyone who googles this issue is going to find this ticket first.

In my case, I think I can probably just rename the offending file.

@phil-blain
Copy link
Contributor

Indeed :) so I suggest adding a link to your post on the mailing list if you send one, if it gets useful answers.

@noloader
Copy link

noloader commented Jan 4, 2021

The Git folks are at the mercy of the Glibc folks and the iconv folks.

Also see Git 2.26.2 and failed self tests on OS X on the Git mailing list and Missing UTF-8-Mac is causing programs to fail on the iconv mailing list.

@analog-cbarber
Copy link

Yes, I saw those. Personally, I would never check in a file using non-ascii characters or spaces, but you don't have control over other people's repos.

@analog-cbarber
Copy link

I am not sure the setting matters for existing files in the repo, but setting precomposeunicode to false could result in decomposed unicode names being added if any other such files are added to your repo, so I would guess that this would not be especially recommended.

@asmeurer
Copy link
Member Author

asmeurer commented Jan 4, 2021

The second mailing list post you reference points to this patch from Apple https://opensource.apple.com/source/libiconv/libiconv-26/patches/macutf8.patch.auto.html. Is it sufficient to make the conda-forge libiconv recipe on Mac include this patch (and possibly others)?

I don't see any issues on the libiconv feedstock about this https://github.com/conda-forge/libiconv-feedstock

@noloader
Copy link

noloader commented Jan 4, 2021

Is it sufficient to make the conda-forge libiconv recipe on Mac include this patch (and possibly others)?

The Apple provided patch is for libiconv 1.11 from 2007.

If you want to build a modern libiconv with utf8-mac support, then use https://github.com/fumiyas/libiconv-utf8mac.

I eat my own dogfood. I use https://github.com/fumiyas/libiconv-utf8mac. Also see build-iconv-utf8mac.sh.

@asmeurer
Copy link
Member Author

asmeurer commented Jan 4, 2021

Is it possible to express that as a patch file on top of the existing libiconv sources? I imagine that will be easier to swallow for the libiconv recipe maintainers than using a forked source.

@analog-cbarber
Copy link

I use git from a conda environment, so I would really want a conda package. I don't want to hand patch my conda environments.

Obviously, it would be best if git just took care of this for you. Users shouldn't have to worry about patching binaries potentially in multiple places to make git work properly.

@AtomicNess123
Copy link

Is it possible to know which files are the troubling ones? I have thousands of files and would be nice if the error gave you a list of such files.

@SultanOrazbayev
Copy link

In my experience these were files that had non-ascii (but utf-8) symbols in their name (including weird non-ascii whitespace characters).

@AtomicNess123
Copy link

Thanks, yes, that seems to be the case. But it'd be nice to be told exactly which they are without having to search for them.

@SultanOrazbayev
Copy link

This seems to work:

LC_ALL=C find . -name '*[! -~]*'

(taken from https://unix.stackexchange.com/questions/109747/identify-files-with-non-ascii-or-non-printable-characters-in-file-name)

@AtomicNess123
Copy link

That's excellent. It'd be even more excellent that magit tells you this without you having to do it yourself :D

idiapbbb pushed a commit to bioidiap/bob.devtools that referenced this issue Oct 1, 2021
@sethrj
Copy link

sethrj commented Feb 21, 2022

I ran across this today using a spack-installed git (linked against its own libiconv) for the first time in one repo that uses https://github.com/zeux/pugixml:

$ git status
fatal: iconv_open(UTF-8,UTF-8-MAC) failed, but needed:
    precomposed unicode is not supported.
    If you want to use decomposed unicode, run
    "git config core.precomposeunicode false"
$ LC_ALL=C find . -name '*[! -~]*'
./tests/data/тест.xml

Thank you @SultanOrazbayev for the clever find command and @phil-blain for the detailed debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants