Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting ICU dependency #154

Closed
caesay opened this issue Mar 10, 2024 · 4 comments
Closed

Revisiting ICU dependency #154

caesay opened this issue Mar 10, 2024 · 4 comments

Comments

@caesay
Copy link
Contributor

caesay commented Mar 10, 2024

Hi there,

I didn't just want to open a PR because we've gone back and forth on this a bit already. The current solution helped for Windows, but I had a user who just could not stomach the ICU dependency on apple clang. It was very complicated for them to set up with cmake, and added 30+mb's to their app size just for one function. I added a few more escape hatches in the following code. Since my userbase will largely be using Qt, and Qt already has string functions, it makes sense to use them. Also, Since I still don't actually need unicode support for what I'm doing, I've also added an option to fall back to the ASCII version. Let me know if you would like me to clean this up as a PR or not.

#if defined(QT_CORE_LIB)

#include <QString>
static std::string VeloString_ToLower(std::string_view s)
{
    QString t = QString::fromStdString(std::string { s });
    return t.toLower().toStdString();
}

static std::string VeloString_ToUpper(std::string_view s)
{
    QString t = QString::fromStdString(std::string { s });
    return t.toUpper().toStdString();
}

#elif defined(_WIN32)

#include <Windows.h>
static std::string VeloString_Win32LCMap(std::string_view s, DWORD flags)
{
    int size = MultiByteToWideChar(CP_UTF8, 0, s.data(), (int)s.size(), nullptr, 0);
    std::wstring wide(size, 0);
    MultiByteToWideChar(CP_UTF8, 0, s.data(), (int)s.size(), wide.data(), size);
    size = LCMapStringEx(LOCALE_NAME_SYSTEM_DEFAULT, LCMAP_LINGUISTIC_CASING | flags, wide.data(), size, nullptr, 0, nullptr, nullptr, 0);
    std::wstring wideResult(size, 0);
    LCMapStringEx(LOCALE_NAME_SYSTEM_DEFAULT, LCMAP_LINGUISTIC_CASING | flags, wide.data(), wide.size(), wideResult.data(), size, nullptr, nullptr, 0);
    int resultSize = WideCharToMultiByte(CP_UTF8, 0, wideResult.data(), size, nullptr, 0, nullptr, nullptr);
    std::string result(resultSize, 0);
    WideCharToMultiByte(CP_UTF8, 0, wideResult.data(), size, result.data(), resultSize, nullptr, nullptr);
    return result;
}

static std::string VeloString_ToLower(std::string_view s)
{
    return VeloString_Win32LCMap(s, LCMAP_LOWERCASE);
}

static std::string VeloString_ToUpper(std::string_view s)
{
    return VeloString_Win32LCMap(s, LCMAP_UPPERCASE);
}

#elif defined(VELOPACK_NO_ICU)

static std::string VeloString_ToLower(std::string_view s)
{
    std::string data(s);
    std::transform(data.begin(), data.end(), data.begin(), [](unsigned char c){ return std::tolower(c); });
    return data;
}

static std::string VeloString_ToUpper(std::string_view s)
{
    std::string data(s);
    std::transform(data.begin(), data.end(), data.begin(), [](unsigned char c){ return std::toupper(c); });
    return data;
}

#else

#include <unicode/unistr.h>

static std::string VeloString_ToLower(std::string_view s)
{
    std::string result;
    return icu::UnicodeString::fromUTF8(s).toLower().toUTF8String(result);
}

static std::string VeloString_ToUpper(std::string_view s)
{
    std::string result;
    return icu::UnicodeString::fromUTF8(s).toUpper().toUTF8String(result);
}

#endif
@pfusik
Copy link
Collaborator

pfusik commented Mar 11, 2024

Thanks for discussing it first!

Ideally, the Fusion outputs would only depend on the target language and its library (including the OS library). Unfortunately, C++ doesn't have Unicode case operations built-in.

Qt is an option. I've even considered making an option to emit QString and Qt collections instead of their STL counterparts. But, if that's for macOS/iOS support, don't these OSes provide functions for Unicode case change, just like Windows does? I think that would be better (and more performant) than relying on Qt.

As for conditionally switching to ASCII-only mode, I'm against it. If the use case is case-insensitive comparison against ASCII constants, we should directly implement that (#151). fut could then emit ASCII-only functions when it detects that one side of the equation is an ASCII literal.

@caesay
Copy link
Contributor Author

caesay commented Mar 11, 2024

There's nothing in the posix or C++ standard for this. As far as the code I provided above, I feel like if Qt is available, it should be the default (rather than the OS), because a cross-platform Qt application likely favors having consistency across platforms. For example, the current locale changes the results of localisation, and if the developer had created some configurable Locale for the user in the app settings, if we use Qt everywhere this will provide a uniform user experience.

I have not tested the following code, but I think it's possible to use something like this with the CoreFoundation lib on OSX. Foundation is Obj-C only, but CoreFoundation is pure C.

#include <CoreFoundation/CoreFoundation.h>
#include <iostream>
#include <string>
#include <vector>

std::string toLowerCF(const std::string& input) {
    // Convert std::string to CFStringRef
    CFStringRef inputStringRef = CFStringCreateWithBytes(
        kCFAllocatorDefault,
        reinterpret_cast<const UInt8*>(input.c_str()),
        input.length(),
        kCFStringEncodingUTF8,
        false
    );

    // Convert to lowercase
    CFMutableStringRef lowerCaseStringRef = CFStringCreateMutableCopy(NULL, 0, inputStringRef);
    CFStringLowercase(lowerCaseStringRef, nullptr); // nullptr for the current locale

    // Calculate the buffer size needed for the lowercased string
    CFIndex length = CFStringGetLength(lowerCaseStringRef);
    CFIndex maxSize = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8) + 1; // +1 for null terminator

    // Use a std::vector<char> as the buffer
    std::vector<char> buffer(maxSize);

    // Convert CFStringRef back to std::string safely, checking for successful conversion
    CFStringGetCString(lowerCaseStringRef, buffer.data(), maxSize, kCFStringEncodingUTF8);

    // Clean up Core Foundation objects
    CFRelease(inputStringRef);
    CFRelease(lowerCaseStringRef);

    return std::string(buffer.data());
}

This of course does not solve the problem for Linux, however for me having Linux support via Qt is ok.

@pfusik
Copy link
Collaborator

pfusik commented Mar 11, 2024

As you already have noticed, the results are not 100% identical between the languages. For example, in https://github.com/fusionlanguage/fut/blob/4a3d11a412e909b86fb8b5119eeeed15238eb563/test/StringToLower.fu the input is not the full string from the comment, because that breaks on Node.js.

And it is not Fusion's goal to have zero differences between languages. Relying on Qt is ok for you, but for other people that's quite a heavy dependency. ICU is a much smaller package and it comes preinstalled on the Ubuntus in our CI.

It's also unclear to me that on one hand you propose limiting to ASCII letters, on the other you are worried about Windows vs macOS Unicode handling differences.

ToLower/ToUpper implementations get a little lengthy with so many #ifdefs.

@caesay
Copy link
Contributor Author

caesay commented Mar 11, 2024

It was not my suggestion that people go out of their way to install Qt just for string handling, but that many people building cross platform apps will already have Qt so it's a safe bet to use it if it's already available. Also, my point was that if a developer is using Qt on one platform, it's likely they are using it on all other platforms too, so Qt should be the preferred implementation in that case to provide cross-platform consistency.

It's fine, I'll close this. I'm using the above code for C++ because the current behavior in fut doesn't work for my users, and if you'd accept a PR which adds in additional implementations I'd be happy write one, just let me know.

@caesay caesay closed this as completed Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants