Revisiting ICU dependency #154

caesay · 2024-03-10T14:39:41Z

Hi there,

I didn't just want to open a PR because we've gone back and forth on this a bit already. The current solution helped for Windows, but I had a user who just could not stomach the ICU dependency on apple clang. It was very complicated for them to set up with cmake, and added 30+mb's to their app size just for one function. I added a few more escape hatches in the following code. Since my userbase will largely be using Qt, and Qt already has string functions, it makes sense to use them. Also, Since I still don't actually need unicode support for what I'm doing, I've also added an option to fall back to the ASCII version. Let me know if you would like me to clean this up as a PR or not.

#if defined(QT_CORE_LIB)

#include <QString>
static std::string VeloString_ToLower(std::string_view s)
{
    QString t = QString::fromStdString(std::string { s });
    return t.toLower().toStdString();
}

static std::string VeloString_ToUpper(std::string_view s)
{
    QString t = QString::fromStdString(std::string { s });
    return t.toUpper().toStdString();
}

#elif defined(_WIN32)

#include <Windows.h>
static std::string VeloString_Win32LCMap(std::string_view s, DWORD flags)
{
    int size = MultiByteToWideChar(CP_UTF8, 0, s.data(), (int)s.size(), nullptr, 0);
    std::wstring wide(size, 0);
    MultiByteToWideChar(CP_UTF8, 0, s.data(), (int)s.size(), wide.data(), size);
    size = LCMapStringEx(LOCALE_NAME_SYSTEM_DEFAULT, LCMAP_LINGUISTIC_CASING | flags, wide.data(), size, nullptr, 0, nullptr, nullptr, 0);
    std::wstring wideResult(size, 0);
    LCMapStringEx(LOCALE_NAME_SYSTEM_DEFAULT, LCMAP_LINGUISTIC_CASING | flags, wide.data(), wide.size(), wideResult.data(), size, nullptr, nullptr, 0);
    int resultSize = WideCharToMultiByte(CP_UTF8, 0, wideResult.data(), size, nullptr, 0, nullptr, nullptr);
    std::string result(resultSize, 0);
    WideCharToMultiByte(CP_UTF8, 0, wideResult.data(), size, result.data(), resultSize, nullptr, nullptr);
    return result;
}

static std::string VeloString_ToLower(std::string_view s)
{
    return VeloString_Win32LCMap(s, LCMAP_LOWERCASE);
}

static std::string VeloString_ToUpper(std::string_view s)
{
    return VeloString_Win32LCMap(s, LCMAP_UPPERCASE);
}

#elif defined(VELOPACK_NO_ICU)

static std::string VeloString_ToLower(std::string_view s)
{
    std::string data(s);
    std::transform(data.begin(), data.end(), data.begin(), [](unsigned char c){ return std::tolower(c); });
    return data;
}

static std::string VeloString_ToUpper(std::string_view s)
{
    std::string data(s);
    std::transform(data.begin(), data.end(), data.begin(), [](unsigned char c){ return std::toupper(c); });
    return data;
}

#else

#include <unicode/unistr.h>

static std::string VeloString_ToLower(std::string_view s)
{
    std::string result;
    return icu::UnicodeString::fromUTF8(s).toLower().toUTF8String(result);
}

static std::string VeloString_ToUpper(std::string_view s)
{
    std::string result;
    return icu::UnicodeString::fromUTF8(s).toUpper().toUTF8String(result);
}

#endif

pfusik · 2024-03-11T07:05:51Z

Thanks for discussing it first!

Ideally, the Fusion outputs would only depend on the target language and its library (including the OS library). Unfortunately, C++ doesn't have Unicode case operations built-in.

Qt is an option. I've even considered making an option to emit QString and Qt collections instead of their STL counterparts. But, if that's for macOS/iOS support, don't these OSes provide functions for Unicode case change, just like Windows does? I think that would be better (and more performant) than relying on Qt.

As for conditionally switching to ASCII-only mode, I'm against it. If the use case is case-insensitive comparison against ASCII constants, we should directly implement that (#151). fut could then emit ASCII-only functions when it detects that one side of the equation is an ASCII literal.

caesay · 2024-03-11T07:56:41Z

There's nothing in the posix or C++ standard for this. As far as the code I provided above, I feel like if Qt is available, it should be the default (rather than the OS), because a cross-platform Qt application likely favors having consistency across platforms. For example, the current locale changes the results of localisation, and if the developer had created some configurable Locale for the user in the app settings, if we use Qt everywhere this will provide a uniform user experience.

I have not tested the following code, but I think it's possible to use something like this with the CoreFoundation lib on OSX. Foundation is Obj-C only, but CoreFoundation is pure C.

#include <CoreFoundation/CoreFoundation.h>
#include <iostream>
#include <string>
#include <vector>

std::string toLowerCF(const std::string& input) {
    // Convert std::string to CFStringRef
    CFStringRef inputStringRef = CFStringCreateWithBytes(
        kCFAllocatorDefault,
        reinterpret_cast<const UInt8*>(input.c_str()),
        input.length(),
        kCFStringEncodingUTF8,
        false
    );

    // Convert to lowercase
    CFMutableStringRef lowerCaseStringRef = CFStringCreateMutableCopy(NULL, 0, inputStringRef);
    CFStringLowercase(lowerCaseStringRef, nullptr); // nullptr for the current locale

    // Calculate the buffer size needed for the lowercased string
    CFIndex length = CFStringGetLength(lowerCaseStringRef);
    CFIndex maxSize = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8) + 1; // +1 for null terminator

    // Use a std::vector<char> as the buffer
    std::vector<char> buffer(maxSize);

    // Convert CFStringRef back to std::string safely, checking for successful conversion
    CFStringGetCString(lowerCaseStringRef, buffer.data(), maxSize, kCFStringEncodingUTF8);

    // Clean up Core Foundation objects
    CFRelease(inputStringRef);
    CFRelease(lowerCaseStringRef);

    return std::string(buffer.data());
}

This of course does not solve the problem for Linux, however for me having Linux support via Qt is ok.

pfusik · 2024-03-11T11:52:27Z

As you already have noticed, the results are not 100% identical between the languages. For example, in https://github.com/fusionlanguage/fut/blob/4a3d11a412e909b86fb8b5119eeeed15238eb563/test/StringToLower.fu the input is not the full string from the comment, because that breaks on Node.js.

And it is not Fusion's goal to have zero differences between languages. Relying on Qt is ok for you, but for other people that's quite a heavy dependency. ICU is a much smaller package and it comes preinstalled on the Ubuntus in our CI.

It's also unclear to me that on one hand you propose limiting to ASCII letters, on the other you are worried about Windows vs macOS Unicode handling differences.

ToLower/ToUpper implementations get a little lengthy with so many #ifdefs.

caesay · 2024-03-11T11:58:43Z

It was not my suggestion that people go out of their way to install Qt just for string handling, but that many people building cross platform apps will already have Qt so it's a safe bet to use it if it's already available. Also, my point was that if a developer is using Qt on one platform, it's likely they are using it on all other platforms too, so Qt should be the preferred implementation in that case to provide cross-platform consistency.

It's fine, I'll close this. I'm using the above code for C++ because the current behavior in fut doesn't work for my users, and if you'd accept a PR which adds in additional implementations I'd be happy write one, just let me know.

caesay closed this as completed Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting ICU dependency #154

Revisiting ICU dependency #154

caesay commented Mar 10, 2024

pfusik commented Mar 11, 2024

caesay commented Mar 11, 2024 •

edited

Loading

pfusik commented Mar 11, 2024

caesay commented Mar 11, 2024 •

edited

Loading

Revisiting ICU dependency #154

Revisiting ICU dependency #154

Comments

caesay commented Mar 10, 2024

pfusik commented Mar 11, 2024

caesay commented Mar 11, 2024 • edited Loading

pfusik commented Mar 11, 2024

caesay commented Mar 11, 2024 • edited Loading

caesay commented Mar 11, 2024 •

edited

Loading

caesay commented Mar 11, 2024 •

edited

Loading