-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmake: fix UNICODE-escaped characters on aarch64 #8851
Conversation
Please sort the DCO failure, it cannot be merged without that. |
Done @patrick-stephens . Thanks |
Let me check on this. |
This patch makes to be able to process emoji on aarch64 machine with the posted repro in #8521.
|
Please let me know if any further action required to merge this PR. I do see some sporadic unittest failures for |
are there any other potential side effects of this change ? I could be wrong, but it seems to me the problem is in another side... |
Some of the details are posted on Arm website: unsigned-char-and-signed-char. So, we have 2 solutions: Fix the For example, the PR #3522 tries |
From the ARM website, this could be caused by compatibility against the old ARM instructions:
Also, it is mentioned as a workaround:
However, my two cents, handling as signed char should align the handlings of char type like as x86 processors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@cosmo0920 , thank you for the review. I would like to merge this PR or make any additional changes required. Thanks again. |
Checking again... Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a new option only for Linux on arm architectures?
diff --git a/CMakeLists.txt b/CMakeLists.txt
index b3e7a2585..09807f595 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -30,6 +30,10 @@ endif()
if(CMAKE_SYSTEM_NAME MATCHES "Linux")
set(FLB_SYSTEM_LINUX On)
add_definitions(-DFLB_SYSTEM_LINUX)
+ if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm64|aarch64)")
+ set(FLB_LINUX_ON_ARM On)
+ add_definitions(-DFLB_LINUX_ON_ARM)
+ endif()
endif()
# Update CFLAGS
@@ -146,6 +150,9 @@ option(FLB_WINDOWS_DEFAULTS "Build with predefined Windows settings" Yes)
option(FLB_WASM "Build with WASM runtime support" Yes)
option(FLB_WAMRC "Build with WASM AOT compiler executable" No)
option(FLB_WASM_STACK_PROTECT "Build with WASM runtime with strong stack protector flags" No)
+if (FLB_LINUX_ON_ARM)
+ option(FLB_PREFER_SIGNED_CHAR "Build with signed char (Linux on ARM only)" No)
+endif()
# Native Metrics Support (cmetrics)
option(FLB_METRICS "Enable metrics support" Yes)
@@ -405,6 +412,14 @@ if (FLB_SYSTEM_LINUX)
include(cmake/s390x.cmake)
endif ()
+# Enable signed char support on Linux ARM if specified
+if (FLB_LINUX_ON_ARM)
+ if (FLB_PREFER_SIGNED_CHAR)
+ set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsigned-char")
+ message(STATUS "Enabling signed char")
+ endif()
+endif()
+
# Extract Git commit information for debug output.
# Note that this is only set when cmake is run, the intent here is to use in CI for verification of releases so is acce
ptable.
# For a better solution see https://jonathanhamberg.com/post/cmake-embedding-git-hash/ but this is simple and easy.
This could be reasonable for the most cases and the problem point on ARM is: LDRSB instruction is not always existing in ARM processors.
So, at this moment, providing only for ARM option would be reasonable to add, I guess.
Thanks @cosmo0920. So, option Are there any other issues blocking merge of this PR? Thank you. |
@cosmo0920, I validated the change you suggested in #8851 (review) and pushed 253254f. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unaware of this compiler dependent aspect but unless I'm missing something most if not all of us work under the expectation that char
means signed char
so I wonder if we shouldn't make this platform agnostic amongst supported compilers (gcc and clang?)
Side note : according to msdn, msvc already defaults char
to be signed and user need to opt-in to make it unsigned so that one doesn't seem to be addressed and kind of makes me think that regardless of the platform we should adopt this to ensure that we have consistent behavior and expectations.
@patrick-stephens and @edsiper, could you please chime in?
I think it's a general recommendation to never rely on size or signed-ness of primitives in code, we should be explicit. Older compilers/standards particularly just pick an "optimal" type down to the underlying compiler/OS as to what the optimum is based on.
|
From Apple documentation, macOS apple silicon (arm64 macOS) also selects signed char: I found that there is still a linking issue for using signed char opt-in in RISC-V. So, we're only able to use opt-in feature of signed char in aarch64. However, we shouldn't forcibly use signed char for other non-x86 platforms. |
My point is that given the assumptions made in our codebase it would make sense to make |
@ensean would you please re-phrase what your previous message means ? is it a breaking change for Chinese support or you mean it also fixes a problem with Chinese characters ? |
Sorry for the confusion I make. My case is that we have Chinese characters in the logs, when our App runs on x86 servers, the Chinese logs can be handled by fluent-bit correctly. In order to save cost, we are planning to change to arm based instances(Graviton on aws). During the PoC, everything works but we found that the Chinese logs can not be handled by fluent-bit correctly, for example As more and more cloud providers are offering arm based servers for better price performance, I think there will be more fluent-bit users encounter this unicode-escaped issue. So I think it's necessary to get this PR merged. |
@ensean thanks for describing the current problem. With the proposed changes in this PR does the problem go away ? |
@edsiper Yep, confirmed that this PR can solve my problem. |
@RamaMalladiAWS Hi, we added aarch64 CI task to confirm internal test cases should be exit normally on aarch64. diff --git a/.github/workflows/unit-tests.yaml b/.github/workflows/unit-tests.yaml
index 0fa50f118..49593a696 100644
--- a/.github/workflows/unit-tests.yaml
+++ b/.github/workflows/unit-tests.yaml
@@ -125,7 +125,7 @@ jobs:
config:
- name: "Aarch64 actuated testing"
flb_option: "-DFLB_WITHOUT_flb-it-network=1 -DFLB_WITHOUT_flb-it-fstore=1"
- omit_option: "-DFLB_WITHOUT_flb-it-utils=1 -DFLB_WITHOUT_flb-it-pack=1"
+ omit_option: ""
global_option: "-DFLB_BACKTRACE=Off -DFLB_SHARED_LIB=Off -DFLB_DEBUG=On -DFLB_ALL=On -DFLB_EXAMPLES=Off"
unit_test_option: "-DFLB_TESTS_INTERNAL=On"
compiler: gcc
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9e42d4faf..d1d6b33b5 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -30,6 +30,10 @@ endif()
if(CMAKE_SYSTEM_NAME MATCHES "Linux")
set(FLB_SYSTEM_LINUX On)
add_definitions(-DFLB_SYSTEM_LINUX)
+ if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm64|aarch64)")
+ set(FLB_LINUX_ON_AARCH64 On)
+ add_definitions(-DFLB_LINUX_ON_AARCH64)
+ endif()
endif()
# Update CFLAGS
@@ -301,6 +305,12 @@ if (FLB_SYSTEM_LINUX)
include(cmake/s390x.cmake)
endif ()
+# Enable signed char support on Linux AARCH64 if specified
+if (FLB_LINUX_ON_AACH64)
+ set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsigned-char")
+ message(STATUS "Enabling signed char")
+endif()
+
# Extract Git commit information for debug output.
# Note that this is only set when cmake is run, the intent here is to use in CI for verification of releases so is acceptable.
# For a better solution see https://jonathanhamberg.com/post/cmake-embedding-git-hash/ but this is simple and easy. And could you rebase off the current master? This could help to confirm that characters which are having 2 or more bytes should be normally handled in aarch64 Linux. |
253254f
to
3fd8b9d
Compare
@cosmo0920 , I rebased my changes to the latest master and pushed. Would you want me to change the PR itself to the code you have here: #8851 (comment)? Thanks |
Yes. This AArch64 PR needs to confirm to fix Unicode issue on AArch64 Linux with using signed char. |
3fd8b9d
to
7140cc0
Compare
@RamaMalladiAWS Could you fix a typo? |
Signed-off-by: Rama Malladi <[email protected]>
7140cc0
to
9d75cf4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A huge 👍 Now, a CI task on AArch64 Linux is got green. 🟢
Great! There were some checks that failed (https://github.com/fluent/fluent-bit/actions/runs/10831994739/job/30060741701?pr=8851). Are they benign/ unrelated? Thanks. |
macOS I think are known flakes |
Can we have this PR approved and merged if no further action pending? Thanks |
Checking again... Can we please merge this PR? Thanks |
Signed-off-by: Rama Malladi <[email protected]> Signed-off-by: AdheipSingh <[email protected]>
Build with
-fsigned-char
onaarch64
to resolve issue: #8521Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.