-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of 4-byte UTF8 values on Windows #5803
Conversation
c871631
to
c47cbdf
Compare
38dabaa
to
479b026
Compare
@astigsen perhaps you can have a look at the changes in unicode.cpp. |
src/realm/query_engine.hpp
Outdated
@@ -1582,7 +1571,7 @@ class StringNode : public StringNodeBase { | |||
auto upper = case_map(v, true); | |||
auto lower = case_map(v, false); | |||
if (!upper || !lower) { | |||
error_code = "Malformed UTF-8: " + std::string(v); | |||
throw query_parser::InvalidQueryError(util::format("Malformed UTF-8: %1", v)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not "query parser" errors, they are general query errors. Can we use another name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - changed to std::runtime_error.
const char* begin = source.data(); | ||
const char* end = begin + source.size(); | ||
auto output = result.begin(); | ||
while (begin != end) { | ||
int n = static_cast<int>(sequence_length(*begin)); | ||
if (n == 0 || end - begin < n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need something like this check don't we? What if there are random bytes at the end of the string which appear to be a multibyte character but are just garbage such that sequence_length()
returns 4 but there is only one character left in the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the logic a bit so that trailing garbage should be handled. I did not have a chance to test it on a Windows machine today. I will do that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'd be more convinced that this works if there was a test for trailing garbage as well 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should not specify how trailing garbage would be treated and fall back to "garbage in -> garbage out". This is anyway how the non-windows version works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's match the non-windows version. Can you add a test with trailing garbage to check the consistency though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can make a test that has the same outcome on both Windows and Linux. On Linux we will just copy over anything that is not an ASCII letter, but on Windows it will fail if there is something that cannot be translated. That was what I meant with my comment: We should not specify how it behaves as is behaves differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will concede platform dependent behaviour here. However, I'd still request a (platform dependent) test of garbage characters to verify that we do not crash.
@@ -176,6 +176,25 @@ class NoSubscriptionForWrite : public std::runtime_error { | |||
NoSubscriptionForWrite(const std::string& msg); | |||
}; | |||
|
|||
namespace query_parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have put my previous comment on naming here. Since these exceptions are thrown from the query engine now and not necessarily from the query parser, can we change this namespace to something more general such as query
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sure. In fact, I'd expect all these exceptions will changing soon with the upcoming error handling work.
7f3d7c3
to
91f94e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, after addressing clang-format 👍
91f94e9
to
22c91fa
Compare
…nification * origin/master: Stop forcing enums to be 64 bits unnecessarily clean up documentation of internal fields in config structs SyncConfig should be default constructible Traversal functions use IteratorControl values rather than true/false which is more expressive (#5857) Fix handling of 4-byte UTF8 values on Windows (#5803) Encode links in a way the server can understand (#5835) expose `Group::remove_table` in the C API (#5860) Disable auto refresh for old realm during migration (#5856) Expose `list_find` in the c api (#5848) Do not allow asymmetric tables in pbs (#5853) Refactor link tracing code (#5796) Expose `Obj::get_parent_object` in the C API (#5851) Update app.hpp (#5854) Fix appending to list ignores existing query (#5850)
What, How & Why?
Fixes #5825
Some unused error handling is removed from
ParentNode
. Replaced by throwing.☑️ ToDos