-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](functions) impl scalar functions normal_cdf,to_iso8601,from_iso8601_date #40695
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 38395 ms
|
TPC-DS: Total hot run time: 192793 ms
|
ClickBench: Total hot run time: 32.12 s
|
b924777
to
77ebd97
Compare
run buildall |
TPC-H: Total hot run time: 37981 ms
|
TPC-DS: Total hot run time: 198063 ms
|
ClickBench: Total hot run time: 31.2 s
|
FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateTimeV2Type.SYSTEM_DEFAULT), | ||
FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateV2Type.INSTANCE), | ||
FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateTimeType.INSTANCE), | ||
FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateType.INSTANCE), | ||
FunctionSignature.ret(StringType.INSTANCE).args(DateTimeV2Type.SYSTEM_DEFAULT), | ||
FunctionSignature.ret(StringType.INSTANCE).args(DateV2Type.INSTANCE), | ||
FunctionSignature.ret(StringType.INSTANCE).args(DateTimeType.INSTANCE), | ||
FunctionSignature.ret(StringType.INSTANCE).args(DateType.INSTANCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should only return string or varchar. just like c/c++ ret type is not a part of function's signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will fix it
* ScalarFunction 'to_iso8601'. This class is generated by GenerateFunction. | ||
*/ | ||
public class ToIso8601 extends ScalarFunction | ||
implements UnaryExpression, ExplicitlyCastableSignature, PropagateNullableOnDateLikeV2Args { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use PropagateNullableOnDateLikeV2Args? this is a new function should have same nullable property for date/datetime v1 and date/datetime v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will fix it.
@@ -901,6 +904,18 @@ default R visitConnectionId(ConnectionId connectionId, C context) { | |||
return visitScalarFunction(connectionId, context); | |||
} | |||
|
|||
default R visitNormalCdf(NormalCdf normalCdf, C context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please sort by lexicographical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will fix it
size_t result, size_t input_rows_count) { | ||
const auto* src_column_ptr = block.get_by_position(arguments[0]).column.get(); | ||
|
||
std::regex calendar_regex(R"((\d{4})(-)?(\d{2})(-)?(\d{2}))"); // YYYY-MM-DD or YYYYMMDD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never use std::regex
, it's worse than RE2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will change it.
int day_of_year = std::stoi(match[3]); | ||
|
||
if (is_leap(year)) { | ||
if (day_of_year < 0 || day_of_year > 366) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better for judgement like this, set it to [[unlikely]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add it
struct FromIso8601DateV2 { | ||
static constexpr auto name = "from_iso8601_date"; | ||
|
||
static bool is_variadic() { return true; } //todo : check is or not is_variadic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this function support variadic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't support variadic. I forgot to delete this line.
be/src/vec/functions/math.cpp
Outdated
static constexpr auto name = "normal_cdf"; | ||
String get_name() const override { return name; } | ||
static FunctionPtr create() { return std::make_shared<FunctionNormalCdf>(); } | ||
DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { | ||
return make_nullable(std::make_shared<DataTypeFloat64>()); | ||
} | ||
DataTypes get_variadic_argument_types_impl() const override { | ||
return {std::make_shared<DataTypeFloat64>(), std::make_shared<DataTypeFloat64>(), | ||
std::make_shared<DataTypeFloat64>()}; | ||
} | ||
size_t get_number_of_arguments() const override { return 3; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better add blank line between functions/variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add it
std::regex calendar_regex(R"((\d{4})(-)?(\d{2})(-)?(\d{2}))"); // YYYY-MM-DD or YYYYMMDD | ||
std::regex year_month_regex(R"((\d{4})-(\d{2}))"); // YYYY-MM | ||
std::regex year_only_regex(R"((\d{4}))"); // YYYY | ||
std::regex week_regex( | ||
R"((\d{4})(-)?W(\d{2})(-)?(\d)?)"); // YYYY-Www or YYYY-Www-D or YYYYWww or YYYYWwwD | ||
std::regex day_of_year_regex(R"((\d{4})(-)?(\d{3}))"); // YYYY-DDD or YYYYDDD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider add const
for them and move outside of execute()
to improve the efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will use static const to fix it.
b212b6c
to
3cb8dde
Compare
run buildall |
TPC-H: Total hot run time: 42992 ms
|
TPC-DS: Total hot run time: 195634 ms
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 31.07 s
|
0dce774
to
e56b87e
Compare
run buildall |
02aa222
to
d13fe4a
Compare
run buildall |
TPC-H: Total hot run time: 43414 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 195758 ms
|
ClickBench: Total hot run time: 31.06 s
|
TPC-H: Total hot run time: 41620 ms
|
TPC-DS: Total hot run time: 190496 ms
|
ClickBench: Total hot run time: 32.9 s
|
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
…_iso8601_date (apache#40695) Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
before pr : #40695 ## Proposed changes Fix the bug that be will have this error when compiling on mac. ``` be/src/vec/functions/math.cpp:416:39: error: no member named 'numbers' in namespace 'std' constexpr double sqrt2 = std::numbers::sqrt2; ~~~~~^ 1 error generated. ```
…_iso8601_date (apache#40695) ## Proposed changes Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
…41075) before pr : apache#40695 ## Proposed changes Fix the bug that be will have this error when compiling on mac. ``` be/src/vec/functions/math.cpp:416:39: error: no member named 'numbers' in namespace 'std' constexpr double sqrt2 = std::numbers::sqrt2; ~~~~~^ 1 error generated. ```
…_iso8601_date (#40695 and #41075) (#41600) bp #40695 and #41075 #41075 : fix #40695 mac compile ## Proposed changes Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
Proposed changes
Added three functions:
normal_cdf
,to_iso8601
,from_iso8601_date
normal_cdf(mean, sd, v) → double
:Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE).
to_iso8601(DATE/DATETIME) → string
For
DATE
type, it will be converted toYYYY-MM-DD
.For
DATETIME
type, it will be converted toYYYY-MM-DDThh:mm:ss.xxxxxx
.from_iso8601_date(string) → DATE
Convert iso8601 string to
DATE
type.The supported iso8601 string formats are as follows:
a. Year : YYYY
b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM
c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD
d. Ordinal dates : YYYY-DDD , YYYYDDD
You can refer to this document to understand what these formats mean
Please note that:
a. Each date and time value has a fixed number of digits that must be padded with leading zeros.
b. The range of YYYY is 0001-9999.
outside these rules, the correctness of the results is not guaranteed.
For illegal values, the result is NULL.