feat: add test file format #680

scgkiran · 2024-08-14T04:12:54Z

No description provided.

github-actions · 2024-08-14T04:36:43Z

ACTION NEEDED

Substrait follows the Conventional Commits
specification for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

EpsilonPrime · 2024-08-14T17:33:57Z

tests/cases/arithmetic_decimal/power.test

+power(13::decimal<38, 0>, 10::decimal<38, 0>) = 137858491849::fp64
+
+# result_more_than_input_precison: Examples demonstrating result with more precision than input
+power(16::decimal<2, 0>, 4::decimal<38, 0>) = 65536::fp64


Perhaps we could reuse the text format's definition of literals?

https://github.com/substrait-io/substrait-cpp/blob/main/src/substrait/textplan/README.md

My preference would be to stick with :: format generally. It is a cast format consistent with many systems the people looking at these tests will be familiar with and I believe much more intuitive understood than underscore.

I'm similarly inclined to try to find more standard ways to represent other things as opposed to what the text format has done. There are several examples of alternative ways to implement things to what is commonly done in SQL (which has the most consistent way to approach these things across systems). Examples beyond the use of cast syntax versus underscores include. (1) String literals are double quoted and backticked while sql string literals are single quoted and (2) brackets being overloaded to mean any of intervals, lists, maps and structs.

The :: format is a type subclass operator. As such it requires a mental context switch every time I look at it. The underscore was chosen in the text format to create a visual separation between the literal value and its type thus aiding in readability (and it's also requires less keystrokes).

Braces are used for literal values and angle brackets are used for type composition and parameterization. This helps to reinforce the distinction between value and type. I have yet to be able to determine what the value of any interval literal is in the proposed format. However the text format's use of labels makes it ultra-clear what each part of the interval is.

The text format's string implementation does draw from other sources than SQL but that's mostly because the SQL version has its own warts which make it harder to read. SQL's implicit concatenation is the main SQL string feature that is difficult for humans to parse (it requires the counting of active single quotes).

The text format was reviewed in the community meeting twice (February 1st, 2023 and April 12th, 2023) so presumably there is some community agreement about its design validity.

It's not that the text format defined in substrait-cpp isn't valid. It's definitely valid.

I think we come from different perspectives. I'm coming from the perspective that we should try to leverage existing patterns as much as possible. My expectation is that the vast majority of systems that will be tested are SQL systems so leveraging SQL patterns is desirable as much as reasonably possible and comes with the added benefit of being obvious in meaning to a large group of people reviewing the test case. If we look at the set of BFT dialects, for example, we have the following:

Dialect API Single Quotes for Literals Supports :: for Cast

cudf cudf api no no

datafusion SQL yes no

duckdb SQL yes yes

postgres SQL yes yes

snowflake SQL yes yes

sqllite SQL yes no

velox velox api no no

So a decent portion are SQL based. Single-quoted literals are the most common way to represent strings in those systems. 3/7 have support for the double colon as meaning cast. Given that, I prefer that over something invented.

EpsilonPrime · 2024-08-14T17:47:18Z

I'm guessing we're going to need to make sure that we have something that tests the validity of the test files. Would a parse test be sufficient enough for that purpose? Alternatively we could step it up and validate the tests against the function definitions ensuring that they refer to existing functions.

tests/cases/arithmetic_decimal/power.test

tests/cases/datetime/lt_datetime.test

vbarua · 2024-08-14T19:45:48Z

tests/README.md

+- **timestamp**: `YYYY-MM-DD HH:MM:SS[.fraction]`, example: `2021-01-01 12:00:00`
+- **timestamp_tz**: `YYYY-MM-DD HH:MM:SS[.fraction]±HH:MM`, example: `2021-01-01 12:00:00+05:30`
+- **interval year**: `INTERVAL 'P[n]Y[n]M'`, example: `INTERVAL 'P2Y3M'` (2 years, 3 months)
+- **interval days**: `INTERVAL 'P[n]DT[n]H[n]M[n]S'`, example: `INTERVAL 'P2DT3H2M9S'` (2 days, 3 hours, 2 minutes, 9 seconds)


This a somewhat recent change, but also need to encode precision for interval_day now.

See https://substrait.io/types/type_classes/#compound-types

EpsilonPrime · 2024-08-15T05:38:20Z

It's a different use case. You're interested in testing whether a subset is compliant or not. BFT, until now, has been focused on showing all of the support (positive and negative) to provide a world view (the website). It can also be used as a way of showing total compliance which can be used to drive adoption.

…

On Wed, Aug 14, 2024, 21:56 Jacques Nadeau ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tests/cases/arithmetic_decimal/power.test <#680 (comment)> : > @@ -0,0 +1,21 @@ +### SUBSTRAIT_SCALAR_TEST: 1.0 +### SUBSTRAIT_INCLUDE: extensions/functions_arithmetic_decimal.yaml + +# basic: Basic examples without any special cases +power(8::decimal, 2::decimal<38, 0>) = 64::fp64 I don't understand your comment Rich. The dialect should declare what option values are allowed for which functions. The test should skip test cases whose combination is not supported by a dialect. Why would we need to refer to test cases? — Reply to this email directly, view it on GitHub <#680 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDDCF6AITUJN3CECANWI3ZRQYG3AVCNFSM6AAAAABMPQFXWCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMZZGY3DENJQGI> . You are receiving this because you commented.Message ID: ***@***.***>

tests/README.md

westonpace · 2024-08-15T19:35:33Z

tests/README.md

+- **arguments**: Comma-separated list of arguments to the function. The arguments must be literals.
+- **options**: Optional comma-separated list of options in `key:value` format. The options describe the behavior of the function. The test should be run only on dialects that support the options. If options are not specified, the test should be run for all permutations of the options.
+- **result**: The expected result of the function. Either `SUBSTRAIT_ERROR` or a literal value.
+- **literal**: In the format `<name>::<datatype>`


name? Maybe value?

changed to value

tests/README.md

westonpace · 2024-08-15T19:57:41Z

tests/README.md

+add(126::i8, 1::i8) = 127::i8
+
+# Arithmetic Overflow Tests
+add(127::i8, 1::i8) [overflow:ERROR] = <!ERROR>  #check overflow


Where is <!ERROR> defined? I see something about SUBSTRAIT_ERROR in the grammar but nothing about a ! or <> (and SUBSTRAIT_ERROR != ERROR)

I added it now.

tests/cases/arithmetic/add.test

westonpace · 2024-08-15T20:00:32Z

tests/cases/arithmetic/add.test

+# rounding: Examples demonstrating floating point rounding behavior
+add(4.5::fp32, 2.500001::fp32) [rounding:TIE_TO_EVEN] = 7.000001::fp32


This doesn't really demonstrate rounding error. If we're sticking with the 6/15 digits of precision thing then it is impossible to demonstrate rounding error. Should we just remove these tests?

jacques-n · 2024-08-17T21:53:22Z

It's a different use case...

I believe the original request was some way to refer to test cases individually. My main thought is that the case itself should probably be the "pointer" (or a hash of the case) as opposed to some kind of hand-defined naming system. From a substrait core perspective, I think there should be no nuance to describe "I support funcA with optionA and optionB but sometimes I return different results than what is defined in the test cases for funcA". To me, that's not funcA--it's an entirely different function. That system should say "I support a funcAprime which might be similar to funcA but has a different set of test cases".

My inclination is we should avoid trying to create some system that says "these portion of test cases for funcA also apply to funcAprime" Feels like an over-engineer.

EpsilonPrime · 2024-08-19T06:24:51Z

The usage is more like "this system doesn't support this function" sufficiently. A compatibility matrix lists all the systems that support a functionality and which ones that don't. While we could say that only these tests pass, the set of passes and failures provide a more complete story of compatibility.

westonpace

Thanks for tackling the changes.

I support funcA with optionA and optionB but sometimes I return different results than what is defined in the test cases for funcA

@jacques-n

There are some functions (maybe very few) where avoiding this will be challenging. I'm not saying you're wrong, I really like the idea of "if engine supports function X then behavior is exactly defined" (and I don't think we need to figure it out in this PR). It's more food for thought. My primary example is the divide function:

LHS	RHS	Postgres	DuckDb	Datafusion
0	0	ERR	NAN	NAN
0	NAN	NULL	NAN	NAN
0	INF	0	0	0
NAN	0	NULL	NULL	NAN
NAN	NAN	NULL	NAN	NAN
NAN	INF	NULL	NAN	NAN
INF	0	ERR	NULL	INF
INF	NAN	NULL	NAN	NAN
INF	INF	NULL	NAN	NAN

The division options we have today (on_domain_error and on_division_by_zero) are insufficient to describe any one of these three (fairly significant) engines. There is no engine that matches any combination of these values. In fact, we can't even introduce more values to the options because engines are inconsistent (e.g. postgres' divide by zero behavior changes if the numerator is NAN).

Some options:

Just state that there is no engine that provides the divide function and hope we can drive compliance.
Come up with lot's of options in the YAMLs (we almost need a divide_by_zero_but_numerator_is_inf option or at the extreme, one option per combination of inputs)
State that the engines do implement the divide function "except at the boundaries"
Add an undefined to the two options we have today and just accept that all engines will use this value

jacques-n · 2024-08-20T00:08:58Z

There are some functions (maybe very few) where avoiding this will be challenging. [snip]... example is the divide function...

These are just different functions from my pov. We shouldn't be fearful of clearing documenting different divide functions (through options and/or variants). In fact, I see this as one of the key values of Substrait for people--saying there are 3 distinct divide functions in each of these systems and how each differs. (FWIW, in this particular example I see both using variant and/or options as reasonable.)

My point in general here is that whether a particular engine implements a particular function is a property of the dialect and not the function. Functions should be well defined outside of a specific instance of use.

Now, many people may be fine to do lossy translation between these divide functions and we should make that easy with substrait tooling. But I don't think we should confound that with the functions themselves. I'm generally of the perspective that options and different functions should solve these problems without having to introduce additional primitives.

For example, we may need to introduce the following kind of psuedo-mappings in substrait:

mappings:
  from: postgres.yaml:div(float,float)
  to:  duckdb.yaml:div(float,float) 
  differences:
    - divide by NAN => NAN in duckdb, NULL in postgres
    - ...
    - ...

Then someone using the two systems could get a list of lossy translations and evaluate whether that list is compatible with their use case. (Maybe okay for general analytics, not for financial analytics).

In many cases, I would suspect that we could also define expression trees that provide lossless conversions in these cases. Odds are those scenarios might be more compute expensive.

EpsilonPrime · 2024-08-20T08:46:04Z

tests/README.md

+result      := <substrait_error> | <literal>
+options     := <optLiteral>, <optLiteral>, ... <optLiteral>
+optLiteral  := <option_name>:<option_value>
+lietral_value := string | integer | decimal | float | boolean | date | time | timestamp | timestamp_tz | interval year | interval days | null


EpsilonPrime · 2024-08-20T08:49:55Z

tests/README.md

+    add(126::i8, 1::i8) = 127::i8  # addition of two numbers
+    ```
+
+### Spec


If we define the specification here and in the Antlr grammar file will we end up changing both places all the time?

Yes, hopefully only when new types are introduced.

EpsilonPrime · 2024-08-20T08:51:31Z

tests/README.md

+argument    := <literal>
+literal     := <literal_value>::<datatype>
+result      := <substrait_error> | <literal>
+options     := <optLiteral>, <optLiteral>, ... <optLiteral>


Options are values passed to the functions but not Substrait literals. Probably best to avoid the confusion here.

EpsilonPrime · 2024-08-20T08:52:21Z

tests/README.md

+`<literal_value>` described in this section.
+
+#### String
+- **string**, **fixedchar**, **varchar**: A sequence of characters enclosed in single quotes. Example: 'Hello, world!'


How are escapes handled?

EpsilonPrime · 2024-08-20T08:55:56Z

tests/README.md

+- **time**: `HH:MM:SS[.fraction]`, example: `12:00:00.000`
+- **timestamp**: `YYYY-MM-DD HH:MM:SS[.fraction]`, example: `2021-01-01 12:00:00`
+- **timestamp_tz**: `YYYY-MM-DD HH:MM:SS[.fraction]±HH:MM`, example: `2021-01-01 12:00:00+05:30`
+- **interval year**: `'P[n]Y[n]M'`, example: `'P2Y3M'` (2 years, 3 months)


What does P stand for?

EpsilonPrime · 2024-08-20T09:00:17Z

tests/README.md

+- **i64**: 64-bit signed integer
+- **f32**: 32-bit floating point number
+- **f64**: 64-bit floating point number
+- **dec**: Fixed-point `decimal<P,S>`


Do short forms still need to be parameterized? Can we refer to the list elsewhere? Can long names be interchangeably used?

yes parametrization is needed. I prefer to not to use long names in the tests.

EpsilonPrime · 2024-08-20T09:02:43Z

tests/README.md

+add(126::i8, 1::i8) = 127::i8
+
+# Arithmetic Overflow Tests
+add(127::i8, 1::i8) [overflow:ERROR] = ERROR  #check overflow


Can comments sometimes be recognized as group names? Do we care?

yes. Empty groups can be ignored by the testing framework.

EpsilonPrime · 2024-08-20T09:03:31Z

tests/cases/arithmetic/add.test

+### SUBSTRAIT_SCALAR_TEST: 1.0
+### SUBSTRAIT_INCLUDE: /extensions/functions_arithmetic.yaml
+
+# basic: Basic examples without any special cases


Is the name the whole line or only up to the semicolon?

EpsilonPrime · 2024-08-20T09:05:04Z

tests/cases/arithmetic_decimal/power.test

+power(16::decimal<2, 0>, 4::decimal<38, 0>) = 65536::fp64
+
+# floating_exception: Examples demonstrating exceptional floating point cases
+power(1.5e+10::decimal<38, 0>, 1.5e+20::decimal<38, 0>) = inf::fp64


Should this be the short form names as noted above or are both legal?

yes it should be short form. Fixed it.

EpsilonPrime · 2024-08-20T09:05:57Z

tests/cases/datetime/lt_datetime.test

+
+# interval: examples using the interval type
+lt('P7D'::iday, 'P6D'::iday) = false::bool
+lt('P5D'::iday, '6D'::iday) = true::bool


Is the P here optional?

no, it was a typo. Fixed it.

scgkiran · 2024-12-09T13:03:31Z

Closing this PR, as changes are merged using separate PRs for test file format grammar and testcases

scgkiran added 2 commits August 14, 2024 09:41

Proposal for test file format

39de59e

rename test files

d4d69e1

scgkiran changed the title ~~feat: add test file format~~ WIP: feat: add test file format Aug 14, 2024

scgkiran changed the title ~~WIP: feat: add test file format~~ feat: add test file format Aug 14, 2024

scgkiran mentioned this pull request Aug 14, 2024

Proposal to define a test file format #681

Open

EpsilonPrime reviewed Aug 14, 2024

View reviewed changes

tests/cases/arithmetic_decimal/power.test Outdated Show resolved Hide resolved

vbarua reviewed Aug 14, 2024

View reviewed changes

use shortType name

8676a4c

westonpace reviewed Aug 15, 2024

View reviewed changes

Address review comments.

759a755

westonpace previously approved these changes Aug 19, 2024

View reviewed changes

EpsilonPrime reviewed Aug 20, 2024

View reviewed changes

Address review comments

3a541c0

scgkiran dismissed westonpace’s stale review via 3a541c0 August 26, 2024 06:09

scgkiran marked this pull request as ready for review August 26, 2024 06:11

scgkiran requested a review from cpcloud as a code owner August 26, 2024 06:11

scgkiran closed this Dec 9, 2024

Dialect	API	Single Quotes for Literals	Supports :: for Cast
cudf	cudf api	no	no
datafusion	SQL	yes	no
duckdb	SQL	yes	yes
postgres	SQL	yes	yes
snowflake	SQL	yes	yes
sqllite	SQL	yes	no
velox	velox api	no	no

		# rounding: Examples demonstrating floating point rounding behavior
		add(4.5::fp32, 2.500001::fp32) [rounding:TIE_TO_EVEN] = 7.000001::fp32

feat: add test file format #680

feat: add test file format #680

Conversation

scgkiran commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

EpsilonPrime Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EpsilonPrime commented Aug 14, 2024

Choose a reason for hiding this comment

EpsilonPrime commented Aug 15, 2024 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacques-n commented Aug 17, 2024

EpsilonPrime commented Aug 19, 2024

westonpace left a comment

Choose a reason for hiding this comment

jacques-n commented Aug 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scgkiran Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scgkiran Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scgkiran commented Dec 9, 2024

EpsilonPrime Aug 14, 2024 •

edited

Loading

scgkiran Aug 26, 2024 •

edited

Loading

scgkiran Aug 26, 2024 •

edited

Loading