Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support window functions #224

Merged
merged 2 commits into from
Jul 25, 2022

Conversation

weijietong
Copy link
Contributor

No description provided.

@jacques-n
Copy link
Contributor

It looks like this is failing yaml validation. Have you reviewed the yaml schema? Also, it looks like there are several properties that are part of window function in the yaml schema that should probably be marked required and are not currently. For example, I think we should make window_type required.

@weijietong
Copy link
Contributor Author

The specification says the window_type property is optional.

@weijietong
Copy link
Contributor Author

The yaml schema also need to be changed to support the windowFunction. Now everything is ok.

@jacques-n
Copy link
Contributor

I see here that you've defined intermediate types but also state that these are not decomposable. Why have you provided intermediate types?

Additionally, you are stating that nullability is declared output but your output is all non-nullable. So none of these functions can ever return a null value? (Just confirming here, I don't recall the nullability behavior of these window functions.)

I also see you're including an empty args array. Is this for the yaml schema or just for clarity. (again, just curious).

@weijietong
Copy link
Contributor Author

Sorry, I don't know how to express non-nullable. If I don't have the nullability deaclared there, then it means non-nullable or the nullability's default value does?

The same question is to the intermediate, thorough declared these functions as non-decomposable, the intermediate property seems required from the web specification. I will remove them.

The empty args array is for clarity.

@weijietong weijietong force-pushed the window_functions branch 2 times, most recently from 56ac2e0 to 8db9261 Compare June 22, 2022 01:24
@weijietong
Copy link
Contributor Author

I removed the intermediate property from the yaml schema's required property.

@jacques-n
Copy link
Contributor

I removed the intermediate property from the yaml schema's required property.

Is there way to express a more complex relationship here? E.g. require intermediate if things are decomposable?

@jacques-n
Copy link
Contributor

For nullability behavior: declared output means whatever you say the output type is, is exactly the output type. If you say output is boolean? then it is nullable. If you say it is boolean then it is non-nullable.

If you use "MIRROR" then if at least one of the input arguments is nullable, the output is nullable. You can see the specifics here.

@weijietong
Copy link
Contributor Author

I don't get the point. If one function is decomposable, then it can declare the intermediate property. However, now, to simple_extensions_schema.yaml's windowFunction definition, one function must declare the intermediate type, even those functions are not decomposable.

@weijietong
Copy link
Contributor Author

Thanks for the nullability explanation.

@jacques-n
Copy link
Contributor

I don't get the point. If one function is decomposable, then it can declare the intermediate property. However, now, to simple_extensions_schema.yaml's windowFunction definition, one function must declare the intermediate type, even those functions are not decomposable.

I'm asking if it is possible to express this relationship in the json/yaml schema. "If function is decomposable, make sure intermediate is declared. If function is not decomposable, ensure there is no intermediate declared."

@weijietong weijietong force-pushed the window_functions branch 3 times, most recently from e1e1779 to 81a74ae Compare June 29, 2022 14:29
@weijietong
Copy link
Contributor Author

seems the CI flow went wrong,

I tried to add a validation rule that if one declare a decomposable with value ONE or MANY, it must declare intermedidate. This rule works at the vscode redhat yaml plugin. I don't know how to express that once declared decomposable property NONE,should not declare a intermediate property.

@jacques-n
Copy link
Contributor

seems the CI flow went wrong,

I tried to add a validation rule that if one declare a decomposable with value ONE or MANY, it must declare intermedidate. This rule works at the vscode redhat yaml plugin. I don't know how to express that once declared decomposable property NONE,should not declare a intermediate property.

Can you change this to use a different yaml validator if the current one is incomplete?

@jvanstraten
Copy link
Contributor

This rule works at the vscode redhat yaml plugin.

Does it really? That plugin also seems based on jsonschema, and jsonschema has no concept of "if" and "then" that I'm aware of. What I'm pretty sure you did there instead is define that two new properties of the impls items are now allowed, namely if (which must be an object that may have a key named composable, which in turn must map to ONE or MANY), and then (with an illegal schema definition, that the strict CI check is complaining about because it's set to complain about keys it doesn't know about, while the default behavior of jsonschema is to just ignore things it doesn't understand).

AFAICT from the spec, jsonschema can't do this directly. The best it can do is dependentRequired, but that triggers based on the existence of another key in the object, not a value. What I think you could hypothetically do is abuse a oneOf relation for the object, by basically duplicating both branches aside from the allowed variants for the composable enum and the existence of the intermediate field, but what a mess that would make.

Can you change this to use a different yaml validator if the current one is incomplete?

Just so you know, I went through quite a bit of trouble to get jsonschema into the validator and coerce it to deal with YAML files instead, so I have some reservations about this...

Honestly, I don't think it's reasonable to expect this level of semantic validation from a generic schema validator. This kind of constraint would be like trying to get a Substrait table schema to assert that if the value for a record in one column is X, then, and only then, column Y is nullable (or exists at all). intermediate should just be optional as far as the schema is concerned, just as Y would just be declared nullable.

@jacques-n
Copy link
Contributor

AFAICT from the spec, jsonschema can't do this directly. The best it can do is dependentRequired

I wasn't sure it could. If it can't and/or isn't spec'd in json schem nor the yaml schema extension. Let's skip it and just loosen the schema required properties as you originally did.

@weijietong
Copy link
Contributor Author

weijietong commented Jun 30, 2022

@jvanstraten the json schema has a definition of if , else, then.

This rule works at the vscode redhat yaml plugin.

Yeah, I tried following schema format :

if: type: object properties: decomposable: enum: [ONE, MANY] then: dependentRequired: decomposable: [intermediate]

or

if: type: object properties: decomposable: enum: [ONE, MANY] then: required: [intermediate]

If you don't declare an intermedidate property, while declaring decomposable property with value of ONE or MANY, then the vscode will aler you that you missed a intermediate property.

But I find it only work at the aggregate function , no effect at the window function. That's weird.

If we agree to skip these rules, I'll removed them.

@jacques-n
Copy link
Contributor

@jvanstraten the json schema has a definition of if , else, then.

This rule works at the vscode redhat yaml plugin.

Yeah, I tried following schema format :

if: type: object properties: decomposable: enum: [ONE, MANY] then: dependentRequired: decomposable: [intermediate]

or

if: type: object properties: decomposable: enum: [ONE, MANY] then: required: [intermediate]

If you don't declare an intermedidate property, while declaring decomposable property with value of ONE or MANY, then the vscode will aler you that you missed a intermediate property.

But I find it only work at the aggregate function , no effect at the window function. That's weird.

If we agree to skip these rules, I'll removed them.

Yes, let's just remove these.

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending fixes to schema.

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm that all of these functions return only non-null values? You currently saying they are declared output with a non null value.

@weijietong
Copy link
Contributor Author

weijietong commented Jul 4, 2022

Thanks for the tips, you are right, they are all nullable. To a non-existense partition by row, or a filtered ( the filter where clause) row, its window function value will be null.

@weijietong weijietong requested a review from jacques-n July 18, 2022 02:08
@weijietong
Copy link
Contributor Author

@jacques-n I think it'ready to merge now.

@jacques-n jacques-n merged commit 4b2072a into substrait-io:main Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants