Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Substrait Struct literals and type #10622

Merged
merged 5 commits into from
May 23, 2024

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented May 22, 2024

Which issue does this PR close?

Closes #.

Extracted part of #10531 - not necessary part for it but somewhat related

Rationale for this change

What changes are included in this PR?

Adds support for converting from un-named DataFusion Struct ScalarValues into Substrait Struct Literals and back, as well as converting from Substrait Struct type into DF Struct type (other direction already existed).

Substrait doesn't contain names in its Struct fields. For e.g. initial schema, the names are provided separately. This PR only properly supports un-named structs - which in DataFusion get default field names of form "c0", "c1", ... Other structs can also be converted but they'll be renamed automatically into the default names.

The VirtualTable PR will then provide better support for named structs.

Are these changes tested?

Adds round-trip unit tests

Are there any user-facing changes?

More things are now supported, but I don't think Substrait support status is covered by documentation currently?

@Blizzara Blizzara marked this pull request as ready for review May 22, 2024 15:33
@Blizzara
Copy link
Contributor Author

Blizzara commented May 22, 2024

@jonahgao second extracted part :)

this one has a bit of overlap with the list PR but I'm happy to fix conflicts once one of these is merged (order shouldn't matter) conflicts are now resolved

Blizzara added 2 commits May 22, 2024 22:14
Adds support for converting from DataFusion Struct ScalarValues into Substrait Struct Literals and back.
All structs are assumed to be unnamed - ie fields are renamed
into "c0", "c1", etc
@Blizzara Blizzara force-pushed the avo/substrait-literal-structs branch from 4f0dfef to bd964f8 Compare May 22, 2024 20:18
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Blizzara -- this looks good to me. I have one more test I suggested test, but I think we could do it as a follow on PR too.

Please let me know if you would prefer to modify this PR or make a follow on.

Thanks again for the contribution!

@@ -2125,6 +2141,15 @@ mod test {
),
)))?;

let struct_field_1 = Field::new("c0", DataType::Boolean, true);
let struct_field_2 = Field::new("c1", DataType::Int32, true);
round_trip_literal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest one more test with null values too -- specifically something like

            ScalarStructBuilder::new()
                .with_scalar(struct_field_1, ScalarValue::Boolean(None))
                .with_scalar(struct_field_2, ScalarValue::Int32(None))
                .build()?,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added (though incorporated into the existing test) in 84ccaa3!

@Blizzara
Copy link
Contributor Author

Thank you @Blizzara -- this looks good to me. I have one more test I suggested test, but I think we could do it as a follow on PR too.

Please let me know if you would prefer to modify this PR or make a follow on.

Thanks again for the contribution!

Thank you for the review! I added the null case here :)

@Blizzara
Copy link
Contributor Author

Not sure what's up with the checks here - cargo check runs fine for me locally and I don't see how the errors would be related to my changes?

@jonahgao
Copy link
Member

Not sure what's up with the checks here - cargo check runs fine for me locally and I don't see how the errors would be related to my changes?

CI is attempting to merge the current PR into a problematic commit d2f6faf

commit 90ed2f64c675f0a52114b8fcdfccef45fa258181 (grafted, HEAD, pull/10622/merge)
Author: Arttu <[email protected]>
Date:   Thu May 23 13:59:56 2024 +0200

    Merge 84ccaa3361f64dc4a247487b43a57367734b9548 into d2f6faf56784867beafaa2ee88df37ffc3d20720

I am not sure if pushing a new commit to the current PR will fix it.

git commit --allow-empty -m "retry ci"

@Blizzara
Copy link
Contributor Author

Thanks @jonahgao , that seems to work indeed. This is ready to merge by me :)

@alamb
Copy link
Contributor

alamb commented May 23, 2024

Not sure what's up with the checks here - cargo check runs fine for me locally and I don't see how the errors would be related to my changes?

I think this was a logical conflict that @phillipleblanc fixed earlier today in #10637 🙏

@alamb alamb merged commit 19d9174 into apache:main May 23, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 23, 2024

Thanks again @Blizzara and @jonahgao

@Blizzara Blizzara deleted the avo/substrait-literal-structs branch May 23, 2024 17:49
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Add support for (un-named) Substrait Struct literal

Adds support for converting from DataFusion Struct ScalarValues into Substrait Struct Literals and back.
All structs are assumed to be unnamed - ie fields are renamed
into "c0", "c1", etc

* add converting from Substrait Struct type

* cargo fmt --all

* Unit test for NULL inside Struct

* retry ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants