Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Datafusion 6.0.0 #890

Closed
houqp opened this issue Aug 15, 2021 · 31 comments
Closed

Release Datafusion 6.0.0 #890

houqp opened this issue Aug 15, 2021 · 31 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@houqp
Copy link
Member

houqp commented Aug 15, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We had some oversights in the 5.0.0 release (#771) causing us not able to release the python binding and datafusion-cli.

Describe the solution you'd like

Release Datafusion 5.1.0 with an improved process to support python binding and cli releasse.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
see #887, #883 and #837

@houqp houqp added the enhancement New feature or request label Aug 15, 2021
@houqp houqp self-assigned this Aug 15, 2021
@houqp houqp added this to the 5.1.0 milestone Aug 20, 2021
@mmuru
Copy link
Contributor

mmuru commented Sep 3, 2021

@houqp: Do we have an ETA on the python binding release? Thanks.

@houqp
Copy link
Member Author

houqp commented Sep 3, 2021

@mmuru probably 2-3 weeks, need to get some legal issues resolved in #920 before we can cut a release tarball and start the voting process.

@mmuru
Copy link
Contributor

mmuru commented Sep 3, 2021

@houqp: Thanks for the update. I am trying to build python wheel locally but I noticed in Cargo.toml file the dependency listed
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4d61196dee8526998aee7e7bb10ea88422e5f9e1" }
It did not get updated. Will maturin develop pick up the latest datafusion code? Please, can you clarify it? Thanks again.

@houqp
Copy link
Member Author

houqp commented Sep 4, 2021

@mmuru it will not, I just sent #967 to handle the datafusion update.

@mmuru
Copy link
Contributor

mmuru commented Sep 5, 2021

@houqp: Thanks for the clarification and quick turnaround fix. I verified your changes and found two issues.

  1. the requirement.txt is locked to Python 3.8 version. I think, this file should not be checked in the source since the python development version could be different. In my case, it was python 3.7 version.
  2. Fixed the issue Creating dataframe with Recordbatch using pyarrow.Table.to_batches gives "type16 not valid error" when schema includes date32[day] type #949. Added support for pyarrow datatypes such as date32, date64 and Timestamp.
    I would like to submit my changes as part of update datafusion to 5.1.0 for python binding #967 PR. Please, let me know.

@houqp
Copy link
Member Author

houqp commented Sep 5, 2021

the requirement.txt is locked to Python 3.8 version.

The requirements.txt is supposed to work for all python versions supported by the python binding. if it's broken for python 3.7, could you file a separate issue with the error message? We can continue the troubleshooting there.

I would like to submit my changes as part of update datafusion to 5.1.0 for python binding #967 PR. Please, let me know.

That's great, I recommend you send a separate PR based off the branch in #967 for this, or collaborate on #969 to get that issue addressed.

@mmuru
Copy link
Contributor

mmuru commented Sep 6, 2021

@houqp: Sure, created #975. Ping me if you need more information.

@houqp
Copy link
Member Author

houqp commented Sep 30, 2021

@andygrove @alamb @Dandandan @jorgecarleitao @nevi-me given that we have had many major breaking changes merged in since the 5.x release, I am thinking maybe it's better to skip 5.1 and go 6.0 after #1010 gets merged. What do you think?

@alamb
Copy link
Contributor

alamb commented Sep 30, 2021

merged in since the 5.x release

I think using 6.0 is a good idea. I also don't think we need to wait for #1010 to be merged for a release, if we need to get the python binding / cli out sooner

@houqp
Copy link
Member Author

houqp commented Oct 2, 2021

sounds good, I think think I can try help push #873 to the finish line after you have arrow 6 released.

@alamb
Copy link
Contributor

alamb commented Oct 2, 2021

sounds good, I think think I can try help push #873 to the finish line after you have arrow 6 released.

It sounds like we are aiming to release arrow 6.0 in 2 weeks or so

@houqp houqp changed the title Release Datafusion 5.1.0 Release Datafusion 6.0.0 Oct 9, 2021
@tupshin
Copy link

tupshin commented Oct 19, 2021

arrow 6 has been released. Any ETA on this one? I'm really looking forward to an up to date python API, in particular

@houqp
Copy link
Member Author

houqp commented Oct 20, 2021

@tupshin I pinged #873 again, once that's merged, we could kick off the release process, which usually takes 3-5 days.

@jimexist
Copy link
Member

related Homebrew/homebrew-core#88184

@tupshin
Copy link

tupshin commented Nov 5, 2021

Not to nag, but I see 873 is merged. How we doing?

@houqp
Copy link
Member Author

houqp commented Nov 6, 2021

I am working on the changelog and the release PR, should be out this weekend.

@alamb
Copy link
Contributor

alamb commented Nov 6, 2021

FYI see #1253

@houqp
Copy link
Member Author

houqp commented Nov 14, 2021

rc0 tag pushed, working on automation to package and sign python wheels now. once that's done, i will send out the request for vote email.

@houqp
Copy link
Member Author

houqp commented Nov 17, 2021

Vote passed and I have pushed the release tags into Github. The release steps requires PMC member access. @alamb @andygrove @jorgecarleitao @kszucs could one of you follow the steps in https://github.com/apache/arrow-datafusion/tree/master/dev/release#finalize-the-release to complete the release?

The remaining steps are:

  • run ./dev/release/release-tarball.sh 6.0.0 0
  • publish to crates.io
  • publish to pypi

@houqp
Copy link
Member Author

houqp commented Nov 17, 2021

@jimexist we should be able to update datafusion-cli in homebrew as well.

@jimexist
Copy link
Member

Homebrew/homebrew-core#89562

@alamb
Copy link
Contributor

alamb commented Nov 17, 2021

Vote passed and I have pushed the release tags into Github. The release steps requires PMC member access. @alamb @andygrove @jorgecarleitao @kszucs could one of you follow the steps in https://github.com/apache/arrow-datafusion/tree/master/dev/release#finalize-the-release to complete the release?

@houqp I will do so now. Thank you for all the work in this regard

@houqp
Copy link
Member Author

houqp commented Nov 17, 2021

Thanks @alamb ! I just noticed I forgot to add (cd datafusion-cli && cargo publish) in the release doc, could you help run that command as well? I will send a PR to get the doc updated later today.

@andygrove @jorgecarleitao @kou @kszucs @xhochy we will need your help to publish the python binding to PyPI since only you are listed as maintainers of the PyPI package. The steps are documented at https://github.com/apache/arrow-datafusion/tree/master/dev/release#publish-on-pypi

@xhochy
Copy link
Member

xhochy commented Nov 17, 2021

I'd rather give more people access to PyPI ;)

@kou
Copy link
Member

kou commented Nov 17, 2021

I'm trying.

I found a typo in the document:

diff --git a/dev/release/README.md b/dev/release/README.md
index 2127dc23..73b3eb1a 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -304,7 +304,7 @@ PyPI, in order to conform to Apache Software Foundation governance standards.
 First, download all official python release artifacts:
 
 ```shell
-svn co https://dist.apache.org/repos/dist/release/arrow/apache-arrow-datafusion-5.1.0-rc0/python ./python-artifacts
+svn co https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-5.1.0/python ./python-artifacts
 ```
 
 Use [twine](https://pypi.org/project/twine/) to perform the upload.

https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-6.0.0/python/ uses 0.4.0 not 6.0.0. Is it OK?

@houqp
Copy link
Member Author

houqp commented Nov 17, 2021

I'd rather give more people access to PyPI ;)

+1 :D

Good catch @kou , I will include that fix in my docs PR. The version diff is expected because we want them to be decoupled so we can release major version change in the python binding without forcing a major version bump in datafusion.

@kou
Copy link
Member

kou commented Nov 17, 2021

OK. I've published them: https://pypi.org/project/datafusion/0.4.0/

I found one more typo:

diff --git a/dev/release/README.md b/dev/release/README.md
index 2127dc23..fcf090e3 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -310,7 +310,7 @@ svn co https://dist.apache.org/repos/dist/release/arrow/apache-arrow-datafusion-
 Use [twine](https://pypi.org/project/twine/) to perform the upload.
 
 ```shell
-twine upload ./python-artifactl/*.{tar.gz,whl}
+twine upload ./python-artifacts/*.{tar.gz,whl}
 ```
 
 ### Call the vote

@houqp
Copy link
Member Author

houqp commented Nov 17, 2021

Thank you @kou ! I will include that fix in my docs PR as well :)

@houqp houqp closed this as completed Nov 17, 2021
@alamb
Copy link
Contributor

alamb commented Nov 18, 2021

Thanks @alamb ! I just noticed I forgot to add (cd datafusion-cli && cargo publish) in the release doc, could you help run that command as well? I will send a PR to get the doc updated later today.

Hi @houqp

I tried to publish datafusion-cli and I got the following error. It looks like datafusion-cli relies on ballista somehow

(arrow_dev) alamb@MacBook-Pro:~/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli$ cargo publish
    Updating crates.io index
warning: manifest has no description.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
   Packaging datafusion-cli v5.1.0-SNAPSHOT (/Users/alamb/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli)
error: failed to prepare local package for uploading

Caused by:
  failed to select a version for the requirement `ballista = "^0.6.0"`
  candidate versions found which didn't match: 0.5.0, 0.3.0, 0.2.5, ...
  location searched: crates.io index
  required by package `datafusion-cli v5.1.0-SNAPSHOT (/Users/alamb/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli)`

@houqp
Copy link
Member Author

houqp commented Nov 18, 2021

oh yeah, it supports ballista as a way to perform remote query execution. @alamb looks like we haven't published the ballista crates yet? could you do that first by following https://github.com/apache/arrow-datafusion/tree/master/dev/release#publish-on-cratesio?

@alamb
Copy link
Contributor

alamb commented Nov 18, 2021

could you do that first by following https://github.com/apache/arrow-datafusion/tree/master/dev/release#publish-on-cratesio?

Done (updated instructions in #1331)

Turns out I still can't upload datafusion-cli package:

(arrow_dev) alamb@MacBook-Pro:~/Downloads/apache-arrow-datafusion-6.0.0$ (cd datafusion-cli && cargo publish)
....
    Finished dev [unoptimized + debuginfo] target(s) in 1m 20s
   Uploading datafusion-cli v5.1.0-SNAPSHOT (/Users/alamb/Downloads/apache-arrow-datafusion-6.0.0/datafusion-cli)
error: failed to publish to registry at https://crates.io

Caused by:
  the remote server responded with an error: invalid upload request: invalid length 7, expected at most 5 keywords per crate at line 1 column 3441

I will file a ticket Tracked by #1332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants