-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the udf-examples module from spark-rapids repository to here #94
Conversation
Signed-off-by: Chong Gao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Spark-Rapids
directory seems redundant, as this repository is already named spark-rapids-examples
. I suggest just dropping the Spark-Rapids
directory and using udf-examples
directly or possibly renaming it something like RAPIDS-accelerated-UDFs
.
<!-- Note: rapids-4-spark release version is not published in the Maven Central --> | ||
<!-- Depends on its release version 21.12.0 --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This comment is redundant with the comment next to to the <version>
tag of the dependency below, and this comment has the drawback that it needs to be kept in sync with the version used below. I suggest removing this and just keeping the comment below (if any comment at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
<parent> | ||
<groupId>com.nvidia</groupId> | ||
<artifactId>rapids-4-spark-parent</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need for this project to depend on the parent project of a separate repository. We should just copy the few relevant sections here (i.e.: license
and developers
), add its own scm
section, etc. All the profile shenanigans and dependencyMgmt should not apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the dependency to Rapids parent project.
Removed the databricks and cdh311 profiles. The pre-merge and nightly build will be tracked by NVIDIA/spark-rapids#4704.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This examples are target to Spark 3.0.1, is it necessary to add profiles for other Spark versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be fine. The UDF examples are using public, stable APIs of Spark and thus should not need to be recompiled per version of Spark.
If we had to compile per Spark verssion then we'd need to deploy multiple versions of the UDF examples jar with a Spark version classifier.
Can we change this pr target to 22.02, the udf-examples can be found more easily. Since we will update 22.04 as the default branch two months later(the date when rapids 22.04 released) |
Rebased to branch 22.02 |
user defined functions for use with the RAPIDS Accelerator | ||
for Apache Spark | ||
</description> | ||
<version>22.04.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're targeting branch-22.02 but marking it as 22.04-SNAPSHOT?
Changing this to 22.02-SNAPSHOT probably is not the right fix. I don't think it makes sense to target branch-22.02 because for version 22.02 of the plugin repo, the "official" artifact for 22.02 still comes from the plugin repository, and thus we'd have two separate codebases that claim to publish version 22.02 of the artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to version 0.1
</parent> | ||
<artifactId>rapids-4-spark-udf-examples_2.12</artifactId> | ||
<groupId>com.nvidia</groupId> | ||
<artifactId>RAPIDS-accelerated-UDFs_2.12</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment about RAPIDS-accelerated-UDFs
referred to the directory name in the repo, not the artifact name.
I think we should prefer the previous artifact name. The previous name leverages the same rapids-4-spark name of the plugin, so it's clearly tied to and related to that project, and also notes that these jars contain UDF examples.
user defined functions for use with the RAPIDS Accelerator | ||
for Apache Spark | ||
</description> | ||
<version>0.1</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind picking version 0.1? We're using calendar versioning, and this is currently targeting branch-22.02 (but it should target branch-22.04 per my previous comment, as the RAPIDS Accelerator version 22.02 has the "official" source for this artifact for 22.02).
This should be version 22.04.0-SNAPSHOT and updated to 22.04.0 just before deploying the official artifact when we ship version 22.04 across our repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw the version of Spark-ETL+XGBoost is 0.2.2.
Updated to version 22.04.0-SNAPSHOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we also need to change Spark-ETL+XGBoost version to calendar version to keep in sync, I will update it in the coming up PR.
[BUG] Dependencies missing of spark-rapids v21.12.0 release jars NVIDIA/spark-rapids#4253
Move the udf-examples module from spark-rapids repository to here
Move the udf-examples to the external repository spark-rapids-examples and the spark-rapids will no longer contain the udf-examples module.
The moved udf-examples depends on the previous release dist jar, can't depends on the Snapshot jar, because of the Snapshot jar is not published to the Maven Central.
e.g.: udf-examples 22.04 Snapshot depends on dist 22.02.0 version.
Signed-off-by: Chong Gao [email protected]