-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependencies doc #487
Conversation
kathweinschenkprophecy
commented
Dec 20, 2024
•
edited
Loading
edited
- To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1208013806020375
- https://app.asana.com/0/0/1208580452462191
| Scope | The dependency is enabled at the Project level or the Pipeline level. | | ||
| Type | The dependency is either from the Package Hub, Scala (Maven) or Python (PyPi). | | ||
| Name | This will identify the dependency. | | ||
| Version/Package/Coordinates | For Package Hub dependencies, input the package version. For Scala, use the Maven coordinates in the `groupId:artifcatId:version` format. For example, use `org.postgresql:postgresql:42.3.3` For Python, use the package and the version number. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the option to choose "Package Hub" when installing a Pipelines dependency. Can I provide a reason why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may be using a project that has Prophecy Managed Git. I believe that may not allow you to use packagehub. we can discuss more over chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see it on a project using external git. Though, it doesn't really matter for these docs at the moments... we can return to the question
To do:
|
| Scope | The dependency is enabled at the Project level or the Pipeline level. | | ||
| Type | The dependency is either from the Package Hub, Scala (Maven) or Python (PyPi). | | ||
| Name | This will identify the dependency. | | ||
| Version/Package/Coordinates | For Package Hub dependencies, input the package version. For Scala, use the Maven coordinates in the `groupId:artifcatId:version` format. For example, use `org.postgresql:postgresql:42.3.3` For Python, use the package and the version number. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may be using a project that has Prophecy Managed Git. I believe that may not allow you to use packagehub. we can discuss more over chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realized that there is 1 more section we need to add:
Scala Dependencies in Pyspark: how do python projects track scala dependencies
We need to note how that gets tracked in pbt_project.yaml and have a hyperlink to the PBT (needs to be written) page which describes the option to build python WHLs with a dummy POM.xml file ... for now we can just hyperlink to PBT main page.
I can provide more details on this tomorrow
additional info for new section: When deploying pipelines in the WHL format we must consider dependencies both in Python and Scala. These Scala jars are used by spark applications in the underlying jvm ( even in pyspark applications). The WHL format inherently records python dependencies, however there is no industry standard for WHL files to specify non-python dependencies (jar files). As a result we have enhanced PBT to record the Scala dependency information and store it in the WHL file recently we made an improvement to the prophecy-build-tool which allows you to generate any Scala dependencies of pyspark pipelines as just run pbt once on your project like so:
and it will generate those files and add them to the WHL under |
@kathweinschenkprophecy , I made a few changes; mostly pointing out that this is only necessary if the user did not create a Job in the Prophecy UI and is deploying the WHL file manually (not using Please go ahead and correct anything if I made any bad suggestions. |
# Conflicts: # docusaurus.config.js