Skip to content

Latest commit

 

History

History
41 lines (28 loc) · 1.67 KB

DEVELOPMENT.md

File metadata and controls

41 lines (28 loc) · 1.67 KB

Packaging

For databricks to properly install a C++ extension, one must take a detour through pypi. Use twine to upload the package to pypi.

cd python

python setup.py sdist

twine upload dist/pysarplus-*.tar.gz

On Spark one can install all 3 components (C++, Python, Scala) in one pass by creating a Spark Package. Documentation is rather sparse. Steps to install

  1. Package and publish the pip package (see above)
  2. Package the Spark package, which includes the Scala formatter and references the pip package (see below)
  3. Upload the zipped Scala package to Spark Package through a browser. sbt spPublish has a few issues so it always fails for me. Don't use spPublishLocal as the packages are not created properly (names don't match up, issue) and furthermore fail to install if published to Spark-Packages.org.
cd scala
sbt spPublish

Testing

To test the python UDF + C++ backend

cd python 
python setup.py install && pytest -s tests/

To test the Scala formatter

cd scala
sbt test

(use ~test and it will automatically check for changes in source files, but not build.sbt)