Replies: 2 comments 5 replies
-
One other point... by trial-and-error I found an approach for getting UDF's working on databricks but it is a very different one that what is described in the README. In my approach I was able to get things working by tinkering with the environment variable, DOTNET_ASSEMBLY_SEARCH_PATHS, and getting it to point to a files in /dbfs/FileStore/MyAppWhatever. That seemed to work insofar as my simple testing was concerned. But obviously I don't want to use it if it isn't the best practice. It may have been just by accident that I was able to get things working ... and I kept wondering what circumstances would cause my related assemblies to stop being loaded properly. I finally found the README. Thankfully it was before heading too far down the wrong path. Hopefully the approach in the README will also be able to support any *.deps.json annotations that may be needed. |
Beta Was this translation helpful? Give feedback.
-
@Niharikadutta do you happen to know how to update the doc.microsoft.com? |
Beta Was this translation helpful? Give feedback.
-
By trial-and-error I was able to get UDF's working on databricks.
It wasn't until recently that I noticed there were actually some instructions for it here:
https://github.com/dotnet/spark/blob/main/deployment/README.md#databricks
Previously I had only been following the instructions at "docs.microsoft.com" that are found at the following locations...
... those docs are very vague about UDF's. It is unfortunate, since the UDF's are where we get most of the benefits of Spark. Notice that those docs simply say...
Microsoft.Spark.Worker helps Apache Spark execute your app, such as any user-defined functions (UDFs) you may have written.
Is there someone who can make a change to those databricks-specific docs, and get them to link to the README.md in this github project? That would have saved me from flailing around for so very long. The most important part of the github README is the stuff about deploying application assemblies that are used by the workers (below).
Oddly the instructions at "docs.microsoft.com" don't refer at all to these assemblies which are needed by the Microsoft.Spark.Worker. The omission is significant since it gives us the false impression that all workers will all have direct access to the contents of the app.zip which was deployed. Anyone who has worked with databricks/scala will know that the jars you add to the cluster are available to both the driver and workers. So it is natural to think the same thing about the app.zip.
Beta Was this translation helpful? Give feedback.
All reactions