-
Notifications
You must be signed in to change notification settings - Fork 121
Troubleshooting: Extracting verbose logs of the Cosmos DB OLTP Spark connector from Databricks
Fabian Meiswinkel edited this page Oct 12, 2020
·
4 revisions
Create a bash-script file "configurelog4j.sh" that will set the log-level and redirect the logs for the Cosmos DB OLTP Spark connector. The sample below collects logs mostly relevant for bulk ingestion - you can modify it based on your workload and if the log files grow too big you can reduce the log levels from TRACE to INFO or WARN. Make sure the bash-script file uses Unix-EOL (end-of-line) characters (only LF not CR+LF - otherwise the initialization script would fail with "Cluster scoped init script dbfs:/ failed: Script exit status is non-zero").
#!/bin/bash
echo "Executing on Driver: $DB_IS_DRIVER"
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties"
else
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties"
fi
echo ""
echo "Showing original log4j.properties for: ${LOG4J_PATH}"
echo "$(cat ${LOG4J_PATH})"
echo "" >> ${LOG4J_PATH}
echo "# Enable TRACE logging for Spark connector" >> ${LOG4J_PATH}
echo "com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor=TRACE" >> ${LOG4J_PATH}
echo "com.microsoft.azure.cosmosdb.spark.CosmosDBSpark=TRACE" >> ${LOG4J_PATH}
echo "log4j.logger.com.microsoft.azure.cosmosdb.spark.CosmosDBConnectionCache=TRACE" >> ${LOG4J_PATH}
echo "log4j.logger.com.microsoft.azure.cosmosdb.spark.AsyncCosmosDBConnection=TRACE" >> ${LOG4J_PATH}
echo "import com.microsoft.azure.cosmosdb.rx.AsyncDocumentClient=TRACE" >> ${LOG4J_PATH}
echo "com.microsoft.azure.cosmosdb.spark.streaming.CosmosDBWriteStreamRetryPolicyUtil=TRACE" >> ${LOG4J_PATH}
echo ""
echo "Showing modified log4j.properties for: ${LOG4J_PATH}"
echo "$(cat ${LOG4J_PATH})"
Enable logging to the dbfs filesystem for the cluster
Specify the (possibly modified) configurelog4j.sh script as start-up script on the cluster.
- Installation: https://docs.databricks.com/dev-tools/cli/index.html#set-up-the-cli
- Copy the configurelog4j.sh to the Databricks file system via the databricks CLI (“
databricks fs cp <LocalSourceDirectory>\configurelog4j.sh dbfs:/configurelog4j.sh
”) - Restart the cluster
- Run the repro
- Copy the logfiles to the local machine (“
databricks fs cp dbfs:/cluster-logs <LocalLogDirectoy> -r
”)