Skip to content

Commit

Permalink
Sort Delta log objects when comparing and avoid caching all logs (NVI…
Browse files Browse the repository at this point in the history
…DIA#7456)

Signed-off-by: Jason Lowe <[email protected]>

Signed-off-by: Jason Lowe <[email protected]>
  • Loading branch information
jlowe authored Jan 5, 2023
1 parent e9e605b commit 17343d5
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 2 deletions.
5 changes: 3 additions & 2 deletions integration_tests/run_pyspark_from_build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
# Copyright (c) 2020-2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -220,7 +220,8 @@ else
export PYSP_TEST_spark_jars="${ALL_JARS//:/,}"
fi

export PYSP_TEST_spark_driver_extraJavaOptions="-ea -Duser.timezone=UTC $COVERAGE_SUBMIT_FLAGS"
# Set the Delta log cache size to prevent the driver from caching every Delta log indefinitely
export PYSP_TEST_spark_driver_extraJavaOptions="-ea -Duser.timezone=UTC -Ddelta.log.cacheSize=10 $COVERAGE_SUBMIT_FLAGS"
export PYSP_TEST_spark_executor_extraJavaOptions='-ea -Duser.timezone=UTC'
export PYSP_TEST_spark_ui_showConsoleProgress='false'
export PYSP_TEST_spark_sql_session_timeZone='UTC'
Expand Down
6 changes: 6 additions & 0 deletions integration_tests/src/main/python/delta_lake_write_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,12 @@ def decode_jsons(json_data):
# Skip whitespace between records
while idx < len(json_data) and json_data[idx].isspace():
idx += 1
# reorder to produce a consistent output for comparison
def json_to_sort_key(j):
keys = sorted(j.keys())
paths = sorted([ v.get("path", "") for v in j.values() ])
return ','.join(keys + paths)
jsons.sort(key=json_to_sort_key)
return jsons

def assert_gpu_and_cpu_delta_logs_equivalent(spark, data_path):
Expand Down

0 comments on commit 17343d5

Please sign in to comment.