Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write cuDF version in Parquet "created_by" metadata field #14721

Merged
merged 12 commits into from
Jan 10, 2024
8 changes: 7 additions & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# =============================================================================
# Copyright (c) 2018-2023, NVIDIA CORPORATION.
# Copyright (c) 2018-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
Expand Down Expand Up @@ -658,6 +658,12 @@ set_source_files_properties(
PROPERTIES COMPILE_DEFINITIONS "_FILE_OFFSET_BITS=64"
)

set_property(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line 650 above this is using set_source_files_properties so should we use it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a CMake expert 😅. I used set_property since it's also used below for setting the same property for jit/cache.cpp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not cmake expert. Okay probably they both can achieve the same output 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_property is probably easier to work with here because you can APPEND with it. This solution is fine.

SOURCE src/io/parquet/writer_impl.cu
APPEND
PROPERTY COMPILE_DEFINITIONS "CUDF_VERSION=${PROJECT_VERSION}"
)

set_target_properties(
cudf
PROPERTIES BUILD_RPATH "\$ORIGIN"
Expand Down
9 changes: 6 additions & 3 deletions cpp/src/io/parquet/writer_impl.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -99,6 +99,10 @@ struct aggregate_writer_metadata {
}
}

#ifndef CUDF_VERSION
#error "CUDF_VERSION is not defined"
#endif
Copy link
Contributor

@ttnghia ttnghia Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the middle of the aggregate_writer_metadata class so the indentation here is terrible. I would recommend moving it to the top of this file, or moving it inside of the get_metadata below, just right before line 115.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it close to where it's used, but it looked uglier to me in the middle of the function. Moved it to the top just after the includes.


FileMetaData get_metadata(size_t part)
{
CUDF_EXPECTS(part < files.size(), "Invalid part index queried");
Expand All @@ -108,7 +112,7 @@ struct aggregate_writer_metadata {
meta.num_rows = this->files[part].num_rows;
meta.row_groups = this->files[part].row_groups;
meta.key_value_metadata = this->files[part].key_value_metadata;
meta.created_by = this->created_by;
meta.created_by = "cudf version " CUDF_STRINGIFY(CUDF_VERSION);
ttnghia marked this conversation as resolved.
Show resolved Hide resolved
meta.column_orders = this->column_orders;
return meta;
}
Expand Down Expand Up @@ -171,7 +175,6 @@ struct aggregate_writer_metadata {
std::vector<std::vector<uint8_t>> column_indexes;
};
std::vector<per_file_metadata> files;
std::string created_by = "";
thrust::optional<std::vector<ColumnOrder>> column_orders = thrust::nullopt;
};

Expand Down