Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuFile direct device reads/writes by default in cuIO #9722

Merged
merged 15 commits into from
Nov 19, 2021
36 changes: 36 additions & 0 deletions cpp/src/io/utilities/config_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,41 @@ inline std::string getenv_or(std::string const& env_var_name, std::string_view d
return std::string{(env_val == nullptr) ? default_val : env_val};
}

namespace cufile_integration {

namespace {
vuule marked this conversation as resolved.
Show resolved Hide resolved
/**
* @brief Defines which cuFile usage to enable.
*/
enum class usage_policy : uint8_t { OFF, GDS, ALWAYS };

/**
* @brief Get the current usage policy.
*/
inline usage_policy get_env_policy()
{
static auto const env_val = getenv_or("LIBCUDF_CUFILE_POLICY", "GDS");
vyasr marked this conversation as resolved.
Show resolved Hide resolved
if (env_val == "OFF") return usage_policy::OFF;
if (env_val == "ALWAYS") return usage_policy::ALWAYS;
return usage_policy::GDS;
vuule marked this conversation as resolved.
Show resolved Hide resolved
}
} // namespace

/**
* @brief Returns true if cuFile and its compatiblity mode are enabled.
vuule marked this conversation as resolved.
Show resolved Hide resolved
*/
inline bool is_always_enabled() { return get_env_policy() == usage_policy::ALWAYS; }

/**
* @brief Returns true if only direct IO through cuFile are enabled (compatiblity mode is disabled).
vuule marked this conversation as resolved.
Show resolved Hide resolved
*/
inline bool is_gds_enabled()
{
return is_always_enabled() or get_env_policy() == usage_policy::GDS;
}

} // namespace cufile_integration

namespace nvcomp_integration {

namespace {
Expand Down Expand Up @@ -64,4 +99,5 @@ inline bool is_stable_enabled()
}

} // namespace nvcomp_integration

} // namespace cudf::io::detail
9 changes: 5 additions & 4 deletions cpp/src/io/utilities/datasource.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,16 @@
* limitations under the License.
*/

#include "file_io_utilities.hpp"

#include <cudf/io/datasource.hpp>
#include <cudf/utilities/error.hpp>
#include <io/utilities/config_utils.hpp>

#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>

#include <cudf/utilities/error.hpp>
#include "file_io_utilities.hpp"

namespace cudf {
namespace io {
namespace {
Expand Down Expand Up @@ -239,7 +240,7 @@ std::unique_ptr<datasource> datasource::create(const std::string& filepath,
size_t size)
{
#ifdef CUFILE_FOUND
if (detail::cufile_config::instance()->is_required()) {
if (detail::cufile_integration::is_always_enabled()) {
// avoid mmap as GDS is expected to be used for most reads
return std::make_unique<direct_read_source>(filepath.c_str());
}
Expand Down
110 changes: 57 additions & 53 deletions cpp/src/io/utilities/file_io_utilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,45 +51,14 @@ file_wrapper::~file_wrapper() { close(fd); }

#ifdef CUFILE_FOUND

cufile_config::cufile_config() : policy{getenv_or("LIBCUDF_CUFILE_POLICY", default_policy)}
{
if (is_enabled()) {
// Modify the config file based on the policy
auto const config_file_path = getenv_or(json_path_env_var, "/etc/cufile.json");
std::ifstream user_config_file(config_file_path);
// Modified config file is stored in a temporary directory
auto const cudf_config_path = tmp_config_dir.path() + "/cufile.json";
std::ofstream cudf_config_file(cudf_config_path);

std::string line;
while (std::getline(user_config_file, line)) {
std::string const tag = "\"allow_compat_mode\"";
if (line.find(tag) != std::string::npos) {
// TODO: only replace the true/false value
// Enable compatiblity mode when cuDF does not fall back to host path
cudf_config_file << tag << ": " << (is_required() ? "true" : "false") << ",\n";
} else {
cudf_config_file << line << '\n';
}

// Point libcufile to the modified config file
CUDF_EXPECTS(setenv(json_path_env_var.c_str(), cudf_config_path.c_str(), 0) == 0,
"Failed to set the cuFile config file environment variable.");
}
}
}
cufile_config const* cufile_config::instance()
{
static cufile_config _instance;
return &_instance;
}

/**
* @brief Class that dynamically loads the cuFile library and manages the cuFile driver.
*/
class cufile_shim {
private:
cufile_shim();
void modify_cufile_json();
vuule marked this conversation as resolved.
Show resolved Hide resolved
void load_cufile_lib();

void* cf_lib = nullptr;
decltype(cuFileDriverOpen)* driver_open = nullptr;
Expand All @@ -116,25 +85,60 @@ class cufile_shim {
decltype(cuFileWrite)* write = nullptr;
};

void cufile_shim::modify_cufile_json()
{
std::string const json_path_env_var = "CUFILE_ENV_PATH_JSON";
temp_directory tmp_config_dir{"cudf_cufile_config"};

// Modify the config file based on the policy
auto const config_file_path = getenv_or(json_path_env_var, "/etc/cufile.json");
std::ifstream user_config_file(config_file_path);
// Modified config file is stored in a temporary directory
auto const cudf_config_path = tmp_config_dir.path() + "/cufile.json";
std::ofstream cudf_config_file(cudf_config_path);

std::string line;
while (std::getline(user_config_file, line)) {
std::string const tag = "\"allow_compat_mode\"";
if (line.find(tag) != std::string::npos) {
// TODO: only replace the true/false value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the fact the file is overwritten and not actually modified should be documented in the .rst somehow.

// Enable compatiblity mode when cuDF does not fall back to host path
vuule marked this conversation as resolved.
Show resolved Hide resolved
cudf_config_file << tag << ": "
<< (cufile_integration::is_always_enabled() ? "true" : "false") << ",\n";
} else {
cudf_config_file << line << '\n';
}

// Point libcufile to the modified config file
CUDF_EXPECTS(setenv(json_path_env_var.c_str(), cudf_config_path.c_str(), 0) == 0,
"Failed to set the cuFile config file environment variable.");
}
}

void cufile_shim::load_cufile_lib()
{
cf_lib = dlopen("libcufile.so", RTLD_NOW);
driver_open = reinterpret_cast<decltype(driver_open)>(dlsym(cf_lib, "cuFileDriverOpen"));
CUDF_EXPECTS(driver_open != nullptr, "could not find cuFile cuFileDriverOpen symbol");
driver_close = reinterpret_cast<decltype(driver_close)>(dlsym(cf_lib, "cuFileDriverClose"));
CUDF_EXPECTS(driver_close != nullptr, "could not find cuFile cuFileDriverClose symbol");
handle_register =
reinterpret_cast<decltype(handle_register)>(dlsym(cf_lib, "cuFileHandleRegister"));
CUDF_EXPECTS(handle_register != nullptr, "could not find cuFile cuFileHandleRegister symbol");
handle_deregister =
reinterpret_cast<decltype(handle_deregister)>(dlsym(cf_lib, "cuFileHandleDeregister"));
CUDF_EXPECTS(handle_deregister != nullptr, "could not find cuFile cuFileHandleDeregister symbol");
read = reinterpret_cast<decltype(read)>(dlsym(cf_lib, "cuFileRead"));
CUDF_EXPECTS(read != nullptr, "could not find cuFile cuFileRead symbol");
write = reinterpret_cast<decltype(write)>(dlsym(cf_lib, "cuFileWrite"));
CUDF_EXPECTS(write != nullptr, "could not find cuFile cuFileWrite symbol");
}

cufile_shim::cufile_shim()
{
try {
cf_lib = dlopen("libcufile.so", RTLD_NOW);
driver_open = reinterpret_cast<decltype(driver_open)>(dlsym(cf_lib, "cuFileDriverOpen"));
CUDF_EXPECTS(driver_open != nullptr, "could not find cuFile cuFileDriverOpen symbol");
driver_close = reinterpret_cast<decltype(driver_close)>(dlsym(cf_lib, "cuFileDriverClose"));
CUDF_EXPECTS(driver_close != nullptr, "could not find cuFile cuFileDriverClose symbol");
handle_register =
reinterpret_cast<decltype(handle_register)>(dlsym(cf_lib, "cuFileHandleRegister"));
CUDF_EXPECTS(handle_register != nullptr, "could not find cuFile cuFileHandleRegister symbol");
handle_deregister =
reinterpret_cast<decltype(handle_deregister)>(dlsym(cf_lib, "cuFileHandleDeregister"));
CUDF_EXPECTS(handle_deregister != nullptr,
"could not find cuFile cuFileHandleDeregister symbol");
read = reinterpret_cast<decltype(read)>(dlsym(cf_lib, "cuFileRead"));
CUDF_EXPECTS(read != nullptr, "could not find cuFile cuFileRead symbol");
write = reinterpret_cast<decltype(write)>(dlsym(cf_lib, "cuFileWrite"));
CUDF_EXPECTS(write != nullptr, "could not find cuFile cuFileWrite symbol");
modify_cufile_json();
load_cufile_lib();

CUDF_EXPECTS(driver_open().err == CU_FILE_SUCCESS, "Failed to initialize cuFile driver");
} catch (cudf::logic_error const& err) {
Expand Down Expand Up @@ -285,11 +289,11 @@ std::future<void> cufile_output_impl::write_async(void const* data, size_t offse
std::unique_ptr<cufile_input_impl> make_cufile_input(std::string const& filepath)
{
#ifdef CUFILE_FOUND
if (cufile_config::instance()->is_enabled()) {
if (cufile_integration::is_gds_enabled()) {
try {
return std::make_unique<cufile_input_impl>(filepath);
} catch (...) {
if (cufile_config::instance()->is_required()) throw;
if (cufile_integration::is_always_enabled()) throw;
}
}
#endif
Expand All @@ -299,11 +303,11 @@ std::unique_ptr<cufile_input_impl> make_cufile_input(std::string const& filepath
std::unique_ptr<cufile_output_impl> make_cufile_output(std::string const& filepath)
{
#ifdef CUFILE_FOUND
if (cufile_config::instance()->is_enabled()) {
if (cufile_integration::is_gds_enabled()) {
try {
return std::make_unique<cufile_output_impl>(filepath);
} catch (...) {
if (cufile_config::instance()->is_required()) throw;
if (cufile_integration::is_always_enabled()) throw;
}
}
#endif
Expand Down
26 changes: 0 additions & 26 deletions cpp/src/io/utilities/file_io_utilities.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,32 +162,6 @@ class cufile_output : public cufile_io_base {

class cufile_shim;

/**
* @brief Class that manages cuFile configuration.
*/
class cufile_config {
std::string const default_policy = "OFF";
std::string const json_path_env_var = "CUFILE_ENV_PATH_JSON";

std::string const policy = default_policy;
temp_directory tmp_config_dir{"cudf_cufile_config"};

cufile_config();

public:
/**
* @brief Returns true when cuFile use is enabled.
*/
bool is_enabled() const { return policy == "ALWAYS" or policy == "GDS"; }

/**
* @brief Returns true when cuDF should not fall back to host IO.
*/
bool is_required() const { return policy == "ALWAYS"; }

static cufile_config const* instance();
};

/**
* @brief Class that provides RAII for cuFile file registration.
*/
Expand Down
8 changes: 3 additions & 5 deletions docs/cudf/source/basics/io-gds-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ GPUDirect Storage Integration
Many IO APIs can use GPUDirect Storage (GDS) library to optimize IO operations.
GDS enables a direct data path for direct memory access (DMA) transfers between GPU memory and storage, which avoids a bounce buffer through the CPU.
GDS also has a compatibility mode that allows the library to fall back to copying through a CPU bounce buffer.
The SDK is available for download `here <https://developer.nvidia.com/gpudirect-storage>`_.
The SDK is available for download `here <https://developer.nvidia.com/gpudirect-storage>`_. Newer versions are also a part of CUDA toolkit (11.4 and higher).
vuule marked this conversation as resolved.
Show resolved Hide resolved
vuule marked this conversation as resolved.
Show resolved Hide resolved

Use of GPUDirect Storage in cuDF is disabled by default, and can be enabled through environment variable ``LIBCUDF_CUFILE_POLICY``.
Use of GPUDirect Storage in cuDF is enabled by default, but can be disabled through environment variable ``LIBCUDF_CUFILE_POLICY``.
vuule marked this conversation as resolved.
Show resolved Hide resolved
This variable also controls the GDS compatibility mode.

There are three special values for the environment variable:
Expand All @@ -15,7 +15,7 @@ There are three special values for the environment variable:
- "ALWAYS": Enable GDS use; GDS compatibility mode is *on*.
- "OFF": Compretely disable GDS use.
vuule marked this conversation as resolved.
Show resolved Hide resolved

Any other value (or no value set) will keep the GDS disabled for use in cuDF and IO will be done using cuDF's CPU bounce buffers.
Any other value (or no value set) will enable the GDS use, with compatibility mode turned *off*.
vuule marked this conversation as resolved.
Show resolved Hide resolved

This environment variable also affects how cuDF treats GDS errors.
When ``LIBCUDF_CUFILE_POLICY`` is set to "GDS" and a GDS API call fails for any reason, cuDF falls back to the internal implementation with bounce buffers.
Expand All @@ -30,5 +30,3 @@ Operations that support the use of GPUDirect Storage:
- `to_csv`
- `to_parquet`
- `to_orc`

NOTE: current GDS integration is not fully optimized and enabling GDS will not lead to performance improvements in all cases.