Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new KvikIO compatibility mode "AUTO" #547

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
0b69ff3
Add ALLOW mode
kingcrimsontianyu Nov 7, 2024
4eb21f5
Update
kingcrimsontianyu Nov 8, 2024
207b5dd
Update
kingcrimsontianyu Nov 8, 2024
a430d5b
Make the changes non-breaking
kingcrimsontianyu Nov 8, 2024
26431fb
Add comments
kingcrimsontianyu Nov 8, 2024
a76c3da
Add comments
kingcrimsontianyu Nov 8, 2024
bcbf81e
Adjustment
kingcrimsontianyu Nov 8, 2024
17de218
Rename the compat mode
kingcrimsontianyu Nov 9, 2024
8e9b478
Further simplify the implementation
kingcrimsontianyu Nov 9, 2024
1d938bd
Update
kingcrimsontianyu Nov 9, 2024
6b16210
Further simplify
kingcrimsontianyu Nov 9, 2024
b71f7f6
Add unit test
kingcrimsontianyu Nov 10, 2024
8e04177
Use pre-commit to fix formatting
kingcrimsontianyu Nov 10, 2024
b0ae4ac
Update Python interface
kingcrimsontianyu Nov 10, 2024
c0c2139
Fix Python issue
kingcrimsontianyu Nov 10, 2024
5d18bc8
Update comment
kingcrimsontianyu Nov 10, 2024
7509233
Add Python unit test
kingcrimsontianyu Nov 10, 2024
a711589
Update Python unit test
kingcrimsontianyu Nov 10, 2024
4ed4031
Add more test
kingcrimsontianyu Nov 11, 2024
c747241
Cleanup
kingcrimsontianyu Nov 11, 2024
18e6fd7
Address reviewer comments
kingcrimsontianyu Nov 12, 2024
087a691
Lint
kingcrimsontianyu Nov 12, 2024
01ad65d
Fix comment
kingcrimsontianyu Nov 12, 2024
fbf5cd3
Reset compat mode state for file handle
kingcrimsontianyu Nov 12, 2024
385c0cb
Fix test
kingcrimsontianyu Nov 12, 2024
736d8ec
Address review comment
kingcrimsontianyu Nov 12, 2024
570350a
Address reviewer comments
kingcrimsontianyu Nov 13, 2024
9c6b649
Update the doc
kingcrimsontianyu Nov 13, 2024
0f353ef
Improve comments
kingcrimsontianyu Nov 13, 2024
f8fdd73
Remove compat inference from defautls
kingcrimsontianyu Nov 13, 2024
b6e70a3
Address compat mode in buffer
kingcrimsontianyu Nov 13, 2024
9621e1b
A happy update
kingcrimsontianyu Nov 14, 2024
b7ad8c8
Update examples
kingcrimsontianyu Nov 14, 2024
8d48eaa
Cleanup
kingcrimsontianyu Nov 14, 2024
6c58f88
Update for the batch
kingcrimsontianyu Nov 14, 2024
d3f93a3
Fix copyright lint error
kingcrimsontianyu Nov 14, 2024
d922b1d
Fix docs
kingcrimsontianyu Nov 14, 2024
84ac5ea
Fix header
kingcrimsontianyu Nov 14, 2024
70f2f79
Fix Sphinx doc issue. Improve string parsing for compat mode
kingcrimsontianyu Nov 14, 2024
34c594d
Rename compat mode related functions
kingcrimsontianyu Nov 15, 2024
f34ea86
Fix compat mode. Update doc
kingcrimsontianyu Nov 15, 2024
74dc84c
Futher tweaks
kingcrimsontianyu Nov 15, 2024
f89965e
Update
kingcrimsontianyu Nov 15, 2024
b5c3a46
Revert
kingcrimsontianyu Nov 15, 2024
ca4bf22
Final touch
kingcrimsontianyu Nov 15, 2024
9e69c22
Simplify implementation
kingcrimsontianyu Nov 16, 2024
032b075
Further cleanup
kingcrimsontianyu Nov 16, 2024
d332da4
Improve doxygen doc
kingcrimsontianyu Nov 16, 2024
64490d2
Rename a compat mode function according to reviewer suggestion
kingcrimsontianyu Nov 19, 2024
95accfe
Remove WHATEVER as AUTO's alias
kingcrimsontianyu Nov 19, 2024
efae9b0
Cleanup
kingcrimsontianyu Nov 19, 2024
20f4373
Addres reviewers comment
kingcrimsontianyu Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions cpp/doxygen/main_page.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,19 @@ Then run the example:
## Runtime Settings

#### Compatibility Mode (KVIKIO_COMPAT_MODE)
When KvikIO is running in compatibility mode, it doesn't load `libcufile.so`. Instead, reads and writes are done using POSIX. Notice, this is not the same as the compatibility mode in cuFile. That is cuFile can run in compatibility mode while KvikIO is not.
When KvikIO is running in compatibility mode, it doesn't load `libcufile.so`. Instead, reads and writes are done using POSIX. Notice, this is not the same as the compatibility mode in cuFile. It is possible that KvikIO performs I/O in the non-compatibility mode by using the cuFile library, but the cuFile library itself is configured to operate in its own compatibility mode. For more details, refer to [cuFile compatibility mode](https://docs.nvidia.com/gpudirect-storage/api-reference-guide/index.html#cufile-compatibility-mode) and [cuFile environment variables](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#environment-variables)

Set the environment variable `KVIKIO_COMPAT_MODE` to enable/disable compatibility mode. By default, compatibility mode is enabled:
The environment variable `KVIKIO_COMPAT_MODE` has three options (case-insensitive):
- `ON` (aliases: `TRUE`, `YES`, `1`): Enable the compatibility mode.
- `OFF` (aliases: `FALSE`, `NO`, `0`): Disable the compatibility mode, and enforce cuFile I/O. GDS will be activated if the system requirements for cuFile are met and cuFile is properly configured. However, if the system is not suited for cuFile, I/O operations under the `OFF` option may error out, crash or hang.
- `AUTO`: Try cuFile I/O first, and fall back to POSIX I/O if the system requirements for cuFile are not met.

Under `AUTO`, KvikIO falls back to the compatibility mode:
- when `libcufile.so` cannot be found.
- when running in Windows Subsystem for Linux (WSL).
- when `/run/udev` isn't readable, which typically happens when running inside a docker image not launched with `--volume /run/udev:/run/udev:ro`.

This setting can also be controlled by `defaults::compat_mode()` and `defaults::compat_mode_reset()`.
This setting can also be programmatically controlled by `defaults::set_compat_mode()` and `defaults::compat_mode_reset()`.


#### Thread Pool (KVIKIO_NTHREADS)
Expand Down
4 changes: 2 additions & 2 deletions cpp/examples/basic_io.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ int main()
check(cudaSetDevice(0) == cudaSuccess);

cout << "KvikIO defaults: " << endl;
if (kvikio::defaults::compat_mode()) {
if (kvikio::defaults::is_compat_mode_preferred()) {
cout << " Compatibility mode: enabled" << endl;
} else {
kvikio::DriverInitializer manual_init_driver;
Expand Down Expand Up @@ -181,7 +181,7 @@ int main()
cout << "Parallel POSIX read (" << kvikio::defaults::thread_pool_nthreads()
<< " threads): " << read << endl;
}
if (kvikio::is_batch_and_stream_available() && !kvikio::defaults::compat_mode()) {
if (kvikio::is_batch_and_stream_available() && !kvikio::defaults::is_compat_mode_preferred()) {
std::cout << std::endl;
Timer timer;
// Here we use the batch API to read "/tmp/test-file" into `b_dev` by
Expand Down
2 changes: 1 addition & 1 deletion cpp/examples/basic_no_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ constexpr int LARGE_SIZE = 8 * SIZE; // LARGE SIZE to test partial s
int main()
{
cout << "KvikIO defaults: " << endl;
if (kvikio::defaults::compat_mode()) {
if (kvikio::defaults::is_compat_mode_preferred()) {
cout << " Compatibility mode: enabled" << endl;
} else {
kvikio::DriverInitializer manual_init_driver;
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/kvikio/batch.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -118,7 +118,7 @@ class BatchHandle {
std::vector<CUfileIOParams_t> io_batch_params;
io_batch_params.reserve(operations.size());
for (const auto& op : operations) {
if (op.file_handle.is_compat_mode_on()) {
if (op.file_handle.is_compat_mode_preferred()) {
throw CUfileException("Cannot submit a FileHandle opened in compatibility mode");
}

Expand Down
4 changes: 2 additions & 2 deletions cpp/include/kvikio/buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ inline void buffer_register(const void* devPtr_base,
int flags = 0,
const std::vector<int>& errors_to_ignore = std::vector<int>())
{
if (defaults::compat_mode()) { return; }
if (defaults::is_compat_mode_preferred()) { return; }
CUfileError_t status = cuFileAPI::instance().BufRegister(devPtr_base, size, flags);
if (status.err != CU_FILE_SUCCESS) {
// Check if `status.err` is in `errors_to_ignore`
Expand All @@ -67,7 +67,7 @@ inline void buffer_register(const void* devPtr_base,
*/
inline void buffer_deregister(const void* devPtr_base)
{
if (defaults::compat_mode()) { return; }
if (defaults::is_compat_mode_preferred()) { return; }
CUFILE_TRY(cuFileAPI::instance().BufDeregister(devPtr_base));
}

Expand Down
138 changes: 122 additions & 16 deletions cpp/include/kvikio/defaults.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/**
* @file
*/

#pragma once

#include <algorithm>
Expand All @@ -27,7 +32,48 @@
#include <kvikio/shim/cufile.hpp>

namespace kvikio {
/**
* @brief I/O compatibility mode.
*/
enum class CompatMode : uint8_t {
OFF, ///< Enforce cuFile I/O. GDS will be activated if the system requirements for cuFile are met
///< and cuFile is properly configured. However, if the system is not suited for cuFile, I/O
///< operations under the OFF option may error out, crash or hang.
ON, ///< Enforce POSIX I/O.
AUTO, ///< Try cuFile I/O first, and fall back to POSIX I/O if the system requirements for cuFile
///< are not met.
};

namespace detail {
/**
* @brief Parse a string into a CompatMode enum.
*
* @param compat_mode_str Compatibility mode in string format(case-insensitive). Valid values
* include:
* - `ON` (alias: `TRUE`, `YES`, `1`)
* - `OFF` (alias: `FALSE`, `NO`, `0`)
* - `AUTO`
* @return A CompatMode enum.
*/
inline CompatMode parse_compat_mode_str(std::string_view compat_mode_str)
{
// Convert to lowercase
std::string tmp{compat_mode_str};
std::transform(
tmp.begin(), tmp.end(), tmp.begin(), [](unsigned char c) { return std::tolower(c); });

CompatMode res{};
if (tmp == "on" || tmp == "true" || tmp == "yes" || tmp == "1") {
res = CompatMode::ON;
} else if (tmp == "off" || tmp == "false" || tmp == "no" || tmp == "0") {
res = CompatMode::OFF;
} else if (tmp == "auto") {
res = CompatMode::AUTO;
} else {
throw std::invalid_argument("Unknown compatibility mode: " + std::string{tmp});
}
return res;
}

template <typename T>
T getenv_or(std::string_view env_var_name, T default_val)
Expand Down Expand Up @@ -77,16 +123,24 @@ inline bool getenv_or(std::string_view env_var_name, bool default_val)
std::string{env_val});
}

template <>
inline CompatMode getenv_or(std::string_view env_var_name, CompatMode default_val)
{
auto* env_val = std::getenv(env_var_name.data());
if (env_val == nullptr) { return default_val; }
return parse_compat_mode_str(env_val);
}

} // namespace detail

/**
* @brief Singleton class of default values used thoughtout KvikIO.
* @brief Singleton class of default values used throughout KvikIO.
*
*/
class defaults {
private:
BS::thread_pool _thread_pool{get_num_threads_from_env()};
bool _compat_mode;
CompatMode _compat_mode;
vuule marked this conversation as resolved.
Show resolved Hide resolved
std::size_t _task_size;
std::size_t _gds_threshold;
std::size_t _bounce_buffer_size;
Expand All @@ -104,13 +158,7 @@ class defaults {
{
// Determine the default value of `compat_mode`
{
if (std::getenv("KVIKIO_COMPAT_MODE") != nullptr) {
// Setting `KVIKIO_COMPAT_MODE` take precedence
_compat_mode = detail::getenv_or("KVIKIO_COMPAT_MODE", false);
} else {
// If `KVIKIO_COMPAT_MODE` isn't set, we infer based on runtime environment
_compat_mode = !is_cufile_available();
}
_compat_mode = detail::getenv_or("KVIKIO_COMPAT_MODE", CompatMode::AUTO);
}
// Determine the default value of `task_size`
{
Expand Down Expand Up @@ -163,19 +211,77 @@ class defaults {
* - when `/run/udev` isn't readable, which typically happens when running inside a docker
* image not launched with `--volume /run/udev:/run/udev:ro`
*
* @return The boolean answer
* @return Compatibility mode.
*/
[[nodiscard]] static CompatMode compat_mode() { return instance()->_compat_mode; }

/**
* @brief Reset the value of `kvikio::defaults::compat_mode()`.
*
* Changing the compatibility mode affects all the new FileHandles whose `compat_mode` argument is
* not explicitly set, but it never affects existing FileHandles.
*
* @param compat_mode Compatibility mode.
*/
static void compat_mode_reset(CompatMode compat_mode) { instance()->_compat_mode = compat_mode; }

/**
* @brief Infer the `AUTO` compatibility mode from the system runtime.
*
* If the requested compatibility mode is `AUTO`, set the expected compatibility mode to
* `ON` or `OFF` by performing a system config check; otherwise, do nothing. Effectively, this
* function reduces the requested compatibility mode from three possible states
* (`ON`/`OFF`/`AUTO`) to two (`ON`/`OFF`) so as to determine the actual I/O path. This function
* is lightweight as the inferred result is cached.
*/
static CompatMode infer_compat_mode_if_auto(CompatMode compat_mode)
{
if (compat_mode == CompatMode::AUTO) {
static auto inferred_compat_mode_for_auto = []() -> CompatMode {
return is_cufile_available() ? CompatMode::OFF : CompatMode::ON;
}();
return inferred_compat_mode_for_auto;
}
return compat_mode;
}

/**
* @brief Given a requested compatibility mode, whether it is expected to reduce to `ON`.
*
* This function returns true if any of the two condition is satisfied:
* - The compatibility mode is `ON`.
* - It is `AUTO` but inferred to be `ON`.
*
* Conceptually, the opposite of this function is whether requested compatibility mode is expected
* to be `OFF`, which would occur if any of the two condition is satisfied:
* - The compatibility mode is `OFF`.
* - It is `AUTO` but inferred to be `OFF`.
*
* @param compat_mode Compatibility mode.
* @return Boolean answer.
*/
[[nodiscard]] static bool compat_mode() { return instance()->_compat_mode; }
static bool is_compat_mode_preferred(CompatMode compat_mode)
{
return compat_mode == CompatMode::ON ||
(compat_mode == CompatMode::AUTO &&
defaults::infer_compat_mode_if_auto(compat_mode) == CompatMode::ON);
}

/**
* @brief Reset the value of `kvikio::defaults::compat_mode()`
* @brief Whether the global compatibility mode from class defaults is expected to be `ON`.
*
* This function returns true if any of the two condition is satisfied:
* - The compatibility mode is `ON`.
* - It is `AUTO` but inferred to be `ON`.
*
* Changing compatibility mode, effects all new FileHandles that doesn't sets the
* `compat_mode` argument explicitly but it never effect existing FileHandles.
* Conceptually, the opposite of this function is whether the global compatibility mode is
* expected to be `OFF`, which would occur if any of the two condition is satisfied:
* - The compatibility mode is `OFF`.
* - It is `AUTO` but inferred to be `OFF`.
*
* @param enable Whether to enable compatibility mode or not.
* @return Boolean answer.
*/
static void compat_mode_reset(bool enable) { instance()->_compat_mode = enable; }
static bool is_compat_mode_preferred() { return is_compat_mode_preferred(compat_mode()); }

/**
* @brief Get the default thread pool.
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/kvikio/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ struct CUfileException : public std::runtime_error {
if (error != CUDA_SUCCESS) { \
const char* err_name = nullptr; \
const char* err_str = nullptr; \
CUresult err_name_status = cudaAPI::instance().GetErrorName(error, &err_name); \
CUresult err_str_status = cudaAPI::instance().GetErrorString(error, &err_str); \
CUresult err_name_status = kvikio::cudaAPI::instance().GetErrorName(error, &err_name); \
CUresult err_str_status = kvikio::cudaAPI::instance().GetErrorString(error, &err_str); \
if (err_name_status == CUDA_ERROR_INVALID_VALUE) { err_name = "unknown"; } \
if (err_str_status == CUDA_ERROR_INVALID_VALUE) { err_str = "unknown"; } \
throw(_exception_type){std::string{"CUDA error at: "} + __FILE__ + ":" + \
Expand Down
Loading
Loading