Skip to content

Commit

Permalink
Add a flag for allowing single quotes in JSON strings. (#8144)
Browse files Browse the repository at this point in the history
Add a flag that allows `get_json_object()` to accept JSON with strings using single quotes.   Also adds an explicit `get_json_object_options` struct for allowing the user to customize what behaviors they want.   

Note:  stripping quotes from individually returned string values has been left _on_ as default.

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8144
  • Loading branch information
nvdbaranec authored May 19, 2021
1 parent 1f9f061 commit 32c1bac
Show file tree
Hide file tree
Showing 4 changed files with 327 additions and 74 deletions.
1 change: 1 addition & 0 deletions cpp/include/cudf/strings/detail/json.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ namespace detail {
std::unique_ptr<cudf::column> get_json_object(
cudf::strings_column_view const& col,
cudf::string_scalar const& json_path,
get_json_object_options options,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

Expand Down
76 changes: 75 additions & 1 deletion cpp/include/cudf/strings/json.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,6 +17,8 @@

#include <cudf/strings/strings_column_view.hpp>

#include <thrust/optional.h>

namespace cudf {
namespace strings {

Expand All @@ -26,6 +28,76 @@ namespace strings {
* @file
*/

/**
* @brief Settings for `get_json_object()`.
*/
class get_json_object_options {
// allow single quotes to represent strings in JSON
bool allow_single_quotes = false;

// individual string values are returned with quotes stripped.
bool strip_quotes_from_single_strings = true;

public:
/**
* @brief Default constructor.
*/
explicit get_json_object_options() = default;

/**
* @brief Returns true/false depending on whether single-quotes for representing strings
* are allowed.
*/
CUDA_HOST_DEVICE_CALLABLE bool get_allow_single_quotes() const { return allow_single_quotes; }

/**
* @brief Returns true/false depending on whether individually returned string values have
* their quotes stripped.
*
* When set to true, if the return value for a given row is an individual string
* (not an object, or an array of strings), strip the quotes from the string and return only the
* contents of the string itself. Example:
*
* @code{.pseudo}
*
* With strip_quotes_from_single_strings OFF:
* Input = {"a" : "b"}
* Query = $.a
* Output = "b"
*
* With strip_quotes_from_single_strings ON:
* Input = {"a" : "b"}
* Query = $.a
* Output = b
*
* @endcode
*/
CUDA_HOST_DEVICE_CALLABLE bool get_strip_quotes_from_single_strings() const
{
return strip_quotes_from_single_strings;
}

/**
* @brief Set whether single-quotes for strings are allowed.
*
* @param _allow_single_quotes bool indicating desired behavior.
*/
void set_allow_single_quotes(bool _allow_single_quotes)
{
allow_single_quotes = _allow_single_quotes;
}

/**
* @brief Set whether individually returned string values have their quotes stripped.
*
* @param _strip_quotes_from_single_strings bool indicating desired behavior.
*/
void set_strip_quotes_from_single_strings(bool _strip_quotes_from_single_strings)
{
strip_quotes_from_single_strings = _strip_quotes_from_single_strings;
}
};

/**
* @brief Apply a JSONPath string to all rows in an input strings column.
*
Expand All @@ -37,12 +109,14 @@ namespace strings {
*
* @param col The input strings column. Each row must contain a valid json string
* @param json_path The JSONPath string to be applied to each row
* @param options Options for controlling the behavior of the function
* @param mr Resource for allocating device memory.
* @return New strings column containing the retrieved json object strings
*/
std::unique_ptr<cudf::column> get_json_object(
cudf::strings_column_view const& col,
cudf::string_scalar const& json_path,
get_json_object_options options = get_json_object_options{},
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of doxygen group
Expand Down
Loading

0 comments on commit 32c1bac

Please sign in to comment.