Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Keys are always Cow::Owned with cowkeys feature #31

Closed
jszwec opened this issue Jan 17, 2025 · 8 comments · Fixed by #32
Closed

Object Keys are always Cow::Owned with cowkeys feature #31

jszwec opened this issue Jan 17, 2025 · 8 comments · Fixed by #32

Comments

@jszwec
Copy link
Contributor

jszwec commented Jan 17, 2025

Hello,

I noticed that object keys are always Cow::Owned with cowkeys feature on.
I tracked it down to serde being responsible for this as Cow has a specific Deserializer implementation that always uses Owned variant.

We could potentially implement our own wrapper around Cow, that would allow us to have either Owned or Borrowed values when possible.
That would bring further performance gains.

Is there anything that you are aware of that would stop us from doing it? If not, I could create a PR with proposed change.

@PSeitz
Copy link
Owner

PSeitz commented Jan 17, 2025

Do you have an example?
It depends if serde calls visit_borrowed_str, visit_str or visit_string if Cow::Borrowed can be used.

@jszwec
Copy link
Contributor Author

jszwec commented Jan 17, 2025

Apply this diff and run cargo test cowkeys

diff --git a/src/de.rs b/src/de.rs
index 3b17a5b..a7f42ee 100644
--- a/src/de.rs
+++ b/src/de.rs
@@ -145,6 +145,22 @@ mod tests {

     use crate::Value;

+    #[test]
+    fn cowkeys() {
+        let json_obj = r#"
+            {
+                "bool": true,
+                "escaped\"": true
+            }
+       "#;
+
+        let val: Value = serde_json::from_str(json_obj).unwrap();
+
+        let obj = val.as_object().unwrap();
+        assert!(matches!(obj.as_vec()[0].0, Cow::Borrowed(_)));
+        assert!(matches!(obj.as_vec()[1].0, Cow::Owned(_)));
+    }
+
     #[test]
     fn deserialize_json_test() {
         let json_obj = r#"

This is because of serde, it always uses Cow::Owned -> serde-rs/serde#1852 (comment)

@PSeitz
Copy link
Owner

PSeitz commented Jan 17, 2025

In this example you show that it's not always using Cow::Owned like you said, but only if the data needs to be escaped. That behaves like expected.

If you want to avoid it, you'd need to use simd_json as a deserializer.

@PSeitz PSeitz closed this as completed Jan 17, 2025
@jszwec
Copy link
Contributor Author

jszwec commented Jan 17, 2025

@PSeitz, What are you talking about?

"bool" key should be Cow::Borrowed, but it's Owned, and the test fails

did you even run that test?

@PSeitz PSeitz reopened this Jan 17, 2025
@PSeitz
Copy link
Owner

PSeitz commented Jan 17, 2025

Whoops I'm sorry, I made an error running the test. It's indeed an issue for object keys, values seem to work fine though.

What would you propose? There's already some complexity by having Cow and &str via the feature flag

@jszwec
Copy link
Contributor Author

jszwec commented Jan 17, 2025

yeah values are fine, the issue seems to be only with the keys like you said.

I put together a wrapper just to see if it's gonna work - something like this: CowStr<'a>(Cow<'a, str>) then I implemented a few traits including AsRef, Deref, and Deserialize and it seems to work without making any other changes.

For as long as it doesn't break backward-compatibility somewhere, I think it could look like this:

#[cfg(feature = "cowkeys"]
pub type KeyStrType<'a> = CowStr<'a>

Unfortunately, it means we would also need CowStr to be public.

If you think it can work, I can polish my code and put it in the PR for you to review

@PSeitz
Copy link
Owner

PSeitz commented Jan 17, 2025

Currently Cow is exposed in the API, so it would be a breaking change. Deref should get us pretty far though on API compatibility. I would be definitely interested in the fix. Impact on the benchmarks would also be quite interesting.

@jszwec
Copy link
Contributor Author

jszwec commented Jan 17, 2025

Agreed, the good thing is that Cow seems to be exposed in the API only when you call as_vec() and to_vec() on ObjectAsVec, all other methods including iterators are &str based.

I can implement a POC including benchmarks, and then we can think if breaking change is worth it on these two methods, or how else we can approach it

jszwec added a commit to jszwec/serde_json_borrow that referenced this issue Jan 19, 2025
serde by always creates Cow::Owned even if it's possible to Borrow:
serde-rs/serde#1852 (comment)

We create our own wrapper around Cow in order to allow Borrowed keys.

This significantly improves performance:
parse
simple_json
serde_json                               Avg: 283.92 MB/s (-0.84%)     Median: 284.16 MB/s (-0.26%)     [279.00 MB/s .. 289.83 MB/s]
serde_json + access by key               Avg: 276.34 MB/s (-1.24%)     Median: 276.46 MB/s (-1.11%)     [267.63 MB/s .. 284.46 MB/s]
serde_json_borrow                        Avg: 534.96 MB/s (+19.37%)    Median: 533.52 MB/s (+18.98%)    [517.16 MB/s .. 561.15 MB/s]
serde_json_borrow + access by key        Avg: 530.26 MB/s (+18.52%)    Median: 529.16 MB/s (+18.28%)    [519.93 MB/s .. 544.36 MB/s]
SIMD_json_borrow                         Avg: 227.52 MB/s (-0.40%)     Median: 229.19 MB/s (+0.40%)     [184.71 MB/s .. 237.90 MB/s]
hdfs
serde_json                               Avg: 573.25 MB/s (-1.02%)     Median: 573.38 MB/s (-0.78%)     [549.59 MB/s .. 587.74 MB/s]
serde_json + access by key               Avg: 610.33 MB/s (-0.88%)     Median: 609.03 MB/s (-1.00%)     [587.33 MB/s .. 631.20 MB/s]
serde_json_borrow                        Avg: 1.0051 GB/s (+19.36%)    Median: 1.0037 GB/s (+18.06%)    [978.71 MB/s .. 1.0418 GB/s]
serde_json_borrow + access by key        Avg: 1.0453 GB/s (+20.75%)    Median: 1.0432 GB/s (+20.01%)    [1.0123 GB/s .. 1.0802 GB/s]
SIMD_json_borrow                         Avg: 551.42 MB/s (+0.26%)     Median: 554.60 MB/s (+0.91%)     [478.56 MB/s .. 566.50 MB/s]
hdfs_with_array
serde_json                               Avg: 470.84 MB/s (-0.24%)    Median: 470.70 MB/s (-0.20%)    [460.56 MB/s .. 480.44 MB/s]
serde_json + access by key               Avg: 471.89 MB/s (+0.09%)    Median: 471.92 MB/s (-0.03%)    [459.45 MB/s .. 484.84 MB/s]
serde_json_borrow                        Avg: 813.40 MB/s (+7.66%)    Median: 816.02 MB/s (+7.78%)    [798.52 MB/s .. 834.84 MB/s]
serde_json_borrow + access by key        Avg: 821.93 MB/s (+7.64%)    Median: 822.22 MB/s (+7.67%)    [786.16 MB/s .. 838.18 MB/s]
SIMD_json_borrow                         Avg: 458.26 MB/s (-0.15%)    Median: 457.11 MB/s (-0.39%)    [445.08 MB/s .. 473.40 MB/s]
wiki
serde_json                               Avg: 1.2361 GB/s (-0.93%)    Median: 1.2367 GB/s (-1.24%)    [1.2083 GB/s .. 1.2777 GB/s]
serde_json + access by key               Avg: 1.2717 GB/s (-0.63%)    Median: 1.2715 GB/s (-1.91%)    [1.1773 GB/s .. 1.3294 GB/s]
serde_json_borrow                        Avg: 1.4936 GB/s (+3.09%)    Median: 1.4950 GB/s (+3.02%)    [1.4339 GB/s .. 1.5327 GB/s]
serde_json_borrow + access by key        Avg: 1.5255 GB/s (+3.51%)    Median: 1.5266 GB/s (+3.13%)    [1.4733 GB/s .. 1.5849 GB/s]
SIMD_json_borrow                         Avg: 1.2579 GB/s (-1.86%)    Median: 1.2729 GB/s (-1.49%)    [990.58 MB/s .. 1.2995 GB/s]
gh-archive
serde_json                               Avg: 538.63 MB/s (-0.32%)     Median: 539.07 MB/s (-1.29%)     [525.76 MB/s .. 549.05 MB/s]
serde_json + access by key               Avg: 540.55 MB/s (-0.76%)     Median: 541.39 MB/s (-0.84%)     [521.75 MB/s .. 548.68 MB/s]
serde_json_borrow                        Avg: 1.0849 GB/s (+23.13%)    Median: 1.0873 GB/s (+22.41%)    [1.0266 GB/s .. 1.1046 GB/s]
serde_json_borrow + access by key        Avg: 1.0663 GB/s (+20.30%)    Median: 1.0825 GB/s (+22.08%)    [759.26 MB/s .. 1.1250 GB/s]
SIMD_json_borrow                         Avg: 1.0522 GB/s (-0.77%)     Median: 1.0539 GB/s (-0.71%)     [1.0142 GB/s .. 1.0696 GB/s]

Fixes PSeitz#31.
@PSeitz PSeitz closed this as completed in 9f42358 Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants