BUG: json_normalize cannot parse metadata fields list type #37782

sann05 · 2020-11-12T08:05:31Z

Code Sample, a copy-pastable example

test_data = [
    {"values": [1, 2, 3], "metadata": {"listdata": [1, 2]}}]

df = json_normalize(test_data,
                    record_path=["values"],
                    meta=[["metadata", "listdata"]])
print(df)

Problem description

It throws error ValueError: Length of values (6) does not match length of index (3)

Changing listdata field from [1,2] to {1,2} helped but I can't control JSON I receive.

Expected Output

0	metadata.listdata
1	[1, 2]
2	[1, 2]
3	[1, 2]

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 67a3d42
python : 3.6.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-52-generic
Version : #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2022-06-03T20:31:38Z

Thanks @sann05 for the report.

a possible fix ...

diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py
index e77d60d2d4..d04d321388 100644
--- a/pandas/io/json/_normalize.py
+++ b/pandas/io/json/_normalize.py
@@ -531,7 +531,16 @@ def _json_normalize(
             raise ValueError(
                 f"Conflicting metadata name {k}, need distinguishing prefix "
             )
-        result[k] = np.array(v, dtype=object).repeat(lengths)
+
+        values = np.array(v, dtype=object)
+
+        if values.ndim > 1:
+            # GH#37782
+            values = np.empty((len(v),), dtype=object)
+            for i, v in enumerate(v):
+                values[i] = v
+
+        result[k] = values.repeat(lengths)
     return result

there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.

PRs to fix welcome.

sanjay9977 · 2022-06-29T12:40:16Z

I have similar problem .. https://stackoverflow.com/questions/72801399/2d-arrays-repeat-throwing-valueerror-operands-could-not-be-broadcast-together

Any plan to fix this issue?

Julian-J-S · 2022-10-28T20:07:31Z

I have the same problem as described here.
What happened to #47708 ?
Is there still something happening? :D

In addition to the problem that lists with zero or more than one value return a ValueError I think it is important to mention that lists with 1 element must not be converted to a scalars in any case, the list must remain!
When I get a list from an API, I don't know how many elements are in it. If then lists with 1 element are converted to a scalar this would be very inconsistent and questionable :D

felipemaion · 2023-04-29T05:46:00Z

Thanks @sann05 for the report.

a possible fix ...

diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py
index e77d60d2d4..d04d321388 100644
--- a/pandas/io/json/_normalize.py
+++ b/pandas/io/json/_normalize.py
@@ -531,7 +531,16 @@ def _json_normalize(
             raise ValueError(
                 f"Conflicting metadata name {k}, need distinguishing prefix "
             )
-        result[k] = np.array(v, dtype=object).repeat(lengths)
+
+        values = np.array(v, dtype=object)
+
+        if values.ndim > 1:
+            # GH#37782
+            values = np.empty((len(v),), dtype=object)
+            for i, v in enumerate(v):
+                values[i] = v
+
+        result[k] = values.repeat(lengths)
     return result

there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.

PRs to fix welcome.

That's a fix for me.
Any plan to push this for the oficial repo?
It would really be helpful to deploy without having to hack the code manually, just setting the newest version.

* Fix BUG: #37782 * Fix BUG: #37782 * Fix BUG: 37782 - Hardcoded test data * Fix BUG: 37782 - Hardcoded test data * Update pandas/io/json/_normalize.py Co-authored-by: Matthew Roeschke <[email protected]> * Fix BUG: 37782 - typo * Fix BUG: 37782 - typo * Update doc/source/whatsnew/v2.1.0.rst --------- Co-authored-by: Matthew Roeschke <[email protected]>

* Fix BUG: pandas-dev#37782 * Fix BUG: pandas-dev#37782 * Fix BUG: 37782 - Hardcoded test data * Fix BUG: 37782 - Hardcoded test data * Update pandas/io/json/_normalize.py Co-authored-by: Matthew Roeschke <[email protected]> * Fix BUG: 37782 - typo * Fix BUG: 37782 - typo * Update doc/source/whatsnew/v2.1.0.rst --------- Co-authored-by: Matthew Roeschke <[email protected]>

sann05 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 12, 2020

sann05 mentioned this issue Jan 13, 2021

BUG: json_normalize generates TypeError: 'NoneType' object is not subscriptable as metadata object is not always present. #37783

Closed

jbrockmendel added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021

mroeschke added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Aug 14, 2021

simonjayhawkins mentioned this issue Jun 3, 2022

BUG: json_normalize fails with empty arrays/lists #47182

Closed

3 tasks

simonjayhawkins added this to the Contributions Welcome milestone Jun 3, 2022

GYHHAHA mentioned this issue Jul 13, 2022

BUG: json_normalize raises boardcasting error with list-like metadata #47708

Closed

6 tasks

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

felipemaion mentioned this issue May 5, 2023

Fix bug #37782 #53099

Merged

4 tasks

mroeschke closed this as completed in #53099 May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: json_normalize cannot parse metadata fields list type #37782

BUG: json_normalize cannot parse metadata fields list type #37782

sann05 commented Nov 12, 2020 •

edited

Loading

INSTALLED VERSIONS

simonjayhawkins commented Jun 3, 2022

sanjay9977 commented Jun 29, 2022

Julian-J-S commented Oct 28, 2022

felipemaion commented Apr 29, 2023 •

edited

Loading

BUG: json_normalize cannot parse metadata fields list type #37782

BUG: json_normalize cannot parse metadata fields list type #37782

Comments

sann05 commented Nov 12, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

simonjayhawkins commented Jun 3, 2022

sanjay9977 commented Jun 29, 2022

Julian-J-S commented Oct 28, 2022

felipemaion commented Apr 29, 2023 • edited Loading

sann05 commented Nov 12, 2020 •

edited

Loading

Output of `pd.show_versions()`

felipemaion commented Apr 29, 2023 •

edited

Loading