Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: json_normalize cannot parse metadata fields list type #37782

Closed
sann05 opened this issue Nov 12, 2020 · 4 comments · Fixed by #53099
Closed

BUG: json_normalize cannot parse metadata fields list type #37782

sann05 opened this issue Nov 12, 2020 · 4 comments · Fixed by #53099
Labels
Bug IO JSON read_json, to_json, json_normalize Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.).

Comments

@sann05
Copy link

sann05 commented Nov 12, 2020

Code Sample, a copy-pastable example

test_data = [
    {"values": [1, 2, 3], "metadata": {"listdata": [1, 2]}}]

df = json_normalize(test_data,
                    record_path=["values"],
                    meta=[["metadata", "listdata"]])
print(df)

Problem description

It throws error ValueError: Length of values (6) does not match length of index (3)

Changing listdata field from [1,2] to {1,2} helped but I can't control JSON I receive.

Expected Output

0 metadata.listdata
1 [1, 2]
2 [1, 2]
3 [1, 2]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.6.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-52-generic
Version : #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1

@sann05 sann05 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 12, 2020
@jbrockmendel jbrockmendel added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021
@mroeschke mroeschke added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Aug 14, 2021
@simonjayhawkins
Copy link
Member

Thanks @sann05 for the report.

a possible fix ...

diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py
index e77d60d2d4..d04d321388 100644
--- a/pandas/io/json/_normalize.py
+++ b/pandas/io/json/_normalize.py
@@ -531,7 +531,16 @@ def _json_normalize(
             raise ValueError(
                 f"Conflicting metadata name {k}, need distinguishing prefix "
             )
-        result[k] = np.array(v, dtype=object).repeat(lengths)
+
+        values = np.array(v, dtype=object)
+
+        if values.ndim > 1:
+            # GH#37782
+            values = np.empty((len(v),), dtype=object)
+            for i, v in enumerate(v):
+                values[i] = v
+
+        result[k] = values.repeat(lengths)
     return result
 

there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.

PRs to fix welcome.

@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 3, 2022
@sanjay9977
Copy link

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@Julian-J-S
Copy link

I have the same problem as described here.
What happened to #47708 ?
Is there still something happening? :D

In addition to the problem that lists with zero or more than one value return a ValueError I think it is important to mention that lists with 1 element must not be converted to a scalars in any case, the list must remain!
When I get a list from an API, I don't know how many elements are in it. If then lists with 1 element are converted to a scalar this would be very inconsistent and questionable :D

@felipemaion
Copy link
Contributor

felipemaion commented Apr 29, 2023

Thanks @sann05 for the report.

a possible fix ...

diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py
index e77d60d2d4..d04d321388 100644
--- a/pandas/io/json/_normalize.py
+++ b/pandas/io/json/_normalize.py
@@ -531,7 +531,16 @@ def _json_normalize(
             raise ValueError(
                 f"Conflicting metadata name {k}, need distinguishing prefix "
             )
-        result[k] = np.array(v, dtype=object).repeat(lengths)
+
+        values = np.array(v, dtype=object)
+
+        if values.ndim > 1:
+            # GH#37782
+            values = np.empty((len(v),), dtype=object)
+            for i, v in enumerate(v):
+                values[i] = v
+
+        result[k] = values.repeat(lengths)
     return result
 

there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.

PRs to fix welcome.

That's a fix for me.
Any plan to push this for the oficial repo?
It would really be helpful to deploy without having to hack the code manually, just setting the newest version.

@felipemaion felipemaion mentioned this issue May 5, 2023
4 tasks
mroeschke added a commit that referenced this issue May 8, 2023
* Fix BUG: #37782

* Fix BUG: #37782

* Fix BUG: 37782 - Hardcoded test data

* Fix BUG: 37782 - Hardcoded test data

* Update pandas/io/json/_normalize.py

Co-authored-by: Matthew Roeschke <[email protected]>

* Fix BUG: 37782 - typo

* Fix BUG: 37782 - typo

* Update doc/source/whatsnew/v2.1.0.rst

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Rylie-W pushed a commit to Rylie-W/pandas that referenced this issue May 19, 2023
* Fix BUG: pandas-dev#37782

* Fix BUG: pandas-dev#37782

* Fix BUG: 37782 - Hardcoded test data

* Fix BUG: 37782 - Hardcoded test data

* Update pandas/io/json/_normalize.py

Co-authored-by: Matthew Roeschke <[email protected]>

* Fix BUG: 37782 - typo

* Fix BUG: 37782 - typo

* Update doc/source/whatsnew/v2.1.0.rst

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Daquisu pushed a commit to Daquisu/pandas that referenced this issue Jul 8, 2023
* Fix BUG: pandas-dev#37782

* Fix BUG: pandas-dev#37782

* Fix BUG: 37782 - Hardcoded test data

* Fix BUG: 37782 - Hardcoded test data

* Update pandas/io/json/_normalize.py

Co-authored-by: Matthew Roeschke <[email protected]>

* Fix BUG: 37782 - typo

* Fix BUG: 37782 - typo

* Update doc/source/whatsnew/v2.1.0.rst

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.).
Projects
None yet
7 participants