BUG: json_normalize raises boardcasting error with list-like metadata #47708

GYHHAHA · 2022-07-13T20:09:29Z

closes BUG: json_normalize cannot parse metadata fields list type #37782
closes BUG: json_normalize fails with empty arrays/lists #47182
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/io/json/_normalize.py

fix list like Co-authored-by: Matthew Roeschke <[email protected]>

pep8speaks · 2022-07-30T17:19:19Z

Hello @GYHHAHA! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-08-18 02:41:26 UTC

WillAyd · 2022-08-05T06:25:57Z

pandas/io/json/_normalize.py

@@ -531,7 +532,14 @@ def _recursive_extract(data, path, seen_meta, level=0):
            raise ValueError(
                f"Conflicting metadata name {k}, need distinguishing prefix "
            )
-        result[k] = np.array(v, dtype=object).repeat(lengths)
+        if v and is_list_like(v[0]):


As an alternative can we not just do something like:

arr = np.empty(1, dtype=object) arr[0] = v arr = arr.repeat(lengths)

Less than ideal but I think still gets us to the same place? The branching now is a little tough to follow

@mroeschke suggested not to construct the np array for nested data, what I wrote before is

result[k] = np.array(v, dtype=object).repeat(lengths, axis=0).tolist()

I think @WillAyd's suggestion is good if it works. My main issue before was one path calling tolist() and the other returning an np.ndarray

Late response. Unfortunately this won't work. When v is a list, arr will be [list(...)], which raises boardcasting error. Classified discussions may still be necessary. (Now both paths return list for consistency and auto-type-infering.) @mroeschke

GYHHAHA · 2022-08-06T21:44:03Z

pandas/io/json/_normalize.py

+                for _ in range(repeat):
+                    out.append(item)
+        else:
+            out = np.array(v, dtype=object).repeat(lengths).tolist()


Here dtype=object is necessary since np.array(["a", np.nan]) will result in the missing value is converted to string format "nan".

Is the tolist call needed as well?

Not necessary, but auto-type-infering won't be triggered if drop this. I think type infer is useful here. If we still want keep the object type, I can change. Suggestion?

I was hoping this line could avoid dtype=object and tolist such that this line doesn't trigger type inference and would be more performant. Otherwise since both if/else blocks convert to list are essentially similar.

How about the following solution? We only point out the object dtype when this is a string np.array and do the tolist() when nested array detected. For the numeric dtype, we can still keep performant and handle both two corner cases.

if v and isinstance(v[0], str): out = np.array(v, dtype=object) else: out = np.array(v) out = out.repeat(lengths, axis=0) if v and is_list_like(v[0]): out = out.tolist()

Actually looking back at the original line, it appears dtype=object was originally here and allowing dtype inference all the time might actually lead to more desired behavior (more specific types).

Would it make sense just to combine both branches then and always return a list (without using np.array)? Would be good to also to explore if there's any performance hit

github-actions · 2022-09-22T00:07:59Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-10-24T21:22:44Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

GYHHAHA added 4 commits July 13, 2022 14:59

fix repeat axis

039a659

add tests

1645550

Update v1.5.0.rst

6db3e6c

change tests dtype descriptions

49f2950

mroeschke reviewed Jul 13, 2022

View reviewed changes

pandas/io/json/_normalize.py Outdated Show resolved Hide resolved

mroeschke added this to the 1.5 milestone Jul 13, 2022

mroeschke added the IO JSON read_json, to_json, json_normalize label Jul 13, 2022

Merge branch 'main' into patch-1

b73d0f1

GYHHAHA requested a review from mroeschke July 25, 2022 04:05

drop np array

8eb1421

mroeschke reviewed Jul 27, 2022

View reviewed changes

pandas/io/json/_normalize.py Outdated Show resolved Hide resolved

GYHHAHA added 4 commits July 28, 2022 12:47

Update _normalize.py

de21683

Update test_normalize.py

5a3d8bd

Merge branch 'pandas-dev:main' into patch-1

a4ac5ec

fix dtype

cc00c3e

GYHHAHA commented Jul 29, 2022

View reviewed changes

pandas/io/json/_normalize.py Outdated Show resolved Hide resolved

mroeschke reviewed Jul 29, 2022

View reviewed changes

pandas/io/json/_normalize.py Outdated Show resolved Hide resolved

GYHHAHA and others added 2 commits July 30, 2022 12:13

Update pandas/io/json/_normalize.py

7efe709

fix list like Co-authored-by: Matthew Roeschke <[email protected]>

type ignore

9ad3db4

GYHHAHA added 5 commits July 30, 2022 12:20

fix format

23c090e

remove object type

b3d500f

remove astype

2779bc2

fix format

6252821

import is_list_like

34a3ace

WillAyd reviewed Aug 5, 2022

View reviewed changes

GYHHAHA added 2 commits August 6, 2022 16:30

Update _normalize.py

f2fea4e

Update test_normalize.py

06d9f34

GYHHAHA commented Aug 6, 2022

View reviewed changes

fix format

35a8494

Update _normalize.py

8135fea

GYHHAHA requested a review from mroeschke August 6, 2022 23:49

Merge branch 'main' into patch-1

fccfbba

mroeschke mentioned this pull request Aug 22, 2022

BUG: fixed json_normalize not working with list records #48194

Closed

5 tasks

mroeschke removed this from the 1.5 milestone Aug 22, 2022

github-actions bot added the Stale label Sep 22, 2022

mroeschke closed this Oct 24, 2022

Julian-J-S mentioned this pull request Oct 28, 2022

BUG: json_normalize cannot parse metadata fields list type #37782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: json_normalize raises boardcasting error with list-like metadata #47708

BUG: json_normalize raises boardcasting error with list-like metadata #47708

GYHHAHA commented Jul 13, 2022 •

edited

Loading

pep8speaks commented Jul 30, 2022 •

edited

Loading

WillAyd Aug 5, 2022

GYHHAHA Aug 6, 2022 •

edited

Loading

mroeschke Aug 8, 2022

GYHHAHA Aug 18, 2022

GYHHAHA Aug 6, 2022

mroeschke Aug 18, 2022

GYHHAHA Aug 18, 2022

mroeschke Aug 18, 2022

GYHHAHA Aug 18, 2022 •

edited

Loading

mroeschke Aug 18, 2022

github-actions bot commented Sep 22, 2022

mroeschke commented Oct 24, 2022

BUG: json_normalize raises boardcasting error with list-like metadata #47708

BUG: json_normalize raises boardcasting error with list-like metadata #47708

Conversation

GYHHAHA commented Jul 13, 2022 • edited Loading

pep8speaks commented Jul 30, 2022 • edited Loading

Comment last updated at 2022-08-18 02:41:26 UTC

Choose a reason for hiding this comment

GYHHAHA Aug 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GYHHAHA Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 22, 2022

mroeschke commented Oct 24, 2022

GYHHAHA commented Jul 13, 2022 •

edited

Loading

pep8speaks commented Jul 30, 2022 •

edited

Loading

GYHHAHA Aug 6, 2022 •

edited

Loading

GYHHAHA Aug 18, 2022 •

edited

Loading