Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linter fails in channel_dataframe() #360

Closed
epruesse opened this issue Nov 30, 2018 · 15 comments
Closed

Linter fails in channel_dataframe() #360

epruesse opened this issue Nov 30, 2018 · 15 comments

Comments

@epruesse
Copy link
Member

epruesse commented Nov 30, 2018

E.g. in bioconda/bioconda-recipes#12441, https://circleci.com/gh/bioconda/bioconda-recipes/35848?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

#!/bin/bash -eo pipefail
bioconda-utils lint recipes config.yml \
--loglevel debug --full-report \
--git-range master HEAD

Traceback (most recent call last):
  File "/home/circleci/project/miniconda/bin/bioconda-utils", line 11, in <module>
    sys.exit(main())
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/bioconda_utils/cli.py", line 657, in main
    bioconductor_skeleton, pypi_check, clean_cran_skeleton,
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/argh/dispatching.py", line 328, in dispatch_commands
    dispatch(parser, *args, **kwargs)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/bioconda_utils/cli.py", line 228, in lint
    df = linting.channel_dataframe(cache=cache)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/bioconda_utils/linting.py", line 123, in channel_dataframe
    x = pd.DataFrame(repo)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/pandas/core/frame.py", line 330, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/pandas/core/frame.py", line 461, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/pandas/core/frame.py", line 6163, in _arrays_to_mgr
    index = extract_index(arrays)
  File "/home/circleci/project/miniconda/lib/python3.6/site-packages/pandas/core/frame.py", line 6214, in extract_index
    raise ValueError('Mixing dicts with non-Series may lead to '
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
Exited with code 1
@epruesse epruesse changed the title sporadic, bogus lint failure Linter fails in channel_dataframe() Nov 30, 2018
@epruesse
Copy link
Member Author

ping @bioconda/core Help appreciated. I think this is now permanent.

@PertuyF
Copy link

PertuyF commented Nov 30, 2018

Looks like it comes from conda-forge

I tried to debug reproducing the part of bioconda-utils.linting.channel_dataframe() that raises the error:

from bioconda_utils import (utils,
                            linting)
import pandas as pd

platforms = ['linux', 'osx']
channels = ['bioconda', 'conda-forge', 'defaults']

for platform in platforms:
    for channel in channels:
        repo, noarch = utils.get_channel_repodata(channel, platform)
        try :
            df = pd.DataFrame(repo)
        except ValueError as e:
            print(f'Encountered error: {e} for channel: {channel}/{platform}')
            break

This returns Encountered error: Mixing dicts with non-Series may lead to ambiguous ordering. for channel: conda-forge/linux.

EDIT: Looks like osx is raising the error too now

Then the faulty json object can be dissected (see below)

Get a glimpse at the faulty JSON object

Fetch it with:

from bioconda_utils import utils
repo, noarch = utils.get_channel_repodata("conda-forge", "linux")

repo is the one, and it' huge, leads to a 594592 columns dataframe with the approach below.

Lead on a solution?

For the record, I tried to use the idea mentioned here to parse the json object and load it using pd.io.json.json_normalize(), which seems to work.

from ruamel.yaml import YAML

yaml = YAML(typ="safe")
data = yaml.load(str(repo))

pd.io.json.json_normalize(data)

@epruesse
Copy link
Member Author

epruesse commented Dec 1, 2018

In case it "works again" soon, here's a pickle of the repo object:
broken_cf_linux.pkl.gz

@rob-p
Copy link

rob-p commented Dec 1, 2018

So, is this problem fundamentally upstream? Is there anything individual packages can do to address it?

@bgruening
Copy link
Member

I need to figure out if this is a upstream problem or if we need to adjust or build infrastructure.

@epruesse
Copy link
Member Author

epruesse commented Dec 2, 2018

I think we should be resilient. Working on it ...

@k3yavi
Copy link
Member

k3yavi commented Dec 2, 2018

I was wondering if there is any update on this? Seems like a bioconda wide outage for PR, is there a way to promote this issue to a higher priority ?

@pcm32
Copy link
Member

pcm32 commented Dec 3, 2018

The above PR for bioconda/bioconda-recipes#12347 was merged as tests passed, but then the uploads failed due to the issue mentioned on this PR.

@dpryan79
Copy link
Contributor

dpryan79 commented Dec 3, 2018

You should be able to close this now.

@epruesse
Copy link
Member Author

epruesse commented Dec 3, 2018

Thanks @dpryan79

@epruesse epruesse closed this as completed Dec 3, 2018
@k3yavi
Copy link
Member

k3yavi commented Dec 3, 2018

Thanks guys for solving this !

@mshakya
Copy link

mshakya commented Jan 22, 2019

I am still getting same error when testing packages using circleci build locally. And, i might have missed it but I don't see where or how was it resolved. Thanks

@dpryan79
Copy link
Contributor

@mshakya Try updating bioconda-utils.

@epruesse
Copy link
Member Author

@mshakya The entire piece of code was rewritten. Your circleci build might be using an old image perhaps?

@mshakya
Copy link

mshakya commented Jan 22, 2019

@epruesse that did the trick. I had the old image. I pulled the new image docker pull bioconda/bioconda-utils-build-env. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants