Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qiime Songbird Multinomial - 'Data Frame' Error #166

Closed
nagr4657 opened this issue Mar 22, 2023 · 14 comments
Closed

Qiime Songbird Multinomial - 'Data Frame' Error #166

nagr4657 opened this issue Mar 22, 2023 · 14 comments

Comments

@nagr4657
Copy link

Hi all,

I am running 'qiime songbird multinomial' (using ubuntu 22.04.2 LTS with Qiime2-2020.6) to eventually perform differential abundance testing. I have run into an issue where my input .qza data files sometimes work and sometimes do not. I am working on multiple projects with similar workflows, and as such have similarly generated feature tables. I installed and started by successfully running the function on one of my datasets (FMT): #1

image

I then tried running the same function using a different .qza feature table (in a different directory - acetate) and received this data frame error: #2

image

After some troubleshooting, I figured this issue had to be due to my input .qza file so as a test I re-ran my very first data set again to see if my original feature table (FMT) would still work. As you can see from the code below, I used the exact same files as in image #1
but this time instead of generating my outputs I received another error message: #3

image

Any ideas what might be causing this 'DataFrame' error? I tried using 'qiime dev refresh-cache' and that didn't seem to alleviate the problem.

Thank you!

@fedarko
Copy link
Collaborator

fedarko commented Mar 23, 2023

Hi @nagr4657,

This is interesting—my guess is that, at some point between the first and second images, something about your QIIME 2 conda environment changed. This error is a symptom of an incompatible (i.e. "too new") pandas version being used, which in turn suggests that your conda environment may be changing somehow in between these two images. Anecdotally, I've seen that sometimes conda can mess up and say that it's still in one conda environment (and then still behave, somewhat, like it's in a different environment).

If you could please provide the following data, that would help a lot with debugging this:

  • One of the /tmp/[...].log files that QIIME 2 gives you from these errors (ideally the one from running Songbird on the FMT dataset, but either should be ok)
  • The output of running which pip in your terminal
  • The output of running pip list | grep pandas
  • The output of running conda list | grep pandas

Also, if you are able to reproduce the first "success" (where you are able to run songbird on the FMT dataset), could you try running songbird on the second (acetate) dataset without leaving the current directory? For example, something like

qiime songbird multinomial \
    --i-table ~/acetate2023/152656_filtered_table_acetateBL1800.qza \
    [other parameters go here...]

If this succeeds, then it proves that something is going wrong when you change directories into ~/acetate2023. If it fails, then something even stranger is going on.

Thanks!

@mortonjt
Copy link
Collaborator

I'll also add that you may want to look into your input file more carefully (i.e. qiime tools peek FMT_BL_only_table.qza) to make sure that they are biom tables (not csvs or tab delimited tables).

But I agree with @fedarko it looks like something changed in your environment.

@nagr4657
Copy link
Author

Hi @fedarko,

Thank you very much for your quick response! Here are the data that you requested:

-Attempting to run sognbird again in the FMT2023 directory:
image

-Outputs of the requests within the Acetate2023 directory:
image

-Attempting to run songbird on the acetate feature table again but this time in the FMT2023 directory
image

Thank you very much,
Nathan

@mortonjt
Copy link
Collaborator

mortonjt commented Mar 23, 2023

Hi @nagr4657 thank you for the update -- but don't think this is a pandas issue. My best guess is that your FMT_BL_only_table.qza is misformatted (songbird thinks that your input counts are in the biom-format when they aren't).
Running the qiime tools peek FMT_BL_only_table.qza command will verify this.

@nagr4657
Copy link
Author

Hi @mortonjt,

Thank you for the suggestion, I am curious why it worked originally as I used the exact same '--i-table'?

Here is the output from running tools peek:
image

Any thoughts on what is causing this issue? I am using Qiita to generate the input file:
image

I tried using both the .qza and the .biom outputs from qiita (screenshot above).

When I use the .qza file for songbird the output was:
image

When I use the .biom file for songbird the output is:
image
image

I apologize if I am missing something rather basic that is contributing to this issue.

Thank you,
Nathan

@mortonjt
Copy link
Collaborator

Hi @nagr4657 got it. I'm looking at your pandas version again and realized that it may be out-of-sync with the original songbird version. See this issue for another example of this : #128

Can you try downgrading to pandas=0.25 to see if you still get this issue?

@nagr4657
Copy link
Author

Hi @mortonjt ,

Sorry for the delay - it took a really long to downgrade to pandas=0.25. I think the process was successful, but given how long it took and how much code was run I am not quite sure:
image

However, when I tried to run songbird I am still getting a 'DataFrame" Error.
image

Any thoughts on what else I could try? Thanks!

@mortonjt
Copy link
Collaborator

mortonjt commented Mar 23, 2023 via email

@nagr4657
Copy link
Author

Hi @mortonjt

I am sorry for having to continue this discussion. I am not sure what I am doing wrong. Here is what I did:

  1. uninstalled songbird:
    image

  2. I reinstalled songbird in qiime using pip:
    image

  3. I tried rerunning songbird multinomial using the same feature table and metadata file:
    image

Should I try uninstalling a different way?

Best,
Nathan

@fedarko
Copy link
Collaborator

fedarko commented Mar 24, 2023

Hi @nagr4657,

Thank you for your assistance in debugging this. I think I see the problem we are currently stuck on: if you have an incompatible pandas version (you have version 1.1.5 installed), then re-installing Songbird won't change the pandas version -- this is because Songbird's setup files say that it only requires a pandas version above a certain limit (#117), even though this is false. Sorry; this is a known problem with Songbird.

Fortunately, this means that fixing the problem should be simple. I'm not sure that the earlier attempt to downgrade the pandas version worked, since I still see a pandas version of 1.1.5 listed near the top of this screenshot. To downgrade pandas in this situation, I recommend running these two commands:

pip uninstall pandas
pip install "pandas==0.25.3"

After this, you can run pip list | grep pandas to check what version of pandas is installed, like before. If the pandas re-installation process worked, you should see something like pandas 0.25.3. At this point, you should be able to use Songbird again.

If the above steps don't work for some reason, then the nuclear approach is just re-installing QIIME 2 2020.6 (creating a new conda environment). You can do this by following the same installation instructions that I assume you used before, but now just using a different conda environment name (e.g. conda env create -n qiime2-2020.6-v2 [...] instead of conda env create -n qiime2-2020.6 [...]). I'm pretty sure this should fix the problem, but hopefully the less-nuclear approach detailed above should resolve this problem without requiring you to re-install QIIME 2 :)

Let us know how this goes!

@nagr4657
Copy link
Author

nagr4657 commented Apr 5, 2023

Hi @fedarko and @mortonjt

Sorry for delayed response time. I tried uninstalling and reinstalling the pandas version as you suggested. I was successful in doing so, but this did not solve the issue with songbird. I went with the more nuclear approach and in the process of uninstalling and re-installing qiime2020.6 I hit quite a few hiccups that I was not able to solve until today. This issue may or may not have been related to the issues I was having previously (https://forum.qiime2.org/t/installation-of-qiime2-successful-but-cant-activate-due-to-inability-in-finding-conda-enviroment/26017/9).

Regardless, I was able to re-install qiime2-2020.6 and was able to successfully run songbird on both of my data sets today.

Thank you both very VERY much for your help troubleshooting!

Best,
Nathan

@fedarko
Copy link
Collaborator

fedarko commented Apr 10, 2023

Thanks for letting us know, @nagr4657! Sorry for all the trouble; glad the issue is solved.

@fedarko fedarko closed this as completed Apr 10, 2023
@mestaki
Copy link

mestaki commented Apr 22, 2023

Hey folks, I know this is closed and the culprit was determined to compatibility with newest pandas, but just wanted to share what I think is causing users to run into this issue. Qiime 2 2020.6 comes with pandas 0.25.3, this meets songbird's required 0.18 needs so when songbird is installed it doesn't upgrade pandas. However, qurro, which is often installed alongside songbird, does have a "pandas >= 1" requirement which forces the environment to upgrade pandas, breaking songbird in the process. Keeping qurro separate is the easiest solution here, assuming the q2 plugins for these are not going to be upgraded for newest versions?

@fedarko
Copy link
Collaborator

fedarko commented Apr 22, 2023

Thanks @mestaki! That makes sense; I apologize for the trouble. A few months ago I updated Qurro to work with the pandas versions in newer QIIME 2 environments, and this update had the unfortunate effect of making Qurro not work with older QIIME 2 environments. (The silly thing is that, before this update, we had this same problem in reverse -- installing Qurro into new QIIME 2 environments would break those also ._.)

I think it might be possible to adjust Qurro's code to repeatedly detect which version of pandas is installed and do different things accordingly, but I don't have time to overhaul it that extensively now. The "ideal" solution would probably be updating Songbird to work with newer pandas versions, but as I understand it recent efforts have been focused more on BIRDMAn.

Keeping qurro separate is the easiest solution here, assuming the q2 plugins for these are not going to be upgraded for newest versions?

Given an old QIIME 2 environment (e.g. 2020.6) into which Songbird has been installed, I think an even easier way to use Qurro is to install a slightly old version of it (v0.7.1) that expects older pandas versions:

Using pip Using conda
pip install "qurro==0.7.1" conda install -c conda-forge "qurro=0.7.1"

This way, we avoid the need to create a new separate conda environment just for Qurro. (Although doing that would also work.)

This slightly-old version of Qurro, v0.7.1, is basically the same as the latest version (v0.8.0) -- the main difference between the two was the adjustment in v0.8.0 to work with newer pandas versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants