Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with context.py #85

Open
orlowskamga opened this issue Jan 3, 2023 · 9 comments
Open

problem with context.py #85

orlowskamga opened this issue Jan 3, 2023 · 9 comments

Comments

@orlowskamga
Copy link

Hi!

I have the following issue using cblaster (in fact not only me, same for my lab colleagues). I tested 1.3.16 (both from pip and conda) and 1.3.17 versions. As my first run, I use your example bua.fasta and this is the result:

`cblaster search -qf bua.fasta --session bua_session.json --binary bua_binary.txt --output bua_summary.txt --plot bua_plot.html

[11:41:14] INFO - Starting cblaster in remote mode
[11:41:14] INFO - Launching new search
[11:41:17] INFO - Request Identifier (RID): V81JCA6U01N
[11:41:17] INFO - Request Time Of Execution (RTOE): 15s
[11:41:32] INFO - Polling NCBI for completion status
[11:41:32] INFO - Checking search status...
[11:42:32] INFO - Checking search status...
[11:43:32] INFO - Checking search status...
[11:44:32] INFO - Checking search status...
[11:45:32] INFO - Checking search status...
[11:46:32] INFO - Checking search status...
[11:47:32] INFO - Checking search status...
[11:47:33] INFO - Search has completed successfully!
[11:47:33] INFO - Retrieving results for search V81JCA6U01N
[11:50:20] INFO - Parsing results...
[11:50:20] INFO - Found 26998 hits meeting score thresholds for remote search
[11:50:20] INFO - Fetching genomic context of hits
Traceback (most recent call last):
File "/home/users/morlowska/anaconda3/envs/cblaster/bin/cblaster", line 10, in
sys.exit(main())
^^^^^^
File "/home/users/morlowska/anaconda3/envs/cblaster/lib/python3.11/site-packages/cblaster/main.py", line 432, in main
cblaster(
File "/home/users/morlowska/anaconda3/envs/cblaster/lib/python3.11/site-packages/cblaster/main.py", line 334, in cblaster
organisms = context.search(
^^^^^^^^^^^^^^^
File "/home/users/morlowska/anaconda3/envs/cblaster/lib/python3.11/site-packages/cblaster/context.py", line 592, in search
organisms = parse_IPG_table(rows, hits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/users/morlowska/anaconda3/envs/cblaster/lib/python3.11/site-packages/cblaster/context.py", line 205, in parse_IPG_table
groups = parse_IP_groups(results)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/users/morlowska/anaconda3/envs/cblaster/lib/python3.11/site-packages/cblaster/context.py", line 142, in parse_IP_groups
entry = Entry(*fields)
^^^^^^^^^^^^^^
TypeError: Entry.new() missing 10 required positional arguments: 'source', 'scaffold', 'start', 'end', 'strand', 'protein_id', 'protein_name', 'organism', 'strain', and 'assembly'`

I would be grateful for any suggestions!

@orlowskamga
Copy link
Author

We just make it work with the following changes in the code of context.py (if fields == [] or not line.strip(): \ continue) (line 108-150):

def parse_IP_groups(results):
    """Parse groups from an Identical Protein Groups (IPG) table.

    This function converts rows in the IPG table to namedtuple objects which have
    attributes corresponding to each field in the table. These objects are grouped
    by the IPG they belong to.

    Args:
        results (list): Rows in the IPG table.
    Returns:
        Dictionary of table entries (namedtuple objects) grouped by IPG.
    """
    fields = [
        "source",
        "scaffold",
        "start",
        "end",
        "strand",
        "protein_id",
        "protein_name",
        "organism",
        "strain",
        "assembly",
    ]
    Entry = namedtuple("Entry", fields)
    groups = defaultdict(list)
    for line in results:
        if not line \
            or line.startswith("Id\tSource") \
            or line.isspace() \
            or "skipping" in line:
            continue
        print(line)
        ipg, *fields = line.strip("\n").split("\t")
        if fields == [] or not line.strip():
            continue
        try:
            entry = Entry(*fields)
        except ValueError:
            LOG.warning("Failed to parse row in IPG table: %s", fields)
            continue
        groups[ipg].append(entry)
    return groups

@orlowskamga
Copy link
Author

orlowskamga commented Jan 4, 2023

New day, new issue. This time problem is with mine query. As I can see by adding print in context.py cblaster generate results, but result files are empty (binary, summary and plot).
npun_ses.zip

@galacmr
Copy link

galacmr commented Jan 4, 2023

Hi, I encountered your original problem yesterday too with v 1.3.17 but found that when I restarted the search the problem didn't happen again. But today I am encountering the same problem you are where the results files are empty.

@gamcil
Copy link
Owner

gamcil commented Jan 5, 2023

Thanks for that change @orlowskamga, errors there aren't handled well so I'll add that in the next release. I think the IPG efetch request has also been inconsistent lately, returning empty tables for queries that definitely should not have been empty.

For the other issue, did you specify the --min_hits or --unique arguments? By default they are set to 3, so searches with 2 queries will return no results (I should change this too).

@orlowskamga
Copy link
Author

orlowskamga commented Jan 5, 2023

I tried to run cblaster with gui on my local computer (v. 3.1.16) instead of command line version on the server and it works! And as you just said - using GUI forced me to modify -mh und -u option and now everything is working perfectly fine. But while working on server - I still have the same problem with empty results, even with modified -mh and -u options.
On my local - works fine. And as I finally got my results I can say - I'm just in love with cblaster. Great work!

Well - it worked once, now it's the same with empty results also on local machine ;_;

@galacmr
Copy link

galacmr commented Jan 5, 2023

I am also still having the issue of empty results files, specifically the html and csv outputs. I am running v 3.1.17 (freshly installed) locally on my machine. My search has 8 proteins but I did try specifying -mh and -u but the results files are still coming up empty. The json file does have results in it so I tried running it from that to make the csv and html outputs but they still don't get populated with results.

@gamcil
Copy link
Owner

gamcil commented Jan 6, 2023

Unforuntately I can't reproduce this issue at the moment, all of the outputs for searches I am testing with are working... I think it might be the intermittent NCBI issue. I'll have to add some warning during the genomic context search if nothing gets returned.

@galacmr
Copy link

galacmr commented Jan 6, 2023

Today it ran fine for me so it does look like NCBI was likely the cause of the problem.

@orlowskamga
Copy link
Author

I tested it yesterday with kochał database and everything was working fine, so +1 for issue with NCBI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants