small code change for use of the cdd2cog.pl script with COG20 #14

dspeth · 2021-07-01T12:09:37Z

Hi!

not really an issue, but putting this here in case the original author doesn't have time to modify the cdd2cog.pl script for use with the updated COG20 database. I guess this would be more properly done via pull request, but I figure more people might see an opened issue. I also realise this is a "band-aid" style fix, but might be useful to someone nonetheless.

if you're interested in using the cdd2cog.pl script with the COG20 database (which can be found here https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/) only a small change is required. The body of the script will work perfectly fine, but the information parsed from the fun.txt and whog files now needs to be parsed from files with a slightly different name and format.

fun.txt is now replaced by fun-20.tab
whog can be replaced by cog-20.def.tab
both of these files can be downloaded from the link above

to retrieve the relevant info from these files, the subroutines all the way at the end of cdd2cog.pl need to be modified. For clarity and ease of copy/pasting, I have pasted the entire subroutine here. The orignal code is still in place, but commented out. The 4 added lines are present under # code to parse fun-20.tab file and # code to parse cog-20.def.tab.

after the modifications are made, the script can be run using:
cdd2cog.pl -r rps-blast.out -c cddid.tbl -f fun-20.tab -w cog-20.def.tab

Hope this is helpful!

###############
# Subroutines #
###############

### Subroutine to parse the 'cddid.tbl', 'fun' and 'whog' file contents and store in hash structures
sub parse_cdd_cog {

    ### 'cddid.tbl'
    open (my $cddid_fh, "<", "$CDDid_File");
    print "\nParsing CDDs '$CDDid_File' file ...\n"; # status message
    while (<$cddid_fh>) {
        chomp;
        my @line = split(/\t/, $_); # split line at the tabs
        if ($line[1] =~ /^COG\d{4}$/) { # search for COG CD accessions in cddid
            $CDDid{$line[0]} = $line[1]; # hash to store info; $line[0] = PSSM-Id
        }
    }
    close $cddid_fh;

    ### 'fun.txt'
    open (my $fun_fh, "<", "$Fun_File");
    print "Parsing COGs '$Fun_File' file ...\n"; # status message
    while (<$fun_fh>) {
        chomp;
	
	# code to parse fun-20.tab file
	my @line = split(/\t/, $_); # split line at the tabs
	$Fun{$line[0]} = {'desc' => $line[2], 'count' => 0}; # anonymous hash in hash
        # $line[0] = single-letter functional category, $line[2] = description of functional category
        # count used to find functional categories not present in the query proteins for final overall assignment statistics	

	# code and comments to parse original fun.txt file
	#$_ =~ s/^\s*|\s+$//g; # get rid of all leading and trailing whitespaces
        #if (/^\[(\w)\]\s*(.+)$/) {
        #    $Fun{$1} = {'desc' => $2, 'count' => 0}; # anonymous hash in hash
        #    # $1 = single-letter functional category, $2 = description of functional category
        #    # count used to find functional categories not present in the query proteins for final overall assignment statistics

        #}
    }
    close $fun_fh;

    ### 'whog'
    open (my $whog_fh, "<", "$Whog_File");
    print "Parsing COGs '$Whog_File' file ...\n"; # status message
    while (<$whog_fh>) {
        chomp;
	# code to parse cog-20.def.tab
	my @line = split(/\t/, $_); # split line at the tabs
	$Whog{$line[0]} = {'function' => $line[1], 'desc' => $line[2]}; # anonymous hash in hash
	# $line[1] = single-letter functional categories, maximal four per COG in COG20 (COG5032 no longer exists)
	# $line[0] = COG#, $line[2] = COG protein description

	#code and comments to parse the original whog file
        #$_ =~ s/^\s*|\s+$//g; # get rid of all leading and trailing whitespaces
        #if (/^\[(\w+)\]\s*(COG\d{4})\s+(.+)$/) {
        #    $Whog{$2} = {'function' => $1, 'desc' => $3}; # anonymous hash in hash
        #    # $1 = single-letter functional categories, maximal five per COG (only COG5032 with five)
	#    # $2 = COG#, $3 = COG protein description
        #}
    }
    close $whog_fh;

    return 1;
}

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small code change for use of the cdd2cog.pl script with COG20 #14

small code change for use of the cdd2cog.pl script with COG20 #14

dspeth commented Jul 1, 2021

small code change for use of the cdd2cog.pl script with COG20 #14

small code change for use of the cdd2cog.pl script with COG20 #14

Comments

dspeth commented Jul 1, 2021