You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
not really an issue, but putting this here in case the original author doesn't have time to modify the cdd2cog.pl script for use with the updated COG20 database. I guess this would be more properly done via pull request, but I figure more people might see an opened issue. I also realise this is a "band-aid" style fix, but might be useful to someone nonetheless.
if you're interested in using the cdd2cog.pl script with the COG20 database (which can be found here https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/) only a small change is required. The body of the script will work perfectly fine, but the information parsed from the fun.txt and whog files now needs to be parsed from files with a slightly different name and format.
fun.txt is now replaced by fun-20.tab whog can be replaced by cog-20.def.tab
both of these files can be downloaded from the link above
to retrieve the relevant info from these files, the subroutines all the way at the end of cdd2cog.pl need to be modified. For clarity and ease of copy/pasting, I have pasted the entire subroutine here. The orignal code is still in place, but commented out. The 4 added lines are present under # code to parse fun-20.tab file and # code to parse cog-20.def.tab.
after the modifications are made, the script can be run using: cdd2cog.pl -r rps-blast.out -c cddid.tbl -f fun-20.tab -w cog-20.def.tab
Hope this is helpful!
###############
# Subroutines #
###############
### Subroutine to parse the 'cddid.tbl', 'fun' and 'whog' file contents and store in hash structures
sub parse_cdd_cog {
### 'cddid.tbl'
open (my $cddid_fh, "<", "$CDDid_File");
print "\nParsing CDDs '$CDDid_File' file ...\n"; # status message
while (<$cddid_fh>) {
chomp;
my @line = split(/\t/, $_); # split line at the tabs
if ($line[1] =~ /^COG\d{4}$/) { # search for COG CD accessions in cddid
$CDDid{$line[0]} = $line[1]; # hash to store info; $line[0] = PSSM-Id
}
}
close $cddid_fh;
### 'fun.txt'
open (my $fun_fh, "<", "$Fun_File");
print "Parsing COGs '$Fun_File' file ...\n"; # status message
while (<$fun_fh>) {
chomp;
# code to parse fun-20.tab file
my @line = split(/\t/, $_); # split line at the tabs
$Fun{$line[0]} = {'desc' => $line[2], 'count' => 0}; # anonymous hash in hash
# $line[0] = single-letter functional category, $line[2] = description of functional category
# count used to find functional categories not present in the query proteins for final overall assignment statistics
# code and comments to parse original fun.txt file
#$_ =~ s/^\s*|\s+$//g; # get rid of all leading and trailing whitespaces
#if (/^\[(\w)\]\s*(.+)$/) {
# $Fun{$1} = {'desc' => $2, 'count' => 0}; # anonymous hash in hash
# # $1 = single-letter functional category, $2 = description of functional category
# # count used to find functional categories not present in the query proteins for final overall assignment statistics
#}
}
close $fun_fh;
### 'whog'
open (my $whog_fh, "<", "$Whog_File");
print "Parsing COGs '$Whog_File' file ...\n"; # status message
while (<$whog_fh>) {
chomp;
# code to parse cog-20.def.tab
my @line = split(/\t/, $_); # split line at the tabs
$Whog{$line[0]} = {'function' => $line[1], 'desc' => $line[2]}; # anonymous hash in hash
# $line[1] = single-letter functional categories, maximal four per COG in COG20 (COG5032 no longer exists)
# $line[0] = COG#, $line[2] = COG protein description
#code and comments to parse the original whog file
#$_ =~ s/^\s*|\s+$//g; # get rid of all leading and trailing whitespaces
#if (/^\[(\w+)\]\s*(COG\d{4})\s+(.+)$/) {
# $Whog{$2} = {'function' => $1, 'desc' => $3}; # anonymous hash in hash
# # $1 = single-letter functional categories, maximal five per COG (only COG5032 with five)
# # $2 = COG#, $3 = COG protein description
#}
}
close $whog_fh;
return 1;
}
The text was updated successfully, but these errors were encountered:
Hi!
not really an issue, but putting this here in case the original author doesn't have time to modify the
cdd2cog.pl
script for use with the updated COG20 database. I guess this would be more properly done via pull request, but I figure more people might see an opened issue. I also realise this is a "band-aid" style fix, but might be useful to someone nonetheless.if you're interested in using the
cdd2cog.pl
script with the COG20 database (which can be found here https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/) only a small change is required. The body of the script will work perfectly fine, but the information parsed from thefun.txt
andwhog
files now needs to be parsed from files with a slightly different name and format.fun.txt
is now replaced byfun-20.tab
whog
can be replaced bycog-20.def.tab
both of these files can be downloaded from the link above
to retrieve the relevant info from these files, the subroutines all the way at the end of
cdd2cog.pl
need to be modified. For clarity and ease of copy/pasting, I have pasted the entire subroutine here. The orignal code is still in place, but commented out. The 4 added lines are present under# code to parse fun-20.tab file
and# code to parse cog-20.def.tab
.after the modifications are made, the script can be run using:
cdd2cog.pl -r rps-blast.out -c cddid.tbl -f fun-20.tab -w cog-20.def.tab
Hope this is helpful!
The text was updated successfully, but these errors were encountered: