Releases: vanheeringen-lab/gimmemotifs
Releases · vanheeringen-lab/gimmemotifs
Version 0.18.0
[0.18.0] - 2023-01-11
Added
gimme scan
andgimme maelstrom
now accept a random seed for (most) operations- for (optimal) deterministic behaviour, delete the cache and then run the command with a seed
Scanner
now accepts anp.random.RandomState
andprogress
on init.progress=None
(the default) should print progress bars to the command line only, not to file.
Scanner.set_genome
now accepts the optional argumentgenomes_dir
gimmemotifs.maelstrom.Moap.create
now accepts anp.random.RandomState
.gimmemotifs.maelstrom.run_maelstrom
now accepts anp.random.RandomState
.
Changed
gimme diff
(diff_plot()
to be exact) will now print to stdout, like all other functions- now using the logger instead of print/sys.stderr.write in many more places
- string formatting now (mostly) done with f-strings
- refactored Fasta class
- split
scanner.py
into 3 submodules:scanner/__init__.py
with the exported functionsscanner/base.py
with the Scanner classscanner/utils.py
with the rest
gimmemotifs/maelstrom.py
renamed togimmemotifs/maelstrom/_init__.py
rank.py
andmoap.py
are now submodules of maelstrom.
Fixed
gimme maelstrom
works with or without xgboost (but will give a warning without xgboost)- fixed warning "in validate_matrix(): Row sums in df are not close to 1. Reormalizing rows..."
- fixed multiprocess.Pool Warnings
- fixed a pandas copywarning (in
gc_bin_bedfile()
to be exact) - fixed warnings when leaving files open
- fixed deprecation warning in maelstrom (and in tests)
- fixed futurewarning in report.py
- silence warnings from external tools in motif prediction (
pp_predict_motifs()
to be exact) - updated last references from
Motif.pwm_scan
andMotif.pwm_scan_all
toMotif.scan
andMotif.scan_all
respectively - typo in
gimme motifs
output ("%matches background" to "% matches background") Scanner
now uses a cheaper method to determine a genome's identity- (filesize + name instead of the md5sum of the whole genome's contents)
gimme motifs
gives an informative error whenfraction
is not within 0-1.gimme threshold
works again
Removed
- removed old python2 code (scanning with MOODS & import shenanigans)
Version 0.17.2
[0.17.2] - 2022-10-12
Changed
- made xgboost an optional dependency (to save space on bioconda)
- an existing config will now update available tools when accessed (e4b3275)
- applied the bioconda patch to compile_externals.py (11b0c2c)
coverage_table
andcombine_peaks
have their positional arguments under positional arguments (20819ee)coverage_table
should be slightly faster now (20819ee)
Fixed
- biofluff dependency back in requirements
- pinned conda and mamba versions in
.travis.yaml
- temp fix until conda>=4.12 can install mamba properly
- documentation is working again!
- gimmemotifs now supports pandas >=1.30
Removed
- pyarrow dependency
Version 0.17.1
Changelog
[0.17.1] - 2022-06-02
Fixed
- motifs require to have unique ids when clustering, thanks @akmorrow13!
- motif2factors removes apostrophes so it wont crash :)
- removed a print
Version 0.17.0
Changelog
[0.17.0] - 2021-12-22
Added
- Added
--genomes_dir
argument togimme motif2factors
. - Added
--version
flag. - Function
sample()
for fast sequence sampling from aMotif()
instance. - Added JASPAR 2022 motif databases.
- Updated Homer motif database.
- Operators:
+
- take the combination of two motifs (average), based on pfm, which means that motifs with higher counts will be weighed more heavily.&
- take the combination of two motifs (average), based on the ppm, which means that both motifs will be weighed equally.<<
- "shift" motif left (adding a non-informative position to the right side)>>
- "shift" motif right (adding a non-informative position to the left side)~
- reverse complement*
- multiply the pfm by a value
- Progress bar for scanning.
list_installed_libraries()
to list available motif libraries.
Changed
Motif()
class completely restructured:- Split into multiple files with coherent function.
- Uses
numpy.array
internally. - All functions that mention
pwm
renamed toppm
(position-probability matrix), as the definition of a PWM is usually a log-odds matrix, not a probability matrix.to_pwm()
is deprecated, useto_ppm()
instead.
- Changed functions
pwm_min_score()
andpwm_max_score()
to propertiesmax_score
andmin_score
. - All internal data is correctly updated when
Motif()
is changed, for instance by trimming (#218).
Fixed
gimme motif2factors
can now unzip genome fastas.gimme motif2factors
will sanitize genome names.- Fixed bugs related to partial rerun of
gimme motif2factors
. - Fixed unhandled
OSError
during installation on Mac. - Fixed bug related to
RFE()
(#226). - Positional probability matrix now sum to 1 over all positions (#209).
- Fixed issue with pandas >= 1.3.
- Fixed issue with
non_reducing_slice
import from pandas. - Fix threshold calculation if more than 20,000 sequences are supplied.
- Fix issue with config file getting corrupted.
- Fix FPR threshold calculation.
Version 0.16.1
[0.16.1] - 2021-06-28
Bugfix release.
Added
- Added warning when the number of sequences used for de novo motif prediction is low.
Fixed
- Fixed bug with
gimme motif2factors
. - Fixed "Motif does not occur in motif database when running maelstrom" (#192).
- Fixed bugs related to runs where no (significant) motifs is found.
Version 0.16.0
[0.16.0] - 2021-05-28
Many bugfixes, thanks to @kirbyziegler, @irzhegalova, @wangmhan, @ClarissaFeuersteinAkgoz and @fgualdr for reporting and proposing solutions!
Thanks to @Maarten-vd-Sande for the speed improvements.
Added
gimme motif2factors
command to annotate a motif database with TFs from different species
based on orthogroups.- Informative error message with link to fix when cache is corrupted (running on a cluster).
- Print an informative error message if the input file is not in the correct format.
Changed
- Speed improvements to motif scanning, which is now up to 2X faster!
- Size of input regions is now automatically adjusted (#123, #128, #129)
- Quantile normalization in
coverage_table
now uses multiple CPUs.
Fixed
- Fixes issue where % of motif occurence would be incorrectly reported in
gimme maelstrom
output (#162). - Fix issues with running Trawler (#181)
- Fix issues with running YAMDA (#180)
- Fix issues with parsing XXmotif output (#178)
- Fix issue where command line argument (such as single strand) are ignored (#177)
- Fix pyarrow dependency (#176)
- The correct % of regions with motif is now reported (#162)
- Fix issue with running
gimme motifs
with the HOMER database (#135) - Fix issue with the
--size
parameter ingimme motifs
, which now works as expected (#128)
Version 0.15.3
[0.15.3] - 2021-02-01
Fixed
_non_reducing_slice
vsnon_reducing_slice
for pandas>=1.2 (#168)- When using original region size, skip regions smaller than 10bp and warn if no
regions are left. - Fixed creating statistics report crashed with
KeyError: 'Factor'
(#170) - Fixed bug with creating GC bins for a genome with unusual GC% (like Plasmodium).
- Fixed bug that occurs when upgrading pyarrow with an existing GimmeMotifs
cache.
Version 0.15.2
[0.15.2] - 2020-11-26
Changed
- Refactoring to make
coverage_table
andcombine_peaks
available via API.
Fixed
- Fix issue with -s parameter of
gimme motifs
(#146) - Fix issues (hopefully) with scanning large input files.
Version 0.15.1
[0.15.1] - 2020-10-07
Bugfix release.
Added
Motif.plot_logo()
accepts anax
argument.
Fixed
- Support for pandas>=1.1
coverage_table
doesn't add a newline at the end of the file.
Version 0.15.0
[0.15.0] - 2020-09-29
Added
- Added additional columns to
gimme maelstrom
output for better intepretation (correlation of motif to signal and % of regions with motif). - Added support for multi-species input in
genome@chrom:start-end
format. gimme maelstrom
warns if data is not row-centered and will center by default.gimme maelstrom
selects a set of non-redundant (or less redundant) motifs by default.- Added SVR regressor for
gimme maelstrom
. - Added quantile normalization to
coverage_table
.
Removed
- Removed the lightning classifiers and regressors as the package is no longer actively maintained.
Changed
- Visually improved HTML output.
- Score of
maelstrom
is now an aggregate z-score based on combining z-scores from individual methods using Stouffer's method. The z-scores of individual methods are generated using the inverse normal transform. - Reorganized some classes and functions.
Fixed
- Fixed minor issues with sorting columns in HTML output.
gimme motifs
doesn't crash when no motifs are found.- Fixed error with Ensembl chromosome names in
combine_peaks
.