Statistics are an integral aspect of scientific research, particularly for the life sciences which rely heavily on quantitative methodologies. This course is designed to provide researchers in the life sciences with a gentle introduction to statistics and its application to a variety of biological problems.
This course is intended for scientists (and in particular life scientists) from all levels and disciplines who are not experts in statistics.
Although we will provide materials and a reminder on data mamipulation in python, participant must be comfortable with the python environment and be able to read, understand and write basic python commands before attending this course. We also recommend some familiarity with the pandas, and matplotlib libraries.
The course will combine lectures on statistics, short tutorials and practical exercises on the topics discussed in the class. These practical exercises will be implemented in the widely used python language and environment for statistical computing and graphics.
Software to be installed PRIOR to the course:
- latest python3 distribution, preferably bundled using conda
- jupyter (https://jupyter.org/install)
Python libraries (we recommend the usage of conda for the installation):
- scipy (NB: if you installed conda, then this library is already installed)
- statsmodels library
- pandas
- seaborn
- scikit-learn
The course is organized in several, numbered, jupyter notebooks, each corresponding to a chapter which interleaves theory, code demo, and exercises.
The course does not require any particular expertise with jupyter notebooks to be followed, but if it is the first time you encounter them we recommend this gentle introduction.
- 01_data_manipulation_and_representation.ipynb : an introduction without much statistics, to get everyone up to speed on the pandas, matplotlib, and seaborn libraries.
- 02_distribution_and_statistical_tests.ipynb
- 03_distribution_and_statistical_tests_continued.ipynb
- 04_correlation_and_regression.ipynb
Solutions to each practical can be found in the solutions/
folder and should be loadable directly in the jupyter notebook themselves.
Please cite as: Wandrille Duchemin. (2023, June 22). Material for the Introduction to Statistics with Python SIB-training course. Zenodo. https://doi.org/10.5281/zenodo.8070049