DataScience

This repository contains Jupyter notebooks, images, PDFs, etc. prepared for the course Introduction to Data Science offered for Ph. D. students at TIFR Hyderabad (https://moldis-group.github.io/teaching.html)

How to access this material?

First of all, this material is made available on the GitHub to encourage others to access it freely, maintain a local copy, and may be even contribute corrections or new material. So, you can follow the three steps listed below freely, i.e., without having to commit to any responsibilities.

If you think that you will ever use (or reuse) this material (or a part of it) for any purpose, you should sign-in to github by creating an account (maybe google account works for this too), and then click the 'Fork' button on the top-right. Then, you will get a local copy to play with. You will also be notified when any changes is made to this master version. You will be able to merge the new changes to your own copy of this repository. Others can also pull the changes you make in your version.
To download the content to your computer, type the following in a terminal

git clone https://github.com/raghurama123/DataScience.git

or click the 'code' botton above and then click 'Download zip'

If you also Fork the material, then replace 'raghurama123' in the above line with your 'username'

If you want to try the material in a web browser, i.e., to test the code or make small changes and run the code, you can access this repository at the interactive platform Binder by clicking the link: https://mybinder.org/v2/gh/raghurama123/DataScience/HEAD

If you also Fork the material, then replace 'raghurama123' in the above line with your 'username'

Syllabus:

The syllabus of this course is evolving over time. The original plan was to cover the following topics

Data Science: Big Data, Facets of data (structured/unstructured data)
Toolboxes: Python libraries, SCIKIT-Learn, PANDAS
Statistics: Distributions, Outlier, Skewness, Pearson’s/Spearman’s/Kendall’s coefficient, Kernel density
Statistical Inference: Hypothesis testing, Confidence Intervals
Supervised Machine Learning: What is machine learning? Learning curves, Support Vector Machines, Random Forest
Regression: Linear Regression, Logistic Regression
Unsupervised Machine Learning: Clustering, Case studies
Big Data concepts: Handling large data, Hadoop, Spark, NoSQL, Graph databases, Natural language processing, MapReduce

Additional reading:

10 minutes to Pandas
Introduction to Data Science. A Python Approach to Concepts, Techniques and Applications, Laura Igual, Santi Segu, Springer (2017).
Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning (2016).
Learn Git

Data sources:

https://www.kaggle.com/

Contact

For comments, questions, suggestions or requests please write to [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
datasets		datasets
images		images
notebooks		notebooks
student_project_2022		student_project_2022
FirstSteps.md		FirstSteps.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataScience

How to access this material?

Syllabus:

Additional reading:

Data sources:

Contact

About

Releases

Packages

Languages

License

raghurama123/DataScience

Folders and files

Latest commit

History

Repository files navigation

DataScience

How to access this material?

Syllabus:

Additional reading:

Data sources:

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages