These are my notes, projects, and assignments from the MS Data Science program at Texas Tech's Rawls College of Business.
You can refer to the course schedule on TTU's website to follow along in the coursework. Classes and material will be added as I progress through the program.
Courses so far:
- Multivariate Analysis
- Statistical methods for analyzing multivariate data in the R language
- Topics: Principal component analysis, multidimensional scaling, exploratory and confirmatory factor analysis, and clustering
- Big Data Strategy
- Review of corporate big data strategy literature and case studies involving management, monetization, privacy, and ethics
- Business Intelligence
- Practices for collecting, transforming, cleaning, preparing, and visualizing data
- Topics: Web scraping with Python and HTML, data preparation and cleaning (ETL) with Pandas
- Time Series Analysis
- Overview of basic econometrics, linear and nonlinear regression, panel data, and fundamental time series techniques
- Database Concepts
- Data defintion language, database design, entity-relationship modeling, Stuctured Query Language(SQL) with MySQL, and NoSQL
- Statistics for Data Science
- An introduction to statistical concepts as well as the R language
- Scripting Languages
- Overview of Python data analysis with Pandas and NumPy
The top directory in each class will be the course lecture notes, including example code. These are my own annotations and understanding of my professor's notes, and I'm doing my best to ensure copyrighted content doesn't slip through. This is the current working directory for my studying, so feel free to file a bug if something slipped through my .gitignore.
Otherwise these markdown files were lovingly typed by hand into iA Writer, my favorite Markdown editor.
These items will always be my work, with the exception of code included as part of the assignment and group projects. Often I try to include some headmatter to explain the task or prompt I am responding to. Feel free to file an issue if you find somewhere where this is not the case.
Extra content provided by the professor or dug up by me while doing background research. May be in a variety of file types. Hopefully nothing copyrighted, most of these should be from public sources.
Most of descriptive content is in markdown or Jupyter Notebooks (Python and R). There's a possibility I'll migrate the R content to RMD or R-Notebooks, but that's an internet debate I'm not ready to wade into.
These are my references that I update with useful snippets I've learned that assist my workflow, they aren't necessarily taught in class. TeX, for example, is not a part of the curriculum but I've found it extremely useful for capturing formulas so I can stay in markdown/notebooks.