This is a repository to showcase skills, share projects and track my progress in Data Science / Machine Learning related topics.
- About
- Study projects
- Portfolio of end-to-end projects
- [MLOps] - Client Segmentation for targeted marketing in a credit union - Internship project
- [NLP] - Predicting job salary using Neural Network model on Azure ML
- [MLOps] - End-to-end ML project predicting diabetes on Azure Machine Learning and GitHub Actions
- [LLM] - Fine-Tuning LLM with SkyPilot and DVC
- Previous work
In this section I will provide links to my github repositories containing code and jupyter notebooks I created while passing courses.
This is 12 months master program at the University of Calgary, Canada.
For more details ---> go to repo...
List of courses:
- DATA601 - Working with Data and Visualization
- DATA602 - Statistical Data Analysis
- DATA603 - Statistical Modelling with Data
- DATA604 - Working with Data at Scale
- DATA605 - Actionable Visualization and Analytics
- DATA606 - Statistical Methods in Data Science
- DATA607 - Machine Learning
- DATA608 - Developing Big Data Applications
Code:
go to repo...
Status:
In progress
Code:
go to repo...
Description
: This project is an in-depth exploration of various NLP models with the purpose of generating text based on a dataset of recipes.
Skills in focus
: N-gram Language Model, Neural Language Models (RNN-LSTM, Convolutional), Sampling strategies, Evaluating language model
Status:
Completed in November 2023
In this section I will list projects briefly describing the technology stack used to solve cases.
Code
: go to repo...
Presentation
: go to google slides...
Industry
: Banking and Finance
Description
: The focus of the project was to build a highly flexible and automated ML pipeline to run experiments. Then, the best model is deployed to an app by a series of automated workflows.
Skills in focus
: Clustering, Model selection, Data and model versioning, Experimentations, CI/CD pipelines
Tools
:
- Environment: GitHub Codespaces, devcontainer, Docker, venv, Hydra
- Data Management: DVC (Data Version Control), AWS S3
- DS and ML: scikit-kearn (PCA, clustering algorithms), keras (autoencoder)
- Continuous Integration: GitHub Actions, CML, AWS EC2
- Continuous Deployment: Fast API, Heroku
Results
: This helps the credit union make better decisions about how to reach out to different groups of clients.
Status
: Completed in August 2023.
Code
: go to repo...
Description
: This is from Kaggle competition: "Adzuna wants to build a prediction engine for the salary of any UK job ad, so they can make huge improvements in the experience of users searching for jobs, and help employers and jobseekers figure out the market worth of different positions."
Data
: large dataset (hundreds of thousands of records), which is mostly unstructured text, with a few structured data fields.
Skills in focus
: Regression, Tokenization, Categorical Vectorization, Neural Networks, OOP, ML Pipeline (Azure CLI), Components (Azure CLI), Deployment
Tools
:
- Environment: GitHub Codespaces, devcontainer, conda, Azure CLI, Azure ML Studio
- DS and ML: PyTorch, scikit-learn
Status
: Completed in September 2023.
Code
: go to repo...
Industry
: Healthcare
Description
:
Skills in focus
: Logistic Regression, CI/CD pipelines, Linting, Testing, Package and Register the Model
Tools
:
- Environment: GitHub Codespaces, devcontainer, Docker, venv
- Data Management: Azure ML Datastore
- DS and ML: scikit-kearn (Logistic regression)
- Continuous Integration: GitHub Actions, Azure ML Resources (Job, Compute, Environment), flake8, pytest
- Continuous Deployment: MLFlow
Results
: An automated workflow that will be triggered when a new model is registered. Once the workflow is triggered, the new registered model will be deployed to the production environment.
Status
: Completed in October 2023.
Code
: go to repo...
Description
: Fine-tune the foundational LLM for hotel reviews' sentiment classification in the cloud on GPUs.
Skills in focus
: Text classification, Fine-tune LLM, Provision infrastructure, Checkpointing
Tools
:
- Environment: GitHub Codespaces, devcontainer, Docker, venv
- Infrastructure Management: SkyPilot
- DS and ML: Transformer, PyTorch
- Continuous ML: DVC, Weights and Biases
Results
: Cost-optimized setup to run in the cloud to fine-tune LLM with continuous machine learning.
Status
: Completed in October 2023.
- Linkedin: https://www.linkedin.com/in/avoytkiv/
- E-mail: [email protected]