Skip to content

Check here for a showcase of the data science and data analysis projects that I've done

Notifications You must be signed in to change notification settings

mleiwe/DataSciencePortfolio

Repository files navigation

DataSciencePortfolio

Welcome to my data science portfolio the intent is to provide a showcase of the data science and data analysis projects that I've done. Click on the links below to check out some of the projects that I've done

To find out more about me check out the pptx or check out the google slide deck

Selected Projects

Four projects are described here. If you want to find out more click on the hyperlinked titles to get more detail and see the codes

Facial recognition challenge for FruitPunch AI: Many sea turtle species are critically endangered and monitoring sea turtle populations is vital. However, tracking a turtle over several captures is a difficult challenge as metal tags can get damaged, and also cause distress to the turtle. By using facial recognition we came up with a solution that is more accurate and faster than manual annotation, and also minimises harm to the turtle. LightGlueDemoSift

Images were first passed through a YOLOv8_SAM network to isolate the relevant turtle pixels. Following this a variety of different models were tested to see which was the most effective. The winning solution was SIFT keypoint extraction followed by LightGlue for keypoint matching. However, instead of the basic point matching as demonstrated in the image above I devised a novel metric which compares the distribution of all the keypoints to a null distribution (average distribution of non-matching sea turtles). This difference in distributions was quantified using the Wasserstein distance. My team's solution proved to me more effective than other methods such as metric learning and LoFTR.

Screenshot 2024-02-17 at 9 47 29 PM

A new clustering algorithm that utilises a single distance threshold. Ideal for when you don't know how many clusters there should be but all the points should be closely related.

dCrawlerDemo.mov

dCrawler is particularly useful for clustering colors when compared to DBSCAN DBSCAN_vs_dCrawler_image

As part of the ML zoomcamp training this was an exercise to get familiar with deploying a solution using docker images. In this example, I used features extracted from histological samples containing malignant or benign tumors. The original data set is nicely curated but with approximately 30 variables is quite large. By utilizing principal component analysis (PCA) I engineered 10 features that explain 95% of the variance. Many models were assessed using a gridsearch method, with the scoring metric being the F1 score due to a class imbalance in the dataset. I found that the most effective model on the validation set was a logistic regression classifier on the 10 Principal components. This produced an F1 score >0.975 on the validation data (see figure below) 281686787-bcad93e2-c776-4959-bfad-da2307cd76f8

In biological imaging often the colours smeer (chromatic aberration) which hinders any further analysis. I noticed that we could model the aberration and so reverse its effects to produce an accurate image. This meant we could keep unaffected areas the same (e.g. E) while correcting the distorted areas (e.g. F) CA_Fig5

About

Check here for a showcase of the data science and data analysis projects that I've done

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages