DataSciencePortfolio

Welcome to my data science portfolio the intent is to provide a showcase of the data science and data analysis projects that I've done. Click on the links below to check out some of the projects that I've done

To find out more about me check out the pptx or check out the google slide deck

Selected Projects

Four projects are described here. If you want to find out more click on the hyperlinked titles to get more detail and see the codes

Sea Turtle Facial Recognition

Facial recognition challenge for FruitPunch AI: Many sea turtle species are critically endangered and monitoring sea turtle populations is vital. However, tracking a turtle over several captures is a difficult challenge as metal tags can get damaged, and also cause distress to the turtle. By using facial recognition we came up with a solution that is more accurate and faster than manual annotation, and also minimises harm to the turtle.

Images were first passed through a YOLOv8_SAM network to isolate the relevant turtle pixels. Following this a variety of different models were tested to see which was the most effective. The winning solution was SIFT keypoint extraction followed by LightGlue for keypoint matching. However, instead of the basic point matching as demonstrated in the image above I devised a novel metric which compares the distribution of all the keypoints to a null distribution (average distribution of non-matching sea turtles). This difference in distributions was quantified using the Wasserstein distance. My team's solution proved to me more effective than other methods such as metric learning and LoFTR.

dCrawler

A new clustering algorithm that utilises a single distance threshold. Ideal for when you don't know how many clusters there should be but all the points should be closely related.

dCrawlerDemo.mov

dCrawler is particularly useful for clustering colors when compared to DBSCAN

CancerDetection

As part of the ML zoomcamp training this was an exercise to get familiar with deploying a solution using docker images. In this example, I used features extracted from histological samples containing malignant or benign tumors. The original data set is nicely curated but with approximately 30 variables is quite large. By utilizing principal component analysis (PCA) I engineered 10 features that explain 95% of the variance. Many models were assessed using a gridsearch method, with the scoring metric being the F1 score due to a class imbalance in the dataset. I found that the most effective model on the validation set was a logistic regression classifier on the 10 Principal components. This produced an F1 score >0.975 on the validation data (see figure below)

Correcting Chromatic Aberration

In biological imaging often the colours smeer (chromatic aberration) which hinders any further analysis. I noticed that we could model the aberration and so reverse its effects to produce an accurate image. This meant we could keep unaffected areas the same (e.g. E) while correcting the distorted areas (e.g. F)

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
CancerDetection		CancerDetection
ChromaticAberration		ChromaticAberration
SeaTurtle		SeaTurtle
dCrawler		dCrawler
.DS_Store		.DS_Store
README.md		README.md
WhoAmI_PlusOneCaseStudy.pptx		WhoAmI_PlusOneCaseStudy.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataSciencePortfolio

Selected Projects

Sea Turtle Facial Recognition

dCrawler

CancerDetection

Correcting Chromatic Aberration

About

Releases

Packages

Languages

mleiwe/DataSciencePortfolio

Folders and files

Latest commit

History

Repository files navigation

DataSciencePortfolio

Selected Projects

Sea Turtle Facial Recognition

dCrawler

CancerDetection

Correcting Chromatic Aberration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages