Skip to content

LGBEUT/Data-Scientist-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 

Repository files navigation

Data-Scientist-Analyst-Portfolio

À propos

Hi, I'm Loris! I have a background in chemistry and a solid foundation in data science, including machine learning and deep learning algorithms. I have developed a passion for using data to uncover meaningful insights. I am excited to bring my technical and analytical skills to the field of data science as an entry-level data specialist.

During my final year at university, driven by curiosity, I began to acquire knowledge in the field of data science through online courses (e.g., Open Classroom) and personal projects. My training encompassed rigorous data analysis using statistics and the tools necessary for implementing these analyses, such as Python and SQL. This multidisciplinary approach allowed me to integrate chemistry and data science effectively, culminating in models capable of predicting the yields of chemical reactions and molecular properties. Some of these projects were carried out in collaboration with my thesis supervisors, based on the scientific publications we co-produced. Additionally, I am currently participating in the "Leash-Bio" challenge on Kaggle, a platform for data science competitions.

In my free time, I enjoy exploring new machine learning and deep learning algorithms, data analysis tools, and I am always looking for opportunities to expand my knowledge and skills. Whether working on a team or independently, I am driven by the thrill of discovering new insights and the satisfaction of using data and machine learning to solve complex problems.

Table of Contents

Portfolio Projets

In this section I will list data analytics projects briefly describing the technology stack used to solve cases.

Statistical Analysis

Description: Principal Component Analysis (PCA) and Correspondence Analysis (CA) are both types of statistical analyses.

Code:

PCA with Iris Dataset: ACP_With_Iris_Complet_propre.ipynb

CA with NobelPrice Dataset: AFC_With_Python_Nobel.ipynb

Description: Variable analysis

Code:

normalite_analyse: Analyse_Normalite.ipynb

Bivariate analysis: Analyse_bivarie_et_tests.ipynb

Skills: Statistics, distribution, data cleaning, data analysis, hypothesis testing, data visualization,

Technology: Python, Pandas, Matplotlib, Seaborn.

Covid 19 Data Exploration

Code:

EDA: Covid19_EDA.ipynb

Preprocessing:Covid19_Preprocessing.ipynb

Machine Learning algorithms: Covid19_Machine_learning_model.ipynb

Description: The dataset contains records of Covid-19 cases, deaths, vaccine, etc.. This project includes the following steps: EDA (exploratory data analysis), Pre-processing, machine learning algorithms.

Data Source: Excel file

Skills: data cleaning, data analysis, hypothesis testing, data visualization, machine learning algorithms

Technology: Python, Pandas, Matplotlib, Seaborn, Scikit_learn.

Condensation-Reaction-Yield Prediction

Code:

Using Molecular Chemical Language Model Embedding : Yield Prediction using Molecular Chemical Language Model Embedding_ Condensation_reactions.ipynb

Using Molecular Descriptors_All_ Condensation_reactionsPreprocessing:Yield Prediction using Molecular Descriptors_All_ Condensation_reactions_.ipynb

Using Descriptors RDkit: Yield Prediction using Molecular Descriptors_GetAvailableProperties_ Condensation_reactions.ipynb

Using Fingerprint_Morgan: Yield Prediction using Molecular Fingerprints_Morgan_2048_Condensation_reactions.ipynb

Using Fingerprint_MACCS: Yield Prediction using Molecular Fingerprints_MACCS Key_Condensation_reactions.ipynb

Using Molecular Mol2Vec Embeddings: Yield Prediction using Molecular Mol2Vec Embeddings_Condensation_reactions.ipynb

Using DRFP Fingerprint: Yield Prediction using RDkit fingerprint (DRFP) datat_set_condensation_reaction.ipynb

Description: Read Me README.md

Data Source: Excel file

Skills: data cleaning, data analysis, hypothesis testing, data visualization, machine learning algorithms

Technology: Python, Pandas, Matplotlib, Seaborn, Scikit_learn, Rdkit

Use of deep learning models

DNN model

Code: Model_DNN_yield-prediction-rdkitmolecular-descriptors-dnn.ipynb

Description: Using a Deep Neural Network (DNN) model to predict the yield of chemical reactions.

Data Source: Excel file

Skills: data cleaning, data analysis, embedding, deep learning algorithms

Technology: Pytorch, Rdkit

CNN model

Code: Model_CNN_Classification.ipynb

Description: Using a Convolutional Neural Network (CNN) model for prediction.

Data Source: Fashion-MNIST

Skills: data cleaning, data analysis, embedding, deep learning algorithms

Technology: Pytorch

GNN model

Code:Model_GNN_Classic_QM9.ipynb

Description: Using a Graph Neural Network (GNN) model to predict the properties of molecules.

Data Source: QM9 Dataset

Skills: data cleaning, data analysis, embedding, deep learning algorithms

Technology: Pytorch, Rdkit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published