Hi, I'm Loris! I have a background in chemistry and a solid foundation in data science, including machine learning and deep learning algorithms. I have developed a passion for using data to uncover meaningful insights. I am excited to bring my technical and analytical skills to the field of data science as an entry-level data specialist.
During my final year at university, driven by curiosity, I began to acquire knowledge in the field of data science through online courses (e.g., Open Classroom) and personal projects. My training encompassed rigorous data analysis using statistics and the tools necessary for implementing these analyses, such as Python and SQL. This multidisciplinary approach allowed me to integrate chemistry and data science effectively, culminating in models capable of predicting the yields of chemical reactions and molecular properties. Some of these projects were carried out in collaboration with my thesis supervisors, based on the scientific publications we co-produced. Additionally, I am currently participating in the "Leash-Bio" challenge on Kaggle, a platform for data science competitions.
In my free time, I enjoy exploring new machine learning and deep learning algorithms, data analysis tools, and I am always looking for opportunities to expand my knowledge and skills. Whether working on a team or independently, I am driven by the thrill of discovering new insights and the satisfaction of using data and machine learning to solve complex problems.
In this section I will list data analytics projects briefly describing the technology stack used to solve cases.
Description: Principal Component Analysis (PCA) and Correspondence Analysis (CA) are both types of statistical analyses.
Code:
PCA with Iris Dataset: ACP_With_Iris_Complet_propre.ipynb
CA with NobelPrice Dataset: AFC_With_Python_Nobel.ipynb
Description: Variable analysis
Code:
normalite_analyse: Analyse_Normalite.ipynb
Bivariate analysis: Analyse_bivarie_et_tests.ipynb
Skills: Statistics, distribution, data cleaning, data analysis, hypothesis testing, data visualization,
Technology: Python, Pandas, Matplotlib, Seaborn.
Code:
EDA: Covid19_EDA.ipynb
Preprocessing:Covid19_Preprocessing.ipynb
Machine Learning algorithms: Covid19_Machine_learning_model.ipynb
Description: The dataset contains records of Covid-19 cases, deaths, vaccine, etc.. This project includes the following steps: EDA (exploratory data analysis), Pre-processing, machine learning algorithms.
Data Source: Excel file
Skills: data cleaning, data analysis, hypothesis testing, data visualization, machine learning algorithms
Technology: Python, Pandas, Matplotlib, Seaborn, Scikit_learn.
Code:
Using Molecular Chemical Language Model Embedding : Yield Prediction using Molecular Chemical Language Model Embedding_ Condensation_reactions.ipynb
Using Molecular Descriptors_All_ Condensation_reactionsPreprocessing:Yield Prediction using Molecular Descriptors_All_ Condensation_reactions_.ipynb
Using Descriptors RDkit: Yield Prediction using Molecular Descriptors_GetAvailableProperties_ Condensation_reactions.ipynb
Using Fingerprint_Morgan: Yield Prediction using Molecular Fingerprints_Morgan_2048_Condensation_reactions.ipynb
Using Fingerprint_MACCS: Yield Prediction using Molecular Fingerprints_MACCS Key_Condensation_reactions.ipynb
Using Molecular Mol2Vec Embeddings: Yield Prediction using Molecular Mol2Vec Embeddings_Condensation_reactions.ipynb
Using DRFP Fingerprint: Yield Prediction using RDkit fingerprint (DRFP) datat_set_condensation_reaction.ipynb
Description: Read Me README.md
Data Source: Excel file
Skills: data cleaning, data analysis, hypothesis testing, data visualization, machine learning algorithms
Technology: Python, Pandas, Matplotlib, Seaborn, Scikit_learn, Rdkit
Code: Model_DNN_yield-prediction-rdkitmolecular-descriptors-dnn.ipynb
Description: Using a Deep Neural Network (DNN) model to predict the yield of chemical reactions.
Data Source: Excel file
Skills: data cleaning, data analysis, embedding, deep learning algorithms
Technology: Pytorch, Rdkit
Code: Model_CNN_Classification.ipynb
Description: Using a Convolutional Neural Network (CNN) model for prediction.
Data Source: Fashion-MNIST
Skills: data cleaning, data analysis, embedding, deep learning algorithms
Technology: Pytorch
Code:Model_GNN_Classic_QM9.ipynb
Description: Using a Graph Neural Network (GNN) model to predict the properties of molecules.
Data Source: QM9 Dataset
Skills: data cleaning, data analysis, embedding, deep learning algorithms
Technology: Pytorch, Rdkit