What makes us happy?

Introduction

The pursuit of happiness has fascinated humanity throughout the ages. Ancient philosophers and modern-day researchers have sought to uncover its source, however the secret formula to happiness remains yet to be found. In our project, we join those who have embarked on this quest, exploring the factors that influence happiness and striving to find a predictive model for happiness. Join us on this captivating journey as we dive into modeling happiness and its determinants, and inspire a happier and more fulfilling world. Moreover, it is important to talk about happiness, as it contributes to several Sustainable Development Goals (SDGs), such as Goals 1, 2, 3, 4, 5, 6, 10, and 16 highlighting its importance in creating a sustainable and fulfilling world.

In line with this, the aim of this project is to answer the question, "What factors influence the level of happiness in countries? To answer this question, we applied different methods and algorithms to find the best one that predicts the happiness (given by the feature "Happiness Score.")

Contributors

Maria Velikikh (@mivelikikh)
Emilija Vukasinovic (@emavuk)
Paula Ramirez Ortega (@Pramirezortega)

In this project 2016_world_metrics.csv (37.3 KB) dataset is used.

To see the full project please refer to the notebook full_project.ipynb.

To watch the video of us explaining the most important parts of the project please refer to this link.

Exploratory Data Analysis

We began by examining the dataset to understand its characteristics. Our dataset world_metrics contains information on health and life expectancy data as well as on ecological footprint, human freedom scores, and happiness scores for 137 countries in 2016. Overall, it includes 30 features, out of which one is the “country name”, another the “happiness score” (our target variable) and the remaining 28, the predictors.

We discovered our data, checking for outliers, correlations between variables and happiness scores, and more. Computing the correlations, helped us define features with a high correlation (> $|0.6|$) out of which we compiled a subset, with which we worked later in our analysis. Thereafter, we used different clustering methods, such as clustering based on quintiles but also more advanced methods such as Kmeans and Hierarchical clustering. The clustering helped us better understand the distribution of our data based on the countries and their corresponding scores. One example of the clustering based on quintiles can be found below, showing that Europe, North America, Australia, and partially South America are the happiest regions in the world. We also see that Africa, as we expected, appears as the least happiest region.

Prediction Models

After extensively discovering our data we applied a complex strategy to analyze how the performance of our models could be influenced by various input parameters. Our objective was to investigate the following aspects:

The difference between different data samples:

The full dataset world_metrics (considering the effect of each original feature);
The subset world_metrics_subset (containing only the most correlated features with the target feature);

The effect of artificially constructed features:

Does the use of PolynomialFeatures() in the preprocessing step improve model performance?

The effect of scalling technique:

MinMaxScaller()
StandardScaller()

The effect of dimensionality reduction with the use of Principal Component Analysis (PCA):

Do we need all the features from the dataset?
Do we need only a few?

Through this approach, we aimed to assess the evolution of the prediction model's performance. In each scenario, we worked with the following models to evaluate their performance:

ElasticNet (to see what should we prefer: Ridge vs. Lasso)
Ridge
Lasso
kNN
Decision Tree
Random Forest
Support Vector Regression

To predict, we used a set of custom functions that are placed in the file functions.py. These functions were designed to work together in the workflow for performing grid search on a regressor, obtaining results, and extracting the best models based on different scoring functions. The code output includes both the model settings and the calculated metrics. The model settings provide information about the chosen algorithm, hyperparameters, and preprocessing steps used. The calculated metrics consist of the mean $R^2$, mean MAE, mean MSE, and standard deviations for each metric, allowing us to observe the minimum and maximum values across the five folds that was created with the use of Cross-Validation.

Results

This table compares all the models in terms of the obtained mean MSE. We use the following abreviations for compactness of the table:

F = Full Sample Approach
S = Subset Sample Approach

sample	model name	mean MSE	std	sample	model name	mean MSE	std
F	SVR	0.3355	0.0841	S	SVR	0.2842	0.0424
F	Random Forest	0.3465	0.0761	S	Random Forest	0.2961	0.0831
F	Ridge	0.3512	0.0804	S	Ridge	0.3499	0.0565
F	ElasticNet	0.3519	0.0806	S	ElasticNet	0.3510	0.0576
F	Lasso	0.3577	0.0945	S	Lasso	0.3561	0.0522
F	kNN	0.3619	0.1106	S	kNN	0.3073	0.0727
F	Decision Tree	0.5657	0.1843	S	Decision Tree	0.4734	0.1301

Especially, we found the following:

Full Sample Approach. Among the regression models, SVR (Support Vector Regression) has the lowest mean MSE of 0.3355, followed closely by Random Forest, and Ridge. ElasticNet, Lasso and kNN have slightly higher mean MSE values, and Decision Tree performs the worst with a mean MSE of 0.5657.

Subset Approach. SVR still has the lowest mean MSE of 0.2842, followed by Random Forest and kNN. Ridge, ElasticNet, and Lasso perform similarly, while Decision Tree has the highest mean MSE of 0.4734.

Overall, SVR consistently performs well on both the Full and Subset datasets, achieving the lowest mean MSE in both cases. This indicates that SVR is the most accurate model in predicting happiness scores based on the given features. It demonstrates good generalization ability and robustness across different dataset sizes.

Limitations and further research

In conclusion, our investigation did not uncover a definitive formula for happiness. However, we gained valuable insights into the factors associated with happiness and their alignment with the Sustainable Development Goals. It is important to acknowledge that the parameters used in our analysis offer a generalized perspective on happiness and may not fully capture its individual and multifaceted nature. Nevertheless, these insights have fueled our determination to continue our quest for understanding happiness and contribute to a happier world.

Looking ahead, we have several plans for future research. We intend to create subsets of factors based on our own definition of happiness, explore how happiness is depicted in cartoons, and investigate cultural perspectives on happiness. Additionally, we aim to expand our dataset by including currently missing countries, enabling us to gain a more comprehensive global view and examine regional variations. By pursuing these avenues and incorporating additional data, we seek to enhance the comprehensiveness of our findings and uncover new insights into the complex nature of happiness.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
LICENSE		LICENSE
PCA_approach.ipynb		PCA_approach.ipynb
README.md		README.md
full_project.ipynb		full_project.ipynb
functions.py		functions.py
preprocessing.ipynb		preprocessing.ipynb
preprocessing_full.ipynb		preprocessing_full.ipynb
preprocessing_full_subset.ipynb		preprocessing_full_subset.ipynb
preprocessing_full_subset_pca.ipynb		preprocessing_full_subset_pca.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What makes us happy?

Introduction

Contributors

Exploratory Data Analysis

Prediction Models

Results

Limitations and further research

About

Releases

Packages

Languages

License

Pramirezortega/what_makes_us_happy

Folders and files

Latest commit

History

Repository files navigation

What makes us happy?

Introduction

Contributors

Exploratory Data Analysis

Prediction Models

Results

Limitations and further research

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages