draft.tex

\documentclass[a4paper,12pt]{article}
\usepackage{graphicx}
\usepackage{setspace}
\usepackage[lmargin=2.5cm,tmargin=3cm,rmargin=2.5cm,vscale=0.8,nohead]{geometry}
\usepackage{multicol}
\usepackage{wrapfig}
\usepackage{tikz}
\usepackage{tikz-network}
\usepackage{subfig}
\usepackage{amsmath, amsthm, amsfonts, amssymb}
\usepackage{bibentry}
\usepackage[round]{natbib}
%\usepackage[numbers]{natbib}
\usepackage{xpatch}
\usepackage{pgfplots}
\newcommand{\citeyearonly}[1]{\citeyearpar{#1}}
%\usepackage{fancyhdr}
\usepackage{mathrsfs}
\usepackage{authblk}
\usepackage[alwaysadjust]{paralist}
\usepackage{alltt}
\usepackage{caption}
\usepackage{array}
\usepackage[fit]{truncate}
%\pagestyle{fancy}
\usepackage{calc}
%\usepackage{fancyvrb}
\usepackage{float}
\usepackage[utf8]{inputenc}
\usepackage{lscape}
\usepackage{longtable}
\usepackage{tabularx}
\usepackage[figuresright]{rotating}
\usepackage{color}
\usepackage{hyperref}
\usepackage{enumerate}
\usepackage{url}
\usepackage[normalem]{ulem}
\usepackage{multirow}
\usepackage{rotating}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{caption}
\usetikzlibrary{positioning, arrows}
\usepackage{multirow}
\usepackage{tabularx}
\usepackage{array}
\usepackage{booktabs}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{array}
\usepackage{graphicx}
\usepackage{longtable}
\usepackage{adjustbox}
\usepackage[stable]{footmisc}
\usepackage[french, english]{babel}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{epigraph}
\usepackage[page,toc,titletoc,title]{appendix}
\pgfplotsset{compat=1.18}


\newcommand{\doubleitem}{%
  \begingroup
  \stepcounter{enumi}%
  \edef\tmp{\theenumi, }%
  \stepcounter{enumi}
  \edef\tmp{\endgroup\noexpand\item[\tmp\labelenumi]}%
  \tmp}

\newenvironment{dedication}
  {%\clearpage           % we want a new page          %% I commented this
   \thispagestyle{empty}% no header and footer
   \vspace*{\stretch{1}}% some space at the top
   \itshape             % the text is in italics
   \raggedleft          % flush to the right margin
  }
  {\par % end the paragraph
   \vspace{\stretch{3}} % space at bottom is three times that at the top
   \clearpage           % finish off the page
  }


\renewcommand{\footnotesize}{\normalsize}

\title{{\bf \Large The Power of Context in Decision-Making and Recommendations}
}


\author{Emil Mirzayev}

%March 28, 2023

\begin{document}

\begin{dedication}

To my father, my first mentor, who seeded my enduring passion for learning.

\end{dedication}

\spacing{1.5}
\maketitle

\clearpage
\begin{otherlanguage}{french}
\begin{abstract}

Cette dissertation comprend quatre études qui examinent les effets du contexte et appliquent ces connaissances pour améliorer les systèmes de recommandation dans les marchés en ligne. Je différencie deux types de contexte : interne et externe. Les conclusions de la dissertation indiquent que les effets du contexte interne dans les choix, qui ont été principalement étudiés et observés auparavant dans des situations expérimentales, sont également présents et détectables dans des situations concrètes. De plus, l'étude examinant le contexte externe trouve une relation positive entre les outils qui permettent aux utilisateurs de signaler leurs préférences aux systèmes de recommandation et leur adoption. Les applications empiriques des chapitres composant cette thèse reposent sur quatre ensembles de données distincts, le plus grand étant des données observationnelles provenant d'une situation concrète, et les trois autres provenant d'une situation expérimentale.

La première étude applique un modèle de décision computationnel à un ensemble de données conséquent de choix dans le monde réel, en faisant la première application de cette envergure. Les résultats indiquent que le contexte influence les choix des individus. L'étude suggère que les marchés en ligne pourraient utiliser de tels modèles pour approfondir la compréhension de la composition de l'ensemble de choix et de l'interaction entre différentes options sur les décisions des consommateurs.

La deuxième étude va au-delà des études de contexte traditionnelles en développant une méthodologie visant à séparer ses trois composantes principales, connues sous les noms d'attraction, de compromis et de similarité, les unes des autres. Cette étude contribue à comprendre l'interaction entre différents effets de contexte au sein d'un ensemble de choix et préconise le développement de conceptions de systèmes de recommandation et une compréhension plus profonde de la nature hétérogène de la dynamique du choix du consommateur. Les résultats de cette étude peuvent être utilisés pour naviguer dans le problème dit de démarrage à froid rencontré par les marchés numériques.

La troisième étude présente une approche novatrice pour aborder le problème de démarrage à froid du côté de l'utilisateur dans la conception de systèmes de recommandation. Elle s'appuie sur les résultats de l'étude précédente et applique les conclusions de la littérature sur le choix à deux étapes observé chez les individus pour générer des ensembles de considération. Ses conclusions ouvrent la voie à l'examen des effets de contexte provenant de l'extérieur des ensembles de choix, à savoir les préférences des individus et les outils qui leur permettent de signaler leurs préférences aux systèmes de recommandation. Cela est présenté comme l'un des principaux mécanismes pour créer des systèmes de recommandation plus efficaces.

La dernière étude a examiné l'effet du contrôle de l'utilisateur sur l'acceptation du système de recommandation en utilisant le Modèle d'Acceptation de la Technologie comme cadre théorique. Cette étude a trouvé que les systèmes de recommandation faciles à utiliser étaient perçus par les utilisateurs comme plus utiles et entraînaient une plus grande intention de les utiliser. Cependant, différents mécanismes de contrôle ont eu des impacts variés sur l'expérience utilisateur.

Cette thèse démontre l'existence d'effets de contexte dans des configurations multiattributs, multidimensionnelles et développe des méthodologies pour améliorer la conception des systèmes de recommandation avec ces effets de contexte. De plus, cette étude examine le contexte externe, à savoir les outils qui permettent aux utilisateurs d'exprimer leurs préférences et comment rendre les systèmes de recommandation meilleurs grâce à eux.

\end{abstract}
\end{otherlanguage}

\clearpage
\begin{abstract}


This dissertation comprises four studies investigating context effects and applying the knowledge to enhance recommender systems in online marketplaces. I differentiate between two types of context: internal and external. The findings of the dissertation indicate that the effects of the internal context in choice settings, which were primarily studied and observed before in experimental settings, are also present and detectable in field settings. Furthermore, the study investigating the external context finds a positive relationship between tools that enable users to signal their preferences to recommender systems and their adoption. The empirical applications of the chapters comprising this thesis rely on four distinct datasets, the largest being observational data from a field setting and the remaining three coming from an experimental setting.

The first study applies a computational decision-making model to a substantial dataset of real-world choice, making it the first application of this magnitude. The findings indicate that the context influences the choices of individuals. The study suggests that online marketplaces could use such models to gain further insight into how the composition of the choice-set and the interaction among different options affect consumer decisions.

The second study extends beyond traditional context studies by developing a methodology that aims to disentangle its three main components, known as attraction, compromise, and similarity, from each other. This study contributes to understanding the interaction between different context effects within a choice set and advocates the development of recommender system designs and a deeper understanding of the heterogeneous nature of the consumer choice dynamics. The results of this study can be used to navigate the so-called cold-start problem faced by digital marketplaces. 

The third study presents a novel approach to addressing the user-side cold start problem in recommender system design. It builds on the results of the previous study and applies the findings of the two-stage choice literature observed in individuals to generate consideration sets. Its findings pave the way for investigating context effects arising from outside the choice sets, namely, the preferences of individuals and the tools that allow them to signal their preferences to recommender systems. This is argued to be one of the main mechanisms to create more effective recommender systems.

The final study investigated the effect of user control on recommender system acceptance using the Technology Acceptance Model as a theoretical framework. This study found that easy-to-use recommender systems were perceived by users as more useful and resulted in greater intention to use them. However, different control mechanisms had varying impacts on user experience.

This thesis demonstrates the existence of context effects in multiattribute, multidimensional settings and develops methodologies of enhancing recommender systems design with context effects. Additionally, this study investigates the external context, namely, the tools that allow users to express their preferences and how to make recommender systems better through them.
           
   
\end{abstract}
\clearpage

\section*{Acknowledgements}

Embarking on this scholarly journey has been both humbling and transformative, marked by countless days of meticulous study, intense deliberation, and profound discoveries. Indeed, the journey was punctuated with hurdles and moments of self-doubt. However, every trough was succeeded by a crest, each stumble by an ascent, and every challenge served as an opportunity. The path was steep, but these uphill struggles honed my perspective, expanded my horizons, and ignited my passion for knowledge. The lessons learned, the triumphs celebrated and the sense of accomplishment far outweighed the trials faced. Reflecting on these uplifting moments, I joyfully acknowledge those who accompanied me on this journey and made it not only possible, but also immensely rewarding.

Firstly, I extend my deepest gratitude to Zakaria Babutsidze, my Ph.D. supervisor. He has been a great mentor and a source of inspiration for me. His honesty and straightforwardness challenged me to think critically and creatively, shaping me into someone better day by day. I am indebted to Francesco Castellanetta, our Ph.D. program director, for his guiding hand and unwavering faith in my capabilities. I appreciate our long and intriguing discussions with Ludovic Diabaggio that transcended academia. I am grateful to Bill Rand for his invaluable insight, feedback, and co-authorship. Daniela Iubatti, your informal talks and unwavering support have been a source of inspiration. I appreciate Diego Zunino for his honest and invaluable guidance. Bruno Cirillo deserves special mention; our albeit slightly formal discussions have significantly shaped my work. My heartfelt appreciation goes to Renata Kaminska. Her support and intuitive understanding of my thoughts were essential in my pursuits. I am grateful to Benjamin Montmartin; our separate exchanges about wine (we were in France at the end) and statistics enriched my perspectives immeasurably. To Valerio Incerti, our creative discourses, even about less creative ideas, were a constant source of inspiration. Lapo Mola's valuable feedback motivated me to pursue excellence, and I value that wholeheartedly.

Aytan, my beloved wife and best friend, your unwavering support and guidance have been my Polaris, guiding me through the darkest nights and teaching me to only settle for infinity and beyond. Ecem Delicik, my dear friend, I am grateful for the shared stories, moments, laughter, and silence that have strengthened our friendship. The moment you realized that I was speaking to you in Turkish is unforgettable (I am sure you still remember it, too). Johanna Deperi, our shared moments and long conversations were invaluable for self-reflection, joy, and sometimes ``justified sadness'' throughout this journey. Artyom Yepremyan, our discussions helped us conquer the PhD mountain. I will forever cherish the times when I needed a hand, and you extended yours. Manon Desjardins, your constant support despite the towering stack of papers on your desk will always be appreciated. Mehdi Ibrahim, I am grateful for our coffee talks and the irresistible desserts from Morocco. Teymur Mardaliyev, your friendship and coding prowess have been invaluable; only you could convince me to settle for JavaScript there.

Finally, no words can fully express my gratitude to my parents. Father, your inquisitive nature has been my guide, instilling in me the curiosity that led me on this scholarly path. Your enthusiasm for the unexplored has been infectious, leading me to question, explore, and discover. Mother, you have been my anchor, reminding me of my roots and my identity in the academic whirlwind. Your unwavering encouragement and steadfast belief have reinforced my sense of self, allowing me to face challenges with courage and resilience. You taught me to stay grounded and true to myself, no matter how far I travel in the pursuit of knowledge. Your lessons continue to shape my journey, helping me navigate both personal and academic hurdles with grace and fortitude. 

\textit{Çox sağ olun ki, varsınız!}


\clearpage
\tableofcontents
\newpage
\listoffigures
\clearpage
\listoftables
\newpage


\newpage
\section{Introduction}

\epigraph{We all make choices, but in the end, our choices make us.}{}


The main objective of this thesis is to make a contribution towards understanding the choice context and applying it to improve recommender systems. The thesis makes an effort to identify context effects arising from the choice set and to develop a methodology to implement this information in the design of recommender systems. Furthermore, it analyzes the effectiveness of user control mechanisms, which allow individuals to inform the system about their preferences and also amend them.

Recommender systems have become a crucial ally in the vast landscape of the digital world. They help individuals to navigate through the plethora of alternatives and find what they are looking for. They achieve this by using sophisticated algorithms that analyze the wide space of items, users, and their interactions with each other. System designers and businesses thrive to maximize the accuracy of recommendations, which means they want to maximize the consumption of recommendations. However, recent research streams show that accuracy is not the only way to make recommender systems effective. Other factors such as diversity and serendipity are also important \citep{kaminskas2016diversity}. Moreover, the black-box nature of the algorithms does not allow us to understand the reasons behind a particular user's choice behavior \citep{kotkovSurveySerendipityRecommender2016, samih2021exmrec2vec}. Therefore, more and more interest is directed towards understanding and utilizing the context around the particular choice \citep{adomavicius2005toward}.

The choice context has long received the attention of scholars in many fields, including marketing, psychology, and economics. Because it was studied in many domains, scholars refer to it differently. This thesis concentrates on and uses two of them. The first one posits that the choice context \footnote{For the remainder of the dissertation, I will follow the previous literature \citep{truebloodEtAl13} and refer to this definition of the choice context as context effects.} is ``the availability and nature of the choice alternatives'' \citep{tversky1972elimination, huberEtAl82, simonson89}. Previous research has demonstrated the existence of context effects in various settings \citep{herne1997decoy, soltani2012range, evangelidisEtAl18,  wuConsguner20}. The second definition of context arises much later with the proliferation of recommender systems. Then, context is also referred to as ``the time and content of the choice, the location or sociodemographic characteristics of the decision maker...'' \citep{adomavicius2011context}\footnote{To distinguish between these two definitions I will refer to this definition as external context throughout this dissertation.}

However, modeling for context effects mathematically was a challenge because the existing models suffered from the independence of irrelevant alternative criterion \citep{luce59}, which meant that these models treated each alternative in isolation. This has shifted the scholar's interest towards computational models \citep{usher2001time, roe2001multialternative, trueblood2014multiattribute, noguchi2018multialternative}. Yet, none of these models has been applied to real-world choice data, and as a result, their applicability to field data remained an open question. I address this gap in the first study where I apply a computational model to a field data of high heterogeneity among dimensions. Building on my results in my second study, I have delved deeper into choice modeling and developed a methodology to account for the three main context effects studied in the literature \citep{truebloodEtAl13}. 

When new users or items are introduced to recommender systems, they fail to function as intended because of the lack of information their algorithms need. In such cases, even when they do provide recommendations, the recommendations are far from personalized \citep{lika2014facing}. It is considered a key challenge in recommender system design \citep{park2009pairwise}. The previous literature has applied various methods to overcome this problem, including asking users to rate some items, share their preferences among others \citep{guy2009personalized, aharon2013off, bykau2013coping, saveski2014item}. However, the limitations of these approaches are that they ignore the context effects and concentrate on scenarios where information scarcity is temporary. Using the findings from my second study and combining it with decision-making and choice modeling literature, I propose an innovative solution to address continuous information scarcity issue in recommender system design by utilizing contextual information of the menu to generate relevant choice sets through a two-step choice modeling method.

Research agrees on the importance of metrics outside the boundaries of accuracy for recommender systems \citep{kaminskas2016diversity}. It is in the best interest of online marketplaces to provide users with recommendations that are not only in the category ``exactly what I want'', but also ``I never thought I would have liked this'' \citep{kotkovSurveySerendipityRecommender2016}. This can not only boost sales \citep{songWhenHowDiversify2019}, but also increase user satisfaction \citep{knijnenburgExplainingUserExperience2012}. However, it was observed that individual preferences are not stable and tend to change and system designers have proposed various tools to allow users to signal their preference shifts \citep{bostandjiev2012tasteweights,hijikata2012relation}. However, to better understand, it is necessary to study those tools in isolation from the context arising from the choice set and concentrate purely on external context that arises from the user side, e.g., their preferences \citep{adomavicius2011context}. In my last study, I address this gap by conducting an online experiment and applying Technology Acceptance Model \citep{davis1985technology} to measure users' acceptance of such tools.

All in all, the four studies aim to contribute to our understanding of the context and provide applications of the design of recommender systems using this information.

\newpage

\section{Exploring Context Effects in Multi-Attribute, Multi-Alternative Choice Environments}\label{chapter:simulationStudy}
\begin{abstract}
    
    Previous computational decision making models that were developed to account for context effects have only been studied with an experimental data where only one effect was produced at a time. Using data coming from strictly controlled experimental environments hinders the understanding of context effects that occur in real-world choice scenarios where items have multiple dimensions and choice sets have dozens of alternatives. In this chapter, I apply a computational model that accounts for context effects to an observational data which was not done before. The data comes from an air travel industry and is ideal to study context effects in multiattribute, multialternative choice environments. I first find optimal parameters for the computational model using the differential evolution algorithm. Then, I complement a traditional choice model with its outputs and assess the significance of its contribution. This chapter contributes to context effect and decision-making literature by providing further insights on behavior of computational decision-making models in real-world choice data.
    
\end{abstract}

\newpage

\subsection{Introduction}

Context effects have been extensively studied and demonstrated in various domains, from psychology to marketing \citep{herne1997decoy, soltani2012range, truebloodEtAl13, frederickEtAl14, evangelidisEtAl18, wuConsguner20}. Some recent studies also concluded that multiple context effects may occur at the same time \citep{berkowitsch2014rigorously, noguchi2014attraction}. However, recent studies have also discovered boundary conditions for these effects \citep{liew2016appropriacy, spektor2018good, spektor2019similarity}. Familiarity with the choice domain was found to reduce the context effects experienced by individuals \citep{kim2005attraction, sheng2005understanding}.  It was also found that some conditions may force these effects to completely reverse \citep{cataldo2019comparison}. The findings described above make it necessary to call for a model which could explain these effects.

Logit and Probit models have been traditionally used in choice settings \citep{gensch1979multinomial, kim2017probit}. However, those models cannot account for context effects because they only account for the attributes of the focal option, not taking into account the attributes of the other options in the choice set. Tversky has proposed a model of elimination by aspects that could explain the similarity effect \citeyearonly{tversky1972elimination}. The foundation of the model is attention switching of individuals between alternatives and attributes and their comparisons. Once attention is received, the attribute value of a given alternative is compared to a predetermined threshold value by the individual, and if the threshold does not meet, that alternative is eliminated from the decision. The step is then continued with another attribute until the final decision is made. Another model proposed by Tversky and Simonson could explain the effect of compromise \citeyearonly{tverskySimonson93}. The theoretical foundations of this model posited that alternatives are compared based on a weighted sum of attribute values and a local context comprising binary comparisons among alternatives. However, these two models were unable to successfully account for all three effects. Despite their drawbacks, the sequential decision making and attention-based mechanisms in these models laid the foundations of many upcoming computational choice models \citep{bhatia2013associations}. 

In the last three decades, researchers have developed many computational models which account for context effects. Multialternative Decision Field Theory (MDFT) \citep{roe2001multialternative}, Leaky competing
accumulator \citep{usher2001time}, Multiattribute linear ballistic accumulator \citep{trueblood2014multiattribute}, Multialternative Decision by Sampling  \citep{noguchi2018multialternative} are part of them. Some of these models have been extensively tested and studied, while others are relatively new and therefore have not received much attention from scholars \citep{truebloodEtAl13}. However, these studies have been performed with experimental data \citep{trueblood2014multiattribute,  berkowitsch2014rigorously, evans2019response, busemeyer2019cognitive}. Research has proven that the behavior of individuals in the laboratory choice environment is different from real-world choice environment \citep{hogarth1989risk}. Hence, the applicability of such models to field data is an uninvestigated avenue because no previous study has been done where a computational model was applied to real-world observational data. The plan is to address this gap by applying the Multialternative Decision by Sampling (MDbS) model proposed by Noguchi and Stewart \citeyearonly{noguchi2018multialternative} to a field data from the airfare booking domain. This would allow me to assess the applicability level of this model complex field-data. 

Also, applying MDbS to observational data would allow one to statistically assess the significance of the contribution of this model's ability to account for the context effect. Instead of testing this model against established choice models, I will attempt to complement them with MDbS. To do this, the random effect Probit model will be used as a variation of the Probit family and will be augmented with MDbS output. The Probit model is chosen as it does not explicitly assume IIA unlike the family of Logit models. To validate my results further, I will apply the same methodology and analysis to an experimental data.

The reasons for choosing MDbS are twofold. First, it is relatively new when compared to other models, hence it has not been further investigated before. Second, it is more robust and can account for a wider range of context effects (than other models) known to the literature \citep{noguchi2018multialternative}.


\subsection{Big three context effects}\label{chapter:bigThreeContextEffectsDescription}

We make choices all the time. Imagine the time you went to see a movie you had been waiting for a while and decided to grab some popcorn before entering. You may have seen something like that in the figure \ref{fig:decoyPopcornExample}, although prices may be higher these days. You are puzzled at first, but reminded that the movie is about to start, so you better hurry up and choose one. 

The small one feels not enough for a 90 minute marathon. Then, there is a middle one which seems like an okay option at first. When you notice that big box, you immediately forget about the small one you saw moments ago. You start looking at sizes and prices of middle and large boxes and think: Well, that is easy. Big box seems like the way to go here, considering their prices are almost equal. 


\begin{figure}[H]
    \centering
    \includegraphics[width=0.5\textwidth]{staticFiles/popcornDecoy.png}
    \caption{Classic illustration of context effects.}
    \label{fig:decoyPopcornExample}
\end{figure}

If this situation is familiar to you, then you have experienced the so-called context effect which can be understood as ``the composition and the nature of the choice set, availability of various options in it'' \citep{tversky1972elimination, huberPuto83}. For a long time, our understanding of the choice did not extend over the borders of two related principles. The first one is independence of irrelevant alternatives (IIA), which states that when having a choice between two two options $A$ and $B$ if a person prefers $A$ for example, regardless of adding a third option $C$ to this choice set, that person's preference must be unaltered \citep{luce59}. The second is the regularity principle, which states that the probability of choice of option $A$ cannot increase by the introduction of option $C$ \citep{luce59}.

However, research has concluded that the context of the choice set and the options in it have a substantial effect on how people make decisions. This effect has been extensively studied over the past five decades by many economists, marketing scholars, and psychologists \footnote{See Dowling et al. \citeyearonly{dowlingEtAl20} and Lichtenstein \citeyearonly{lichtenstein2006construction} for a more comprehensive review.} \citep{ kahnemanTversky79, simonson89, tverskySimonson93, lichtenstein2006construction, dowlingEtAl20}. Most of the research has focused on three context effects, also referred to as ``big three'': attraction, compromise and similarity \citep{howes2016contextual}.

To better understand these three context effects, let us think of a hypothetical choice set where options differ along two dimensions: Dimensions 1 and 2. We first start with a set of options consisting of two options: $A$ that has the values 20 and 80; $B$ that has the values 80 and 20 that figure \ref{fig:binaryChoiseSet} represents. 

\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/noEffect.png}
    \caption{Binary choice set with two options.} % Add your description here
    \label{fig:binaryChoiseSet} % This labels your figure for reference

\end{figure}

People who prefer Dimension 1 would choose $B$ whereas $A$ will be chosen by people who prefer Dimension 2. This can be described as equation \ref{eq:onlyTwoOptions} below.

\begin{equation}\label{eq:onlyTwoOptions}
    P(A|A,B) = P(B|A,B)
\end{equation}

Where $P(A|A,B)$ corresponds to the probability of choosing $A$ given the choice set $A,B$. Same goes for $P(B|A,B)$.

\textit{Attraction effect}

Now, let us add a third option to this choice set, the option $D_A$ to create one variation, and $D_B$ to create the second variation of the ternary choice set. Both added options have lower values in both dimensions compared to $A$ and $B$, respectively. Huber created this type of scenario and has found what he has called the ``attraction effect'' \citeyearonly{huberEtAl82}. The attraction effect, which is also known in the literature as the asymmetric dominance effect, is a consistent violation of the regularity principle mentioned earlier. He suggested that when having a choice set consisting of options $A$ and $B$ the relative probability of choosing option $A$ can be increased if a third option with characteristics of $D_A$ is added to the same set of choices \citeyearonly{huberEtAl82}.  

\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/attractionEffect.png}
    \caption{Attraction effect in ternary choice.} % Add your description here
    \label{fig:attractionEffect} % This labels your figure for reference

\end{figure}

Figure \ref{fig:attractionEffect} shows these options and their respective values in each dimension. It can be seen that option $A$ and $B$ are located at two different ends of the choice space. Option $D_A$ is inferior to option $A$ in both dimensions and $D_B$ is inferior to $B$. With the attraction effect in place, equation \ref{eq:onlyTwoOptions} will change into \ref{eq:attractionProbability}.

\begin{equation}\label{eq:attractionProbability}
    \frac{P(A|A,B,D_A)}{P(B|A,B,D_A)} > \frac{P(A|A,B)}{P(B|A,B)} \And \frac{P(B|A,B,D_B)}{P(A|A,B,D_B)} > \frac{P(B|A,B)}{P(A|A,B)}
\end{equation}

Huber noted that although other explanations are still possible, the addition of $D_A$ to the choice set would shift the preferences of people towards dimension 2 because this is where option $A$ appears advantageous \cite{huberEtAl82, bhatia2013associations}. However, this claim has not received unanimous support in future studies in which preference changes have been observed \citep{wedell1991distinguishing}.

\textit{Compromise effect}

When the third option we add is option $C$ instead of $D$, the preference shift happens differently. $C$ is virtually a middle option between $A$ and $B$ and therefore has a value of 50 in each dimension. In this case, the probabilities of choosing $A$ and $B$ will both decrease in favor of $C$, resulting in \ref{eq:compromiseProbability}:

\begin{align}\label{eq:compromiseProbability}
    P(A|A,B,C) < P(A|A,B) \And P(B|A,B,C) < P(B|A,B)
\end{align}

Simonson was the first to describe such an effect \citeyearonly{simonson89}. He associated this with a difficulty to select: When people are not sure which attribute is important, they will find a justification to favor a compromise \citep{simonson89}. This argument can explain the reason why an individual may drift towards a middle choice in a three-option choice set. Such a compromise emerges as an important factor, acting as a tie-breaker when the decision maker is unsure between the initial two options. 

It is worth noting that it is also possible to ``target'' a particular option from the binary choice set when adding a third option to create a compromise effect. One can add a target $C$ which makes $A$ a compromise option. In this case, the probability of choosing $A$ among this triple will increase, as it is considered a compromise between the remaining two options. 

\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/compromiseEffect.png}
    \caption{Compromise effect in ternary choice.} % Add your description here
    \label{fig:compromiseEffect} % This labels your figure for reference

\end{figure}

\textit{Similarity effect}

Although Becker et al. \citeyearonly{becker1964measuring} have mentioned it before, the first study of the similarity effect is known to be that of Tversky \citep{tversky1972elimination}. He noted that when faced with a binary choice set consisting of $A$ and $B$ individuals will gravitate towards $A$ more than when faced with a ternary choice set consisting of ${A, B, S_A}$ depicted in figure \ref{fig:similarityEffect}. He explained it by proposing the elimination by aspects theory, which states that one attribute will be chosen as elimination criteria, and all options that do not meet that criteria will be eliminated \citep{tversky1972elimination}. Therefore, in the set of choices ${A, B, S_A}$ if an individual selects dimension 2 as the elimination criteria, both $A$ and $S_A$ will be eliminated, leaving $B$ as a choice. In contrast, if the decision maker prefers dimension 1 more, then option
$B$ will be eliminated, leaving both $A$ and $S_A$ to share the ``victory'', hence resulting in equation \ref{eq:similarityProbability}.

\begin{equation}\label{eq:similarityProbability}
    P(A|A,B,S_A) < P(A|A,B) \And  P(B|A,B,S_B) < P(B|A,B)
\end{equation}

The similarity effect in the choice set ${A, B, S_A}$ will follow a similar route. Figure \ref{fig:similarityEffect} depicts both sets of choices, where an option similar to $A$ and $B$ was introduced separately.


\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/SimilarityEffect.png}
    \caption{Similarity effect in ternary choice in two scenarios.} % Add your description here
    \label{fig:similarityEffect} % This labels your figure for reference

\end{figure}


\subsection{Computational decision making models}

MDbS belongs to the attention-based choice models. These models are type of decision-making models which take into account the attention allocation mechanisms when making decisions. They generally assume that people allocate attention to different attributes and those who receive more attention have more impact on decision making \citep{gabaix2000boundedly}. Before commencing with its underlying mechanisms and assumptions, it is beneficial to discuss the other two models that MDbS influenced by and have been studied extensively. After briefly discussing those models, I will continue with MDbS, its main assumptions, mechanisms, and how it accounts for context effects.

\textit{Multialternative Decision Field Theory}

The very first computational model which could account for all three context effects was Multialternative Decision Field Theory (MDFT) developed by Roe et al. \citeyearonly{roe2001multialternative} as an extension of Decision Field Theory \citep{busemeyer1993decision}. It is a dynamic model of decision making that accommodates multialternative preferential choice situations, which was not possible with Decision Field Theory \citep{hotaling2019quantitative}. MDFT assumes that decision making can be explained in three general mechanisms. First, attention allocation posits that attention switches over time between attributes stochastically. Second, the evaluation mechanism posits that the attribute value of the given option is compared with the average attribute values of other options, which makes sure that each option in the choice set participates in comparison. Third, the evidence accumulation mechanism, which is based on the results of the evaluations, gathers evidence in favor of the alternatives compared. As soon as the evidence gathered hits the externally set relative threshold, the choice is concluded \citep{busemeyer2002survey}. This means that as soon as the difference between the highest and the second highest evidence values is larger than the relative threshold, a choice is made. If this threshold is not met, the decision continues until the pre-set time limit is reached.

MDFT has been confirmed to account for similarity, compromise, and attraction effects in multialternative choice scenarios \citep{roe2001multialternative}. It has previously been tested against such random utility models of choice as Logit and Probit and has been concluded to be a better fit to empirical data \citep{berkowitsch2014rigorously}. 
MDFT has also been adapted to account for preference changes \citep{mohr2017attraction} and decision making under time restrictions \citep{diederich2003mdft}.

\textit{Multiattribute linear ballistic accumulator}

The multiattribute linear ballistic accumulator (MLBA) is another attention-based decision-making model first proposed by Trueblood et al. \citeyearonly{trueblood2014multiattribute}. Similar to MDFT, it is also a dynamic model, which can be explained in three general mechanisms: attention allocation, evaluation of alternatives, and evidence accumulation. However, the two models have key differences. Firstly, MDFT emulates the search process of elimination by aspects proposed by Tversky \citeyearonly{tversky1972elimination} assuming that decision makers compare alternatives with one another over time. In contrast, MLBA assumes that individuals make comparisons and accumulate evidence of all alternatives independently from one another at the same time and then accumulate evidence \citep{trueblood15fragile}. Furthermore, MDFT assumes that individuals have limited cognitive capacity to process information when comparing items together, in contrast to MLBA, which considers individuals with unlimited cognitive capacity. Moreover, unlike MDFT, where decision is made based on relative threshold, in MLBA decision is based on absolute threshold, i.e. as soon as one alternative's evidence reaches the threshold, a decision is made in favor of that alternative. Another difference between these two models is the context effects for which they account. Although MDFT does account for attraction, compromise, and similarity effects \citep{hotaling2019quantitative}, MLBA additionally accounts for preference reversals arising from context \citep{trueblood15fragile}. 


\subsection{Multialternative decision by sampling}

MDbS has its origins in the theory of decision-by-sampling, which assumes that individual preferences arise from binary, ordinal comparisons of alternatives on given attribute values with reference values from the memory \citep{stewart2006decision}. Unlike it, MDbS assumes that the information required for comparison also comes from the choice environment itself \citep{noguchi2018multialternative}. As in the other two previous models discussed above, its mechanisms can be explained using three stages. The next section will discuss this in detail.

\subsubsection{Mechanisms behind MDbS} \label{subsec:mechanismMDBS}

\textsc{Attention allocation}

According to MDbS, when comparing two tickets between Paris and New York, for example, the price of a ticket would be compared to the prices of other tickets in the choice set and also to the ones and which an individual has previously seen but are not in the current choice set. Comparisons are ordinal, meaning that evidence accumulated toward the ``winner'' at a rate of one irregardless of how large the difference was. 

Previously, it was concluded that people tend to compare alternatives that are similar to each other more than dissimilar ones \citep{noguchi2014attraction}. Similarity-based attention is one of the main assumptions of MDbS. To better understand this, let $m_{ij}$ and $m_{kj}$ be two attribute values with $i \neq k \in \{1, \ldots, n_a\}$ and $j \in \{1, \ldots, n_d\}$. MDbS defines the similarity of $m_{ij}$ to $m_{kj}$ as

\begin{align}\label{similarityMDBS}
s_{ij,kj} = \exp \left( - \alpha \left| \frac{m_{ij} - m_{kj}}{m_{kj}} \right| \right) ,
\end{align}
with similarity parameter $\alpha$. Also, generally $s_{ij,kj} \neq s_{kj,ij}.$ Consider also 

\begin{align}\label{sumOfSimilaritiesMDbS}
    s_{ij} = \sum_{\substack{k \neq l \in \{1, \ldots, n_a\}}} s_{ij,kj}  ,
\end{align}
which is the sum of all similarities for attribute $m_{ij}$ to other attributes on the same dimension. Consequently, by dividing this value by the sum of similarities in all other attributes across all dimensions, one can calculate the probability that $m_{ij}$ will be selected for comparison, which will be 

\begin{align}\label{probabilityOfComparison}
    p_{ij} = \frac{s_{ij}}{\sum_{l \in {1, \ldots, n_a}} \sum_{m \in {1, \ldots, n_d}} s_{lm}} .
\end{align}

\textsc{Evaluation of alternatives}

When evaluating alternatives with each other based on pairwise comparisons, MDbS defines the probability of winning a comparison as 

\begin{align}\label{probabilityOneIsFavored}
    P(m_{ij} \text{ is favored over } m_{kj}) = 
        \begin{cases}
        F(\beta_0 (| \frac{m_{ij} - m_{kj}}{m_{kj}} |- \beta_1)) & \text{if } A_i > X_i \\
        0 & \text{otherwise}
        \end{cases} ,
\end{align}
where $F$ is a logistic sigmoid function and $\beta_0$ and $\beta_1$ correspond to the advantage value and the probability that this particular advantage value will be enough to be preferred. For example, consider the case where $\beta_0 = 0.1$ and $\beta_1 = 50$. This would mean that the advantage of 10\% would be preferred with the probability of 50\%. Consequently, if the difference is 20\%, then it will be preferred with the probability of 99\%. The logistic function brings the notion of ``soft'' comparison instead of ``hard'' comparison, in which case small differences would be ignored, while large differences would be extremely preferred \citep{noguchi2018multialternative}.

\textsc{Evidence accumulation}

As mentioned above, in MDbS the accumulation of evidence occurs at a rate of one. For each alternative and for each comparison, in case of winning that comparison, one evidence point is counted towards that alternative. Hence, the probability that evidence will increase by one point will be defined as 

\begin{align}\label{probabilityOfEvidenceIncreasing}
    p_i = \sum_{j \in {1, \ldots, n_d}} p_{ij} \cdot P(m_{ij} \text{ wins a comparison}).
\end{align}

In order to make a choice, MDbS sets a relative stopping rule $\theta$ following the study of Teodorescu \citeyearonly{teodorescu2013disentangling} which states that when deciding between more than two alternatives, the decision is made when the difference between the highest and the second best evidence is larger than the threshold, or the difference between the maximum and mean evidence becomes larger than the threshold. For computational feasibility, MDbS assumes $\theta = 0.1$, which means that a decision is made when the difference between the maximum and mean average evidence reaches $0.1$. Other parameters given externally are $\alpha, \beta_0, \beta_1$. As a last step, evidence for each alternative is divided by the sum of evidence for the entire choice set to convert them to choice probabilities.

After discussing the main mechanisms behind MDbS, the discussion about how MDbS accounts for attraction, compromise, and similarity effects becomes necessary. The next subsection will shed some light on this matter.

\subsubsection{Accounting for big three context effects}

To effectively illustrate the functioning of MDbS, employing an example choice set that encapsulates the context effects discussed can be beneficial. Therefore, the example dataset depicted in figure \ref{fig:MDBsContextExample} will be used. Although there are five alternatives in the figure \ref{fig:MDBsContextExample}, only three of them will be discussed at a time.

\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/contextEffectExampleScatterplot.png}
    \caption[MDbS’ account for big three context effects]{Example choice set to explain MDbS' account for big three context effects. $A$ and $B$ are considered original two alternatives (binary choice set). $D$ is dominated by $A$ on both dimensions. $C$ acts as compromise between the original two and $S$ is a similar option to $B$. Although these effects would be present in different variations of the choice set (for example one can make $A$ as compromise option), for the simplicity I will concentrate on this example.} % Add your description here
    \label{fig:MDBsContextExample} % This labels your figure for reference

\end{figure}

\textit{Accounting for attraction effect}

When adding the option $D$ which is dominated by $A$ in both dimensions to the binary choice set $A$ $B$ (hereafter the binary choice set will be used instead of $A$ and $B$) one creates an attraction effect \citep{huberEtAl82, huberPuto83}. Huber et al. \citeyearonly{huberEtAl82} explained the attraction effect through weight shifts for individuals. The addition of $D$ would cause people to weigh dimension 2 more. Therefore, $A$ and $D$ will have higher ``interest'' among the people, and this is where $A$ wins over $D$.

However, MDbS takes a different approach. Adding $D$ will increase the probability of comparison of option $A$, due to its similarity to $D$ (recall equation \ref{similarityMDBS}). As a consequence, the probability of $A$ winning a comparison will be greater than the probability of $B$ winning a comparison, because $A$ dominates $D$ in both dimensions, while $B$ is only better in one and $A$ and $D$ will be selected more often to be compared with each other. As a result, $A$ will have the highest expected evidence accumulated, $B$ will be the runner-up, and $D$ will have the lowest evidence accumulated towards it. The value of $\alpha$ will determine how much attribute similarity is translated into being selected for comparison.

\textit{Accounting for compromise effect}

In the scenario where option $C$ is added to the binary choice set, the compromise effect arises \citep{simonson89}. This makes $C$ more likely to be chosen because people become uncertain about the importance of attributes and, therefore, experience a choice difficulty. This results in the choice of $C$ as it is easier to justify \citep{simonson89}. 

MDbS approach differs here as well. Recall that the probability of accumulating evidence towards an option is a product of its probability to be selected for comparison and its probability to win that comparison, as described in equation \ref{probabilityOfEvidenceIncreasing}. $C$ is more similar to $A$ and $B$ than $A$ is similar to $B$ (and vice versa). This will increase the probability that $C$ will be chosen as a comparison. Although $C$ will not win every comparison, the fact that it will be chosen more as a comparison will increase its probability of accumulating evidence.

\textit{Accounting for similarity effect}

When Tversky \citeyearonly{tversky1972elimination} explained the similarity effect in a choice set consisting of $A$, $B$ and $S$, he explained it via his famous elimination by aspects theory. When $B$ and $S$ are similar to each other, they will be eliminated together or stay together. Hence, having $S$ in the set of options will ``steal'' the probability of choice from $B$.

In MDbS this is explained by $\beta_0$ and $\beta_1$ from equation \ref{probabilityOneIsFavored}. Recall that the reason of having the sigmoid function with arguments $\beta_0$ and $\beta_1$ is to make sure that small differences would be relatively ignored. This is in line with previous literature that states that people tend to ignore small differences between alternatives when making a decision \citep{kalwani1992consumer}. Hence, the small differences between $B$ and $S$ would be ignored, which would translate into a decreased probability that any of them would win over the other when compared. This will indirectly increase the accumulation of evidence for $A$, resulting in ``shared'' evidence between $B$ and $S$.

Not only does MDbS account for the three main context effects, it also successfully accounts for other known effects to decision-making literature, such as the attribute spacing effect \citep{cooke1998multiattribute}; centrality effect \citep{brown2011decision}; background contrast effect \citep{tverskySimonson93}; endowment effect \citep{knetsch1989endowment} and others\footnote{For the full list of the effects MDbS can account for please refer to Noguchi \citeyearonly{noguchi2018multialternative}.}. Overall, the authors claim that MDbS can theoretically account for up to 25 variations in the context effect. 

Considering its ability to account for wider range of context effects, MDbS offers a novel and more insightful path for studying multidimensional and multiattribute choice data. It also offers a fine trade-off between the complexity of the model, namely its dynamic attributes, which makes it more practical analytically. 

After delving into the theoretical mechanisms of MDbS and its handling of the three main context effects, the focus now shifts to the practical component of the research: the methodology. The following section provides a deeper exploration into the specifics of the research process. It offers a brief overview of the dataset, followed by an in-depth explanation of the steps involved in the analysis.

\subsection{Empirical application}

\textbf{Observational Data}\label{section:observationalDataDescription}

The observational dataset is created by the merger of two sources. The first dataset constitutes a list of all air travel reservations made in Europe on European routes between December 2013 and June 2014, extracted from the MIDT database (Marketing Information Data Tapes). Besides all of the booking details (e.g., number of passengers, price), it also contains the timestamp of booking and the identity of the booking office (all offline and online outlets have unique identifiers). The second source of data contains information on all air travel searches performed on one of the most comprehensive air travel booking services operated by Amadeus S.A.S. This dataset also contains trip specifics, as well as the identifier of the office where the search was performed. Most importantly, the latter dataset contains information on all possible alternatives that could have been presented to the traveler at the time of search, but does not contain information on which of the options (if any) has the traveler chosen. The matching of these two datasets across office identifier, search / booking time, origin and destination of the trip, travel dates, and number of passengers results in a merged dataset that allows us to identify the itineraries chosen within the options offered at the search \footnote{Office ID, trip origin and destination, trip dates and number of passengers are matched exactly. The time interval between the reservation and the preceding search is minimized. If, given exactly matched attributes, the booking was not performed within 24 hours after a given search, the search is declared unmatched. If, given exactly matched attributes, no search is found during the 24 hours preceding a given booking, the booking is declared unmatched. Unmatched searches and bookings are removed from the analysis}.  An important limitation of the data is that there is no way of ensuring that the consumer has actually seen the exhaustive list of alternatives available to him/her at the time of booking. However, dataset consists of all the options that they could have seen. Although this is a drawback for a researcher, this is a standard experience for the practitioner (e.g., recommender system designer). Practitioners designing recommender systems need to create algorithms based on the set of existing alternatives without much visibility on the subset of options a particular user will be interested in, or will eventually see.

The matched dataset (previously used by Mottini and Acuna-Agost \citeyearonly{mottiniAcunaAgost17}) consists of 13000 choice sessions with around 1 million choice alternatives in total. Every alternative is a round-trip flight and has a number of attributes including ticket price, date and times of all inbound and outbound flights, number of flights in the itinerary, number of airlines, days before booking, and a few more, less important attributes.

Menus (i.e., choice sets) with one single available alternative do not allow the consumer to make choices and are therefore discarded. Data on choices contain at most 100 alternatives for each choice session, even if more choices potentially existed. As a result, our data are truncated from the right. This creates a large number of menus, including exactly 100 alternatives, some of which may be incomplete. To deal with this oddity, we simply confine our research to menus having between 2 and 99 alternatives (excluding those with one option, since there is no choice involved).  In the end, we are left with a dataset with  6,297 choice sessions with 368,723 alternatives in total.


\begin{table}
    \centering
    
    \begin{tabular}{l|cccccc}
    \hline
    Variable & Count & Mean & St.Dev. & Min & Max \\
    \hline
    Price (in EUR) & 368,723 & 647.12 & 1,105.120 & 59.55 & 16,997 \\
    Trip duration (in minutes) & 368,723 & 518.98 & 555.04 & 70 & 2,715 \\
    Number of flights & 368,723 & 2.94 & 0.95 & 2 & 6 \\
    Number of airlines & 368,723 & 1.25 & 0.45 & 1 & 5 \\
    Menu size & 368,723 & 58.077 & 30.267 & 2 & 99 \\
    Days before departure & 368,723 & 32.36 & 38.03 & 0 & 340\\
    Domestic travel & 368,723 & 0.49 & 0.49 & 0 & 1\\
    Intercontinental travel & 368,723 & 0.06 & 0.23 & 0 & 1\\
    \hline
    \end{tabular}
    \caption{Descriptive statistics of variables in observational data.}
    \label{tab:descriptiveStats}
\end{table}

These are the attributes that are designated as vertical in the choice process. For the purposes of this dissertation, it is assumed that consumers prefer lower values for each of them (e.g., all consumers prefer lower prices, shorter trips, fewer layovers, and not having to change airlines too frequently). Apart from vertical attributes, there are also three attributes that do not vary across alternatives within each menu. These are the number of days between when the choice was made and the start of the trip, whether the trip is
domestic or international and whether it is intercontinental. In addition to vertical attributes, the data also contains two sets of horizontal attributes, the departure times and the dates of outbound and inbound flights. These attributes are treated as horizontal, as there is no clear way of defining consumer preferences over them. To eliminate potential scale effects, z-score normalization on vertical attributes was performed as $Z = \frac{{x - \overline{x}}}{{\sigma}}
$ where $\overline{x}$ is the mean and $\sigma$ is the standard deviation of the variable $x$. 

Due to the MDbS nature of comparing dimensions with one another, for the purposes of this study, I cannot use variables that do not differ between menus. Additionally, I cannot utilize horizontal attributes in my analysis because they do not follow the standard ``the greater the better mathematical approach''. However, these variables are used later in this dissertation in chapters \ref{chapter:jmrPaper} and \ref{chapter:hicssPaper}. As a result, I am bound to utilize only four vertical attributes in my analysis. Table \ref{tab:descriptiveStats} provides descriptive information about the variables. I have also multiplied the four vertical variables by \-1 to convert them to a negative scale due to the MDbS nature of comparing absolute values and the assumption that consumers prefer lower values along vertical dimensions.

\textbf{Experimental data}

In the course of this study, I also introduce an additional dataset sourced from a controlled experiment conducted by Noguchi \citeyearonly{noguchi2018multialternative}, distinct from the primary observational data. Although not as diverse in terms of alternatives, dimensions, and choice sets, these experimental data carry significant value, not as a principal analytic focus, but rather as a means to corroborate my main findings. I will apply the same analytical techniques employed in the observational analysis and utilize experimental dataset as a robustness check to verify the validity of my results. Henceforth, the role of these data is primarily confirmatory.

This dataset comes from an experiment conducted by Noguchi \citeyearonly{noguchi2018multialternative} in which 503 participants, aged 18 to 75 years, participated in Amazon Mechanical Turk, resulting in a total of 5295 observations. Participants faced eight randomly sampled decision scenarios with descriptions consisting of two and three alternative sets, each ternary choice set containing only one of the attraction, compromise, similarity effects. For ternary choice sets, to create a context effect, one alternative was randomly selected as focus and third alternative was generated following three scenarios: a) for attraction effect scenario, third options' both dimensions were reduced by 25\% of the difference between the remaining two options' dimensions; b) for compromise scenario, the third option was generated in a way that it would make the randomly chosen target a compromise; c) in similarity scenario 2\% of the difference was added to one dimension while 2\% was subtracted from another dimension for the third option. The dimensions of the alternatives are described in the table \ref{tab:noguchiDescriptions}.

\begin{table}
\centering

\begin{tabular}{l|lll}
\hline
Product & Dimension & Alternative A & Alternative B \\
\hline
\multirow{2}{*}{Mouthwash} & Breath & 4.5 hours & 7.2 hours \\
 & Germs killed & 77\% & 56\% \\[2ex]
\multirow{2}{*}{Exercise class} & Fee & \$9.49 & \$6.49 \\
 & Calories & 356 kcal & 259 kcal \\[2ex]
\multirow{2}{*}{Box of chocolate} & Amount & 26 oz & 33 oz \\
 & Variety & 9 & 5 \\[2ex]
\multirow{2}{*}{GPS} & Update & 3.04 Hz & 5.62 Hz \\
 & Accuracy & 4.97 m & 7.83 m \\[2ex]
\multirow{2}{*}{Mobile battery} & Price & \$19.93 & \$13.49 \\
 & Talk time & 14.55 hours & 9.25 hours \\[2ex]
\multirow{2}{*}{Light bulb} & Life & 1309 hours & 1923 hours \\
 & Price & \$1.35 & \$2.50 \\[2ex]
\multirow{2}{*}{Air purifier} & Noise & 64.7 dB & 39.3 dB \\
 & Efficiency & 325 cfm & 203 cfm \\[2ex]
\multirow{2}{*}{Strawberry} & Quantity & 407 g & 452 g \\
 & Price & \$2.58 & \$2.85 \\
\hline
\end{tabular}
\caption[Attribute values used in experiment]{Attribute values used in the experiment. Sourced from Noguchi \citeyearonly{noguchi2018multialternative}.}
\label{tab:noguchiDescriptions}
\end{table}

\textbf{Parameter optimization}

Recall the three dynamic parameters for MDbS, $\alpha, \beta_0, \beta_1$, which were discussed in section \ref{subsec:mechanismMDBS}. They allow MDbS to take into account various context effects. Noguchi \citeyearonly{noguchi2018multialternative} demonstrates MDbS performance using a fixed set of parameters throughout the article using: $\alpha = 3, \beta_0 = 0.1, \beta_1 = 50$. 

Those parameters are essential controls of the behavior of the model and they create the underpinnings of the choice set, impacting the generated choice probabilities. Therefore, it is fundamental to identify the optimal parameters that will fit the observed data. While identifying the optimal parameters could ideally be purely theory-driven, in reality the theoretical guidance will often fall short. This will leave a plethora of potential parameter values. Hence, this requires a systematic search method to explore the parameter space and identify the optimal parameters that would fit the data the best.

\textit{Parameter space definition}

Before continuing further with the method, one must first define the parameter space over which the search process will commence. Recall that there are three parameters to be optimized, which were $\alpha, \beta_0, \beta_1$. Two of them, $\beta_0$ and $\beta_1$, have theoretical boundaries. 

The effect of the attribute range was first investigated by Mellers \citeyearonly{mellers1994trade}, who described the tendency of people to scale the perceived attractiveness of an alternative in a given attribute using the entire range of that attribute. Therefore, $\beta_0$ in MDbS represents the fraction of the difference between attributes compared to the entire attribute range, which is bounded between 0 and 1. On the other hand, $\beta_1$ represents the percentage of preference of that difference for an individual; hence, it is also bounded with values between 0 and 100. I have created 99 $\beta_0$ values evenly spaced between 0 and 1 and 99 inclusive $\beta_1$ values evenly spaced between 0 and 100.

The parameter $\alpha$ on the other hand, does not have a theoretical upper bound. However, since it is used to determine which alternatives to compare with each other, it must be greater than 0 because otherwise no alternative will be selected for comparison. I have randomly generated 4,000 samples where the alpha ranged between 0.1 and 10. My observations have demonstrated that the performance of MDbS significantly deteriorates when $\alpha \ge 5$. Hence, to balance the need for a flexible model with the requirement of stable performance, an upper bound of 5 has been set for $\alpha$. I have created 49 $\alpha$ values between 0.1 and 5.

In total, the full parameter space has been created with combinations of the three parameter values reaching 480,249 triples.

\textit{Optimization method}

Parameter optimization is a task of high importance in many scientific and engineering applications, where the goal is to find the optimal values of a set of parameters that best fit a given model or system. There are various methods available for parameter optimization, ranging from differential equation-based methods to brute force and other optimization algorithms. I have chosen the differential evolution algorithm proposed by Storn \citeyearonly{storn1997differential} for this purpose. It has several advantages over other algorithms. Firstly, it can be easily implemented. Secondly, it is ideal when the parameter space is large \citep{lin2019applying}. Thirdly, it is especially suitable for complex and non-linear functions \citep{omran2009bare}. 

The way differential evolution works resembles other genetic algorithms. First, it creates an initial population $P$ with a size of $n$ within a given parameter space $S$ and assesses its fitness using the evaluation metric $F$. Then, it randomly selects three members of $P$ and creates a new member. If it is better than a randomly selected one within this triple, it replaces it. This process continues until termination criterion is met, which is either: a) $F$ has reached its global minimum, b) the number of iterations has reached the threshold, or c) $F$ has not improved considerably within the predefined number of iterations. The pseudocode below describes its workflow:

\begin{algorithm}
\caption{Simplified Differential Evolution.}
\begin{algorithmic}[1]

\State Initialize population of $P$ from the parameter space $S$

\While{not met termination criterion}
    \For{each individual in $P$}
        \State \textbf{Mutation:} Select three distinct individuals from population. Compute the donor by adding the weighted difference of two individuals to the third.
        \State \textbf{Crossover:} Create trial individual by mixing parameters of current individual and the donor, decided by random draw and crossover rate.
        \State \textbf{Selection:} Compare trial and current individuals on using $F$. If the trial performs better, replace the current individual with the trial in the population.
    \EndFor
\EndWhile

\State \Return Best individual from the final population as optimal parameters.

\end{algorithmic}
\end{algorithm}

Differential evolution itself has parameters that must be defined in advance. The population size parameter in this algorithm defines the number of candidate solutions it considers during each iteration. Those candidates are selected following uniform distribution in the parameter space which achieves evenly distributed candidates. There is a trade-off between high population size leading to finer exploration of parameter space and low population size leading to faster conversion, albeit not optimal. I set it to 15 to achieve both good exploration and conversion speed. The second parameter, the crossover probability, controls the extent to which the algorithm combines information from different solutions. The higher value will further diversify the population, encouraging exploration of new regions in the parameter space. On the other hand, lower values will lead to more exploitation of the current space. I have set this to 0.5. For other parameters, I will use the values suggested in the literature \citep{omidi2020differential}.

\textit{Evaluation metric}

After discussing the importance of optimal parameter search and defining the optimization algorithm, the remaining question is the evaluation metric of the MDbS. Previous studies which have applied various dynamic choice models to experimental data have used the mean absolute error of aggregate choice shares for the entire dataset as the main metric. Albeit an interesting approach itself, this will not be a feasible approach for me because the experimental data these models have been applied to entailed ternary choice sets, whereas the observational data are not ternary. It comprises choice sets with minimum of 2 and maximum of 99 alternatives. 

Designers and engineers of choice sets have long used ``Top n'' accuracy metrics when designing choice sets or testing the performance of statistical models \citep{ricci2015recommender}. ``Top $n$'' accuracy metric measures whether or not the true class of the option matches the top $n$ predictions of the model. I will follow and adopt this metric because it is well established in the literature, mirrors the real-world decision making, and it fits the contribution of the thesis the best. I will use ``Top 1'' accuracy, which ranges from 0 to 1 as my optimization metric for an individual choice set. Because choice sets in the data vary significantly in size, I will use the average Top 1 accuracy metric weighted by menu size. This will ensure that smaller menus contribute proportionally to their sizes. Also, to comply with the aim of minimization of the differential evolution algorithm, I will multiply this metric by -1. As an additional measure to explore the parameter space thoroughly, I employ the multiple-run approach for the differential evolution algorithm. Specifically, I will execute it ten times across the entire dataset. This reposition will allow me to further explore the parameter space, mitigating the risk of missing any region that can potentially contain an optimal solution.


\subsection{Results}

This section presents the initial results of parameter optimization, followed by the outcomes derived from choice modeling. First, an examination of the results obtained from observational data takes precedence. Afterwards, a concise discussion of the results derived from experimental data will accompany this analysis.

\textbf{Observational data results}


When looking at optimization results on observational data one can immediately see that the values of $\beta_0$ tend to fluctuate around 0.82 and 0.96 while $\beta_1$ is generally below 10\%.  which indicates that MDbS tends to be more strict in terms of defining the winners when comparing, on average preferring 90\% of the ``advantage'' in a given dimension only a little shy of 15\% of the time. Also, it appears that the $\alpha$ values tend to be preferred in the lower half of the parameter space, so only very similar alternatives were chosen by the model for comparison. This behavior is understandable considering that the average choice set had 55 alternatives. Table \ref{tab:optimizationAmadeusResults} contains the results of the parameter optimization using the differential evolution algorithm. It is worth noting that, its top 1 accuracy performances, albeit higher than random chance, still would fall far behind the pure statistical models, such as MNL based ones. 

At first sight, such model behavior might seem surprising. Recall that the nature of MDbS is to compare alternatives with each other and collect evidence based on the won comparisons. In the choice experiments, the usual size of the menu is three, and only one of the context effects is generated at a given time. However, in observational data, the number of alternatives in the menu is much higher. The presence of a large number of alternatives potentially introduces also other context effects. Also, MDbS is bound to only dimensions which are mathematically comparable with each other. In the observational dataset there also present horizontal attributes for which only the decision maker can decide in a given scenario whether or not given the same price, flight duration, the flight which is at 5:00 in the morning is better than the one that is at 14:00 in the afternoon.

\begin{table}
\centering
\begin{tabular}{ccccc}
\hline
Iteration & $\alpha$ & $\beta_0$ & $\beta_1$ & Average top 1 accuracy \\
\hline
1 & 2.58 & 0.912 & 8.832 & 0.125 \\
2 & 1.622 & 0.948 & 6.001 & 0.124 \\
3 & 1.88 & 0.843 & 6.876 & 0.124 \\
4 & 1.883 & 0.832 & 6.83 & 0.124 \\
5 & 1.856 & 0.91 & 8.297 & 0.124 \\
6 & 2.154 & 0.954 & 8.021 & 0.124 \\
7 & 2.204 & 0.963 & 8.52 & 0.124 \\
8 & 0.234 & 0.859 & 7.03 & 0.123 \\
9 & 0.559 & 0.829 & 55.076 & 0.122 \\
10 & 0.235 & 0.844 & 58.893 & 0.122 \\
\hline
\end{tabular}
\caption{Optimization results for observational data.}
\label{tab:optimizationAmadeusResults}
\end{table}


I have estimated two models by using the random effect probit model with standard errors at cluster levels. The first model only included vertical attributes, whereas the second model extended the first one through the addition of the output from the MDbS model. In both cases, it seems that individuals have strong preferences for faster alternatives with lower prices and fewer layovers. This supports our initial assumption that individuals prefer lower values of vertical attributes. 

\begin{table}
    \centering

    \begin{tabular}{lcc}
    \hline
     & Model 1 & Model 2 \\
    \hline
    Price & -0.309*** & -0.282*** \\
     & (0.006) & (0.006) \\[1ex]
    Trip duration & -0.185*** & -0.158*** \\
     & (0.007) & (0.006) \\[1ex]
    Number of flights & -0.195*** & -0.178*** \\
     & (0.007) & (0.007) \\[1ex]
    Number of airlines & -0.262*** & -0.245*** \\
     & (0.008) & (0.008) \\[1ex]
    MDbS output & & 2.085*** \\
     & & (0.097) \\[1ex]
    Constant included & Yes & Yes \\[1ex]
    Menu size as control & Yes & Yes \\[1ex]
    Number of observations & 368,723 & 368,723 \\[1ex]
    Akaike information criteria & 48,532.341 & 47,968.61 \\[1ex]
    Log-likelihood & -24,260.171 & -23,977.305 \\[1ex]
    \hline
    \end{tabular}
    \caption[Outputs of Probit model for observational data]{Outputs of Probit model with random effects for observational data. Standard errors in parentheses. Statistical significance levels: *** $p<0.01$, ** $p<0.05$, * $p<0.1.$.}
    \label{tab:amadeusProbitResults}
\end{table}

Recall that variable ``MDbS output'' refers to the probabilities produced by MDbS. Model 2 results show a positive and statistically significant effect for the information provided. It shows that MDbS is able to capture additional information about the choice by accounting for context effects. To better understand the significance of this result, figure \ref{fig:marginsAmadeusGraph} shows the average marginal effects of the information provided by the computational model. One can immediately observe the downward trend. It is not surprising. As the number of alternatives increases, each additional alternative adds less to the likelihood of choice than the previous. In the context of menus, it could imply that when there are fewer options (smaller menus), the likelihood that any particular choice is selected is more significantly influenced by MDbS output. On average, for every 0.5 increase in MDbS output, the probability of choice has increased by 0.037 percentage points \footnote{Marginal effect of MDbS output across the whole dataset was 0.074.}. This effect was as high as 0.12 percentage points for menus containing as few as 5 alternatives. 

\begin{figure}[h]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/marginsAmadeusGraph.png}
    \caption[Marginal effects of MDbS output]{Average marginal effects of MDbS output with respect to different menu sizes. Horizontal lines represent 95\% confidence interval boundaries.} % Add your description here
    \label{fig:marginsAmadeusGraph} % This labels your figure for reference

\end{figure}

\textbf{Experimental data results}

At first sight, the results of optimization indicate that the optimal parameters differ between datasets. Considering the nature of these two datasets, such a result is expected. Although the optimal $\alpha$ tends to be higher than the one in the observational data, both the optimal $\beta_0$ and $\beta_1$ values are lower than their counterparts. A higher $\alpha$ can indicate that for smaller choice sets, MDbS tends to be less strict in comparison criteria. While the performance metrics seem higher than for the observational data, the menus are considerably smaller. Table \ref{tab:optimizationNoghuchiResults} gives further information.

\begin{table}
    \centering
    
    \begin{tabular}{ccccc}
    \hline
    Iteration & $\alpha$ & $\beta_0$ & $\beta_1$ & Average top 1 accuracy\\
    \hline
    1 & 2.888 & 0.572 & 4.567 & 0.52 \\
    2 & 2.918 & 0.577 & 4.598 & 0.52 \\
    3 & 2.939 & 0.569 & 4.655 & 0.52 \\
    4 & 2.747 & 0.579 & 4.349 & 0.519 \\
    5 & 2.936 & 0.743 & 3.911 & 0.518 \\
    6 & 2.925 & 0.715 & 3.967 & 0.518 \\
    7 & 2.997 & 0.706 & 4.098 & 0.518 \\
    8 & 3.342 & 0.61 & 4.93 & 0.517 \\
    9 & 2.01 & 0.494 & 3.788 & 0.517 \\
    10 & 2.01 & 0.494 & 3.788 & 0.517 \\
    \hline
    \end{tabular}
    \caption{Optimization results for experimental data.}
    \label{tab:optimizationNoghuchiResults}
\end{table}


Overall, considering the differing natures of these two datasets, comparing two optimal parameter combinations would not give any useful knowledge. However, this is not the case for the results from the choice modeling. These results follow the ones from observational data and confirm them. As with field data, here, the MDbS output is proven to provide statistically significant information for a choice model with coefficient in the positive direction. A 0.5 increase in MDbS' ``assessment'' about the alternative resulted in a 0.46 percentage points increase in the actual choice probability among the participants. This effect did not differ between the choice sets having two or three alternatives.

\begin{table}
    \centering
    \renewcommand{\arraystretch}{1.5}
    \begin{tabular}{lcc}
    \hline
     & Model 1 & Model 2 \\
    \hline
    X & 0.000*** & 0.000*** \\
     & ($0.000$) & ($0.000$) \\[1ex]
    Y & 0.000* & 0.000 \\
     & ($0.000$) & ($0.000$) \\[1ex]
    MDbS output & & 2.518*** \\
     & & (0.317) \\[1ex]
    Constant included & Yes & Yes \\[1ex]
    Menu size as control & Yes & Yes \\[1ex]
    Number of observations & 5,295 & 5,295 \\[1ex]
    Akaike information criteria & 6,987.151 & 6,893.77 \\[1ex]
    Log-likelihood & -3,489.576 & -3,441.885 \\
    \hline
    \end{tabular}
    \caption[Outputs of Probit model for experimental data]{Outputs of Probit model with random effects for experimental data. Standard errors in parentheses. Statistical significance levels: *** $p<0.01$, ** $p<0.05$, * $p<0.1.$.}
    \label{tab:noguchiProbitResults}
\end{table}

\subsection{Conclusion}

In this study I have applied MDbS to observational data and showed the consistency of my findings using experimental data. This is the first account of an application of a computational model to real-world choice data of this magnitude. The results indicated that computational models can account for context effects that affect choice behavior not only in experimental settings but also in field settings.

The results of this study create implications for online marketplaces. In today's world, these platforms aggregate immense amounts of products and services, providing consumers with dozens of choices. To help consumers in their choice, these platforms employ sophisticated algorithms which aim to curate product lists and create recommendations with a side goal of influencing the buying decisions of individuals. By applying mathematical decision-making models to choice datasets, these platforms can gain crucial information about the context within the choice sets, which might influence choice decisions towards particular alternatives. This information may also be used to create product bundles with a heterogeneous context to satisfy the needs of consumers.

This study has limitations. I have utilized only one computational model, namely MDbS. This limits the generalizability of my results. Different models ``behave'' differently, and although they are trying to capture the same effects, applying other decision models and investigating their differences can be an interesting avenue to pursue. Another limitation is that the proposed approach was applied only to data originating from one type of choice setting, namely, air travel. Application of this approach to other types of multi-dimensional, multiattribute choice data may help to better generalize the results.

The use of MDbS in the current study has shown it has the ability to potentially capture a wide range of context effects, including attraction, compromise, and similarity. While these results have yielded valuable insights, the general nature of MDbS and other computational models is their inability to successfully isolate these effects from one another. The main reason for that is that they have only been tested in experiments with one effect present at a time. When the number of alternatives in the dataset increases the potential interplay between options and the existence of other context effects come into play. 

My findings provide a strong foundation that leads to a crucial but also challenging future direction: the development of a methodology which would allow disentangle this ``general'' context effect. I will computationally differentiate among three main components, attraction, similarity, and compromise in a multi-dimensional, multialternative choice setting. This goal provides great motivation for the next chapter of my thesis.

\newpage

\section{Enhancing Choice Modeling in Multi-Attribute, Multi-Alternative Settings\footnote{This chapter is based on a joint work with my supervisor Zakaria Babutsidze, William Rand, Nobuyuki Hanaki, Ismael Rafai, Rodrigo Acuna Agost and Thierry Delahaye.}}\label{chapter:jmrPaper} 


\begin{abstract}

    Previous approaches to modeling the effect of context on choices consider neat, compact environments, often in laboratory settings. Such an approach severely limits the study of context effects and, as a consequence, the applicability of findings. In this paper, the authors generalize the existing approach in modeling choice with context effects and apply it on large-scale observational data. The authors consider three main context effects: the attraction, compromise, and similarity effects. The proposed methodology relies on an ex ante calculation of each context effect measure for every alternative in the choice set. This approach minimizes the computational complications of estimating the resulting choice model. The proposed approach is applied to two empirical settings: the choice of airfare using observational data and the choice of daily commute mode using data from a stated choice experiment. The presence of attraction and similarity effects in both empirical settings is demonstrated. The authors also document the existence of the reverse compromise effect in airfare choice, highlighting the fact that travelers possess rigid rankings among flight attributes and are essentially maximizing their utility in terms of one (or few) attribute(s).
    
\end{abstract}

\subsection{Introduction}

The fact that behavioral biases exist in individual decision making is well established (see Dowling et al. \citeyearonly{dowlingEtAl20} for a recent review of evidence). One type of systematic departure from the classic utility maximization approach that seems particularly important is a set of context effects \citep{truebloodEtAl13, kocherEtAl19}. The theory behind these effects posits that the context in which choices are made influences the decision. While the choice context could have a very wide meaning, in this literature it is the availability and nature of choice alternatives which is referred to as ``context'' \citep{tversky1972elimination, huberEtAl82, simonson89}.

Context effects have been systematically studied in marketing and psychology \citep{kivetz04, roodrkerkEtAl11, frederickEtAl14, dotsonEtAl18}. However, virtually all such studies have used controlled experiments in neat, compact settings. Namely, the settings where decision makers are presented with few options and (very) few attributes across which these options differ. In contrast, most actual choices take place in much messier environments. Especially today, when much of our search and shopping activity has shifted online. Proliferation of search engines allows each option to be easily compared with many alternatives across many different characteristics. In chapter \ref{chapter:simulationStudy} I focused on quantifying context as an aggregate. Although my previous study could successfully identify context in complex, multiattribute setting, the knowledge about the prevalence of context effects in these environments is still scarce. Precise measurements of context effects in multi-option and multiattribute setting is one way to contribute. Defining how to measure these effects is a minimum requirement for proceeding to evaluate the existence of context effects using observational data.

Recent attempts in computer science have been made to define some of these effects. The machine learning community has incorporated context effects in discrete choice models applied to observational data \citep{pfannschmidt2019learning, bowerBalzano20}. However, the objective was to increase the prediction accuracy of the choice models \citep{tomlinsonBenson21}. As a result, incorporation of context effects takes the form of generalizing choice models to allow for departure from the strict rationality assumptions\footnote{Recent examples of this approach are  Contextual Multinomial Logit by Yusefi Maragheh et al. \citeyearonly{yousefi2020choice} and the linear context logit of Tomlinson and Benson \citeyearonly{tomlinsonBenson21}.}. These proposed generalizations of estimated functional forms generally do not distinguish across various different types of context effects. Additionally, these approaches often run into computational difficulties, i.e. the estimation process is NP-hard \citep{yousefi2020choice}.

In this chapter, I propose measures of different context effects in multi-option and multidimensional settings. Following Rooderkerk, Van Heerde, and Bijmolt \citeyearonly{roodrkerkEtAl11}, I consider three context effects - attraction, compromise, and similarity effects. Although I have discussed each of these three effects in more detail in the chapter \ref{chapter:bigThreeContextEffectsDescription}, a very brief recap of this ``trinity'' seems appropriate. The attraction effect refers to the increase in attractiveness of a set of options as a result of adding an alternative to the choice set, the compromise effect refers to the inclination of consumers to prefer options that represent a compromise across extreme sets of alternatives, while the similarity effect refers to the drop in choice likelihood for an alternative once another similar alternative has been added to the choice set. Each of the measures corresponding to the three aforementioned effects requires a specific approach to make the measurements applicable to the observational data. Each of these measures is calculated prior to choice estimation, which avoids computational problems. After presenting the generalized measures of the three effects, we perform an empirical analysis of the choices based on the new measures using observational data. We use an extensive dataset of airfare choices for this exercise. We identify that attraction and similarity effects influence the choices in air-travel booking data. We also detect a reverse compromise effect that seems to indicate that air travelers consistently prefer extreme alternatives (i.e., the cheapest or the shortest flight) to alternatives that constitute a compromise among extreme options.

\subsection{Context effect and choice modeling}

Over the years, multiple empirical models have been developed to model context effects. Empirical approaches usually model context effects in either the structural part of utility or in the error covariance part \citep{kamakuraSrivastava84, dotsonEtAl18}. Some of these models have the capacity to take into account multiple effects at the same time \citep{tverskySimonson93, orhun09}. These models extend a classical random utility model \citep{mcfadden01} in multiple directions using discrete choice modeling \citep{benAkivaLerman85}. However, Rooderkerk, Van Heerde, and Bijmolt \citeyearonly{roodrkerkEtAl11} present a unifying model that takes into account all three context effects. Instead of using advanced statistical techniques to address violations of utility maximization assumptions associated with the existence of context effects \citep{luce59}, their approach focuses on additive specification and ex ante calculation of individual measures for each of the three context effects for each item in the menu. Namely, the authors assume that the choice estimator is additive in three context effects (along with a generic preference-driven part) and develop the methodology of quantifying three effects for each alternative prior to calculating the estimator. This is a particularly flexible approach, which also ensures that the researcher does not run into computational difficulties (i.e., $NP$ hard calculations). I follow the suite and formulate the utility that a consumer c attaches to an option $i$, under a given menu $m$, as being additive in two parts:

$$U_{c,i}^m = u_{c,i} + v_{c,i}^m$$

The first summand in this equation $u_{c,i}$ denotes an inherent utility that the consumer $c$ can derive from the option $i$. This part depends only on the tastes of the consumer $c$ towards the characteristics of the option $i$. It is independent of the other options contained in the menu. The second summand $v_{c,i}^m$, denotes the context-dependent utility. I additionally assume that the context-dependent part of the utility can be represented as a linear combination of three contextual effects, 

$$v_{c,i}^m = a_1 \text{Attraction} + a_2 \text{Compromise} + a_3 \text{Similarity}.
$$

Thus, the measures of three context effects that are necessary to estimate empirical discrete choice models based on the utility formulation above need to be computed ex ante. Measures developed by Rooderkerk, Van Heerde, and Bijmolt \citeyearonly{roodrkerkEtAl11}, are adapted to experimental data with a small number of alternatives in the choice set and a small number of attributes characterizing alternatives. This significantly limits the application of the unifying model of context effects. In the next section, I present a generalization of three context effect measures to multi-option, multi-attribute environment which will further allow for the application of the unifying model to observational data.

\subsection{Generalizing context effect measures}
\textbf{Approach to generalization}

Naturally, generalizing across many alternatives and many attributes presents challenges in both dimensions. The fact that theoretical underpinnings of the three effects are diverse does not simplify the task. In the following sections, I will discuss specificities involved in the generalization of each measure. First, however, I focus on common challenges.

Conceptualizations of contextual effects commonly hinge on the choice frequency comparisons between two alternatives. For example, in case of attraction effect, if adding a third alternative to a two-item menu induces some of the consumers to switch their choices to the other incumbent alternative - one could conclude that attraction effect is present. This is suitable for experimental setups where the researcher has control over menus and can observe choices in both cases (i.e., in case of an original two-item menu, as well as after adding the third item). However, given that the aim is to generalize context effect measures for application to a wider range of situations, and most importantly to observational data, it is necessary to take a more fine-grained view and quantify the context in which each of the alternatives is embedded. Quantifying the choice context for each alternative would create an opportunity to study the effect of the context on choice probabilities through inference across (very) different choice sets. Such an approach would be general enough to consider not only the addition of a new alternative to the menu, but also any alteration of attributes for any of the items in the menu. For example, increasing the price of an alternative could decrease the probability of its choice. This would have a direct effect on the choice probabilities of other alternatives. However, the same price increase could also change the choice context and have additional knock-on effect on choice probabilities of (at least some) alternatives.

\begin{figure}[t]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/contextEffectZaksScatterPlot.png}
    \caption[Accounting for attraction effect]{Visualization of accounting for attraction effect. \\ Note: The figure represents three alternative choice sets, each comprising of six options, described along two characteristics. Options A, B, C, E and F and common across three menus. Menus differ in the identity of the 6th option (D, D' or D").}
    \label{fig:attractionZakVisualization}
\end{figure}

Rooderkerk, Van Heerde, and Bijmolt \citeyearonly{roodrkerkEtAl11} take this approach for simple two-attribute products. Using the attraction effect as an example once more, the idea is to quantify how much attraction power does a given menu provide to a given alternative. If option $A$ dominates option $B$ (i.e., it is superior in at least some attributes and not inferior in any of the attributes), while option $B$ is not dominated by option $C$, the attraction power of $A$ compared to $C$ could be measured by the degree to which option $A$ is better than option $B$. The more pronounced the dominance, the more pronounced the attraction effect. However, once we leave a neat context of tree-item menus, we wonder into a possibility that option A dominates not one, but multiple alternatives at the same time. Consider the situation depicted in figure \ref{fig:attractionZakVisualization}. Here we have the menu with six alternatives ${A, B, C, D, E, F}$ each of which is characterized by two attributes $V1$ and $V2$. In this example, option $A$ dominates three alternatives ${B, C, D}$. To simply extend the approach by Rooderkerk, Van Heerde, and Bijmolt \citeyearonly{roodrkerkEtAl11} and calculate the attraction power of the alternative $A$, we could find the center among the three dominated variables and then measure the distance. Such a measure would capture the difference between two choice sets ${A, B, C, D, E, F}$ and ${A, B, C, D', E, F}$. In the latter case, the attraction power of $A$' is lower because the option $D'$ is closer to $A$ than $D$. However, such a measure would not accurately capture the difference between scenarios ${A, B, C, D', E, F}$ and ${A, B, C, D", E, F}$. In the latter case $A$ dominates only two alternatives ${B, C}$. Therefore, the setting changes qualitatively. Such qualitative differences are avoided in experimental settings by design. However, they are prevalent in observational data. While it is acknowledged that the move from $D$ to $D'$ changes the choice context, I argue that the context change is more pronounced in the case of the move from $D'$ to $D"$. Although the ideal measure would combine the characteristics of the number of dominated alternatives and the (some measure of average) distance between the focal alternative and the group of dominated options, in this article I take the approach of focusing on the former, as this is likely to have a more pronounced impact \footnote{Combining frequency and distance measures in one metric requires arbitrage across the two drivers of context effects. It is not clear how to solve such a problem (that is, it is not clear if dominating one option that is at a certain distance from a focal alternative generates more or less attraction than dominating two alternatives that are at a half that distance).}. 

As a result, our approach would capture the context change between ${A, B, C, D, E, F}$ and ${A, B, C, D", E, F}$ or ${A, B, C, D', E, F}$ and ${A, B, C, D", E, F}$, but it will not evaluate the context difference between ${A, B, C, D, E, F}$ and ${A, B, C, D', E, F}$. In what follows, the same approach is applied to similarity and compromise measures.

Once one moves towards choices which have multiple attributes, it is quick to realize that there are two distinct types of choice characteristics that our measures should potentially handle. One type of attributes constitute product characteristics over which preferences are fairly similar for all customers, and their effects can be readily anticipated from basic economic theory. These attributes can easily be ordered from most preferred to least preferred. The most obvious of these characteristics is price. One can assume that every customer would prefer to obtain a given product for a lower price. We call such product characteristics vertical attributes. These are usually attributes that can be represented using numeric values. Previous work measuring context effects only considers such (vertical) attributes \citep{trueblood2014multiattribute, noguchi2018multialternative, noguchi2014attraction}. This is a requirement for defining preferential relationships that are necessary to identify attraction and compromise effects. The same approach is adopted, where I consider only vertical attributes when defining attraction and compromise effects.

On the other hand, there exists another set of attributes where there is no obvious, homogeneous ordering. For example, consider the color. There is no theoretical ground to assume that all consumers would prefer a car that is blue over a car that is green (all other attributes remain constant). The same is true about attributes that at first sight are not strictly labeled as categorical, for example, time. When buying a cinema or plane ticket, there is no theoretical reason for explaining how a ticket for 15:00 is better or worse than one at 17:00.  I refer to these as horizontal attributes. The potential heterogeneity between decision makers in ordering such attributes makes inclusion of such features in the calculation of attraction and compromise effects impossible. In experimental settings, these attributes are often constant between treatments to avoid confounding effects. However, in the field, this usually cannot be done. Therefore, the study of context effects with observational data requires them to be statistically controlled. 

However, unlike the measurement of attraction and compromise effects, measuring the similarity across the alternatives does not require the existence of a single universal ranking. In fact, many clustering methods can identify options that are more or less similar to each other based on a wide range (numeric and categorical) of variables. Therefore, in what follows, I will incorporate all (vertical, as well as horizontal) attributes in the measurement of similarity between a pair of alternatives.

\textbf{Attraction effect}

Previous studies of the attraction effect concentrate on carefully designed small choice sets in experimental settings \citep{huberEtAl82, huberPuto83}. In such settings, an alternative is added to the choice set in a position that is unequivocally inferior to (only) one of two items already present in the menu. Notice again that identification of inferiority requires the attribute under consideration to be vertical, and this cannot be achieved with horizontal attributes. This manipulation introduces an asymmetry between the two incumbent alternatives; one alternative now dominates the decoy, while the other does not. The attraction effect implies that such manipulation increases the attractiveness of the dominant incumbent option with respect to the other incumbent alternative. 

A standard measure of the attraction effect considers a trade-off between two (vertical) characteristics. Let us consider $i \in \mathbb{N}$ vertical attributes $V_i$ for a set of two options A and B. In two dimensions $n = 2$, we start with $V_1(A) > V_1(B)$ and $V_2(A) < V_2(B)$, and then introduce an alternative $C$ such that $V_1(A) > V_1(C) > V_1(B)$ and $V_2(C) < V_2(A) < V_2(B)$. Under such circumstances, $C$ is dominated by $A$, but not by $B$. This introduces asymmetry in consumer considerations and increases the probability that the consumer will choose option $A$. Generalizing this concept to multiple (vertical) attributes is straightforward. For $N > 2$, we again start with $A$ being preferred over $B$ in some $j > 0$ dimensions, while $B$ is preferred to $A$ in some others $k > 0$, such that $j+k \le N$. Then we need an alternative $C$ that will be strictly worse than $A$ in at least one dimension, while not being better in any other dimensions and being better than $B$ in some dimensions while being worse in some others. As long as these two conditions are satisfied, the attraction effect states that $C$ will result in $A$ being favored. 

Generalizing this approach to multiple alternatives is somewhat more challenging. The reason for this is that, instead of one comparison ($A$ vs. $B$ in the case above), for a set of choices with $M$ alternatives, there are $\frac{M(M-1)}{2}$ potential comparisons to consider. Under real-life circumstances, it is easy to identify situations where more than one of the $\frac{M(M-1)}{2}$ relationships has the potential for an attraction effect. Besides, for any given pair of choices, we could have multiple decoy options generating attraction effect. The final complication is that option $A$ may have one set of decoy alternatives and option $B$ another set of decoy alternatives. In these contexts, it is not clear which option the attraction effect favors.

To quantify the attraction effect generated by the menu for a given alternative, I propose to calculate the number of options present in the menu that the focal alternative dominates. This is done across all vertical dimensions. Then, two alternatives present in the same menu can be compared by examining how many choices they dominate. Under such circumstances we can consider different positions option $C$ can take with respect to options $A$ and $B$. If option $C$ is neither superior (dominant) nor inferior (dominated) by any of the options ${A and B}$, or if it dominates both focal options, then it does not generate an attraction effect for $A$ or $B$. If option $C$ is dominated by both options in the focal pair, it generates an attraction effect for both of them (compared to other alternatives). In all of these cases, the location of option $C$ contributes similarly to the choice probability of both options ${A, B}$.  Finally, if option $C$ is dominated by only one of the two focal alternatives (say by $A$, but not by $B$) - it generates a discriminatory attraction effect favoring option $A$ and increasing its probability of being chosen. As a result, the number of options that the current alternative dominates in a menu (appropriately normalized by the size of menu for a comparison across different choice settings) measures the (relative) extent of the attraction effect generated by the menu. For example, compare the probability of choosing option $A$ versus $E$ in figure \ref{fig:attractionZakVisualization} between two sets of menus ${A, B, C, D, E, F}$ and ${A, B, C, D", E, F}$. This probability is higher in the former situation (where $A$ dominates three alternatives, while $E$ dominates one) than in the latter case (where $A$ only dominates two alternatives, while $E$ still dominates one). Although in these cases both alternatives do have some attraction effect, the relative attraction effect of option $A$ compared to option $E$ is stronger in the former scenario.  Therefore, I measure the attraction effect that favors the focal option $F$ as

$$Attraction(F)=O(Dominated),$$

where $O(Dominated)$ measures the number of alternatives in the menu that the focal option $F$ dominates. Given the measure, we expect that the higher the attraction effect in favor of the focal option, the higher the probability of choice of the focal option (\textit{ceteris paribus}).

\textbf{Compromise effect}

The compromise effect is traditionally understood and operationalized in a three-option, two-attribute (experimental) setting \citep{simonson89, dharEtAl00}. It is worth mentioning again here that these two attributes need to be vertical so that we can define universal preference relationships. Let us consider the similar starting situation of options $A$ and $B$ as in the previous subsection: $V_1(A) > V_1(B)$ and $V_2(A) < V_2(B)$. The addition of option $C$ to this menu such that $V_1(C) > V_1(A) > V_1(B)$ and $V_2(C) < V_2(A) < V_2(B)$, makes option $A$ a compromise between two extreme options $B, C$. The compromise effect
maintains that such an alteration of the menu would disproportionately benefit alternative $A$ compared to alternative $B$.

\begin{figure}[t]
    \centering
    \includegraphics[width=0.7\textwidth]{staticFiles/compromiseEffectZaksScatterPlot.png}
    \caption[Compromise effect generalization]{Visualization of the compromise effect generalization.\\ Note: The figure represents a generalization of the compromise effect across multiple alternatives. $F$ represents a focal option. $Gr_1$ collects alternatives dominated by $F$, $Gr_2$ collects alternatives dominating $F$. Focal option represents a compromise between alternatives in $Gr_3$ and $Gr_4$.}
    \label{fig:compromiseZakVisualization}
\end{figure}

To formulate the general measure of the compromise effect, let's first consider the case of multiple options $M$ in two dimensions (attributes, $N = 2$).The compromise effect  calculation over multiple options is visualized on figure \ref{fig:compromiseZakVisualization} with $M = 7$ case. To quantify the extent of the compromise introduced by the focal option $F$ in the menu, I propose to split all other alternatives $M - 1$ into four groups. Let group 1 contain all alternatives for which $V_1(G_1 ) \le V_1(F)$ and $V_2(G_1 ) \le V_2(F)$. These are the alternatives dominated by the focal option. In the case of figure \ref{fig:compromiseZakVisualization}, this set contains only the option $A$. Let group 2 contain all alternatives for which $V_1(G_2 ) \ge V_1(F)$ and $V_2(G_2 ) \ge V_2(F)$. All these options dominate the focal option. This set contains option $E$ in figure \ref{fig:compromiseZakVisualization}. Clearly, the focal option cannot constitute a compromise between any pair of alternatives which is included in any of these first two groups of alternatives. Next, let group 3 contain all alternatives for which $V_1(G_3 ) > V_1(F)$ and $V_2(G_3 ) < V_2(F)$, and group 4 contain all alternatives for which $V_1(G_4 ) < V_1(F)$ and $V_2(G_4 ) > V_2(F)$. In the case of figure \ref{fig:compromiseZakVisualization}, group 3 contains options $B$, $C$ and $D$, while group 4 contains option $G$. The focal alternative can be viewed as a compromise between groups 3 and 4. For the quantification of the extent of such a compromise, I define

\begin{align}\label{}
    \text{Compromise}(F) = \frac{\min(O(G_3), O(G_4))}{\max(O(G_3), O(G_4))} * (O(G_3) + O(G_4)) ,
\end{align}


where $O(G_i)$ measures the number of alternatives in the group $i$. The first multiplier (the ratio) in the measure quantifies the asymmetry across the sizes (in terms of number of alternatives) of the two groups, while the second multiplier (the sum) quantifies the joint size of two groups across which the focal option is a compromise. For the option $F$ in figure \ref{fig:compromiseZakVisualization}, this value is $Compromise(F) = \frac{1}{3} * 4 = 1.33$. If any of the two concerned groups are empty, the value is zero, corresponding to the fact that the focal alternative is at the extreme edge of one of the dimensions and therefore is not a compromise. As a result, our compromise measure will be strictly zero for options $B$, $C$, $E$ and $G$. On the other hand, the better the balance between the size of the two groups, the more valuable compromise alternative $F$ provides. So, the same measure for option $D$ in figure \ref{fig:compromiseZakVisualization} is 4. Alternative $D$ also corresponds to the compromise between 4 alternatives (like option $F$), but the comparison groups are better (in this case, perfectly) balanced. Notice that the measure also increases the number of total options in two comparison groups. Notice that the same measure for option $A$ is 2, even though (similar to option $D$) it also exhibits the correct balance between the sizes of two comparative groups. This reflects the fact that option $D$ is a compromise between larger sets of extreme alternatives \footnote{An alternative way to quantify the compromise between two sets of extreme options is to count the number of all possible pairs for which a given focal option is a compromise. This would result in $Compromise'(F) = O(G_3) * O(G_4)$. This measure behaves very similarly to the one discussed in the paper. In fact, the correlation between the two compromise measures in the dataset used in this paper is 0.825. All the results reported in the paper are qualitatively unaltered by the replacement of the compromise measure with this alternative. However, I prefer to work with the compromise measure in the paper, as it takes a more ``collective'' view of the choice process.}. 

Extending the compromise measure to multiple dimensions is somewhat more challenging. The challenge relates to the fact that increasing number of dimensions (i.e., vertical dimensions) presents exponentially increasing opportunities of different ways a given option can be a compromise. The $N = 2$ case has one pair of groups to compare. However, in the case of $N = 3$, a focal alternative can be a compromise between multiple pairs of option groups. For example, option $F$ can be a compromise between two groups $Z$ and $Y$ such that all options in group $Z$ are superior to option $F$ in dimensions 1 and 2, but inferior in dimension 3, while options in group $Y$ are inferior to option $F$ in dimensions 1 and 2, but superior in dimension 3. Permutation calculus guarantees that there are three such potential comparisons. However, this is not all. Option $F$ can also be a compromise between two groups $X$ and $W$ such that all options in group $X$ are superior to option $F$ in dimension 1, but inferior in dimension 2, while options in group $W$ are inferior to option $F$ in dimension 1, but superior in dimension 2, as long as dimension 3 is constant in all options in groups $X$ and $W$, as well as $F$. The permutation calculus guarantees three additional such comparisons.

As a result, moving from 2 to 3 dimensions increases the number of potential comparisons to calculate the value of the alternative as a compromise option in the range from 1 to 6. Appendix \ref{appendix:compromiseCalculation} derives the number of comparison alternatives necessary to cover all potential ways in which a focal alternative can be a compromise as a function of the number of dimensions. However, as all the groups defined above are mutually exclusive (i.e., each alternative can only belong to one and only one of such potential comparison groups), generalization of the compromise measure in $N$ dimensions would require summation of the specific comparison group's compromise measure over all comparison groups. And thus,

\begin{align}\label{eq:compromiseEffectGeneralFormula}
    \text{Compromise}(F) = \sum_j \text{Compromise}(F)_j ,
\end{align}


 where $j$ runs over all possible comparison groups. Summation, instead of averaging, is used in order to reward options that constitute a compromise across multiple (many) comparison groups. Given the measure of the compromise effect, I expect that the higher the compromise effect, the higher the probability of choice of the focal alternative.    

 It is worth mentioning here that as both attraction and compromise measures only generalize across vertical dimensions, it is important to control for all relevant horizontal dimensions in choice models employing these measures of the two context effects.

 \textbf{Similarity effect}

 Operationalizing the measure of the similarity effect in three options and two vertical dimensions is straightforward \citep{roodrkerkEtAl11}. Increasing the size of the menu introduces an important challenge of defining the border between options that are similar to the focal alternative and those that are not similar to it. At the same time, unlike the previous two context effects, the theory pertinent to the similarity effect does not require dimensions to necessarily be vertical \citep{tversky1972elimination}. The sufficient condition to quantify the similarity effect requires detecting the number of other alternatives that are similar to the focal option.

 Clustering, using machine learning, gives a possibility to operationalize the measure of the similarity effect across all dimensions. Several clustering algorithms have been developed that can take multidimensional lists and partition them into groups of similar objects. Clustering algorithms are unsupervised machine learning techniques that do not require explicit guidance on the definition of similarity. They use different internally consistent evaluation criteria in order to partition the input group of objects into multiple subgroups. Items belonging to the same group are judged to be similar to each other, while items belonging to two different groups are regarded dissimilar. Some algorithms, like K-means clustering \citep{lloyd82}, require additional input on (or an optimization layer for calculating) how many subgroups the user would like to detect. Others, such as Affinity Propagation \citep{freyDueck07}, automatically calculate the optimal number of detected clusters. Appendix \ref{appendix:clusteringAlgorithms} provides a summary of two popular clustering algorithms that can be used for this purpose. We argue that being able to autodetect the number of clusters is significant in terms of minimizing necessary input, as well as minimizing computational power, and use Affinity Propagation in the empirical application bellow. 

 As a result, I propose using a clustering algorithm (in this case, Affinity Propagation) in order to detect clusters within the menu of proposed options. Once such clusters have been identified, the size of the cluster to which the focal option belongs can be used as a straightforward measure of similarity. Hence, I measure the similarity effect as

 $$\text{Similarity}(F) = O(\text{Cluster}_F) ,
$$

where, $Cluster_F$ refers to the cluster to which the focal option belongs. Given this measure of the similarity effect, we expect that the higher the similarity, the lower the choice probability of the focal alternative.

\subsection{Empirical applications}

In this section, I present two empirical applications using the generalization of three contextual measures and an estimate unifying model of context effects. Both applications come from a travel context in Europe. The first application uses a large set of observational data on airfare booking. This is a very heterogeneous dataset, and choice sets vary in terms of the number of alternatives, as well as between city pairs of origin-destination. The second application uses experimental data of stated choice in urban commutes. These data are less exciting in terms of menu variability, but it allows one to address several potential concerns with the main observational dataset. As a result, this is used as a validation exercise. 

\textbf{Observational data}\label{section:additionalPreprocessingObservationalData}

Observational data I use are the same as that used in the previous chapter. The dataset is described in detail in section \ref{section:observationalDataDescription}. These data have been subject to preprocessing rules, which are also discussed in the section \ref{section:observationalDataDescription}. The descriptive statistics of the context variables are shown in the table \ref{tab:descriptivesContextOnly}. The descriptive information for other covariates can be found in the table \ref{tab:descriptiveStats}.

\begin{table}[ht]
\centering
\begin{tabular}{lrrrrr}
\hline
Variable & Count & Mean & St.Dev & Min & Max \\
\hline
Attraction & 368,723 & 19.78 & 20.33 & 0 & 98 \\
Compromise & 368,723 & 1.73 & 3.96 & 0 & 63.01 \\
Similarity & 368,723 & 11.27 & 5.88 & 1 & 77 \\
\hline
\end{tabular}
\caption[Descriptive statistis of context variables]{Descriptive statistics of context variables. \\ Note. Statistics before normalization.}
\label{tab:descriptivesContextOnly}
\end{table}

\textbf{Measurement of context effects}

The measurement of the attraction and compromise effects is pretty straightforward. I follow the methodology outlined in the previous section. For the attraction effect, I count the number of alternatives dominated by a given option within the menu. This is implemented across all four vertical attributes. The compromise effect, given by equation \ref{eq:compromiseEffectGeneralFormula}, is also measured across all four dimensions. This results in 25 pairs of comparison groups for each alternative (see equation \ref{eq:compromiseEffectDetailedCalculation} in Appendix \ref{appendix:compromiseCalculation}). 

However, before proceeding to the measurement of similarity effect, it is needed to normalize the flight departure variables for the clustering algorithms. In order to identify similar alternatives within the menu, the clustering method needs a variable that allows it to measure the distance between any two departure values. This is achieved by transforming these variables into the Coordinated Universal Time format preserving dates, hours, and minutes of departure time. This way, the algorithm is able to measure the distance between any pair of alternatives in minutes. For normalization purposes, I also subtract the timestamp of the earliest flight in a menu from the departure times of every flight in that menu; thus, all times are measured as times after the earliest time.

After this transformation, I use Affinity Propagation for obtaining sets of similar options within each menu. I feed the clustering algorithm with the data on all the vertical and horizontal variables for each alternative. The algorithm returns an identifier for the group of options that comprises each of the options. Affinity propagation detects on average 7.62 clusters within the choice sets. In order to develop the measure of the similarity effect, I calculate the number of alternatives in the cluster to which the focal alternative belongs.

\textbf{Choice modeling}

To examine the context effects on choices in the airline booking data, I estimate random effects Probit models augmented by the context effect measures. These models have the crucial advantage of interpretability. Another advantage (over, for example, Logit) is the feature that Probit does not explicitly require the assumption of the independence from irrelevant alternatives. If the augmented model perfectly accounts for all context effects (IAA), this would not be a concern. However, as one cannot guarantee that human choices are not affected by any other context features (that have not yet been hypothesized and examined), having this feature is an additional advantage. I, however, also present robustness checks by fitting alternative statistical models in the Appendix \ref{appendix:LogitMixedAndFixedEffectResults} (Logit, Mixed Logit, and Fixed Effects Probit) \footnote{An additional robustness check in terms of the usage of the clustering method is also presented in the same appendix. There I use K-mean clustering (augmented with the use of Silhouette score \citep{rousseeuw1987silhouettes} to calculate the optimal number of clusters) to calculate the similarity measure. The results are robust to this alteration as well.}.

An important point to note here is the fact that the context variables incorporate menu size effects. For example, the attraction variable cannot take any value higher than 5 in a menu of size 6. However, the same variable can take the value of 49 in the 50 size menu. One way to deal with this feature would be to normalize the context variables by the size of the menu. Another alternative is to account for this feature statistically by controlling for the size of the menu in the regression equation. I opt for the latter because it guarantees higher flexibility in the empirical model structure. It also allows to account for menu size effects that could go further than context effects (for example potential choice overload). An additional advantage is that it is much simpler to interpret marginal effects of unscaled context variables.


\clearpage

\begin{sidewaystable}[ht]
\centering
\begin{tabular}{p{5cm}|*{9}{p{1.7cm}}}
\hline
Variable & Model 1 & Model 2 & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\ \hline
Price & -0.225*** & -0.232*** & -0.200*** & -0.227*** & -0.219*** & -0.184*** & -0.186*** & -0.184*** & -0.186*** \\ 
 & (0.006) & (0.006) & (0.008) & (0.006) & (0.006) & (0.008) & (0.007) & (0.007) & (0.007) \\ 
Trip duration & -0.136*** & -0.116*** & -0.087*** & -0.112*** & -0.124*** & -0.093*** & -0.092*** & -0.094*** & -0.092*** \\
 & (0.009) & (0.009) & (0.010) & (0.009) & (0.009) & (0.010) & (0.010) & (0.010) & (0.010) \\ 
Number of flights & -0.342*** & -0.322*** & -0.325*** & -0.309*** & -0.289*** & -0.279*** & -0.289*** & -0.281*** & -0.290*** \\ 
 & (0.008) & (0.008) & (0.008) & (0.008) & (0.009) & (0.009) & (0.009) & (0.009) & (0.009) \\
Number of airlines & -0.208*** & -0.209*** & -0.197*** & -0.210*** & -0.205*** & -0.193*** & -0.199*** & -0.193*** & -0.199*** \\
 & (0.010) & (0.010) & (0.011) & (0.010) & (0.010) & (0.010) & (0.010) & (0.010) & (0.010) \\ 
Attraction &  &  & 0.003*** &  &  & 0.003*** &  & 0.003*** &  \\ 
 &  &  & (0.001) &  &  & (<0.001) &  & (<0.001) &  \\ 
Compromise &  &  &  & -0.038*** &  & -0.034*** & -0.031*** &  &  \\ 
 &  &  &  & (0.003) &  & (0.003) & (0.003) &  &  \\ 
Similarity &  &  &  &  & -0.020*** & -0.020*** & -0.031*** & -0.020*** & -0.031*** \\ 
 &  &  &  &  & (0.001) & (0.001) & (0.002) & (0.001) & (0.002) \\ 
Attraction within cluster &  &  &  &  &  &  & 0.025*** &  & 0.0245*** \\ 
 &  &  &  &  &  &  & (0.002) &  & (0.002) \\ 
Attraction outside cluster &  &  &  &  &  &  & 0.002*** &  & 0.002*** \\ 
 &  &  &  &  &  &  & (<0.001) &  & (<0.001) \\ 
Compromise within cluster &  &  &  &  &  &  &  & -0.149*** & -0.136*** \\ 
 &  &  &  &  &  &  &  & (0.020) & (0.020) \\ 
Compromise outside cluster &  &  &  &  &  &  &  & -0.028*** & -0.027*** \\
 &  &  &  &  &  &  &  & (0.004) & (0.004) \\
Constant included & YES & YES & YES & YES & YES & YES & YES & YES & YES \\
Horizontal variables as controls & NO & YES & YES & YES & YES & YES & YES & YES & YES \\ 
Number of observations & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 \\ 
Number of choices & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 \\ 
Consistent Akaike information criterion & 49592 & 48906 & 48878 & 48764 & 48716 & 48565 & 48477 & 48555 & 48469 \\ 
Log likelihood & -24761 & -24356 & -24335 & -24278 & -24254 & -24165 & -24114 & -24153 & -24103 \\ \hline
\end{tabular}
\caption[Choice model estimation results]{Choice model estimation results.\\ Notes: Outputs from random effects Probit regressions. Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
\label{tab:mainResultsRandomProbitModel19AmadeusData}
\end{sidewaystable}
 

\clearpage

Before going to the estimation, one needs to transform the departure-time variables into outbound and inbound flight pairs. The transformation that was performed for the clustering exercise cannot be used directly because it estimated coefficients that are not interpretable. To make this information as tractable as possible, I generate a set of variables. First, a day-of-the-week variable is generated for the outbound flight. Second, a variable is generated that measures the duration of stay at the destination \footnote{For the regression analysis, similar to other numeric variables, in order to eliminate any scale effects, I perform a z-score transformation of duration of stay variable.}.  These two variables together describe the inbound and outbound flight timing characteristics at the level of the day. However, consumer preferences can be defined on a smaller scale. Therefore, I also generate two variables that describe the exact time of the day of the outbound and inbound flights. These variables, $t_out,t_i \in [0;1)$, are measured as a fraction of a day such that $t_i = 0$ corresponds to midnight, while $t_i=0.5$ corresponds to midday. I further apply a cosine transformation to these variables, i.e., $\cos(t_i ) = 2 \pi t_i$. This confines the departure time variable to the interval [-1;1], and ensures the smooth transition in departure times at midnight.  These transformations result in a total of four variables describing departure timestamps for outbound and inbound flight pair - the horizontal attribute of the alternative.

I estimate a sequence of 9 models and present the results in table \ref{tab:mainResultsRandomProbitModel19AmadeusData}. I estimate these models by using random-effects Probit regressions with robust standard errors. First, I start out by fitting two simple baseline models of consumer choice. Model 1 is the simplest estimation, which includes only the four vertical attributes as independent variables. Model 2 further extends this model by adding four horizontal attributes. In both cases, with or without horizontal attribute controls, all vertical variables generate meaningful results. Consumers clearly have preferences for shorter, cheaper flights with fewer layovers and airline changes. Travelers also seem to have preferences for the outbound flight during the day and for the inbound flight during the night (recall that the cosine transform variables reach a maximum at midnight and a minimum at midday).

\begin{table}[ht]
    \centering
    \renewcommand{\arraystretch}{1.3}
    \setlength{\tabcolsep}{0.4em}
    \begin{tabular}{p{3cm}*{7}{p{1.5cm}}}
    \hline
    Variable & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\
    \hline
    Attraction & 0.0187 & & 0.0170 & & 0.0173 & & \\
    Compromise & & -0.1900 & & -0.1646 & & -0.1509 & \\
    Similarity & & & -0.0747 & -0.0756 & -0.1222 & -0.0740 & -0.1207 \\
    Attraction within cluster & & & & & 0.1042 & & 0.1040 \\
    Attraction outside cluster & & & & & 0.0133 & & 0.0135 \\
    Compromise within cluster & & & & & & -0.6039 & -0.5444 \\
    Compromise outside cluster & & & & & & -0.1377 & -0.1323 \\
    \hline
    \end{tabular}
    \caption[Marginal effects of choice model on observational data]{Average marginal effects for relevant models.\\ Note. Average marginal effects implied by various models. All p-values were significant at $p<0.001$ level.}
    \label{tab:marginalEffectsAmadeusModel39}
\end{table}

To further extend model 2, three models (3 through 5) that each incorporate one of the context effects, and one model that incorporates all three context effects at once (model 6) were estimated. Table \ref{tab:mainResultsRandomProbitModel19AmadeusData} indicates the consistency between the coefficient estimates of model 6 and those of models 3-5. This set of models also allows one to evaluate the effect of the three context effects on consumer choice. In line with the theory, the presence of attraction and similarity effects is observed. That is, if the attraction measure increases for a given option, this increases the likelihood that the option is chosen. On the other hand, when the similarity measure increases, it decreases the probability that the option is chosen. Both of these effects are statistically highly significant and are in the hypothesized direction. To better understand the economic significance of the estimated effects, table \ref{tab:marginalEffectsAmadeusModel39} presents (average) marginal effects of the relevant models. From table \ref{tab:marginalEffectsAmadeusModel39} one can read that if the attraction measure increases by one unit (that is, having one more option dominated by the focal alternative, \textit{ceteris paribus}) the likelihood of an option being chosen goes up by about 0.02 percentage points on average. On the other hand, if the similarity measure increases by one unit, the likelihood that a given option is chosen goes down by about 0.08 percentage points on average.

Tables \ref{tab:mainResultsRandomProbitModel19AmadeusData} and \ref{tab:marginalEffectsAmadeusModel39}, however, also indicate the existence of a reverse compromise effect. The compromise effect posits that if an option represents a compromise between extreme alternatives, it will have a higher likelihood of being chosen. On the contrary, our results indicate that increasing our compromise measure decreases the likelihood of an option being chosen. This effect is again statistically and economically significant. From this we can conclude that in the context of airfare choice, consumers prefer extreme options to those that represent compromise. This implies that the preferences of individual consumers are strongly anchored to one of the four vertical attributes. For example, if a traveler attaches particular importance to price, she will be reluctant to trade away an option that is cheap for increases in the attractiveness in any other (vertical) dimension. This, in fact, is rather understandable given the context of current empirical exercise: The two largest groups of air travelers are holidaymakers, who are price-sensitive and do not readily trade away price advantage for shorter travel time, and business travelers, who are time-sensitive and do not trade away flight duration for a decrease in price.

Next, I investigate the interaction between several context effects. Previous literature has hypothesized and demonstrated the interaction between attraction and similarity effects in laboratory environments \citep{huberEtAl82, huberPuto83, roodrkerkEtAl11}. The interaction between similarity and compromise effects has not been studied in literature; however, one can consider that if the similarity effect efficiently identifies comparable alternatives that could constitute a consideration set of the consumer, the compromise effect, which considers options outside the consideration set, will not constitute an adequate guide for consumer behavior.  Given that the similarity measure hinges on identifying clusters of similar options, the interplay between the similarity effect and the other two context effects is rather straightforward to study. 

For this, four additional measures are calculated for each option that decomposes attraction and compromise effects along the cluster lines identified by the similarity measure. More precisely, the attraction and compromise measure for a given option is calculated: 1) by taking only the alternatives that belong to the same cluster to which this particular option belongs, and 2) by considering only the alternatives that do not belong to the same cluster. This way, one can get a measure of attraction and compromise effects of an alternative within the cluster (i.e., among comparable alternatives, or within the consideration set) and outside the cluster (i.e., among relatively noncomparable alternatives, or outside the consideration set).

In models 7 through 9 I study the comparative effects of pairs of these effects. Model 7 decomposes the attraction effect of model 6 into two parts (inside and outside the cluster).  Model 8 decomposes the compromise effect along the same lines, and model 9 estimates the model that includes both decompositions simultaneously. The results are again consistent and meaningful. Models 7 and 9 imply that attraction effect within the cluster, i.e., among comparable alternatives, has a much stronger impact on the purchase likelihood than that of the impact of the measure calculated based on nonsimilar alternatives. Table \ref{tab:marginalEffectsAmadeusModel39} indicates a difference in the size of the order of magnitude. Similarly, as indicated by models 8 and 9, being a compromise among comparable alternatives has a much higher detrimental effect on purchase likelihood than being a compromise among remote alternatives. The p values for all tests of coefficient pair equality (i.e., estimated coefficient for attraction within cluster being equal to that of the attraction outside cluster, and coefficient for compromise within the cluster being equal to the compromise coefficient outside the cluster) are below 0.001, indicating that cluster-based measure of similarity may be an efficient indicator of the consumer's consideration set.

\clearpage

\begin{sidewaystable}[ht]
\centering
\begin{tabular}{p{5cm}*{9}{p{1.7cm}}}
    \hline
        Variable & Model 1 & Model 2 & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\ \hline
        \addlinespace
        Price & -0.249*** & -0.259*** & -0.253*** & -0.257*** & -0.249*** & -0.238*** & -0.245*** & -0.238*** & -0.245*** \\ 
         & (0.008) & (0.009) & (0.013) & (0.009) & (0.009) & (0.013) & (0.013) & (0.013) & (0.013) \\ 
        Trip duration & -0.167*** & -0.143*** & -0.137*** & -0.143*** & -0.147*** & -0.139*** & -0.144*** & -0.138*** & -0.143*** \\ 
         & (0.013) & (0.013) & (0.016) & (0.013) & (0.013) & (0.015) & (0.015) & (0.015) & (0.015) \\ 
        Number of flights & -0.414*** & -0.392*** & -0.393*** & -0.386*** & -0.373*** & -0.369*** & -0.373*** & -0.371*** & -0.375*** \\ 
         & (0.014) & (0.014) & (0.014) & (0.014) & (0.014) & (0.015) & (0.015) & (0.015) & (0.015) \\ 
        Number of airlines & -0.161*** & -0.163*** & -0.160*** & -0.164*** & -0.164*** & -0.162*** & -0.165*** & -0.161*** & -0.164*** \\ 
         & (0.015) & (0.015) & (0.015) & (0.015) & (0.015) & (0.015) & (0.015) & (0.015) & (0.015) \\ 
        Attraction &  &  & 0.002 &  &  & 0.002 &  & 0.003 &  \\ 
         &  &  & (0.003) &  &  & (0.003) &  & (0.003) &  \\ 
        Compromise &  &  &  & -0.066*** &  & -0.063*** & -0.064*** &  &  \\ 
         &  &  &  & (0.013) &  & (0.013) & (0.013) &  &  \\ 
        Similarity &  &  &  &  & -0.023*** & -0.023*** & -0.029*** & -0.022*** & -0.029*** \\ 
         &  &  &  &  & (0.003) & (0.003) & (0.004) & (0.003) & (0.004) \\ 
        Attraction within cluster &  &  &  &  &  &  & 0.016*** &  & 0.016*** \\ 
         &  &  &  &  &  &  & (0.005) &  & (0.005) \\ 
        Attraction outside cluster &  &  &  &  &  &  & -0.001 &  & -0.001 \\ 
         &  &  &  &  &  &  & (0.003) &  & (0.003) \\ 
        Compromise within cluster &  &  &  &  &  &  &  & -0.182*** & -0.180*** \\ 
         &  &  &  &  &  &  &  & (0.048) & (0.048) \\ 
        Comrpmise outside cluster &  &  &  &  &  &  &  & -0.029 & -0.031* \\ 
         &  &  &  &  &  &  &  & (0.018) & (0.018) \\ 
        Constant included & YES & YES & YES & YES & YES & YES & YES & YES & YES \\ 
        Horizontal variables as controls & NO & YES & YES & YES & YES & YES & YES & YES & YES \\ 
        Number of observations & 79080 & 79080 & 79080 & 79080 & 79080 & 79080 & 79080 & 79080 & 79080 \\ 
        Number of choices & 3954 & 3954 & 3954 & 3954 & 3954 & 3954 & 3954 & 3954 & 3954 \\ 
        Consistent Akaike information criterion & 25568 & 25124 & 25134 & 25106 & 25088 & 25085 & 25087 & 25101 & 25104 \\ 
        Log likelihood & -12759 & -12482 & -12481 & -12467 & -12458 & -12444 & -12439 & -12446 & -12441 \\ 
    \hline
    \end{tabular}
\caption[Choice model results for the reduced dataset]{Choice model estimation results from the reduced dataset.\\ Notes: Outputs from random effects Probit regressions. Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
\label{tab:reducedResultsRandomProbitModel19AmadeusData}
\end{sidewaystable}

\clearpage 

An important drawback of the dataset is the feature that there is no way to know which available options on the market reached the eyeballs of the consumer. One way of thinking about this problem is to consider the most likely way such menus are delivered to decision-makers. Most online booking sites and flight aggregators use specific and proprietary algorithms to rank available menus at the search point. These rankings decide which options are shown to the customer. Although the information on specific ranking algorithms is not public, we know that the attribute that usually plays the most important role is the price. Given the robust findings that price negatively affects choice probability, the best guess for a simple ranking mechanism that would capture a wide variety of sorting mechanisms would be options sorted in decreasing order with respect to price. Assuming that each user was reached by the same number of options, we could construct a reduced dataset for a sensitivity check. For this exercise, I construct a dataset that only contains menus with more than 20 options, and only retain 20 cheapest alternatives per menu. There are also cases where the chosen option is not part of the set of 20 cheapest alternatives in the menu \footnote{20 alternatives are chosen so that there are enough entries to have variance in key variables. Results are robust to different menu sizes, with the characteristics that, as I reduce menu size, more effects seem to lose significance}. I also eliminate these choice cases from the reduced dataset. This leaves me with about four thousand choice cases.

\clearpage

\begin{table}[!ht]
    \centering
    \renewcommand{\arraystretch}{1.1}
    \setlength{\tabcolsep}{0.3em}
    
    \begin{tabular}{>{\fontsize{10pt}{11pt}\selectfont}p{3cm}>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l}
        \hline
        Panel A & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\ \hline
        Attraction & 0.0138 & & 0.0191 & & 0.0010 & & \\ 
         & [0.512] & & [0.362] & & [0.317] & & \\
        Compromise & & -0.5507 & & -0.5239 & & -0.5335 & \\ 
         & & [$<0.001$] & & [$<0.001$] & & [$<0.001$] & \\ 
        Similarity & & & -0.1916 & -0.1890 & -0.2425 & -0.1859 & -0.2375 \\ 
         & & & [$<0.001$] & [$<0.001$] & [$<0.001$] & [$<0.001$] & [$<0.001$] \\ 
        \renewcommand{\arraystretch}{1.}
        Attraction within cluster & & & & & 0.1357 & & 0.1329 \\ 
         & & & & & [0.001] & & [0.001] \\ 
        Attraction outside cluster & & & & & -0.0111 & & -0.0080 \\ 
         & & & & & [0.628] & & [0.728] \\ 
        \renewcommand{\arraystretch}{1.}
        Compromise within cluster & & & & & & -1.5127 & -1.4961 \\ 
         & & & & & & [$<0.001$] & [$<0.001$] \\ 
        Comrpmise outside cluster & & & & & & -0.2374 & -0.2566 \\ 
         & & & & & & [0.107] & [0.083] \\
    \end{tabular}
    
    \vspace{10pt}
    
    \begin{tabular}{>{\fontsize{10pt}{11pt}\selectfont}p{3cm}>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l>{\fontsize{11pt}{13pt}\selectfont}l}
        \hline
        Panel B & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\ \hline
        Attraction & 0.0359 & & 0.0291 & & 0.0096 & & \\ 
         & [$<0.001$] & & [$<0.001$] & & [$<0.001$] & & \\
        Compromise & & -0.4150 & & -0.3482 & & -0.3112 & \\ 
         & & [$<0.001$] & & [$<0.001$] & & [$<0.001$] & \\ 
        Similarity & & & -0.2037 & -0.2082 & -0.3084 & -0.2038 & -0.3034 \\ 
         & & & [$<0.001$] & [$<0.001$] & [$<0.001$] & [$<0.001$] & [$<0.001$] \\ 
        \renewcommand{\arraystretch}{1.}
        Attraction within cluster & & & & & 0.2454 & & 0.2436 \\ 
         & & & & & [$<0.001$] & & [$<0.001$] \\ 
        Attraction outside cluster & & & & & 0.0177 & & 0.0182 \\ 
         & & & & & [$<0.001$] & & [0.006] \\ 
        \renewcommand{\arraystretch}{1.}
        Compromise within cluster & & & & & & -1.5331 & -1.3553 \\ 
         & & & & & & [$<0.001$] & [$<0.001$] \\ 
        Comrpmise outside cluster & & & & & & -0.2851 & -0.2655 \\ 
         & & & & & & [$<0.001$] & [$<0.001$] \\ \hline
    \end{tabular}
    
    \caption[Marginal effect of choice model for the reduced dataset]{Average marginal effects from the reduced dataset (Panel A, top) and marginal effects from the full dataset at menu size $=20$ (Panel B, bottom). P-values in the square brackets.}
    \label{tab:marginalEffectsOnReducedDatasetAmadeus}
\end{table}

\clearpage

Table \ref{tab:reducedResultsRandomProbitModel19AmadeusData} presents the estimation results of all 9 models, while panel A in table \ref{tab:marginalEffectsOnReducedDatasetAmadeus} presents the corresponding marginal effects. For the sake of comparison, panel B in table \ref{tab:marginalEffectsOnReducedDatasetAmadeus} also presents marginal effects implied by the models estimated on the original dataset at the menu size being equal to twenty. One difference due to the move from full to reduced dataset is that the total attraction effect seems to lose significance. However, once this effect is decomposed along the borders traced by the similarity measure in model 9, it is clear that the in-cluster attraction effect does attain statistical significance. Naturally, there are differences in terms of the size of marginal effects from the original and reduced datasets that one can read by comparing panels A and B in table \ref{tab:marginalEffectsOnReducedDatasetAmadeus}. However, these differences are fairly small, and two exercises seem to produce consistent results.

\textbf{Experimental data}

The airfare choice data presents an excellent opportunity for the application of the proposed methodology. It is a large dataset of actual choices made in a natural environment by consumers, the product under question is relatively complex (i.e., characterized by more than two attributes), the menu size is not constant across different choice cases, and menu sizes are sufficiently large to get sufficient variance in all context variables. However, the dataset also has shortcomings. Firstly, although one knows what was available on the market when the choice was made, there exists no accurate information on which of the options was in fact considered by the consumer. Secondly, there is no information on the identity and characteristics of the consumers. Without such information, one is not able to account consumer-side features that could systematically drive choice outcomes that is observed.

To remedy these shortcomings, in what follows, the same methodology is applied to an experimental dataset. These data, like the observational dataset, come from a travel context. Similarly, the product studied is relatively complex. Unlike an observational dataset, however, the dataset comes from a stated choice experiment. Here, there is no variance in menu sizes, and these menus are relatively small (five alternatives). Importantly, this dataset contains information on a set of variables that describe the demographics of the subject. An added advantage of the dataset is that each subject makes 12 choices and these 12 choice cases are constant across all subjects. This allows to control for menu-specific, as well as subject-specific characteristics.

The data comes from a discrete choice experiment administered to residents and daily commuters to the city of Ljubljana, Slovenia, by Gerzinic et al. \citeyearonly{gerzinicEtAl21}. 108 subjects were sequentially presented with 12 5-alternative menus and asked to choose the best alternative in each case \footnote{In fact, the experiment consisted of multiple choice rounds within each menu. In the first round, subjects needed to choose the best option of the five presented alternatives. Consequently, this alternative was removed from the menu, and in the second round they needed to choose the worst option out of four remaining alternatives. Then this option was removed, and subjects needed to choose the best of the three remaining and ultimately the worst of the two remaining options. The experiment was designed for a different purpose; see Gerzinic et al. \citeyearonly{gerzinicEtAl21}. For the purpose of this study, only data from the first round of choices are used.}.  This represents 1,296 recorded choice cases. Each alternative described a commuter trip with a 'park and ride facility choice' to the city with respect to the following five characteristics: price, car ride duration, public transport ride duration, public transport (average) wait time, and the mode of public transport (either bus or train). Subjects were overwhelmingly Slovenian nationals ($91.67\%$), 58\% female, with a mean age of 36 years ($St. Dev.=12.5$). Further information on education, income, household size, and the number of cars in the household was also obtained. The descriptive statistics of the choice variables in the experimental dataset is presented in table \ref{tab:descriptivesNejc}. The characteristics of the subjects are used as control variables.

\begin{table}[ht]
    \centering
    \begin{tabular}{lccccc}
    \toprule
    Variable & Count & Mean & St.Dev & Min & Max \\
    \midrule
    Price & 6480 & 5 & 3.266 & 1 & 9 \\
    Car ride duration & 6480 & 15 & 8.166 & 5 & 25 \\
    Public transport ride duration & 6480 & 20 & 8.166 & 10 & 30 \\
    Public transport wait time & 6480 & 16.67 & 10.275 & 5 & 30 \\
    1[Public transport is train] & 6480 & 0.5 & 0.500 & 0 & 1 \\
    Attraction & 6480 & 0.067 & 0.249 & 0 & 1 \\
    Similarity & 6480 & 2.633 & 1.080 & 1 & 4 \\
    \bottomrule
    \end{tabular}
    \caption[Descriptive statistics of variables for transport experiment data]{Descriptive statistics for choice and context variables.}
    \label{tab:descriptivesNejc}
\end{table}

\textbf{Measurement of context effects}

A significant disadvantage of this particular dataset is the small menu size (five) and the large number of choice variables (also five). These circumstances, together with the fact that the experiment was not designed for specific purpose of studying context effects and that one of the choice variables is categorical (mode of public transport), restricts possibility of variance in context variables. So much so that the procedure applied to observational data does not identify a single case with these data where we can observe a compromise option. As a result, there is no variance in the compromise measure. In addition, the presence of categorical variable drives the AP clustering algorithm (as well as K-Means algorithm, for that matter), which always results in two groups of similar options; one consisting of all options using bus as the mean of public transport, and the other using train. Given that calculating the dominance relationship also requires constancy of categorical variable across a pair of options, all dominance relationships (and hence all attraction) are only within the cluster. The consequence of all this is that I cannot estimate model 4, and that models 6 through 9 become equivalent. Table \ref{tab:descriptivesNejc} also presents descriptive statistics of the context variables.

\textbf{Choice modeling}

The choice modeling exercise takes a very similar approach to that using observational data. The only difference is that in the current case regressions also include menu-level fixed effects. This is necessary as the 12 choice cases are constant across all subjects. The results of the random effects Probit estimation are given in table \ref{tab:nejcModelResults} \footnote{Robustness checks with random effects Logit, and fixed effects Probit with experimental dataset are presented in Appendix \ref{appendix:nejcDataRobustnessChecks}.}.  At the level of choice variable, it is not surprising that one finds negative effects of all vertical variables (price, duration of car ride, duration of public transport ride, time of waiting for public transport). I also find that commuters prefer the train over the bus as a mode of public transportation.

The results with respect to context variables are consistent with those obtained from the observational data set in that significant attraction and similarity effects are observed. Although in model 3, the attraction effect is insignificant (and goes in the ``wrong'' direction), in the unifying model of context effects it achieves statistical significance ($p = 0.021$). Marginal effects indicate a much stronger impact of the context in the experimental setup. A one-unit increase in attraction results in 4.8\% increase in the probability of choice of an option. An increase in similarity of one unit, on the other hand, decreases the probability of choice by 3.2\%, \textit{ ceteris paribus}. Such a difference in marginal effects is not surprising given the much smaller menu size compared to the observational dataset.

\clearpage
\begin{sidewaystable}[ht]
    \centering
    \begin{tabular}{p{7cm}ccccc}
    \toprule
    Variable & Model 1 & Model 2 & Model 3 & Model 5 & Models 6-9 \\
    \midrule
    Price & -0.221*** & -0.228*** & -0.228*** & -0.214*** & -0.208*** \\
    & (0.008) & (0.008) & (0.008) & (0.008) & (0.009) \\
    Car ride duration & -0.067*** & -0.070*** & -0.070*** & -0.069*** & -0.066*** \\
    & (0.003) & (0.003) & (0.003) & (0.003) & (0.003) \\
    Public transport ride duration & -0.046*** & -0.047*** & -0.047*** & -0.041*** & -0.039*** \\
    & (0.002) & (0.003) & (0.003) & (0.002) & (0.003) \\
    Public transport wait time & -0.048*** & -0.050*** & -0.050*** & -0.043*** & -0.040*** \\
    & (0.003) & (0.003) & (0.003) & (0.003) & (0.003) \\
    1[Public transport is train] & 0.189*** & 0.179*** & 0.179*** & 0.147*** & 0.140*** \\
    & (0.042) & (0.043) & (0.043) & (0.044) & (0.044) \\
    Attraction & & & -0.005 & & 0.226** \\
    & & & (0.089) & & (0.098) \\
    Similarity & & & & -0.122*** & -0.149*** \\
    & & & & (0.025) & (0.027) \\
    Constant included & YES & YES & YES & YES & YES \\
    Control variables included & NO & YES & YES & YES & YES \\
    Number of observations & 6480 & 6180 & 6180 & 6180 & 6180 \\
    Number of choices & 1296 & 1236 & 1236 & 1236 & 1236 \\
    Number of subjects & 108 & 103 & 103 & 103 & 103 \\
    Consistent Akaike information criterion & 5206 & 5105 & 5113 & 5089 & 5092 \\
    Log likelihood & -2537 & -2386 & -2386 & -2374 & -2371 \\
    \bottomrule
    \end{tabular}
    \caption[Choice model results for experimental data]{Choice model estimation with experimental data. \\
    Note: Five of the subjects did not provide information on all control variables (i.e., demographics). As a result, the data from those subjects is missing in regressions with control variables. Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
    \label{tab:nejcModelResults}
\end{sidewaystable}

\clearpage

\subsection{Conclusion}

This chapter has built on findings described in chapter \ref{chapter:simulationStudy} and further extended established measures of context effects with the aim of making contextual choice modeling applicable to observational data from a wide range of environments. Previous approaches to the study of context effects were limited to choices among a small number of options (usually two or three) of simple products (usually characterized by two numerical attributes). This meant that much of the previous literature used experimental settings and examined relatively simple product line design questions, e.g., Orhun et al. \citeyearonly{orhun09} and Rooderkerk, Van Heerde, and Bijmolt \citeyearonly{roodrkerkEtAl11}, where the choice was among a set of similar products with marginally varied characteristics. 

One important limitation of this approach is that (in case of attraction and compromise effects), it relies on ordinal relationships between the available options. In other words, the information that option $A$ is better than option $B$ in terms of attribute $X$ is used. This is a departure from previous literature that has relied on cardinal measurements (e.g., Rooderkerk, Van Heerde and Bijmolt \citeyearonly{roodrkerkEtAl11}), that is, how much one thing differs from another. When dealing with only two or three options, cardinal measurement is the only way to quantify context effects. In the proposed framework, on the other hand, cardinal measurement is not a necessity. However, given that cardinal measurement is possible in most contexts (at least for a set of attributes), using ordinal measurements may hide some information from the modeling approach. One could imagine developing context effect measures that will consider both ordinal and cardinal measures. In this way, for example, a researcher could measure the attraction effect not only based on how many alternatives a focal option has, which is an ordinal measure, but also by how much the focal option dominates those alternatives (at least on average); which is a cardinal measure. In a similar vein, information on how central the focal option is in the group of alternatives could be incorporated in the measure of the compromise effect, and information on the spread of options identified as similar to the focal option could be incorporated in the measure of similarity effect.  Moreover, clustering has proved itself being a reliable method to account for similarity effect.

This framework opens up possibilities for studying context effects in a much wider range of settings. The presented methodology can handle very large menu sizes of products that are relatively complex (i.e., characterized by many and varied types of attributes). Equally importantly, the approach does not require consistent menu sizes in the dataset. In the paper, only one such empirical application is presented. However, this methodology could be applied to larger and richer datasets generated from electronic commerce websites, as well as offline environments. This enables the study of context effects in the field without explicit experimental interventions, which usually prove to be very expensive. As a result, the methodology could contribute to designing optimal offers on larger scale (e.g., optimizing recommender systems) especially in online settings where each shop outlet can carry hundreds of substitute options to choose from. Considering that in an increasingly crowded digital marketplace, online portals often leverage recommender systems to assist consumers in the decision-making process. Sometimes, even the number of alternatives recommended may seem overwhelmingly large. Prior research has established that when faced with many alternatives, people tend to use decision-heuristics \citep{fishburn1974exceptional}. The findings of this study can be utilized in an attempt to design consideration sets that encompass empirically similar alternatives. This might especially be useful when recommender systems have very little information about the user, which are known as cold-start problem. 

The results of this study provide a good foundation for the next chapter of my thesis. There, I will utilize cluster based similarity measure to propose a recommender system design which can especially be useful in user information scarce environments.


\newpage
\section{A Context-Informed Approach to Cold-Start Problem in Recommender Systems\footnote{This chapter is based on a joint work with my supervisor Zakaria Babutsidze, William Rand and Thierry Delahaye. It has been published in the proceedings of the 54th Hawaii International Conference on System Sciences}.}\label{chapter:hicssPaper}

\begin{abstract}
        The cold-start problem has become a significant challenge in recommender systems. To solve this problem, most approaches use various user-side data and combine them with item-side information in their system design. However, when such user data are not available, those methods become unfeasible. We provide a novel recommender system design approach, which is based on two-stage decision heuristics. Using only the characteristics on the item side, I first identify the structure of the final choice set and then generate it using stochastic and deterministic approaches.
        
\end{abstract}

\subsection{Introduction}

With the rise of the Internet, interaction with recommender systems has become a common part of human activity. When there are many options to choose from, recommender systems save consumers time and effort by matching them with items \citep{bobadilla2013recommender}. Making successful recommendations requires knowledge of demand-side factors, such as consumer taste, historical interactions, purchasing power, and sociodemographic characteristics, along with supply-side factors, such as item characteristics.

Because recommender systems are online services implemented by providers, supply-side information is generally available at all times. However, this is not always the case with demand-side information. On most occasions, users are not identified either because it is not feasible or because interaction with the system does not require them to identify themselves.

This lack of information is referred to as the cold-start problem \citep{adomavicius2005toward}. Some services, for example Netflix, solve this problem by providing general suggestions until they can gather enough information about the user. Others, such as Goodreads, explicitly survey the new user to seek this information.

With current regulations and user awareness of security and privacy on the internet \citep{anton2010internet}, systems face continuous cold-start problems \citep{wong2014online}. In such cases, the use of contextual information from the supply side becomes crucial \citep{adomavicius2005toward}. One way of using this information is to use random utility models. However, such models are based on the notion that perfectly rational consumers have well-defined preferences \citep{babutsidze2019asymmetric}. Hence, they are not able to account for context-dependent preferences \citep{tversky1979preference}. Under such circumstances, clustering-based approaches can be used to enrich the context in recommender systems. Previous research \citep{babutsidze2019asymmetric} has demonstrated that such an approach is flexible enough to be extended over imperfectly rational or context-dependent consumer behavior.

In this paper, I incorporate insights from the decision literature into recommender system design using the dataset of European flight choices \citep{lheritier2019airline}. I argue that the choice process occurs in two sequential stages: consumers first identify the small subset of choices that they would ``consider'' and then make a choice from that subset. Combining findings in marketing, management, and consumer behavior, and using clustering to quantify the contextual information of the choice set,  I propose a user-side, two-step  ``consider-then-choose'' \citep{gilbride2004choice,liu2011efficient} approach to recommender system design to tackle the cold-start problem. 

\subsection{Theoretical background}

\textbf{Choice heuristics}

Previous literature in psychology and economics has suggested that people tend to use various decision heuristics to reduce cognitive load during the decision-making process \citep{fishburn74,bettman1979memory,johnson1989choice}. Because consumers tend to behave as \textit{satisficers} rather than \textit{maximizers}, they do not perform an evaluation of all the alternatives available to them, but stop when they find an option that has overall better attributes and satisfies their needs \citep{simon1956rational}. For simplicity, let us consider the case of buying flight tickets. There are $N$ tickets and each have $k$ attributes, which can include the price, duration, time of the day of flight, number of connections, and others. A consumer has a single objective function

\begin{equation} \label{eq:1}
   O = O(T_1, T_2, \dots, T_N), 
\end{equation}

where $T$ is the linear transformation of the attributes of the flights $Z^k$ with some random parameters $\theta$. We can rewrite equation \ref{eq:1} as

    \begin{equation} \label{eq:2}
        O = O\left(\sum_{k} \theta_{k}Z_1^k, \dots,\sum_{k} \theta_{k}Z_j^k, \dots, \sum_{k} \theta_{k}Z_N^k\right),
    \end{equation}

which would allow a consumer to explicitly compare the marginal contribution of each attribute to the maximization of the objective function \citep{de2011modelling}. However, most of the time, choice attributes do not require or allow for such trade-off calculations, as some of them are too valuable or there are too many attributes to consider. For example, a consumer flying for business purposes may value the flight duration more than a budget traveler, who would value price above everything else. In such cases, even when consumers are perfectly rational and well informed, when faced with options that simultaneously differ across many attributes, or when it is difficult to calculate such trade-offs, they use various heuristic approaches \citep{hauser1990evaluation}.

Heuristics are mathematical formulas that describe different rule-based decision steps taken by individuals to reduce their potential decision effort \citep{bettman1998constructive}. One can distinguish several types of approaches based on heuristics: lexicographic rule \citep{fishburn74}, conjunctive/disjunctive \citep{coombs1951mathematical}, elimination by aspects \citep{tversky1972elimination}, to name a few. 

The lexicographic rule is the simplest deterministic rule in the heuristic approach. Here, individuals choose the alternative that has the highest value of the feature they want. If there are several options with equal values, individuals compare those options based on the second most valued feature. This loop continues until there is one option remaining. For example, a person looking for flight tickets from Paris to New York will have different options that vary in time, price, number of connections, baggage allowance, transfer time, and so on. Attributes of the choices are first ranked based on their importance to the consumer: cheaper than 600 euros, checked and carry-on baggage included, one layover, maximum transfer time of four hours. Then, a filtering stage occurs. After filtering on the price, if there are multiple options remaining, the consumer will switch to baggage allowance, connections, and transfer time. As soon as the choice set contains only one option, the search stops.

The conjunctive and disjunctive heuristic approaches are related \citep{coombs1951mathematical}. In the conjunctive rule, consumers first establish the list of features they consider relevant to the choice problem. Then, they establish various thresholds on those features. If an alternative passes all those thresholds, it is chosen. In contrast, in the disjunctive approach, an option is chosen that exceeds the threshold in at least one of the features \citep{coombs1951mathematical}. The results of the lexicographic approach and the conjunctive approach appear to be similar. The only difference is that, instead of evaluating options based on the first aspect, then the second, and so on, in the conjunctive approach, the consumer evaluates options based on all aspects simultaneously. In some cases, consumers may be willing to use a conjunctive approach that generalizes both conjunctive and disjunctive approaches \citep{hauser2014consideration}. It allows some variation in the desired aspects. For example, if the consumer valued 4 aspects as mentioned in the example above, he or she might be willing to accept an option which satisfies 3 of those 4 aspects. This approach is particularly useful when there are time constraints or a fully conjunctive rule would result in no choice \citep{hauser2009non}. 

Elimination by aspects is another heuristic approach that has been
proposed in the literature \citep{tversky1972elimination}. The basic setup of this approach is that an individual chooses one attribute and eliminates options based on this attribute and repeats this procedure for other attributes if necessary until the remaining options do not share common attributes anymore. Then, as a last step, the final option is chosen according to Luce's choice axiom \citep{luce2012individual}, which states that the probability of selecting one option over the others in a choice set is not affected by the presence or absence of other options. Most of the results on this topic \citep{batsell1985new,gensch1987two,currim1988disaggregate,manrai1989elimination} indicate that the use of multiphase heuristic processes can increase the accuracy of the estimation and result in improved interpretability of the models. Elimination by aspects is considered a heuristic method with stochastic rules due to the nature of the comparisons made by an individual and because the selection process is not based on the relative importance of the features
\citep{aribarg2018advancing}. Such models are also hard to apply successfully because they require a tremendous number of parameters to be estimated \citep{batsell1985new}. Although these models can theoretically capture the essence of the two-stage choice process, they are unable to identify the results of separate stages \citep{gilbride2004choice}. 

\textbf{Two-stage choice}

During the choice process, consumers usually face a large number of options \citep{payne1988adaptive}. Evaluating that many options drastically increases cognitive load during the decision process and, therefore, to reduce this load, consumers first select a small subset during an initial consideration stage \citep{paulssen2005self} and then make their choice from that subset in the final stage \citep{bettman1979memory, gensch1987two, paulssen2005self}. First, this allows users to remove unrealistic options from thorough consideration. Second, because the set of choices is much smaller in the final stage, users are able to invest more cognitive effort to analyze individual options with more care \citep{gensch1987two}. In addition, the decision strategies used in the two stages differ considerably and are therefore not interchangeable. The main reason for this is that the cognitive costs of the decision rules should not outweigh their potential benefits during each stage \citep{bettman1990componential}.

In the information processing literature, the small subsets from which consumers make their final decisions are called consideration sets. There are several definitions of a consideration set. \citep{shocker1991consideration} defines a consideration set as a ``set of alternatives that satisfy a specific objective and are accessible to a consumer on a particular occasion''. Hauser \citeyearonly{hauser1990evaluation} refers to it as a ``set of options that receive a significant amount of consideration during the decision-making process''. However, in marketing, scholars generalize these definitions and refer to consideration sets as a ``subset of alternatives that survive the initial screening phase'' \citep{haubl2000consumer}. 

Despite the fact that consumers may not always use such a two-stage process to select products \citep{hauser2009non}, the use of consideration sets is justified because they represent the choice process more realistically and better explain consumer behavior \citep{horowitz1995role}. Potentially up to 80\% of the decision process, the uncertainty can be resolved if one correctly determines the consideration set  \citep{hauser1978testing}. 

For an empirical study of consideration set formation, information on consideration sets can be elicited in multiple ways \citep{gaskin2007two,yee2007greedoid, ding2011unstructured}. However, for modeling purposes, the literature discusses two main ways of consideration set formation: deterministic \citep{coombs1951mathematical} and stochastic \citep{mcfadden1973conditional, urban1984testing}. Although stochastic modeling makes all potential sets possible by attaching nonzero choice probability to each of them, deterministic approaches may render some outcomes impossible \citep{aribarg2018advancing}. Because we cannot know which answer is the best and the decision maker, or consumer, is the final arbitrator of the ``correct'' choice \citep{hauser2014consideration}, the use of either of these two approaches must consider the environment of choice, time frame, future value (or loss) associated with the correct and incorrect choice, and so on \citep{punj2009information}. For example, let us consider the flight booking case again and suppose that the consumer lives far from the airport and can reach it only in the afternoon. Consequently, all tickets with departure time before noon will not be considered. When forming a consideration set in this case, one must consider not only the characteristics of available options, but also the characteristics of the consumer and the choice environment. While using a purely stochastic approach might yield sets which include some options that the consumer would indeed consider, there will also be options which will have zero probability for consideration. In contrast, applying some deterministic rules derived from this particular choice environment, such as departure time, in the consideration set formation, will completely exclude those options.

When there are not many options to consider, the options have few attributes, or the utility of the final choice is not evenly distributed among the attributes, the consideration sets can be modeled using simple deterministic rules, because there is not much cognitive load and the decision rules are relatively simple \citep{lee2004effect, hauser2014consideration}. When forming consideration sets, it is also important to consider their size. It is very difficult to decide on an optimal size based on the choice environment and individual processing capabilities \citep{de2011modelling}. 

With the current progress in computer science, mathematics, and behavioral economics, recommender systems are ideal tools to solve this information overload problem and provide users with the most relevant consideration sets \citep{breese2013empirical}.

    
\textbf{Recommender systems}    

Recommender systems (RS) have been an important part of our daily lives thanks to the rise of the Internet. RS are software tools and / or algorithms that match users to items \citep{mahmood2009improving} \footnote{Although there are other tools which create suggestions for users (such as Interactive Decision Aids, Recommender agents and etc.), for the purposes of this study I refer to recommender systems and bound the definition using the one by Mahmood \citeyearonly{mahmood2009improving}. These other tools have some theoretical and implementation differences, which makes them orthogonal to the purposes of this study. However, I discuss, differentiate between them, and define the boundary conditions in the chapter \ref{chapter:recommenderSystemsAgentsDifferentiation}.}. An example is Netflix, which recommends a movie similar to the one the user just watched. The general purpose of any RS is to help users who do not have sufficient knowledge or experience or the capacity to evaluate the item pool fully. 

I distinguish between personalized and general recommender systems. Personalized RS may suggest different items to different users or user groups. General RS, in turn, are usually directed towards the general public and might be relevant only to some part of it, for example, Billboard Hot 100, IMDB Top 250, or the front page of the New York Times \citep{ricci2010recsystems}.

When RS faces new users or new items, it may not provide personalized content due to the sparsity of information \citep{lika2014facing}. Because such RS mainly utilize historical interactions of similar users on similar items, and their ratings, facing a new entity about which it has no information makes it impossible to generate recommendations. This problem is referred to in the literature as the cold-start problem \citep{adomavicius2005toward} and is considered a key challenge in the design of RS \citep{park2009pairwise}.

The literature distinguishes three main cold-start settings \citep{park2009pairwise}: a) recommending existing items for new users (user-side), b) recommending new items for existing users (item-side), c) recommending new items for new users (user- and item-side). However, when trying to address this problem, scholars have focused mainly on settings in which the challenge was to recommend new items to existing users \citep{zhang2010solving}.

Recently, some progress has been made in solving the user-side cold-start problem after the introduction of contextual information into recommender systems. As a result of this effort, context-aware recommender systems were introduced \citep{adomavicius2011context}. In this approach, the context refers to the time and content of the choice, the location or sociodemographic characteristics of the decision-maker, and so on.  Some approaches have been very successful by combining contextual information with collaborative filtering \citep{aharon2013off,bykau2013coping,saveski2014item}. The use of baseline information for new users \citep{kluver2014evaluating} and the use of data from social networks \citep{guy2009personalized} have also been proven to overcome the cold-start problem to a some extent.

In practice, however, these cold-start problems often transform into continuous cold-start problems \citep{kiseleva2016beyond}. This happens when:
\begin{enumerate}
    \item The user stays ``inactive'' for a long period before the initial interaction
    \item The user's interactions have a significant time window
    \item The user creates a ``one-time'' account
    \item It is not possible or permitted to track users, or (under GDPR) the user has requested their personal information to be removed from the system \citep{hildebrandt2022issue}.
\end{enumerate}

In the case of the continuous cold-start problem, the solutions suggested in the literature discussed above are not feasible. The first reason is that users generally do not need to create an account to interact with some services, for example,  watching videos on YouTube, searching for items on Amazon, or looking for airline tickets. Because of this, systems commonly treat different sessions by the same user as new users. The second reason is that due to increasing awareness of internet security and privacy, people tend to use incognito mode when they make searches \citep{anton2010internet}, which disables most of the tracking and user identification.

My approach addresses the user-side continuous cold-start problem, which has not been thoroughly researched before. By utilizing only characteristics at the item and search level, I propose a novel RS design which is able to tackle the information sparsity. First, I use clustering \citep{rokach2005clustering} to quantify contextual information both at the individual level and at the search level, and I cluster empirically similar items together. Then, a hypergeometric sampling technique is used to generate the structure of the final choice set, meaning how many options from each cluster should be in the final choice set. Because my goal in this study is to provide the RS design that is not aimed at providing accurate recommendations per se, the final stage of the choice set generation will consist of applying both stochastic \citep{mcfadden1973conditional, urban1984testing} and deterministic rules \citep{hauser2014consideration, lee2004effect, coombs1951mathematical}.

\subsection{Methodology}

Observational data I use are the same as that used in the previous chapter. A detailed discussion of this dataset can be found in section \ref{section:observationalDataDescription}. These data have been subject to preprocessing rules which were also described in section \ref{section:observationalDataDescription} and in section \ref{section:additionalPreprocessingObservationalData}. The descriptive statistics of the vertical variables are shown in the table \ref{tab:descriptiveStats}.

\textbf{Clustering}

Clustering is used to divide the data into different groups where empirically similar elements belong to the same group and dissimilar ones are assigned to different groups \citep{rokach2005clustering}. By using clustering, my aim was to identify options that were similar in their context. I used two mainstream clustering algorithms: Affinity propagation (AP) and KMeans (KM). Both clustering methods are described in detail in the Appendix \ref{appendix:clusteringAlgorithms}.

To quantify the context and clusters, I created two variables that captured the characteristics of clusters: relative cluster size and relative cluster dispersion. The first represents the normalized number of options within that cluster. The second is derived using

$$\frac{\sum_{i=1}^{m}{({x_i -\mu_k})}^2}{{\sum_{i=1}^{N}{(x_i- \mu_M)}^2}} ,$$

where $N$ is the number of options within the menu, $m$ is the number of options within the cluster, $\mu_k$ is the centroid of the cluster $k$ to which $x_i$ belongs and $\mu_M$ is the center of mass of the menu.


\textbf{Two-stage choice}\label{hypergeometricDefinitionText}
 
The modeling process consisted of two stages. In the first stage, I modeled the structure of the final choice set and determined how many elements of each cluster should be present in the consideration set. Next, I used simple stochastic and deterministic rules to select options following the structure obtained during the first stage.

First, the attractiveness measure of clusters within the menu was calculated. I defined the attractiveness measure as the probability that the given cluster contains an actual choice and calculated it using the traditional multivariate logistic model \citep{ben1985discrete}. Utilizing both descriptive information of options within clusters and the aforementioned cluster-level characteristics as covariates, I estimated those probabilities for every cluster in the menu according to

\begin{equation}\label{eq:multivariateLogit}
    a_k = Pr(Y=1|X_k)=\frac{exp(\beta  X_k)}{1 + exp(\beta X_k)},
\end{equation}

where $a_k$ is the attractiveness measure, $X_k$ is the feature vector of the cluster $k$ and $\beta$ is a vector of coefficients.
 
Using cluster-level characteristics allowed me to embed the contextual information of options within a cluster into my model. Then, using this metric, the structure of the final choice set was determined via hypergeometric sampling. 

Let $N$ be the number of options within the menu that belong to $k$ unique clusters and $m_i\in M$ be the number of options that belong to cluster $i$, so that $\sum_{i=1}^{k}m_i= N$. If we sample $n$ random options from that menu without replacement, we get a set $J = \{j_1,j_2,j_3, \dots,j_k\}$ which follows the hypergeometric distribution, and the probability of getting such a vector $J$ is determined by

\begin{equation}\label{eq:hypergeometric1}
    P(j_1, j_2,\ldots, j_k) = P(J) =\frac{{m_1\choose j_1} {m_2\choose j_2} \dots {m_k\choose j_k}}{{N\choose n}} ,
\end{equation}

where $j_k$ is the number of elements belonging to cluster $k$ in our sample. 
 
However, using $M$ and $N$ does not allow one to quantify the menu context in terms of its clusters, which was the goal. One way to avoid this limitation is to use the attractiveness measure instead of $M$ in sampling. However, because the attractiveness measures are in the range of zero to one, it was impossible to use them directly in our sampling. So, I define the \textit{ attractiveness score} of a cluster as $s_k = a_k \ast 1e6$, where $a_k$ is the attractiveness measure of a cluster $k$. The constant $1e6$ was chosen to account for the smallest differences between two almost identical $a_k$. Consequently, $N$ was replaced by $D=\sum_{i=1}^{k}s_i$. Therefore, equation \ref{eq:hypergeometric1} became

\begin{equation}\label{eq:hypergeometric2}
    P(j_1, j_2,\ldots, j_k) = P(J) =\frac{{s_1\choose j_1} {s_2\choose j_2} \dots {s_k\choose j_k}}{{D\choose n}}.
\end{equation}

To save computational time and overcome the sparsity of the vector $J$, during sampling I used only attractiveness scores of the top $n$ most probable clusters. Because $n$ was assumed to be relatively small, I was able to define all possible vectors $J$ in advance such that $j_1\geq j_2\geq j_3\ldots\geq j_k; k \in n$ using integer partitioning.

In order to make the sampling results also dependent on $M$ I used $M$ as a constraint for $J$, so that 

$$ \forall j \in J,m \in M: j_i\le m_i.$$

If this condition could not be satisfied for some $i$, then I did the assignment ${j_i\gets m_i}$ and the remainder $j_i-m_i$ was added to the leftmost possible element of $J$.  However, for 108 menus, it was still the case that there was no valid $J$ that complied with these rules. I simply removed those menus from the analysis. 

To better understand the approach, let $J$ be $[4, 3, 2, 1,\ldots,0]$ and $M$ be $[8, 2, 4, 1,\dots, 6]$. Then, $j_2\geq m_2$, which violates the above constraint. So, the assignment $j_2\gets2$ is made and the remainder $1$ is added to the leftmost possible element of $J$. The final result becomes [5,2,2,1,\ldots,0].

Finally, by randomly sampling according to equation \ref{eq:hypergeometric2} one hundred thousand times, I picked our most likely $J$ by finding the most repeated sample. Then, by selecting the top $j_1, j_2, \dots, j_k$ options from the top $n$ most probable clusters based on the attractiveness score of the options obtained using equation \ref{eq:multivariateLogit}.

After identifying the clusters and the number of elements to select from, two methods were applied to generate the final choice set. The first method was stochastic and consisted of randomly selecting elements according to the vector $J$. The second method was deterministic and used the price of the option as a determinant. The cheapest options were selected according to $J$. As a baseline, I used the same two approaches, but the selection was made without regard to $J$. Therefore, the baseline of the first method was the random selection of one option from every cluster. The baseline of the second method was the selection of the cheapest option from every cluster. Recall that there were two different clustering methods, AP and KM, used. Hence, I applied four models per clustering method:

\begin{itemize}
    \item $Model 1$. Random selection following $J$
    \item $Model 1b$. Random selection (baseline of model one)
    \item $Model 2$. Selection of the cheapest options following $J$
    \item $Model 2b$. Selection of the cheapest options (baseline of model two)
\end{itemize}

\textbf{Performance metrics}

To evaluate each model's performance, I used accuracy at top-N, which is a commonly used metric not only in classification tasks but also in RS design studies, especially for context-based recommendations \citep{ricci2010recsystems}. In classification, it measures whether the actual class is in the top N predicted classes of the model. Similarly, in the RS design it measures if the chosen option is among the top N suggestions of the system. I complied with the existing literature and selected the accuracy in the top-5 and top-10 as our evaluation metrics \cite{cremonesi2010performance}. 

Consequently, $n = 5$ and $n = 10$ were chosen. Therefore, all possible $J$ values were found via integer partitioning of five and ten, which gave us seven and 42 possible variations accordingly.

\subsection{Results}

\textbf{Clustering results}
    
The results of the clustering methods are clearly different. While AP tended to create fewer but larger clusters (7.62 on average), KM generally identified more clusters (10.29 on average)  with relatively smaller sizes. This indicates that both algorithms were able to identify the contextual information but in different ways. 

The runtimes of these algorithms also differed considerably. Because AP did not need an initial number of clusters, while for KM we had to compute the optimal cluster count in each menu, for the same menu, AP converged on average 7.2 times faster. This makes AP more viable for larger choice spaces.

\textbf{First stage results}

Table \ref{tab:desciptiveResultsTwoStageChoiceModelFirstStage} gives descriptive information about the structure of the consideration sets for the different clustering methods. We notice the similarities between AP and KM in terms of the average number of clusters present in the choice sets. Despite the different contexts identified by those algorithms in the clustering phase, both algorithms appeared to identify the ``important'' clusters. One can also see that KM resulted in more variance, yet generated less diverse consideration sets in general. In contrast, AP appeared to be more robust when it came to different configurations of choice environment and was able to generate consideration sets that were more distinct. 


\begin{table}[!h]
    \centering
    \begin{tabular}{lcccc}\hline
     & \multicolumn{2}{c}{$n=5$} & \multicolumn{2}{c}{$n=10$}\\
     & AP & KM & AP & KM\\\hline
    Mean & 2.74 & 2.59 & 3.62 & 3.41\\
    Standard deviation\hspace{5mm} & 0.84 & 1.16 & 0.98 & 1.61\\
    Minimum & 1 & 1 & 1 & 1\\
    Maximum & 5 & 5 & 7 & 10\\\hline
    \end{tabular}
    \caption{Consideration set structure as unique clusters across clustering methods.}
    \label{tab:desciptiveResultsTwoStageChoiceModelFirstStage}
\end{table}

\textbf{Second stage results}

One can see that both clustering methods were also robust to the selection methods used in the second stage. This indicates that item-side contextual information helps capture the choice environment better and also provides meaningful insights into the consumer behavior.
 
Both models considerably outperformed their baseline counterparts. Stochastic models performed in general better in KM than in AP, which is not surprising. The main reason for this is that KM identified smaller clusters, and therefore the chance of randomly selecting a correct option was higher. This difference decreased in cases where the selection was made based on deterministic rules.

The performance of models that use a deterministic rule to make the selections may indicate that consumers use multiple determinants as criteria during the decision-making process, which also complies with previous findings \citep{bettman1979memory, lee2004effect}. Table \ref{tab:mainResultsTwoStageModeling} summarizes the results from the second stage.

\begin{table}
    \centering
    \begin{tabular}{lcccc}\hline
     & \multicolumn{2}{c}{$n=5$} & \multicolumn{2}{c}{$n=10$}\\
     & AP & KM & AP & KM\\\hline
    Model 1\hspace{20mm} & 0.39 & 0.40 & 0.56 & 0.55\\
    Model 1b & 0.21 & 0.23 & 0.21 & 0.25\\
    Model 2 & 0.49 & 0.48 & 0.63 & 0.62\\
    Model 2b & 0.32 & 0.32 & 0.32 & 0.34\\\hline
    \end{tabular}
    \caption{Top-5 and top-10 accuracy scores across clustering methods.}
    \label{tab:mainResultsTwoStageModeling}
\end{table}

\subsection{Conclusion}

I have proposed a novel approach to tackling the user-side continuous cold-start problem in RS design. By using the contextual information of the menu, we were able to generate relevant choice sets using a two-step choice modeling approach. The structural approach to choice set generation proved to be robust not only to selection criteria, be it stochastic or deterministic, but also to the clustering method used. Because in an online environment, the calculation time is critically important, using AP as the clustering method appears to be advantageous.

The findings of this work can be implemented by various systems that face continuous cold-start problems. They also help to understand the decision-making process of consumers, and hence reduce their search cost by introducing the most relevant alternatives. This also benefits the supply side via the reduction of the overall time spent by users on the platform.

This work has some limitations. RS design using simple one-stage MNL probabilities would result in 52\% and 65\% in top-5 and top-10 accuracy, respectively. Such random utility models violate Luce's choice axiom \citep{luce59}, which states that the choice probabilities of the options in the choice set must be equally affected by the introduction or removal of a new option. However, one possible way to improve our approach could be to integrate those probabilities into models in this study. Another possible avenue for future research could be using more complex characteristics derived from the choice set along with clustering.


In this study, I demonstrated the importance of context effects derived from choice sets in shaping user decisions, as well as the feasibility of incorporating these effects into the design of recommender systems.
Albeit enhancing the system's ability to align its recommendations on the account of the user's two-step decision-making process is a significant step forward, one must also recognize that context does not arise from the choice set alone. In previous chapters, I have used Tversky's \citeyearonly{tversky1972elimination} definition of context being ``the composition and the nature of the choice set, and availability of various options in it''. A more recent stream of research has also established that the context of choice also depends on user preferences, among other things \citep{dey2001understanding, adomavicius2011context}.

Research agrees that user preferences are not static and rather fluctuate according to individual-specific factors \citep{songWhenHowDiversify2019}. In light of this, it becomes clear that to help users choose, recommender systems must have tools that enable users to signal their preferences to them. Overall, this could be summarized as ``help me help you'' approach.

However, a significant challenge arises when investigating tools that can account for dynamic preferences, as the existing literature has studied it in specific domain settings. The lack of generalizability hinders the creation of more versatile recommender systems, which can adapt to varying contexts and user preferences. 

To bridge this gap, my next study ventures into the realm of domain-neutral choice settings. By controlling for the choice context, I can focus on the investigation of tools that ``let users help'', ultimately striving to contribute to the design of recommender systems that are responsive and adaptive. This argument creates strong motivation and positions my study in the next chapter as a necessary progression from this one.


\newpage


\section{User Control and Its Impact on Recommender Systems' Acceptance}\label{chapter:UserControlAndRS}
\begin{abstract}

    This study examines the influence of user control on recommender systems' acceptance in a context-independent experimental setting, using the Technology Acceptance Model as a theoretical framework. It confirms the original Technology Acceptance Model relationships, demonstrating that easy-to-use and useful recommender systems lead to higher user adoption rates. User control is found to be a crucial factor in explaining users' behavioral intention. The findings also reveal that different control methods have varying effects on user experiences, suggesting a need for dynamic user controls that align with user requirements. In conclusion, the study highlights the importance of user control in recommender systems and encourages further research into dynamic control mechanisms and more innovative approaches to increase user adoption.
    
\end{abstract}

\subsection{Introduction}

Internet has brought stores from all over the world to one's computer screen. It revolutionized e-commerce by connecting users with products. While making e-commerce widely accessible, it also created difficulties for users with a large number of products to choose from \citep{ricci2011introduction}. To mitigate this, recommender systems (RS) have been adopted \citep{kotkovSurveySerendipityRecommender2016}. Recommender systems are specialized information filtering tools that aim to suggest relevant items to users \citep{adomavicius2005toward}. The sheer amount of information present online make their existence essential, and in today's world, they have become omnipresent in our digital lives. When we try to choose the next Christmas gift, movie to watch over the weekend or a place to go during the next vacation, we use their recommendations to help us. RS are not only useful for consumers (or end users) to make a choice, but it is also essential for businesses as they offer more relevant items to users that otherwise would be undiscovered \citep{ricci2011introduction}. Using RS, businesses increase user interaction, have a strong effect on sales volumes and diversity \citep{songWhenHowDiversify2019}.

Recommendation generation can be categorized into three main categories: collaborative filtering, content-based filtering, and hybrid approaches \citep{burke2002hybrid}. Collaborative filtering uses the historical interactions of users with items and defines similar users and similar items. Then, it suggests items to users based on the items similar users have consumed \citep{schafer1999recommender}. For example, Spotify would suggest a new rap song to the user based on the rap songs other users with similar profiles have listened to \citep{jacobson2016music}. Although quite useful, this approach has two major drawbacks: it assumes that user preferences are static and do not change; it relies heavily on historical user data \citep{wang2011collaborative}. Content-based filtering is another technique which is widely used. It uses the properties of the items to decide on similar items and suggest them to users \citep{pazzani2007content}. For example, Google News analyzes articles based on their content and then suggests similar articles to users \citep{das2007google}. This approach, too, while quite useful, suffers from drawbacks. Recommending only similar items to those already purchased, it may lead to the notion that users are exposed to only a limited circle of items, also commonly known as filter bubble \citep{tintarevKnowingUnknownVisualising2018}. The hybrid approach, on the other hand, implied from its name, uses both content-based and collaborative techniques. Modern hybrid RS also utilizes more sophisticated deep learning-based approaches \citep{bahrainian2020deep}. Netflix can be considered the most widely known hybrid RS. Because hybrid techniques are based on content-based and collaborative filtering, they have inherited the main drawback of those two techniques, namely, adapting to evolving preferences. 

It was previously observed that users' preferences are prone to change over time. They become bored with the similarity of items they have consumed and suggested and want more personalisation \citep{songWhenHowDiversify2019}. This makes the RS adaptation to dynamic user preferences extremely important. To mitigate RS low ability to ``evolve'' alongside with the users' preferences, system engineers have introduced user control mechanisms to allow users to participate in personalisation. 

User control is referred to as the ``extent to which users can influence recommendation generation'' \citep{jannach2019explanations}. It was defined as the ability of users to influence the recommendation process and its results \citep{knijnenburgExplainingUserExperience2012}. User control is necessary for several reasons. Firstly, it allows system designers to directly tackle one of the biggest drawbacks of RS, namely, adapting to evolving user preferences. It is achieved by allowing users to provide initial preference information, explicit or implicit feedback, or modify the recommendation algorithm to better align it with their current interests \citep{tintarev2015explaining}. Secondly, it improves the overall user experience, allows users to explore various aspects of the recommendation space, and come across new and serendipitous items \citep{tintarev2015explaining,kotkovSurveySerendipityRecommender2016}. 


However, studies investigating the effects of user control mechanisms on RS usage reached differing conclusions. The study on user control preference in conference RS setting derived that participants did not treat recommendations from control enabled systems differently from RS without user control \citep{jameson2002pros}. The importance of user control to users varies among consumers depending on ``the nature of the application, its adoption, and individual characteristics'' and users were found to prefer simple control mechanisms to more sophisticated ones \citep{knijnenburg2011each, knijnenburgExplainingUserExperience2012}. The ability to control recommendation generation was found to positively affect user experience in music RS \citep{bostandjiev2012tasteweights, knijnenburgExplainingUserExperience2012}. Another study on different implementations of control mechanisms in music RS concluded that despite the positive correlation between the user satisfaction and usage of such mechanisms, user control does not always end with satisfaction. The authors also found that people who had below average domain interest were satisfied even with non-accurate recommendations \citep{hijikataStudyUserIntervention2014}.

Because of the complexity of RS and the nature of the studies, it is not surprising that most of the studies about user control in RS have involved either music, movie, or news RS. The main reason for that has been their availability. Those systems were already pre-trained, and one had to either download the data about user profiles, ask users to log-in with their existing accounts, or create a new account. This approach has several drawbacks. Firstly, relying on pre-trained systems and using existing user profile data can lead to selection bias. These systems were trained and designed only for use in a given domain, and hence conclusions of such studies may have limited applicability to other domains. Secondly, asking users to use their own accounts or create new ones can cause privacy concerns as users may be reluctant to share their personal preferences, which can result in self-selection bias, thus reducing the representativeness of the study \citep{belanger2011privacy}. Thirdly, using existing RS would limit the ability of researchers to amend the specific features or algorithms of such systems, hence constraining the scope of the experimentation.

Another aspect to keep in mind is the domain knowledge. Users with a high domain knowledge generally tend to be more resistant to RS suggestions. Additionally, users with extensive domain knowledge may not be a good representation of the general user population and may tend to engage with RS in a way that the recommendations align with their pre-existing beliefs, leading to reduced diversity and ``bubble'' effects \citep{mollerNotBlameIt2018}.  

The limitations discussed above make the domain-neutral setting significantly important in RS studies. While it is nearly impossible to come up with a pure ``domain-neutral'' setting, it is still possible to create a setting which has eliminated the most of domain-specific constraints. It would allow for examining the behavior of users with varying expertise levels in RS and achieve a more comprehensive understanding of how users with heterogeneous expertise levels interact with RS. This will not only facilitate the development of more adaptive RS, but also allow the conclusions of studies to be more generalizable. Having a setting not tied to any domain would also allow researchers to fine tune different features of RS, its algorithm, which is hardly possible when using domain-specific RS. Lastly, the domain-neutral setting alleviates the impact of potential biases, such as confirmation bias, anchoring bias, and overconfidence, which are highly likely among users with high domain knowledge \citep{hijikata2012relation}.

To the best of my knowledge, no other study has investigated user control and its effect on RS usage in a domain-neutral setting. I aim to fill this gap by designing an experimental setting which is not only domain-free, but also has an RS with no specific assumptions about users' preferences. I used the Davis Technology Acceptance Model (TAM) which is widely used in studies regarding the usage of a specific technology \citep{davis1985technology} to test our hypotheses.

\subsection{Definitions and scopes}

The importance of precisely defining the concepts and constructs used in this study is crucial. Defining the key constructs not only establishes a common understanding, but also provides boundary conditions and ensures the reliability of the research findings \citep{creswell2013research}.

By clearly defining RS one is able to distinguish among related concepts such as recommendation agents (RA) and interactive decision aid tools (IDA) which while sharing some commonalities, also entail some important differences in their goals, functionalities, and user interactions \citep{xiao2007commerce}. By specifying the characteristics of RS in our particular context, my aim is to ensure the relevance of the findings. This is also important considering the methodology and framework.

Similarly, defining user control is also important for understanding the various methods and tools the user can use to influence the recommendation generation process. Different conceptualizations and methods have been explored in the literature \citep{pu2012evaluating}. Although these methods have a common end goal, they differ in complexity, effect time frame, and technical knowledge required to understand the causal links between recommendations generation manipulations and the outcomes. By specifying the exact definition of user control in this study, the objective is to ensure that the construct is measurable and generalizable. 

Henceforth, in the following subsections, I discuss the definitions of recommender systems and user control in detail and justify our choices for this study.

\textbf{Recommender systems}\label{chapter:recommenderSystemsAgentsDifferentiation}

The emergence of systems which provide personalized or tailored recommendations is closely connected to the rise of e-commerce and Web-based technologies. In the literature such systems are described using different terms such as recommendation agents (RA), interactive decision aids (IDA) and RS. Thus, I will address the overlap between these terms and clarify the definition of RS used for this study.

RA has been conceptualized in the literature as ``an interactive decision aid that helps consumers in the initial selection of alternatives that are available in
an online store'' \citep{haubl2000consumer, xiao2007commerce}. They are predominantly implemented in e-Commerce, education, and organizational knowledge management contexts \citep{xiao2007commerce}. In the e-commerce setting,  RAs are implemented in the initial product search phase on e-commerce websites, like Macy's or Amazon's laptop selection assistant shown in the figure \ref{fig:amazonRA}. The user is asked about their preferences in a conversational, filter/table style, or some other format, and then, based on the stated preferences, RA generates the ``recommendations''. One distinct feature of RA is that it recommends options that satisfy all preferences stated by the user. If one option does not meet one of the preference requirements, it has zero chance of appearing in the recommended list \citep{haubl2000consumer}. Another distinct feature of RA is that they consider the focal user's preferences in ``isolation'' , which means other potentially relevant information, such as choice context, other characteristics of the user or the user group are left unused \citep{wang2008attributions}.

\begin{figure}[H]
    \centering
    \includegraphics[width=0.5\textwidth]{staticFiles/amazonLaptopHandpicked.PNG}
    \caption{Amazon recommender agent for laptop selection.}
    \label{fig:amazonRA}
\end{figure}

Interactive decision aids, on the other hand, are a ``web-based decision support system that elicits user preferences, performs a search on their behalf, and provides them with a subsequent product list'' \citep{maes1999agents}. They have a predefined set of decision rules which can give users a feeling of restriction in expressing their preference while interacting with the system and dissatisfaction with the overall process \citep{wang2009interactive, silver1988user}.

Recommender systems are tools that use information about users, their preference, historical user-item interactions, and contextual information \citep{adomavicius2005toward}. Context here is referred to as ``any information that describes the characteristics of the user or entity that is relevant to the interaction, e.g., a person, place, object, attributes, or the application itself'' \citep{dey2001understanding}.  While RS is conceptually similar to IDA and RA there are differences between them.

Firstly, RS covers all stages of users' interaction with the system. Whereas RA mostly focus on the initial phase of the interaction. Hence, RS can adapt to changing or evolving user preferences and needs \citep{songWhenHowDiversify2019}. This broad scope allows RS to provide a more comprehensive and satisfying user experience. For example, Netflix's RS regularly adapts its recommendation algorithms based on users viewing history, the genres they prefer, and interaction with the platform over time.  

Secondly, RS offer more dynamic decision rules when compared to IDA. While both systems use similar ways of eliciting user preferences, RS has more flexible and dynamic rules in doing so. RS allows users to go back to previous elicitation stages and change their preferences. Such flexibility enhances the relevance of provided recommendations and leads to greater engagement. For example, Spotify uses RS that dynamically adjusts its algorithm based on users' listening habits, discovered artists, and changing moods \footnote{Here mood is not necessarily referring to user's personal mood, but to the mood of the songs.}. 

Thirdly, RS use a more complex set of information to generate recommendations. Unlike RA which only considers the focal user's preferences, RS also utilizes the preference information of the user group it thinks the focal user belongs to. Also, unlike IDAs which predominantly use dialogue like elicitation, RS uses different combinations of elicitation methods. For example, in the same Spotify, one can use like dislike icons to state the preference for a given song, artist, and genre. In addition, they can generate recommendations based on the sole item within a playlist. Such interactions often lead to more serendipitous recommendations and greater overall satisfaction \citep{kotkovSurveySerendipityRecommender2016}. All in all, for the scope of this study, I refer to online systems which have: a) dynamic decision rules, b) elicit user preferences in all stages of users' interaction with the system, and c) uses item features in recommendation generation as RS.

\textbf{User control}

User control is referred to as ``the extent to which users have control over recommendation generation'' \citep{knijnenburgExplainingUserExperience2012}. Depending on the interaction history of the user with RS, two main stages of user control can be distinguished: 1) control in the preference elicitation phase, 2) control after recommendation generation \citep{jannach2017user}.

The first stage commonly consists of preference forms, interactive conversation, and critiquing \citep{jannach2019explanations}. In the preference form, users are asked predefined questions about their search. The answers may come in the form of selection from drop-down list, user input, or slider-style input. Figure \ref{fig:preference} shows an example of this method. Users can express their preferences clearly and directly.  However, when the preference form is too long or too complex, users may disengage \citep{jannach2017user}. Moreover, it might be difficult to interpret the underlying meaning of these preferences. For example, when users utilize slider-based inputs, it might not be obvious to them what the effect is of having all sliders in the middle, far left or far right \citep{jannach2017user}. Another disadvantage of this method is when the user preference does not exist in the form at all.

\begin{figure}[H]
    \centering
    \includegraphics[width=0.5\textwidth]{staticFiles/preferenceForm.png}
    \caption[Preference form in RS]{Preference form on movie and news RS. Sourced from from Jannach et. al \citeyearonly{jannach2017user}.}
    \label{fig:preference}
\end{figure}

The interactive conversation, on the other hand, converts the preference building process into a multistage dialogue where the next steps are generated based on the responses to previous ones \citep{he2016interactive}. It ensures that users provide feedback more naturally and less intrusive. This method is especially used in the so-called ``cold-start'' events, where RS has no previous knowledge about the user \citep{guAddressingColdStartProblem2019}. However, the drawback of this method is that the results are not immediately visible and preference updating occurs only upon completion of the procedure. Also, if the user wants to change their previous answers, they have to restart that stage again. Designing a good dialogue-based system is not possible without taking into account the trade-off between the relevance of the dialogue and the cognitive requirements of the procedure \citep{Gao_2021}. 

In contrast to interactive conversation, when critiquing, users state their preferences directly on features of the products or on the products themselves, as seen in figure \ref{fig:critiquing}. It is a more straightforward way to provide focused feedback. However, users may not always be able to articulate their preferences this way, as they might have to review and critique multiple items. Moreover, it might not always be clear how critiquing a particular recommendation affects the algorithm. For example, in the figure \ref{fig:critiquing} it might not be obvious to a user what the word ``this'' refers to in terms of algorithm change if she selected ``See fewer stories like this''.

The second stage of user control entails functionalities which allow users to influence recommendations after they are generated. Previous approaches have used sorting, feature- and weight-based filtering, and critiquing. \citep{swearingen2001beyond, schafer2002meta, bostandjiev2012tasteweights, schaffer2015hypothetical, jannach2017user}. 

Considering the number of various tools available to achieve user control in different stages of user interaction with RS, it is important to define the scope of user control for this study. I follow the user control requirements from previous studies \citep{jannach2017user}. User control must be easy to understand, have an immediate effect on recommendations, and must not interfere with user's interaction with the system. Hence, in this study, I will refer to critiquing and sorting as user control. 

\begin{figure}[H]
    \centering
    \includegraphics[width=0.5\textwidth]{staticFiles/critiquingRS.PNG}
    \caption[Critiquing in RS]{Critiquing on Google News. Sourced from Jannach et al. \citeyearonly{jannach2017user}.}
    \label{fig:critiquing}
\end{figure}

\subsection{Theoretical background and hypotheses development}

It is essential to base this study on a theoretical framework. As mentioned above, I use the technology acceptance model of Davis \citeyearonly{davis1985technology}. Currently there are two dominant technology acceptance models used in the literature: Unified Theory of Acceptance and Use of Technology (UTAUT) and Technology Acceptance Model (TAM) \citep{davis1985technology, venkatesh2003utaut}. Both models provide valuable insight into understanding the acceptance of information systems by users. In the following section, I will briefly discuss both TAM and UTAUT and argue why I have ultimately selected TAM for our study. Then hypotheses will be developed.

\subsection{Models of technology acceptance}

The continuous quest to understand the acceptance of a certain technology by users is an ongoing challenge in management studies \citep{schwarz2007looking, williams2009contemporary}.  Rapidly increasing information system implementations, their crucial role in the modern business world, and their under-utilization problems have made the issue of acceptance of a technology central \citep{lancelotmiltgenDeterminantsEnduserAcceptance2013}. During the last few decades, increased interest of the research community in addressing this question has resulted in the development of two major theories and models of technology acceptance. 

The Technology Acceptance Model (TAM), which was proposed by Davis \citeyearonly{davis1985technology}, is a widely accepted and influential model to understand and predict the acceptance and adoption of information systems by users. It was derived from the social-psychology based Theory of Reasoned Action and Theory of Reasoned Behavior that have explained individuals' behavioral intentions and actions. TAM simplifies and adapts these two models and assumes the mediating functions of ``perceived ease of use'' and ``perceived usefulness'' in explaining the relationship of the characteristics of the system and the actual use of the systems \citep{marangunic2015technology}. 

Perceived Usefulness (PU) - According to Davis, PU refers to ``the degree to which a user believes that the use of a particular technology will improve his or her job performance or provide benefits'' \citep{davis1985technology}. The construct of PU is detrimental to TAM as it directly influences the intentions of users to adopt and use the technology. Empirical evidence has suggested that users are likely adopt a system ``if they perceive it useful and capable of improving their performance'' \citep{davis1985technology}.

Perceived Ease of Use (PE) - It represents ``the degree to which a user believes that the use of this specific technology will be easy to use'' \citep{davis1985technology}. That is, it captures the user's perception of the ease of the system's use. It not only directly impacts user's intentions to adopt a technology, but it also has an indirect effect on those intentions through PU, i.e. users may perceive easier to use systems more useful.  

Due to its simplicity and versatility, after introduction, TAM has been used with a number of extensions and modifications to better suit a specific research context and include relevant constructs \citep{featherman2003predictingextension, amoako2004extension, burton2006mediationextension}. Subsequently, those modifications have led scholars to build on TAM. TAM2 \citep{venkateshDeterminantsPerceivedEase2000} and TAM3 \citep{venkateshTechnologyAcceptanceModel2008}, UTAUT \citep{venkatesh2003utaut} have been introduced. A detailed review of TAM can be found in the study of Marangunic \citeyearonly{marangunic2015technology}.

UTAUT is another model introduced by Venkatesh which is a more complex and integrative model. It is a synthesis of eight established models, including TAM \citep{venkatesh2003utaut}. UTAUT considers four main constructs, performance expectancy, effort expectancy, social influence, and facilitating conditions as main determinants of behavioral intentions and usage behavior. The definitions of performance expectations and effort expectancy are heavily influenced by Tam's PE and PU. Social influence refers to the extent to which a user perceives that important others believe that they should use a specific technology. Facilitating conditions is the degree to which a user believes that an organizational and technical infrastructure exists to support the use of a technology. UTAUT also entails four moderators - age, gender, experience, and voluntariness as moderators. Like TAM, UTAUT has also been extensively used in various research contexts with different technologies, from communication \citep{wu2007empirical}, specialized business systems \citep{kijsanayotin2009factors}, to general purpose technologies \citep{abu2010internet}. For a more comprehensive review, see the work of Williams \citeyearonly{williams2015unified}. 

TAM was chosen as a theoretical framework for this study. TAM is a more suitable model than UTAUT for the current study because it focuses on the cognitive and affective aspects of user acceptance that are most relevant and measurable in this context. TAM is also a simpler and more parsimonious model than UTAUT, making it easier to operationalize and measure in an online experiment environment. TAM requires only two main factors to be manipulated and measured: perceived usefulness and perceived ease-of-use. Both of these factors can be altered by changing the level of user control over the RS. UTAUT, on the other hand, involves four main factors. These factors are more complex and may not be easily varied or measured in an online experiment. For example, social influence may depend on the presence and feedback of other users, which is not available in an online experimental setting , as participation is simultaneous. Facilitating conditions may depend on the availability and quality of technical support, which may not necessarily be relevant or consistent given the online nature of the experimental setting.

\textbf{Hypothesis development}

Personalization has been proven to be one of the key elements enhancing user satisfaction with RS. Numerous studies have emphasized the importance of personalization in delivering tailored recommendations that better reflect user needs, resulting in greater satisfaction \citep{bostandjiev2012tasteweights, hijikata2012relation, knijnenburgExplainingUserExperience2012, songWhenHowDiversify2019}. The underlying idea is that personalized recommendations reflect better on individual preferences, thereby streamlining decision-making processes \citep{adomavicius2005toward}. A consistent finding in the research is that end-users are more satisfied with information systems when they consider these systems more useful \citep{mahmood2000variables, venkatesh2003utaut}. As described in the Technology Acceptance Model (TAM), perceived usefulness is an important factor in determining an individual's intention to adopt a technology and, as a result, its acceptance \citep{davis1985technology}. Systems perceived as more useful are believed to be better equipped to respond to users' needs and expectations, resulting in increased satisfaction and acceptance levels \citep{venkateshTechnologyAcceptanceModel2008}.

User control is recognized as an essential mechanism to achieve personalization in RS \citep{jannach2017user}. By enabling users to influence recommendation processes, RS can better suit individual preferences, thus producing more tailored and relevant suggestions. The relationship between personalization, perceived usefulness, and user control can be elucidated by examining how user control improves personalized experiences. When users have greater control over RS, they can refine the system to better match their preferences and requirements \citep{chen2014designing}. Considering the associations between personalization, perceived usefulness, and user control, I am positing the following hypothesis:

\textbf{H1}:  \textit{Higher degree of user control in RS leads to increased perceived usefulness of RS}. 

User control represents an important component in modern recommender systems, as it empowers end-users to influence the recommendation generation process and provide feedback on suggested items \citep{pu2012evaluating}. Such control mechanisms include setting preferences, employing preference filters, and providing explicit and implicit feedback on recommended items. A main concern with the incorporation of user control into recommender systems is the transition from passive recommendation consumption to active system participation \citep{knijnenburg2011each}. This change requires that users make decisions and provide input, including specifying preferences and implementing preference filters. Such supplementary steps might be time-consuming and may not be fully aligned with user expectations of seamless system interactions \citep{xiao2007commerce}. While increased user control may yield more personalized and precise recommendations as a result of fine-tuned preference profiles, it simultaneously requires users to take additional steps, potentially undermining the system's ease of use. Additionally, while feedback contributes to the refinement of the recommendation algorithm, users may exhibit reluctance or an inability to invest time and effort consistently in assessing recommendations and delivering feedback to the system \citep{herlocker2000explaining}. This additional step also imposes cognitive demands, as users must assess the relevance of the recommendation, determine suitable feedback, and articulate it. \citep{knijnenburgExplainingUserExperience2012}. Hence, I derive the following hypothesis:

\textbf{H2}: \textit{Higher degree of user control will lead to a reduction in ease of use of RS}. 

As the Information Systems (IS) domain expanded, scholars have turned their focus to the role of other factors in improving user adoption and continued use of innovative technologies. Users have been found to intend to use specific IT when they believed that they could control the way technology behaves \citep{teo2019students}.  Having the ability to navigate the potential hazards and uncertainties associated with the use of unfamiliar technologies is crucial \citep{gefen2003trust}. In addition to controlling the process, controlling the content that software generates has been associated with increased usage in IT systems \citep{lee2006empirical}.  Within this context, the interplay between user control algorithm-driven  frameworks, such as recommendation systems, has emerged as a focal point of interest.

Recommendation systems represent IT systems that utilize algorithms to provide personalized content, merchandise or services to users, based on their preferences, historical data, and additional contextual factors \citep{portugal2018use}. As these systems become increasingly ubiquitous and permeate diverse aspects of everyday life, it is essential for both researchers and practitioners to comprehend the elements that shape user intention to utilize such systems.

Scholars have concluded that users showed a higher propensity to adopt algorithms when given even slight control over the algorithm's operations \citep{dietvorstEtAl18}. In view of these insights, I advance the following hypothesis:

\textbf{H3}: \textit{H3: User control leads to higher behavioral intention (BI) to use RS.}

TAM also proposes that perceived ease of use and perceived usefulness predict the intention of the behavior to use a specific technology. Moreover, empirical evidence also shows that perceived ease of use also has a direct effect not only on intention to use, but also through perceived usefulness \citep{davisPerceivedUsefulnessPerceived1989}. Hence, in this light, I presume that the same would be true in RS: 

\textbf{H4}: \textit{H4: Higher perceived usefulness leads to higher intentions to use RS.}

\textbf{H5}: \textit{H5: Higher perceived ease of use leads to higher intentions to use RS.}

\textbf{H6}: \textit{H6: Higher perceived ease of use leads to higher perceived usefulness of RS.}

Figure \ref{tam} summarizes the theoretical model and hypotheses in this study.
\begin{figure}[ht]
\centering
    \begin{tikzpicture}[node distance=2cm, auto, 
                    latent/.style={rectangle, draw, align=center, scale=1.3},
                    every label/.style={scale=1.3},
                    every edge/.style={scale=1.3},
                    ]
      % Latent variables
      \node[latent] (percontrol) {User\\Control};
      \node[latent] (pease) [below right=1.5cm and 3cm of percontrol] {Perceived\\ease of use};
      \node[latent] (puseful) [above right=1.5cm and 3cm of percontrol] {Perceived\\usefulness};
      \node[latent] (bi) [right=9cm of percontrol] {Behavioral\\intention};
    
      % Paths with labels
      \draw[-latex, line width=1pt] (percontrol) -- node[midway,above right, yshift=0.4cm] {H1} (puseful);
      \draw[-latex, line width=1pt] (percontrol) -- node[midway,below right, yshift=0.4cm] {H2} (pease);
      \draw[-latex, line width=1pt] (puseful) -- node[midway,above, yshift=0.2cm] {H4} (bi);
      \draw[-latex, line width=1pt] (pease) -- node[midway,below] {H5} (bi);
      \draw[-latex, line width=1pt] (pease) -- node[midway,below left] {H6} (puseful);
    \draw[-latex, line width=1pt] (percontrol) -- (bi);
      \node[above] at ($(percontrol)!.7!(bi)$) {H3};
    
\end{tikzpicture}
  \caption[TAM model framework]{Model framework used in this paper. Sourced from Davis \citeyearonly{davisPerceivedUsefulnessPerceived1989}. H4, H5 and H6 are the original relationships in Davis' work.}
  \label{tam}
  \end{figure}


\subsection{Methodology}

To test these hypotheses I have conducted an online experiment with participants (hereafter users and participants are used interchangeably) residing in US recruited through Amazon MTurk. The experiment was hosted on a third-party website and consisted of an information search task and a questionnaire. I found information search task suitable for this study firstly because information search tasks can represent real-world scenarios where consumers are deciding between various options and also it is where the application of recommender systems is at its highest. Secondly, such a setting encourages users to interact with the recommender system, as active involvement is crucial to understand users' acceptance of such technology. Third, an information search task provides an opportunity to directly measure the user's performance, such as the time taken to complete the task, the number of recommendations used, and the quality of the results obtained. These metrics not only can be used to evaluate the effectiveness of the recommendations, but also create an opportunity for us to tie users' performance to their remuneration. In previous studies, the experiments that involved recommender systems have either used an already existing mainstream recommender system where participants were asked to log in to those systems with their personal accounts, or used recommender systems with preference-elicitation phase that would require the user to rate the required number of items before proceeding with the experiment \citep{millecampControllingSpotifyRecommendations2018}. 

One can argue that the information search task may not be able to capture a variety of possible use cases of recommender systems. Recommender systems are used in various domains, such as e-Commerce, entertainment, social media, to count a few. Focusing only on an information search task may limit the generalizability of the study to other contexts. I address this concern in two ways. Firstly, I use randomly generated data in a multidimensional and multiattribute setting. It makes the task less dependent on any potential domain-specific knowledge, making the findings more generalizable. Secondly, I only recruit participants from MTurk who have masters qualification as they are more adaptable to varying tasks and are less likely to be affected by the task content. 

Another argument can be that users might have varying expertise in performing online tasks and tying their remuneration to their performance might not give them enough intentions to perform well. By recruiting only participants with masters qualification and implementing two-stage remuneration process, I address both parts of this argument.

Amazon uses complex criteria including the variety of tasks users have previously participated in, the approval rate, and the relative timeframe between participations to assign master's qualification to MTurkers (commonly used self-identification term by people participating in tasks on MTurk). I implement a two-stage remuneration process where for completion of the task, the participant is paid the base amount no matter their performance. Then, based on performance, an additional amount is paid up to the base amount as a bonus, i.e., based on your performance, you can get double the base amount as the total payment \footnote{Also, if the task has optional bonus and the user is getting little fraction of the bonus, this is also flagged by Amazon's internal rating systems and it affects the user's master qualification directly.}. Therefore, users are also eager to get higher bonuses.

Ultimately, with the proliferation of online experiments and portals like MTurk and Profilic where one can run such experiments, attention was drawn to various AI or simple rule-based bots, participants who would try to cheat in one way or another \citep{aguinis2020mturk}. To minimize such risks, I have implemented IP-based protection systems that would block the same user from participating again, or simultaneously participating in more than one experiment using different browsers \footnote{All IP-based information has been deleted after the completion of the data collection process.}. Also, users trying to use any custom scripts were blocked and not paid. Only participants who resided in the US were eligible for the study. In total, I collected 400 usable observations between November 2022 and February 2023. The sample size was determined prior to the study and analysis was conducted only after the data gathering process ended.

After participants completed the search task, they would proceed to the questionnaire stage. Their answers did not affect the amount of remuneration, but only those who completed the questionnaire and submitted the code they were given in the end to Mturk were considered entitled to payment.  The questionnaire consisted of 18 questions; three demographic, one attention check, and fourteen questions to measure variables of interest with seven-point Likert scale measurement were used: from 1-strongly disagree to 7- strongly agree. Two questions were used to measure behavioral intention and four per each of the remaining variables: user control, perceived ease of use, and perceived usefulness. The detailed description of the questions and measurement scales are available in the Appendix \ref{appendix:userControlExperimentQuestionnaire}. %\cite the appendix

\subsection{Task details}

In this section, I will discuss the information search task in detail. As soon as participants enter the portal where the experiment is hosted, they are provided with instructions on what to expect and how to perform. If any part of the instructions was unclear to them, they could navigate back to that part and read it again. They were informed that they would have 10 minutes to perform the search task, which would be followed by a questionnaire. Only those who complete the questionnaire would be eligible for payment. In the end, they would receive a ``survey code'' to provide to Mturk, which is the only way to signal the completion of the experiment. 

After the participants read the instructions, they saw the main screen of the information search task. In this task, users were presented with a choice task to select five options from the set of choices given according to predefined ``user preferences'' within 10 minutes. The choice set which was located on the left hand side of the screen and consisted of hundred randomly generated options with a feature vector $\vec f(F_1, F_2, F_3, F_4, F_5)$, where $ F_i \in U(0; 500)$. I decided to set the upper bound of the values in favor of 500 to increase the variance in the dimension to better reflect the real-world choice scenarios. Figure \ref{fig:experimentInterface} shows a randomly generated choice set inside a blue rectangle.

\begin{figure}
\begin{center}    \includegraphics[width=0.99\linewidth]{staticFiles/EDITEDexperimentScreenFULL.png}
    \caption[Experimental interface]{Experimental interface: selected options are highlighted in green. Neither sorting or hiding feature is available for this participant. Rectangles were added ex-post for identification purposes and were not part of the interface. Left rectangle corresponds to the choice set, upper right to recommendations and lower right to final selection.}
    \label{fig:experimentInterface}
\end{center}
\end{figure}

For this search task, participants were asked to maximize the simple utility function $U = F_1 + F_3 + F_5$. This type of utility function was selected for several reasons. First, this utility could be related to different choice settings in the real world. For example, in vacation planning, travelers often consider multiple factors such as destination, accommodation, activities, budget, and climate. A pragmatic traveler might put a premium on the destination, activities, and budget while being more flexible on accommodation and climate. For example, they might look for a trip to an exotic location with adventurous activities within their budget, even if it means staying in a more modest accommodation or traveling during the off-season. Another example could be related to music selection where the user could be satisfied with three genres she likes, for example, rock and rap and orchestral classic music, only when the next song has all of them (And We Run by Within Temptation, for example). And if we assume that the features $F_1 \dots F_5$ are inverse representations, it can also be applied to airfare choice, where the customer would search for the options which has the lowest price, flight duration, and layovers.  Second, such explicit definition of utility allows to evaluate subjects performance by comparing the average $u$ over five chosen options with the accurate ``ground truth'' which one can calculate given the choice set (i.e., the average of the top five options that maximize $u$). This is important as it allowed me to tie subject's remuneration to her performance and provide economic incentives for solving the task. Third, one can argue that we could select utility as a function of only two or any other combination of three features, for example, $U = F_1 + F_2 + F_4$. While I agree that this could be done, I have selected the utility function as it is to prevent two features of interest from being besides each other. This would add some level of complexity to the task which would require the participant to gaze focus on nonneighboring dimensions. Studies have previously concluded that eye movements occur in real-life choice scenarios between dimensions \citep{noguchi2018multialternative}.

I also decided to round the values $F_i$ to the nearest integer when presenting to reduce the cognitive capabilities required to perform the calculations. Although this may not fully mitigate cognitive difficulties in performing the task, having recruited only participants with masters qualification certainly helps to neutralize it more \citep{aguinis2020mturk}.

As depicted in the figure \ref{fig:experimentInterface}, along with the choice set, on the left-hand side of the screen, the user is presented with ten ``recommended options'' that they can use to solve the task. Users could select and deselect options from either the choice set or from the recommended options. Unlike previous studies about recommender systems that have used readily available systems like Movielens \citep{movielens2018}, Spotify \citep{millecampControllingSpotifyRecommendations2018} that use collaborative filtering and other complex machine learning-based algorithms, the decision was made to take a different approach. The main criteria in deciding were that it had to be a simple and natural recommendation algorithm, which resulted in diverse recommendations, while being easy for users to understand and also to make changes to. The algorithm was somewhat disconnected from the user's preferences. It simply collected options that have high value in each of the five option characteristics ($F_i$s). Given the attributes of the products, RS would assume by default that the user is interested in at least one of the attributes, as is the case in many real-world systems \citep{guAddressingColdStartProblem2019}. The algorithm presents participants with options possessing the highest values in each dimension, generally resulting in a higher expected utility compared to the expected utility of the random selection from the set of choices. However, the algorithm will not supply users with a set of options that directly corresponds to the search task's genuine solution, as it neglects the necessary pairing across three dimensions mandated by the task. The design of such an algorithm mirrors real-life decision-making scenarios in which the recommendation system lacks comprehensive information about the user's preferences and must make assumptions about preference weights across all dimensions \citep{scheinMethodsMetricsColdStart}.

Another feature of the recommender system is that users are also presented with the control mechanism to interact with the RS aiming to improve the recommendation set and thus simplify her search task. Users can eliminate up to three features from algorithm's consideration. Removing four out of five features would have resulted in options corresponding to top 10 values in the remaining feature, which would eliminate the usefulness of recommendations to users. At the beginning, the recommender algorithm takes all five features into account. Based on how many features the user has removed, the number of elements that correspond to the top values in each included dimension changes. Here is the summary of how the algorithm generates recommendations with different scenarios:

\begin{enumerate}
    \item  All features are present. In this case, options corresponding to the top two values for each feature are included in the final recommendation. There could exist cases in which one or more options correspond to top values in different features. I refer to this as an overlap event. In case of overlap, let us say that for the second top value, the algorithm will randomly decide which feature to ``keep'' for that option, and for the other feature(s) it will select the next order, which will be three. This process will continue until there are no overlaps and ten options are selected.
    \item  Four features are present. In this case, the algorithm will select the top two values per each feature and solve any overlap events. Then, for the remaining two options, it will randomly decide for which feature(s) it should include the remaining two variables which will randomly be either the third, or both the third and fourth top values. Lastly, it will solve any overlap events.
    \item  There are three features. Similarly to the case above, the top three elements are selected per feature. The overlap events are solved. The remaining option will be the fourth top element based on randomly selected features of the three. Finally, it will solve any overlap events.
    \item  There are two features. The algorithm will select five options for each feature corresponding to the top five elements in each feature. Then, the overlap events will be solved.
\end{enumerate}


The expected utility of ten recommendations could be calculated by $U_R = \sum_{r = 1}^{10}{U_r}$ where $U_R$ is the utility of the recommendations and $U_r$ is the utility of the single recommendation. The utility of a single recommendation could be calculated using $U_r = U_{F_1} + U_{F_3} + U_{F_5}$.

Assume that there is a vector $X$ that has 100 elements, where $X_i\sim U\left(0,1\right)$. If one orders this vector in ascending order, the expectation of the order statistic of $kth$ is calculated as
$$
E\left(X_{\left(k\right)}\right)=\frac{n!}{\left(k-1\right)!\left(n-k\right)!}\int_{0}^{1}{x^k\left[1-x\right]^{n-k}dx}
$$

$$
=\frac{\Gamma\left(n+1\right)}{\Gamma\left(k\right)\Gamma\left(n-k+1\right)}\int_{0}^{1}{x^k\left[1-x\right]^{n-k}}dx
$$
$$
=\frac{\Gamma\left(n+1\right)}{\Gamma\left(k\right)\Gamma\left(n-k+1\right)}\cdot\frac{\Gamma\left(k+1\right)\Gamma\left(n-k+1\right)}{\Gamma\left(n+2\right)}\int_{0}^{1}\frac{\Gamma\left(n+2\right)}{\Gamma\left(k+1\right)\Gamma\left(n-k+1\right)}x^k\left[1-x\right]^{n-k}dx
$$
$$
=\frac{\Gamma\left(n+1\right)}{\Gamma\left(k\right)\Gamma\left(n-k+1\right)}\cdot\frac{\Gamma\left(k+1\right)\Gamma\left(n-k+1\right)}{\Gamma\left(n+2\right)}
$$
$$
=\ \frac{\Gamma\left(k+1\right)\ \Gamma\left(n+1\right)}{\Gamma\left(k\right)\ \Gamma\left(n+2\right)}\ =\ \ \frac{k}{n+1} ,
$$
where $k$ is the order of the element we are interested in, $n$ is the number of elements in the vector. Subsequently, in case $X_i\sim U\left(0,\theta\right)$ which is this case ($\theta = 500$), the transformation ends up with 

$$
E\left(X_{\left(k\right)}\right)\ =\ \theta\frac{k}{n+1} .
$$
By applying this formula, I can calculate the mathematical expectation of each recommended option and as a result, the entire set of recommendations. The default recommendation set would have an expected utility of 894. If, for example, the user hides all useless features, $F_2$ and $F_4$, then the expected utility of the recommendation set would increase to 981 or by almost 10 percentage points. If the opposite happens, i.e., user hides all useful features, $F_1$, $F_3$ and $F_5$, then the expected utility of recommendation set would decrease significantly to 750 or by 16 percentage points from the default.

\begin{figure}
    \centering
    \includegraphics[width=0.8\linewidth]{staticFiles/ThreeFeaturesExpectationTop10SUMutility.png}
    \caption[The expectation of the recommendations]{The expectation of the recommendations given different scenarios. Diamond symbols represent the features affecting the algorithm. Their positions are arbitrary. The special case where only 1 feature affects the algorithm is included for convenience.}
    \label{fig:expectationGraph}
\end{figure}

An important aspect of this design is that, depending on user behavior, the usefulness of the recommendation system can actually be measured by measuring the change in overall return across all recommended options. Figure \ref{fig:expectationGraph} shows the expectation of the recommendations given the different scenarios. So, not only would I be able to study the differences in user behavior between experimental treatments but also could study which part of the performance differential is actually due to objective change in the quality of recommended options. Refer again to figure \ref{fig:experimentInterface} for additional details of the experimental interface.

Modern search interfaces usually have tools to facilitate a smoother user experience along with recommendations. These may include sorting and filtering features that are not part of the recommender system per se. However, they offer users help to find what they are looking for. For this reason, I have added a sorting feature as an additional characteristic. Users could sort options in increasing or decreasing order along each of five dimensions in both the choice set and the recommended options table on the right-hand side. This potentially helps them to solve the task, maximize $U$, but does not change the set of recommendations. Although this feature does not directly improve the recommendations, it makes it easier to solve the task and can be used in conjunction with the RS to complete the task faster. Therefore, this can be seen as a feature that creates control over the search task, but does not have an impact on the generation of recommendations \citep{jannach2017user}. 

Participants were randomly assigned to one of four treatments based on whether subjects have sorting and feature hiding capabilities. Users were rewarded with a fixed 2\$ and additional bonus up to 2\$ which was calculated as 

$$R = 1 + \frac{U_u}{U_a} ,$$

where $U_p$ is the performance of the user, which is the average utility of the five final options selected by the user and $U_m$ is the maximum performance, which is the average utility of five options that correspond to the actual options with the highest values of $U$. I performed 1000 simulations assuming participants would randomly select five options from a) default provided recommendations; b) recommendations as a result from hiding unnecessary columns (i.e. best case); c) recommendations as a result hiding necessary columns (i.e. worst case); d) choice sets without considering recommendations. Even in the worst-case scenario (scenario d), the average remuneration for an approximately 15 minute experiment is around 3.2\$ which is greater than the Federal minimum hourly wage  of 7.25\$ per hour \footnote{Retrieved from https://www.dol.gov/general/topic/wages/minimumwage. Applicable from March 2023.}. Figure \ref{fig:averageRemuneration} compares the remuneration of the simulations and the actual remuneration of the participants in the experiment.


\begin{figure}
    \centering
    \includegraphics[width=0.99\linewidth]{staticFiles/AverageRemunerationWorstToBest.png}
    \caption[Remuneration distribution]{Remuneration distribution in the simulations and the actual experiment.}
    \label{fig:averageRemuneration}
\end{figure}


\subsection{Results}

\textbf{Descriptive results}

I obtained 400 usable samples. The average respondent was a 42-year-old male with a bachelor's degree. Considering the average age in the United States being 38 and the median age ranging between 31 and 45 \footnote{Retrieved from https://worldpopulationreview.com/state-rankings/median-age-by-state .}, I can state that the sample corresponds well to the US population. 43\% of the respondents were female while having 4 non-binary individuals. 52\% of the respondents had a bachelor degree and 10\% also had a master degree. The results show that the group of participants is diverse in gender and education level and is not abnormal.

Before proceeding to the main results, it is important to assess the effectiveness of the recommender system. I have previously shown their hypothetical effectiveness in the figure \ref{fig:expectationGraph}. Figure \ref{fig:actualRecommendationPerformance} shows the performance observed of the recommendations given different scenarios. The recommendations which were generated using only nonuseful features as expected had the least utility, while recommendations comprising of only useful features topped the remaining scenarios. Recall that there were four treatment groups. Participants who could alter recommendation generation on average scored higher than those who could not. On the contrary, the sorting was found not to improve the overall performance of the participants. Table \ref{tab:rewardandcompletion} shows descriptive information about the groups and their performances.

\begin{figure}
    \centering
    \includegraphics[width=0.99\linewidth]{staticFiles/ThreeFeaturesObservedTop10SUMutility.png}
    \caption[Observed utilities of the recommendations]{The observed utilities of the recommendations given different scenarios. Diamond symbols represent the features affecting the algorithm. Their positions are arbitrary. Dashed line represents the average utility of participants in the search task.}
    \label{fig:actualRecommendationPerformance}
\end{figure}


\begin{table}[ht]
\centering
\begin{tabular}{p{2cm}lllp{2cm}p{2.5cm}}
\hline
\textbf{Treatment group} & \textbf{Hiding} & \textbf{Sorting} & \textbf{Count} & \textbf{Mean bonus} & \textbf{Mean completion time} \\ \hline
1                        & No               & No              & 100           & 1.78               & 257                          \\
2                        & No              & Yes              & 100           & 1.74               & 260                          \\
3                        & Yes               & No             & 99           & 1.81               & 239                          \\
4                        & Yes              & Yes             & 101           & 1.81               & 246                          \\ \hline
\end{tabular}
\caption[Overview of treatment groups]{Overview of treatment groups and reward and time spent (in seconds) on the search task.}
\label{tab:rewardandcompletion}
\end{table}

One advantage of having an external experiment setup was that it allowed me to measure all experiment-related activity of the participants. They ranged from selection and deselection of a particular option to sorting and hiding of particular features. From the final choice of the participants, I identified which options were selected from the recommendation table and which ones were selected from the overall choice set. To test the effects of sorting and hiding, I have conducted t-tests comparing treatment groups on three metrics: completion time of the search task, amount of performance-based bonus and the share of the options which were chosen from the recommendation set.  Despite its statistical insignificance, by analyzing the share of the options selected from recommendations in the participant's submission, it was concluded that users who could change the way the recommendations are generated were more likely to select the final options from the recommendations, when they could not sort. Furthermore, participants who could hide finished the search task on average faster than the other group. Sorting alone did not lead to increased performance among participants. However, the combination of sorting and hiding did have a positive and significant effect on user performance.
Table \ref{tab:ttests} shows the results of the comparisons between different treatment groups.

\clearpage
\begin{sidewaystable}[ht]
    \centering
    \small
    \begin{tabular*}{\textheight}{@{\extracolsep{\fill}} >{\raggedright\arraybackslash}p{1.1cm}>{\raggedright\arraybackslash}p{1.1cm}>{\centering\arraybackslash}l>{\footnotesize}l>{\footnotesize}l>{\footnotesize}l>{\footnotesize}l>{\footnotesize}l>{\footnotesize}l>{\footnotesize}l>{\footnotesize}l}
    
    \toprule
    \multirow{2}{*}{Effect} & \multirow{2}{*}{Treatments} & \multicolumn{3}{c}{\parbox{2cm}{\centering Mean bonus}} & \multicolumn{3}{c}{\parbox{2.5cm}{\centering Mean completion time in seconds}} & \multicolumn{3}{c}{\parbox{3cm}{\centering Selection share from recommendations}} \\

    \cmidrule(lr){3-5} \cmidrule(lr){6-8} \cmidrule(lr){9-11}
        &  & Effect size & t-stat & p-value & Effect size & t-stat & p-value & Effect size & t-stat & p-value \\ \midrule
        Sorting & 2 vs 1 & -0.04 & -1.23 & 0.90 & 3 & 0.36 & 0.72 & -0.01 & -0.61 & 0.54 \\
        Sorting & 4 vs 3 & -0.01 & -0.01 & 0.50 & 7 & 0.86 & 0.39 & -0.03 & -0.11 & 0.91 \\
        Hiding & 3 vs 1 & 0.03 & 1.32 & 0.10 & -18 & -2.23 & 0.02 & 0.01 & 0.06 & 0.95 \\
        Hiding & 4 vs 2 & 0.07 & 2.35 & 0.01 & -14 & -1.54 & 0.05 & -0.24 & -0.68 & 0.50 \\ \bottomrule
    \end{tabular*}
    \caption{Difference tests among treatment groups.}
    \label{tab:ttests}
\end{sidewaystable}
\clearpage


Another interesting activity to analyze was the sorting and hiding activity that users have performed. Although only two treatment groups could remove features from the recommendation algorithm or sort by them, 70\% of the time they removed ``non-useful'' features. The average participants used the sorting feature 13 times until they made their final choice, approximately 80\% of the time sorting by ``useful'' features. Table \ref{tab:eventsperfeature} shows further information on the usage of sorting and hiding per feature.

\begin{table}[!ht]
    \centering
    \begin{tabular}{lllll}
    \toprule
        Feature & \multicolumn{2}{c}{Hidden} & \multicolumn{2}{c}{Sorted} \\ \cmidrule(lr){2-3} \cmidrule(lr){4-5}
        & count & share & count & share \\ \midrule
        F1 & 65 & 0.15 & 151 & 0.16 \\
        F2 & 154 & 0.35 & 135 & 0.15 \\
        F3 & 47 & 0.11 & 331 & 0.36 \\
        F4 & 146 & 0.33 & 67 & 0.07 \\
        F5 & 24 & 0.06 & 232 & 0.25 \\ \bottomrule
    \end{tabular}
    \caption{Sorting and hiding statistics per feature.}
    \label{tab:eventsperfeature}
\end{table}

\textbf{Main results}

To analyze the relationship defined by the research questions, I have used Structural Equation Modeling. Factor loadings and Cronbach alphas are presented in table \ref{tab:factorloadings}. Although the goodness-of-fit chi-square test yields a significant p-value, the large size of the sample justifies using other statistics to test for goodness of fit \citep{schermelleh2003evaluating}. In the literature, for such cases, other goodness-of-fit measures were proposed \citep{schreiber2006reporting}. Two of them, the comparative fit index and the Tucker-Lewis index for the goodness-of-fit analysis, both are 0.97 which is considered a good fit \citep{schumacker2004beginner}. 75\% of the variation in BI is explained by the model and five of six hypotheses are confirmed. The results of SEM are presented in figure \ref{fig:mainresults_and_summary}. 


\begin{table}[ht]
\centering
\begin{tabular}{llll}
\hline
\textbf{Factor}             & \textbf{Item} & \textbf{Loading} & \textbf{Cronbach's alpha} \\ \hline
\multirow{4}{*}{Perceived Usefulness} & PU1   & 0.97 & \multirow{4}{*}{0.98}  \\
                             & PU2   & 0.97  \\
                             & PU3   & 0.96  \\
                             & PU4   & 0.97   \\ \hline
\multirow{4}{*}{Perceived Ease of Use} & PE1   & 0.76 & \multirow{4}{*}{0.85} \\
                             & PE2   & 0.79  \\
                             & PE3   & 0.65  \\
                             & PE4   & 0.90  \\ \hline
\multirow{4}{*}{User Control} & UC1   & 0.91  & \multirow{4}{*}{0.92} \\
                             & UC2   & 0.96  \\
                             & UC3   & 0.90  \\
                             & UC4   & 0.73  \\ \hline
\multirow{2}{*}{Behavioral Intention} & BI1   & 0.98  & \multirow{2}{*}{0.98} \\
                             & BI2   & 0.97  \\ \hline
\end{tabular}
\caption[Factor loadings of items]{Factor loadings of items and Cronbach's alpha of the main factors. Refer to Appendix A for more information about factors and corresponding questions.}
\label{tab:factorloadings}
\end{table}

All three hypotheses corresponding to the original relationships of the TAM model were confirmed (H4, H5, H6). Users perceived the easy-to-use RS to be more useful and intend to use them more.  Furthermore, perceived usefulness had a strong positive effect on behavioral intentions, suggesting that users were more likely to adopt an RS if they find it useful to meet their needs.

Users found RS systems with user control to be more useful. A percentage point increase in user control led to almost a half percentage point increase in perceived usefulness. Therefore, H1 was confirmed. This is not surprising, as this result is similar to previous research conducted in specific domain settings \citep{bostandjiev2012tasteweights, millecampControllingSpotifyRecommendations2018}. 

User control was also an important construct to directly explain the behavioral intention to use an RS. For every one percentage point increase in user control, participants showed more than half the percentage point more intentions to use an RS. Therefore, H3 was also confirmed. 

\begin{figure}[!ht]
    \centering
    \begin{tikzpicture}[node distance=2cm, auto, 
                        latent/.style={rectangle, draw, align=center, scale=1.3},
                        every label/.style={scale=1.3},
                        every edge/.style={scale=1.3}]
      % Latent variables
      \node[latent] (percontrol) {User\\Control};
      \node[latent] (pease) [below right=1.5cm and 3cm of percontrol] {Perceived\\ease of use};
      \node[latent] (puseful) [above right=1.5cm and 3cm of percontrol] {Perceived\\usefulness};
      \node[latent] (bi) [right=9cm of percontrol] {Behavioral\\intention};
    
      % Paths with labels
      \draw[-latex, line width=1pt] (percontrol) -- node[midway,above right, yshift=0.6cm] {0.32} (puseful);
      \draw[-latex, line width=1pt] (percontrol) -- node[midway,below right, yshift=0.4cm] {0.14} (pease);
      \draw[-latex, line width=1pt] (puseful) -- node[midway,above, yshift=0.2cm] {0.76} (bi);
      \draw[-latex, line width=1pt] (pease) -- node[midway,below, yshift=-0.2cm] {0.11} (bi);
      \draw[-latex, line width=1pt] (pease) -- node[midway,below left, yshift=-0.5cm] {0.40} (puseful);
    \draw[-latex, line width=1pt] (percontrol) -- (bi);
      \node[above] at ($(percontrol)!.7!(bi)$) {0.13};
    \end{tikzpicture}
    
    \vspace{1cm} % Adjust the vertical space between figure B and table A as needed
    
    \begin{tabular}{lllll}
    \toprule
        Hypothesis & Expected sign & Effect size & t-stat & p-value \\
    \midrule
        H1 & + & 0.480 & 128.2 & 0.000 \\
        H2 & - & 0.115 & 75.3 & 0.995 \\
        H3 & + & 0.568 & 142.3 & 0.000 \\
        H4 & + & 0.773 & 284.2 & 0.000 \\
        H5 & + & 0.647 & 140.9 & 0.000 \\
        H6 & + & 0.608 & 134.9 & 0.000 \\
    \bottomrule
    \end{tabular}
    \caption[Main results of TAM and effect sizes]{Main results (above), summary of hypotheses and total effect sizes.}
    \label{fig:mainresults_and_summary}
\end{figure}


However, contrary to my expectations, H2 was not confirmed. Participants did not perceive the RS with control to be less easy to use. This result needs further discussion. The theoretical foundations for H2 have been based on established research streams that established a connection between information overload and consumer decision-making \citep{jacoby1974brand, chenEffectsInformationOverload2009}. Although increased information leads to better informed decisions, research has established a trade-off between information richness and decision quality \citep{jacoby1974brand, malhotra1982information}. Furthermore, information overload has been found to negatively affect the acceptance of the internet tools \citep{shih2004extended}. 

A potential explanation for such a counterintuitive outcome on the result of H2 testing is the type of control mechanisms employed in the experiment. Recall that I have defined the scope of control mechanism as a mechanism which ``has immediate effect on recommendations, is easily understandable and does not prevent the user from interacting with the system''. The previous literature has mentioned a number of user control mechanisms that can be employed. They range from simple critique-based to more weight scale and visualization-based complex control mechanisms \citep{jannach2017user, jinEffectsPersonalCharacteristics2018}. Considering the theoretical foundations of H2, one needs to further investigate this outcome. Recall that users had ten minutes to solve the information search task and they were given two control mechanisms. Disentangling the effect of both control mechanisms on completion time is one possible avenue to pursue. One way to do this is to look at the completion time of the search task. T-tests were conducted where different groups were compared in terms of completion time. The results showed that hiding allowed participants to finish their task earlier, while sorting did not lead to any significant reduction in completion time. These results lead to the conclusion that the critiquing (hiding) and sorting control mechanisms have different, rather than strictly contradicting, effects on task completion time. On the one hand, the hiding mechanism showed a significant negative effect on task completion time, indicating that users who could hide items were able to complete the task more quickly and easily. This suggests that the hiding mechanism has effectively streamlined the decision-making process for participants while allowing them to focus on more relevant options and eliminate unwanted or non-useful aspects of the algorithm. This could have had a positive impact on perceived ease of use. On the other hand, the sorting mechanism having an albeit insignificant but positive effect on task completion time implies that its impact on perceived ease of use is less clear and may vary. While the effects of these two control mechanisms are not necessarily contradicting each other, they do highlight that the two control mechanisms have differing impacts on users' experiences. 

Another explanation could be that engaging with control features helped participants learn more about the recommender system's functionalities and underlying logic. Control mechanisms provided participants with the ability to interact with the system and receive feedback from it. This process could facilitate the learning process and help participants develop a mental model of functionality, giving them a perception of increased ease of use \citep{norman2013design}. As participants became more familiar with a given system, they might have found it easier to navigate and interact with its features. This could have led to a higher perceived ease of use, as users felt more confident and comfortable using the system \citep{venkateshDeterminantsPerceivedEase2000}. 

\subsection{Conclusion}

\textbf{Implications}

The growing speed of digitization of our lives and the presence of AI means even more applications of recommender systems. Although well-formulated and diverse recommendations are crucial for the success of companies and the growing importance of human-AI interaction, it is important to understand the challenges in the implementation of control mechanisms \citep{dietvorstEtAl18, songWhenHowDiversify2019}. There are several important implications for system designers that arise from this study.

Firstly, system designers must take into account that, although user control might in some cases lead to users believing those systems are easier to use, the relationship between them are more complex and nuanced. Hence, they must not sacrifice the simplicity of the systems by adding more control. They must be aware that choice context may have a big effect on users' attitudes towards control mechanisms. Users who know exactly what they are looking for will feel more comfortable when given more granular control over the recommendation generation. Casual users, on the other hand, might not want to have to do everything and rather leave that to algorithms. Real-world applications of such approach could be a system which offers simple control to new users and gradually increases the complexity of control mechanisms as the user becomes more proficient with the system. Also, more granular control will require a steeper learning curve for the users, and considering that algorithm alteration results are sometimes not immediately visible, it might take some time for users to understand the results of their actions. Another angle to look at this would be from a decision-paralysis perspective. When users face a multitude of options to choose from, in terms of which control mechanism to use, they might feel confused and not decide at all \citep{schwartz2004paradox}.

Secondly, the fine trade-off between the level of control and the usefulness of the results must be taken into account as well. Studies have concluded that users appreciate serendipitous results now and then, but are still sensitive to the relevance of results to their preference profiles \citep{kotkovSurveySerendipityRecommender2016}. Having greater control over recommendation generation might allow users to shift the weights of their profiles in the system too much. This will result in RS either generating very similar items to those the user has consumed, or completely throw-off the system \citep{mantovani2019meta}. Too much control might also enable algorithm hijacking, intentionally or unintentionally, leading to suggestions of irrelevant items for users \citep{xing2013take}. 

Thirdly, control is more useful when users understand how it works. Although a detailed explanation of complex control tools is not in the interest of the business because it can reveal proprietary knowledge to competitors \citep{lubit2001tacit}, nor the users because they have limited ability to understand complex systems \citep{kahneman1979interpretation}, people generally use algorithmic tools more when they have some understanding of the underlying mechanisms \citep{guidotti2018survey}. The current experimental study was free of contextual factors for generalizability reasons. However, RS in real-world scenarios are contextualized, which means, regardless of user control method, during the process consumer might reveal some sensitive information about themselves to the system. It can be anything from their political views, financial or social status, geographical information, and so on. Too granular control makes it harder to manage user consent for system designers \citep{belanger2011privacy}. With various control options and granular settings, users may find it challenging to understand the extent of their data being collected and how it is used, potentially leading to privacy concerns. In those cases, it is important to inform users about how their data are gathered, how it affects recommendations, and how one can change that. For example, Netflix gives a basic understanding of its recommendation algorithm, which is easy to understand to any user type, be it novice or expert \citep{netflix_help_page}. Furthermore, giving too granular feedback to the recommendation algorithm may result in the user leaving a fine detailed digital footprint, which could make anonymization harder and open doors to various malicious attacks \citep{sweeney2002k}.

\textbf{Summary, limitations and future directions}

I have investigated the effect of user control on recommender systems' acceptance in a domain-independent experimental setting. TAM was used as a theoretical framework. This study confirmed the relationships of the original TAM model, indicating that users perceive easy-to-use RS as more useful and intend to use them more. Also, usefulness had a strong positive effect on the behavioral intentions to use an RS, suggesting that users were more likely to use an RS if they found it useful in meeting their needs. Furthermore, user control has been found to be an essential construct in directly explaining the intention of the behavior to use an RS.

However, contrary to expectations, the participants did not perceive RS with control to be less easy to use. Possible explanations include the type of control mechanisms employed in the experiment and their varying impacts on users' experiences. The hiding mechanism showed a significant negative effect on task completion time, suggesting a positive impact on perceived ease of use, while the sorting mechanism's effect was more ambiguous. Engaging with control features may have helped participants learn more about the recommender system's functionalities and underlying logic, hence contributing to increased perceived ease of use.

One limitation of the study is the usage of a single utility function although I have provided the reasons granting me the certainty of using such an approach. One way to address this limitation can be by introducing randomly generated utility functions. For example, having utility defined as $U = F \cdot W$ where $W_{i,j} \sim U(-1,1)$. This would allow to capture more diverse preferences and reduce the differences between the experimental and real-world choice scenarios even further \citep{vesanen2007personalization}. Secondly, it would introduce a preference trade-off between different features. It would make the choice setup more complex and hence further resemble the actual choices we face on a daily basis.

Another limitation of the study arises from the fact that I have used two of many control mechanisms. Future research may incorporate a broader range of control mechanisms and study their effect both individually and in combination with each other to investigate the effect of user control on RS acceptance. By doing so, one may identify the differences between various control methods from the user's perspective and better understand the aspects that affect user acceptance by disentangling their effects. 

The findings indicating the varying effects of control methods open doors for future research of recommender systems with dynamic control mechanisms. Dynamic user controls have the potential to provide superior decision support by presenting control options that align with users' current context or requirements. For instance, a travel booking website's recommender system could furnish users with more granular control options when searching for accommodations in popular tourist destinations with a multitude of choices to which the user has previously been, but more simplistic options in case of a first visit. Another example can be a music streaming platform that could extend different control ways depending on users' ongoing activities, such as providing streamlined controls for users listening to music while working and more intricate controls for users actively curating a playlist for a social gathering. Lastly, a streaming service's recommender system might present basic control options for users seeking a quick movie recommendation, while supplying more comprehensive options for users interested in delving into a specific genre or director's oeuvre. Balancing these aspects well might be crucial to increase sales and drive greater engagement.

Recent technological progress in face of LLMs (large language models) creates another exciting opportunity for future research in the context of user control in recommender systems. LLM based agents that leverage billions of parameters and terabytes of human generated knowledge enable human-like natural language interactions between users and the system, and can easily become an instrumental technology in enhancing user control, increasing trust, and satisfaction. The key benefits of incorporating LLMs in RS through conversational agents is their ability to understand human input better, take user input by offering a more intuitive and transparent way to express their preferences and requirements. They can also be used to give explanations for recommendations, thus enhancing transparency.

Considering the rapid technological developments of the recent years, I may state with confidence that user control in recommender systems is about to take a huge leap forward and open even more research directions. 

\newpage

\section{Conclusion}

\epigraph{Part of the journey is the end.}{}

The journey of my dissertation began with the enthusiastic aim of exploring the nature and nuanced influences of context effects on decision making and their applications on recommender system design. This voyage comprises four studies, each contributing to a richer understanding of these effects and jointly offering a new perspective on the complex interplay of context effects. 

The first study embarked on a path towards understanding context within choice sets, an idea derived from economics and decision-making literature that views context as a coexistence of alternatives within a menu. The power of a computational model was employed on real-world choice data, uncovering complex patterns and relationships that affirmed the significant role of context effects. The results illustrated the efficacy of the model's potential as an effective tool for understanding the complex dynamics of context effect in multiattribute and multidimensional setting.

The second study continued this exploration of context effects further by adding a new layer of complexity. Its aim was to unveil three context effects types that previously investigated in experimental setting, but this time achieving it in multidimensional setting and provide a methodology to quantify them. A flexible framework was introduced, which proved to be capable of calculating those effects in different settings, both observational and experimental, and handling diverse choice sets. Its adaptability and scalability make it a robust tool for large-scale applications, both online and offline. The implications of this framework are immense and include the potential for more precise design of consideration sets and using them to enhance recommender systems. This is particularly valuable for online platforms offering an overwhelming range of options considering the proliferation of technologies that project decision making to virtual settings.

The third study built on the findings of the previous one and delved further into the realm of context effects while offering a practical solution to the persistent ``cold-start'' problem plaguing  recommender systems. This novel, context-driven, two-stage decision heuristics approach has further underscored the vital role of context effects steering user decisions, by integrating heuristics and context effects into the design of recommender systems.

The final study made a crucial shift to the context that arises outside of the set of choices, defined within the literature on recommender systems as information related to the consumer, such as their preferences, time, and day of choice. Here, the focus has narrowed and turned to the user preferences, namely, interaction between user control mechanisms and the acceptance of recommender systems. The study suggested that user-centric control mechanisms could potentially lead to elevated user engagement and acceptance, indicating the positive externalities of those mechanisms in a new era of recommender systems that shift towards the ``help me help you'' approach.

Altogether, these studies point towards an innovative blueprint for future recommender systems that harness the power of context effects both within and outside the choice sets and provide not only ``accurate'', but also ``useful, novel, surprising'' recommendation. However, as in every work before it, this work is also not without limitations.

Albeit the computational model proved to bring its two cents to the table, its contributions lacked generalizability because it was the only model employed. It provides an exciting look at future research which could develop an approach utilizing other computational models that the current decision-making literature is rich of. 

The approach taken in the second study mainly accommodated context effects as a result of ordinal relationships, overlooking the potential of cardinal measurements. Future research could tackle this by factoring in the magnitude of the relationship, adding depth to the analysis. Also, an interesting avenue could be research toward quantifying context effects outside of the ``trinity''. 

Despite the effectiveness of the two-stage heuristic approach developed in the third study, its direct application to modern systems may not be straightforward due to the traditional emphasis on predictive power. Although this emphasis has already started to gradually change, enhancing predictive capabilities of this approach could definitely make it more attractive for integration into recommender systems.

Lastly, the fourth study has looked at the universe of user control mechanisms only through a small prism, employing two of them. However, there is a vast realm of other control mechanisms awaiting exploration in future research, calling for studies of more complex experimental settings, while maintaining domain neutrality. Notwithstanding the fact that it seems challenging, I consider this achievable.

In conclusion, this dissertation serves as an illuminating guidepost in a place where understanding our own is limited. Before this work, the study of context effects was restricted to simplistic choice sets and experimental data, and so was limited our understanding of user control tools.  This study not only extends our knowledge of context effects to new multidimensional realms, but also achieves isolation of external context via experimental setting when investigating user control mechanisms.

The potential unveiled in this dissertation is substantial, albeit not fully quantifiable at this juncture. The findings of this work provide the foundation for potential advancements in the application of context in recommender systems design and operations, which could alter users' engagement within modern digital landscape. The unique approach to studying context effects and the capability to disentangle them in multiattribute, multidimensional settings put forth in this dissertation fosters new insights and perspectives in the complex, dynamic decision-making process in us, the humans. As the renowned psychologist and Nobel laureate Daniel Kahneman once said, ``Our comforting conviction that the world makes sense rests on a secure foundation: our almost unlimited ability to ignore our ignorance.'' This dissertation, through its explorations and findings, nudges our understanding a little closer to recognizing the nuanced complexity of decision making, especially in relation to context effects, thus revealing a bit more of the world that we so often ignore.

\newpage

%\bibliographystyle{chicago}


\bibliographystyle{plainnat}
\bibliography{references}
\clearpage

\section{Appendices}
\appendix
\section{}\label{appendix:Differential evolution}
\textbf{Differential evolution}

The Differential Evolution Algorithm developed by Storn \citeyearonly{storn1997differential} is a heuristic-based optimization technique that is widely used in various scientific fields, including economics. This algorithm is capable of exploring the search space of a problem to find the global optimum. Unlike traditional optimization techniques, the DE algorithm does not require gradient information, making it applicable to nondifferentiable, nonlinear, and multimodal optimization problems.

Differential evolution is a population-based stochastic search algorithm that generates new candidate solutions by combining existing individuals in the population. The algorithm considers the differences between their parameter values, hence the term ``differential''. It includes three primary operations: mutation, crossover, and selection.

The algorithm operates as follows:

1. Initialization: Generate an initial population of $NP$ candidate solutions randomly, each solution is a vector in the $D$-dimensional space
    
    For $i = 1 $to $NP, j = 1$ to $D$:
    
    $$X[i, j] = X_min[j] + rand(0, 1) * (X_max[j] - X_min[j]) .$$
    
2. Mutation: For each individual vector $Xi (i=1, 2, ..., NP)$ in the current population, a mutant vector $V_i$ is generated according to:

    $$V[i, j] = X[r1, j] + F * (X[r2, j] - X[r3, j]) .$$

    Here, $r1$, $r2$, and $r3$ are indices randomly chosen from the population, and they are distinct from each other and $i$. The real number $F \in [0,2]$ is the scaling factor which controls the amplification of the differential variation $X[r2, j] - X[r3, j]$.

3. Crossover: A trial vector $U_i$ is then generated from the original vector $X_i$ and mutant vector $V_i$ according to the rule:

    For $j = 1$ to $D$:
    
    $$
        U[i, j] =
        \begin{cases}
            V[i, j] & \text{if } \text{rand}(0,1) \leq \text{CR} \text{ or } j = j_{\text{rand}} \\
            X[i, j] & \text{otherwise}
        \end{cases} .
    $$

Here, $rand(0,1)$ is a uniform random number in $[0,1], CR \in [0,1]$ is the crossover rate that controls the fraction of parameter values copied from the mutant vector, and $j_rand$ is a randomly chosen index from the $D$ dimensions.

4 Selection: The trial vector $U_i$ competes against the original individual $X_i$. The one that provides a better objective function value (lower for a minimization problem, higher for a maximization problem) survives into the next generation:

        $$
        X[i, j] =
        \begin{cases}
            U[i, j] & \text{if } f(U[i, j]) \leq f(X[i, j]) \\
            X[i, j] & \text{otherwise}
        \end{cases} .
        $$

Here, $f(.)$ denotes the objective function.

5 Loop: Steps 2 to 4 are repeated until a termination criterion is met (such as a maximum number of generations or a satisfactory fitness level).


\clearpage
\section{}\label{appendix:compromiseCalculation}

\textbf{Comparison pair count calculation for the compromise effect}

Given a setting with $N$ vertical attributes, there are a number of ways that a focal option can act as a compromise between two groups of competing options.
I discuss this case by case.

Case 1: No dimension is equal between focal and competing options. In this case, one group of options might be better than the focal option in one dimension and worse in $N-1$ dimensions. The mirror image of this group would be a group that is better than the focal option in the $N-1$ dimensions and worse in the same dimension. There are $\binom{N}{1}$ of such groups.

Another option within the same option could be a group that is better than the focal option in 2 dimensions and worse in $N-2$ dimensions. There are $\binom{N}{2}$ such groups. 

All in all, there are $N-1$ such sub-cases. These sub-cases count distinct groups. As we have to pair these groups, we divide the number by two.

Therefore, for this case, the number of comparisons is 

$$\frac{1}{2}\sum_{a=1}^{N-1}\binom{N}{a}.$$

Case 2: Only one dimension is equal in all competing options, including the focal option. In this case, we are comparing not $N$, but $N-1$ options. Therefore, for each given dimension that is equal between options, we have $$\frac{1}{2}\sum_{a=1}^{N-2}\binom{N-1}{a}$$ comparisons. However, not only one, but each of the N vertical attributes can be equal across all options. Therefore, the total number of comparisons in this case is $$\frac{1}{2}\binom{N}{1}\sum_{a=1}^{N-2}{\binom{N-1}{a}.}$$

Case 3: Multiple dimensions are equal in all competing options. First, we extend the previous case to the situation where two dimensions are equal across all options, resulting in $$\frac{1}{2}\binom{N}{2}\sum_{a=1}^{N-3}{\binom{N-2}{a}}.$$ We iterate the same exercise until (and including) the setup where we have $N-2$ dimensions equal across all options \footnote{One needs the minimum of two dimensions that can be compared across two comparable groups.}. 

The total of all comparisons will simply be the sum of all these cases, which can be expressed as

\begin{align}\label{eq:compromiseEffectDetailedCalculation}
    \Omega=\frac{1}{2}\sum_{b=0}^{N-2}\left[\binom{N}{b}\sum_{a=1}^{N-1-b}\binom{N-b}{a}\right].    
\end{align}


Figure \ref{fig:compromiseComparisonPlot} shows how $\Omega$ changes with $N$.

\begin{figure}[H]
    \centering
    \includegraphics[width=0.9\textwidth]{staticFiles/compromiseComparisonsZakAppendix.png}
    \caption{Correspondence between $\Omega$ and $N$.}
    \label{fig:compromiseComparisonPlot}
\end{figure}


\newpage
\section{}\label{appendix:clusteringAlgorithms}

\textbf{Clustering}

The goal of clustering is to separate the data into different groups in a way that similar instances belong to the same group, while different instances are assigned to different groups \citep{maimon2005data}. Formally, clustering consists in making a partition $C\ =\ {C_1,C_2,\ldots,C_k}$ of some set $S$ in a way that: $S=\cup_{I\ =\ 1}^kC_i and C_i\cap C_j\neq0 for all i\neq j$. So, any alternative in the set $S$ belongs to exactly one cluster.

There exist many clustering methods, and each of them have a different way of defining similar items and as a result will group in different ways. Clustering is usually an unsupervised machine learning method in that there are no preconceived labels given to the clusters. This implies that there is no universal way of evaluating the quality of a clustering result. Moreover, most clustering methods require an additional input to determine the number of clusters.  Therefore, in order to make sure that our results do not depend on the chosen clustering method, I investigate the two widely used methods, affinity propagation and k-means clustering.

\textbf{Affinity propagation}

The first clustering method, examined, is affinity propagation \citep{freyDueck07}. affinity propagation identifies a limited number of ``exemplars'', which are identified as the best representative of other objects in the same cluster (``samples''). It calculates the pairwise values that characterize the suitability of one object to be the exemplar of the other. These values are updated in response to the values from other pairs. This updating happens in an iterative manner until convergence, at which point the final exemplars are chosen, and hence the final clustering is identified.

There are two characteristics involved in the process. The responsibility $r\left(i,k\right)$ that quantifies how suited $k$ is as an exemplar of the cluster $i$ compared to all other potential exemplars. It is calculated as

\begin{align}\label{eq:affinityPropagationExemplars}
    r\left(i,k\right) = s\left(i,k\right) - \max\left[a\left(i,k^\prime\right) + s\left(i,k^\prime\right) \text{ for all } k^\prime \neq k\right],
\end{align}

where $s\left(i,k\right)$ is the similarity between $i$ and $k$, measured as the squared negative error.

The second property is availability $a\left(i,k\right)$ which measures the extent to which $i$ is an appropriate sample of $k$, given all the other samples already identified of $k$. This is calculated as

\begin{align}\label{eq:availabilityAffinityPropagation}
    a\left(i,k\right) = \min\left[0, r\left(k,k\right) + \sum_{i^\prime \text{ s.t. } i^\prime \notin \{i,k\}} r\left(i^\prime,k\right)\right].
\end{align}

At the start, both $r$ and $a$ are set to zero and the calculations are iterated until full convergence. To eliminate oscillations when updating the values, the damping factor $\lambda$ is introduced to the iteration process. This facilitates the convergence process and alters the responsibility and availability equations as follows:

\begin{align}\label{eq:responsibilityAffinityPropagation}
    r_{t+1}\left(i,k\right) = \lambda \cdot r_t\left(i,k\right) + (1-\lambda) \cdot r_{t+1}\left(i,k\right),
\end{align}

\begin{align}
    a_{t+1}\left(i,k\right) = \lambda \cdot a_t\left(i,k\right) + (1-\lambda) \cdot a_{t+1}\left(i,k\right).
\end{align}


The damping factor,  $\lambda\in[0;1]$, affects the number of identified clusters in affinity propagation procedure. After setting its value, the number of clusters in the data are automatically identified. Frey and Dueck \citeyearonly{freyDueck07} recommend setting  $\lambda\in[0.5;1]$, in order to ensure the convergence in large datasets. I have experimented with the sensitivity of clustering outcomes with respect to damping factor and have found very little differences in the vicinity of the factor between 0.5 and 0.75. Therefore, I conform to the wide usage of $\lambda=0.5$ (which also increases the convergence speed) for the rest of the paper.

\textbf{Kmeans}

A popular alternative method to identify clusters of comparable objects in data is the k-means algorithm \citep{lloyd82}. It divides a set of N objects of X into K distinct clusters $C_j$  which are described by the means $\mu_j$ of the samples within each cluster. Those means are commonly referred to as cluster centroids. The algorithm aims to select centroids that minimize the within cluster sum of squares, i.e

$$\sum_{i=0}^{n}\min_{\mu_j\in C_j}\left(\left.\left|x_i-\mu_j\right|^2\right.\right) .$$

At its core, k-means is a computationally cheap algorithm. However, it has an important drawback, which is that it explicitly requires the number of clusters to be fixed ex-ante. Different number of clusters result into different clustering outcomes for a given data. This is a disadvantage compared to affinity propagation where the number of clusters is endogenously identified. To solve this problem, there are external measures we can use in order to judge the optimality of clustering outcomes for a given data. One such popular and computationally affordable measure is the silhouette score \citep{rousseeuw1987silhouettes}. The silhouette score is calculated for every object in the set as

\begin{align}\label{eq:silhouetteCalculation}
    s=\frac{1}{n}\sum_{i\ =\ 1}^{n}{\frac{b-a}{max\left(a,b\right)}},
\end{align}

where  $a$ is the mean distance between this object and all other objects in the same cluster, $b$ is the mean distance between this object and all other objects from the nearest cluster, and $n$ is the number of objects in the set. An important advantage of this method over alternatives is that it is confined to the interval $[-1;1]$. The higher the silhouette score, the better is the cluster assignment. Then one can compute clusters for every feasible number of clusters, calculate silhouette score for each of the instances and choose the instance (i.e. number of identified clusters) with the maximal silhouette score. This drastically increases computational requirements for the k-means clustering as in this case one has to calculate cluster assignment for a set of potential cluster numbers to choose from.


\newpage
\section{}\label{appendix:LogitMixedAndFixedEffectResults}

\begin{sidewaystable}[h]
    \centering
    \scriptsize
    \begin{tabular}{p{5.3cm}*{9}{p{1.3cm}}}
    \toprule
    Variable & Model 1 & Model 2 & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\
    \midrule
    Price & -0.504*** & -0.515*** & -0.387*** & -0.499*** & -0.480*** & -0.351*** & -0.354*** & -0.351*** & -0.354*** \\
    & (0.013) & (0.013) & (0.017) & (0.013) & (0.013) & (0.017) & (0.017) & (0.017) & (0.017) \\
    Trip duration & -0.327*** & -0.278*** & -0.165*** & -0.264*** & -0.291*** & -0.178*** & -0.176*** & -0.178*** & -0.176*** \\
    & (0.020) & (0.020) & (0.023) & (0.020) & (0.020) & (0.022) & (0.022) & (0.022) & (0.022) \\
    Number of flights & -0.703*** & -0.658*** & -0.670*** & -0.626*** & -0.580*** & -0.562*** & -0.580*** & -0.567*** & -0.584*** \\
    & (0.018) & (0.018) & (0.018) & (0.018) & (0.018) & (0.019) & (0.019) & (0.019) & (0.019) \\
    Number of airlines & -0.473*** & -0.474*** & -0.430*** & -0.475*** & -0.461*** & -0.420*** & -0.431*** & -0.418*** & -0.430*** \\
    & (0.023) & (0.023) & (0.024) & (0.023) & (0.023) & (0.023) & (0.023) & (0.024) & (0.023) \\
    Attraction & & & 0.013*** & & 0.011*** & & 0.012*** & & \\
    & & & (0.001) & & (0.001) & & (0.001) & & \\
    Compromise & & & & -0.128*** & & -0.110*** & -0.101*** & & \\
    & & & & (0.011) & & (0.011) & (0.011) & & \\
    Similarity & & & & & -0.050*** & -0.051*** & -0.082*** & -0.050*** & -0.081*** \\
    & & & & & (0.004) & (0.004) & (0.005) & (0.004) & (0.005) \\
    Attraction within cluster & & & & & & & 0.070*** & & 0.070*** \\
    & & & & & & & (0.005) & & (0.005) \\
    Attraction outside cluster & & & & & & & 0.009*** & & 0.009*** \\
    & & & & & & & (0.001) & & (0.001) \\
    Compromise within cluster & & & & & & & & -0.404*** & -0.365*** \\
    & & & & & & & & (0.055) & (0.055) \\
    Compromise outside cluster & & & & & & & & -0.092*** & -0.088*** \\
    & & & & & & & & (0.013) & (0.013) \\
    Constant included & YES & YES & YES & YES & YES & YES & YES & YES & YES \\
    Horizontal variables as controls & NO & YES & YES & YES & YES & YES & YES & YES & YES \\
    Number of observations & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 \\
    Number of choices & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 \\
    Consistent Akaike information criterion & 49786 & 49124 & 49004 & 48924 & 48924 & 48643 & 48537 & 48643 & 48537 \\
    Log likelihood & -24858 & -24465 & -24398 & -24358 & -24358 & -24204 & -24144 & -24197 & -24137 \\
    \bottomrule
    \end{tabular}
    \caption[Logistic regression results for observational data]{Outputs from logistic regressions with observational data.\\ Notes: Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
    \label{tab:AppendixA1LogisticRegression}
\end{sidewaystable}

\newpage
\clearpage
\begin{sidewaystable}[h]
    \centering
    \scriptsize
    \begin{tabular}{p{5.3cm}*{9}{p{1.3cm}}}
        \toprule
    Variable & Model 1 & Model 2 & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\
    \midrule
    Price & -0.788*** & -0.832*** & -0.757*** & -0.812*** & -0.822*** & -0.734*** & -0.731*** & -0.730*** & -0.727*** \\
    & (0.027) & (0.028) & (0.032) & (0.028) & (0.029) & (0.032) & (0.032) & (0.032) & (0.032) \\
    Trip duration & -0.627*** & -0.568*** & -0.449*** & -0.535*** & -0.557*** & -0.448*** & -0.447*** & -0.442*** & -0.444*** \\
    & (0.043) & (0.041) & (0.044) & (0.041) & (0.042) & (0.044) & (0.044) & (0.044) & (0.044) \\
    Number of flights & -3.034*** & -2.877*** & -3.053*** & -2.818*** & -2.873*** & -2.973*** & -2.829*** & -2.986*** & -2.891*** \\
    & (0.224) & (0.208) & (0.232) & (0.200) & (0.202) & (0.225) & (0.247) & (0.228) & (0.224) \\
    Number of airlines & -0.677*** & -0.720*** & -0.692*** & -0.749*** & -0.758*** & -0.640*** & -0.671*** & -0.629*** & -0.712*** \\
    & (0.096) & (0.091) & (0.087) & (0.090) & (0.089) & (0.066) & (0.095) & (0.079) & (0.080) \\
    Attraction & & & 0.009*** & & 0.009*** & & 0.008*** & & \\
    & & & (0.002) & & (0.002) & & (0.002) & & \\
    Compromise & & & & -0.042** & & -0.058*** & -0.056*** & & \\
    & & & & (0.019) & & (0.020) & (0.017) & & \\
    Similarity & & & & & -0.008 & -0.007 & -0.019*** & -0.006 & -0.016** \\
    & & & & & (0.006) & (0.006) & (0.007) & (0.006) & (0.007) \\
    Attraction within cluster & & & & & & & 0.025*** & & 0.025*** \\
    & & & & & & & (0.006) & & (0.006) \\
    Attraction outside cluster & & & & & & & 0.007*** & & 0.007*** \\
    & & & & & & & (0.002) & & (0.002) \\
    Compromise within cluster & & & & & & & & -0.211*** & -0.211*** \\
    & & & & & & & & (0.056) & (0.056) \\
    Compromise outside cluster & & & & & & & & -0.059** & -0.050** \\
    & & & & & & & & (0.027) & (0.025) \\
    Constant included & YES & YES & YES & YES & YES & YES & YES & YES & YES \\
    Horizontal variables as controls & NO & YES & YES & YES & YES & YES & YES & YES & YES \\
    Number of observations & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 \\
    Number of choices & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 \\
    Consistent Akaike information criterion & 34142 & 33275 & 33243 & 33273 & 33281 & 33249 & 33263 & 33251 & 33256 \\
    Log likelihood & -17043 & -16596 & -16573 & -16588 & -16592 & -16562 & -16562 & -16556 & -16552 \\
    \bottomrule
    \end{tabular}
    \caption[Mixed Logit regression results for observational data]{Outputs from midex Logit regressions with observational data.\\ Note: Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
    \label{tab:AppendixMixedLogisticRegression}
\end{sidewaystable}

\clearpage
\newpage
\clearpage
\begin{sidewaystable}[h]
    \centering
    \scriptsize
    \begin{tabular}{p{5.3cm}*{9}{p{1.3cm}}}
    \toprule
    Variable & Model 1 & Model 2 & Model 3 & Model 4 & Model 5 & Model 6 & Model 7 & Model 8 & Model 9 \\
    \midrule   
    Price & -0.225*** & -0.232*** & -0.201*** & -0.227*** & -0.231*** & -0.200*** & -0.203*** & -0.200*** & -0.203*** \\
     & (0.006) & (0.006) & (0.008) & (0.006) & (0.006) & (0.008) & (0.008) & (0.008) & (0.008) \\
    Trip duration & -0.136*** & -0.116*** & -0.085*** & -0.112*** & -0.116*** & -0.087*** & -0.090*** & -0.086*** & -0.090*** \\
     & (0.009) & (0.009) & (0.010) & (0.009) & (0.009) & (0.010) & (0.010) & (0.010) & (0.010) \\
    Number of flights & -0.342*** & -0.322*** & -0.324*** & -0.309*** & -0.320*** & -0.308*** & -0.310*** & -0.310*** & -0.312*** \\
     & (0.008) & (0.008) & (0.008) & (0.008) & (0.008) & (0.009) & (0.008) & (0.009) & (0.009) \\
    Number of airlines & -0.208*** & -0.209*** & -0.193*** & -0.210*** & -0.209*** & -0.196*** & -0.198*** & -0.196*** & -0.198*** \\
     & (0.010) & (0.010) & (0.010) & (0.010) & (0.010) & (0.011) & (0.011) & (0.011) & (0.011) \\
    Attraction & & & 0.003*** & & 0.003*** & & 0.003*** & & \\
     & & & (0.001) & & (0.001) & & (<0.001) & & \\
    Compromise & & & & -0.038*** & & -0.037*** & -0.037*** & & \\
     & & & & (0.003) & & (0.003) & (0.004) & & \\
    Similarity & & & & & -0.001 & -0.001* & -0.002*** & -0.000 & -0.002*** \\
     & & & & & (0.000) & (0.000) & (0.001) & (<0.001) & (0.001) \\
    Attraction within cluster & & & & & & & 0.005*** & & 0.005*** \\
     & & & & & & (0.001) & & (0.001) & \\
    Attraction outside cluster & & & & & & & 0.002*** & & 0.002*** \\
     & & & & & & (0.001) & & (0.001) & \\
    Compromise within cluster & & & & & & & & -0.057*** & -0.057*** \\
     & & & & & & & & (0.008) & (0.008) \\
    Compromise outside cluster & & & & & & & & -0.035*** & -0.036*** \\
     & & & & & & & & (0.005) & (0.005) \\
    Constant included & YES & YES & YES & YES & YES & YES & YES & YES & YES \\
    Horizontal variables as controls & NO & YES & YES & YES & YES & YES & YES & YES & YES \\
    Number of observations & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 & 368723 \\
    Number of choices & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 & 6297 \\
    Consistent Akaike information criterion & 49598 & 48910 & 48886 & 48770 & 48922 & 48765 & 48765 & 48787 & 48789 \\
    Log likelihood & -24764 & -24358 & -24339 & -24281 & -24357 & -24265 & -24258 & -24269 & -24263 \\
    \bottomrule
    \end{tabular}
    \caption[Choice model results with K-means clustering]{Outputs from choice model with K-means clustering.\\ Note: Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
    \label{tab:kmeansProbit}
\end{sidewaystable}

\clearpage
\newpage


\section{}\label{appendix:nejcDataRobustnessChecks}
\clearpage
\begin{sidewaystable}[b]
    \centering
    \scriptsize
    \begin{tabular}{p{5.3cm}*{9}{p{1.5cm}}}
    \toprule
    Variable & Model 1 & Model 2 & Model 3 & Model 5 & Models 6-9 \\
    \midrule
    Price & -0.397*** & -0.410*** & -0.411*** & -0.384*** & -0.375*** \\
     & (0.015) & (0.015) & (0.015) & (0.016) & (0.016) \\
    Car ride duration & -0.120*** & -0.125*** & -0.126*** & -0.125*** & -0.119*** \\
     & (0.005) & (0.005) & (0.005) & (0.005) & (0.006) \\
    Public transport ride duration & -0.082*** & -0.083*** & -0.083*** & -0.073*** & -0.070*** \\
     & (0.004) & (0.005) & (0.005) & (0.005) & (0.005) \\
    Public transport wait time & -0.085*** & -0.089*** & -0.089*** & -0.076*** & -0.072*** \\
     & (0.005) & (0.005) & (0.005) & (0.005) & (0.006) \\
    1[Public transport is train] & 0.307*** & 0.289*** & 0.288*** & 0.239*** & 0.240*** \\
     & (0.074) & (0.076) & (0.076) & (0.077) & (0.077) \\
    Attraction & & -0.044 & & & 0.384** \\
     & & (0.158) & & & (0.176) \\
    Similarity & & & & -0.216*** & -0.263*** \\
     & & & & (0.043) & (0.048) \\
    Constant included & YES & YES & YES & YES & YES \\
    Control variables included & NO & YES & YES & YES & YES \\
    Number of observations & 6180 & 6180 & 6180 & 6180 & 6180 \\
    Number of choices & 1296 & 1236 & 1236 & 1236 & 1236 \\
    Number of subjects & 108 & 103 & 103 & 103 & 103 \\
    Consistent Akaike information criterion & 5201 & 5100 & 5108 & 5084 & 5087 \\
    Log likelihood & -2535 & -2383 & -2383 & -2371 & -2369 \\
    \bottomrule
    \end{tabular}
    \caption[Logistic regression results for experimental data]{Outputs from logistic regressions with experimental data.\\ Note: Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
    \label{tab:logitExperimentalData}
\end{sidewaystable}
\clearpage
\newpage
\clearpage
\begin{sidewaystable}[b]
    \centering
    \scriptsize
    \begin{tabular}{p{5.3cm}*{9}{p{1.5cm}}}
    \toprule
    Model & Model 1 & Model 2 & Model 3 & Model 5 & Models 6-9 \\
    \midrule
    Price & -0.222*** & -0.229*** & -0.229*** & -0.215*** & -0.209*** \\
     & (0.008) & (0.008) & (0.008) & (0.009) & (0.009) \\
    Car ride duration & -0.068*** & -0.070*** & -0.070*** & -0.069*** & -0.066*** \\
     & (0.003) & (0.003) & (0.003) & (0.003) & (0.003) \\
    Public transport ride duration & -0.046*** & -0.047*** & -0.047*** & -0.042*** & -0.040*** \\
     & (0.002) & (0.003) & (0.003) & (0.003) & (0.003) \\
    Public transport wait time & -0.048*** & -0.050*** & -0.050*** & -0.043*** & -0.041*** \\
     & (0.003) & (0.003) & (0.003) & (0.003) & (0.003) \\
    1[Public transport is train] & 0.189*** & 0.179*** & 0.179*** & 0.146*** & 0.140*** \\
     & (0.042) & (0.043) & (0.043) & (0.044) & (0.044) \\
    Attraction & & -0.008 & & & 0.223** \\
     & & (0.089) & & & (0.098) \\
    Similarity & & & & -0.123*** & -0.149*** \\
     & & & & (0.025) & (0.027) \\
    Constant included & YES & YES & YES & YES & YES \\
    Control variables included & NO & YES & YES & YES & YES \\
    Number of observations & 6180 & 6180 & 6180 & 6180 & 6180 \\
    Number of choices & 1296 & 1236 & 1236 & 1236 & 1236 \\
    Number of subjects & 108 & 103 & 103 & 103 & 103 \\
    Consistent Akaike information criterion & 6074 & 5726 & 5734 & 5709 & 5712 \\
    Log likelihood & -2534 & -2383 & -2383 & -2371 & -2369 \\

    \bottomrule
    \end{tabular}
    \caption[Fixed effect Probit results for experimental data]{Outputs from fixed effect Probit regressions with experimental data.\\ Note: Standard errors in parentheses. Statistical significance levels: $*** p<0.01$, $** p<0.05$, $* p<0.1$.}
    \label{tab:fixedProbitExperimentalData}
\end{sidewaystable}
\clearpage
\newpage

\section{}\label{appendix:userControlExperimentQuestionnaire}
\clearpage
\begin{sidewaystable}[!ht]
    \centering
    \small{
    \begin{tabular}{|p{7cm}|l|p{2cm}|p{3cm}|p{3cm}|}
    \hline
        \textbf{Question} & \textbf{Variable} & \textbf{Construct} & \textbf{Answer scale} & \textbf{Adapted from} \\ \hline
        Using this recommendation system improves my performance  during this task & Perceived usefulness & PU1 & 7 point Likert & Davis \citeyearonly{davisPerceivedUsefulnessPerceived1989}, \citep{venkateshDeterminantsPerceivedEase2000} \\ \hline
        Using this recommendation system increases my productivity in performing this task & Perceived usefulness & PU2 & 7 point Likert & Davis \citeyearonly{davisPerceivedUsefulnessPerceived1989}, Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000}\\ \hline
        Using this recommendation system enhances my effectiveness in performing this task & Perceived usefulness & PU3 & 7 point Likert & Davis \citeyearonly{davisPerceivedUsefulnessPerceived1989},Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        I find the recommendation system to be useful in performing this task & Perceived usefulness & PU4 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        My interaction with the recommendation system is clear and understandable & Perceived ease of use & PE1 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000}\\ \hline
        Interaction with the recommendation system does not require a lot of mental effort & Perceived ease of use & PE2 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        I find the recommendation system easy to use & Perceived ease of use & PE3 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        I find the recommendation system easy to do what I wanted it to do & Perceived ease of use & PE4 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        I have control over how the recommendations are generated & User control & UC1 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        I have resources necessary to control how the recommendations are generated & User control & UC2 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        I have knowledge necessary to control how the recommendations are generated & User control & UC3 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        Given the resources opportunity and the knowledge it takes to control how the recommendations are generated  it would be easy for me to do it & User control & UC4 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        Assuming I had access to this recommendation system I intend to use it for similar tasks & Behavioral intention & BI1 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
        Given I had access to the recommendation system  I predict that I would use it for similar tasks & Behavioral intention & BI2 & 7 point Likert & Venkatesh \citeyearonly{venkateshDeterminantsPerceivedEase2000} \\ \hline
    \end{tabular}
    }
    \caption{Constructs.}
    \label{tab:questionnaire_items}
\end{sidewaystable}


\end{document}