diff --git a/report/milestone2.pdf b/report/milestone2.pdf index 681992d..9acf7ba 100644 Binary files a/report/milestone2.pdf and b/report/milestone2.pdf differ diff --git a/report/milestone2.tex b/report/milestone2.tex index 791dab2..0fac938 100644 --- a/report/milestone2.tex +++ b/report/milestone2.tex @@ -209,20 +209,13 @@ \subsection{Experimental question} In this section, I will run experiments to find out how the response time of SUT depends on the number of servers $S$ and replication factor $R$. Additionally, I will investigate whether \get{}s and \set{}s are differently affected by these parameters. Finally, I will find out which operations become more time-consuming as these parameters change. -To this end, I will measure response time (middleware) for every 10th request as a function of $S$ and $R$, and measure how long requests spend in each part of the SUT (based on the timestamps defined in Milestone 1). For each parameter combination, I will run experiments until the 95\% confidence interval (calculated using a two-sided t-test) lies within 5\% of the mean response time, but not less than 3 repetitions. +To this end, I will measure response time (middleware) for every 10th request as a function of $S$ and $R$, and measure how long requests spend in each part of the SUT (based on the timestamps defined in Milestone 1). For each parameter combination, I will run experiments until the 95\% confidence interval of the response time (calculated using a two-sided t-test) lies within 5\% of the mean, but not less than 3 repetitions. \subsection{Hypothesis} I predict the following. \subsubsection{\get{} and \set{} requests} -\get{} and \set{} requests will not be impacted the same way by different setups. - -\get{} requests will be processed faster as we increase $S$ because the same load will be distributed across more threads. Increasing $R$ will have no effect on \get{} requests because replication is only done for \set{} requests (there may be secondary effects due to e.g. write threads requiring more CPU time, but this should be negligible). - -\set{} requests will be strongly affected by $R$. If $R=1$, \set{} requests will be processed faster for higher $S$ because each request is only written to one server, and for a higher $S$ the same load is distirbuted across more write threads. However, if $R>1$, response time of \set{}s increases due to two factors: a) the request is written serially to $R$ servers, and b) not all $R$ responses are received at the same time. Assuming a) is negligible compared to b), we will observe an increase in the mean response time. - -All of this is summarised in Figure~\ref{fig:exp2:hyp:replication}. For \get{} requests, response time will be independent of $R$ for any fixed $S$. For \set{} requests, response time increases linearly with increasing $R$, and the slope increases with $S$. \begin{figure}[h] \centering @@ -231,9 +224,17 @@ \subsubsection{\get{} and \set{} requests} \label{fig:exp2:hyp:replication} \end{figure} +\get{} and \set{} requests will not be impacted the same way by different setups. + +\get{} requests will be processed faster as we increase $S$ because the same load will be distributed across more threads. Increasing $R$ will have no effect on \get{} requests because replication is only done for \set{} requests (there may be secondary effects due to e.g. write threads requiring more CPU time, but this should be negligible). + +\set{} requests will be strongly affected by $R$. If $R=1$, \set{} requests will be processed faster for higher $S$ because each request is only written to one server, and for a higher $S$ the same load is distirbuted across more write threads. However, if $R>1$, response time of \set{}s increases due to two factors: a) the request is written serially to $R$ servers, and b) not all $R$ responses are received at the same time. + +All of this is summarised in Figure~\ref{fig:exp2:hyp:replication}. For \get{} requests, response time will be independent of $R$ for any fixed $S$. For \set{} requests, response time increases linearly with increasing $R$, and the slope increases with $S$. + \subsubsection{Throughput} -I also predict the total throughput will decrease as $R$ increases because the servers will need to do additional work (communicating more with memcached servers). +I predict the total throughput will decrease as $R$ increases because the servers will need to do additional work (communicating more with memcached servers for each \set{}). \subsubsection{Relative cost of operations} As explained previously, more replication means that the middleware needs to send each \set{} request to more servers and wait for more answers. Thus, as $R$ increases, $tMemcached$ will increase. Since each \set{} request takes longer to process, this means that $tQueue$ will increase as well. I also predict that the relative cost of \get{} operations will not change. @@ -243,9 +244,9 @@ \subsubsection{Scalability} In an ideal system, a) there would be enough resources to concurrently run all threads; b) all memcached servers would take an equal and constant amount of time to respond; c) there would be no network latencies; d) dequeueing would take constant time. -For \get{} requests, the ideal system would have linear speed-up (assuming the load balancer does not become a bottleneck). I predict that the SUT will have sublinear speed-up for \get{}s because the response time also includes network latency -- a term that is not dependent on $S$: $response \; time = const. + \frac{const.}{S}$. In addition, since threads compete for resources in the SUT, the speed-up will be even lower than what's predicted by the formula above. +For \get{} requests, the ideal system would have linear speed-up (until the load balancer becomes the bottleneck). I predict that the SUT will have sublinear speed-up for \get{}s because the response time also includes network latency -- a term that is not dependent on $S$: $response \; time = const. + \frac{const.}{S}$. In addition, since threads compete for resources in the SUT, the speed-up will be even lower than what's predicted by the formula above. -For \set{}s, the ideal system would have linear speed-up if $R=const.$ because in that case, adding servers does not increase the amount of work done \emph{per \linkmain{MiddlewareComponent}} (again assuming the load balancer does not become a bottleneck). For full replication the ideal system would have sublinear speed-up because each \set{} will be serially written to $S$ servers so the response time would have a component that linearly depends on $S$. +For \set{}s, the ideal system would have linear speed-up if $R=const.$ because in that case, adding servers does not increase the amount of work done per \linkmain{MiddlewareComponent} (again assuming the load balancer does not become a bottleneck). For full replication the ideal system would have sublinear speed-up because each \set{} is serially written to $S$ servers so the response time would have a component that linearly depends on $S$. \subsection{Experiments} \begin{center} @@ -285,22 +286,22 @@ \subsection{Results} \subsubsection{\get{} requests} -From Figure~\ref{fig:exp2:res:replication} we can see that increasing $R$ from 1 to $S$ does have an impact on the mean response time of \get{} requests (contrary to the hypothesis) and this effect is amplified as $S$ grows. However, the 25\%, 50\%, and 75\% percentiles stay constant, implying that most of the requests aren't affected (in accordance with the hypothesis) -- only the response time of outliers (\get{}s with high response times) increases. Figure~\ref{fig:exp2:res:breakdown} shows that queue time is constant and the increase in response time comes almost entirely from waiting for memcached's response; this means the increase is caused by either increased network latency (due to more traffic at a higher value of $R$) or increased memcached response time. +From Figure~\ref{fig:exp2:res:replication} we can see that increasing $R$ from 1 to $S$ does have an impact on the mean response time of \get{} requests (contrary to the hypothesis) and this effect is amplified as $S$ grows. However, the 25\%, 50\%, and 75\% percentiles stay constant, implying that most requests aren't affected (in accordance with the hypothesis) -- only the response time of outliers increases. Figure~\ref{fig:exp2:res:breakdown} shows that $tQueue$ is constant and the increase in response time comes almost entirely from waiting for memcached's response ($tMemcached$); this means the increase is caused by either increased network latency (due to more traffic at a higher value of $R$) or increased memcached response time. I predicted that increasing $S$ while keeping $R$ constant would decrease the response time of \get{} requests. In fact I was only partly right: the 25\%, 50\%, and 75\% percentiles stay constant, but the mean decreases with $S$ at $R=1$ and increases at $R>1$. Investigating the breakdown of time spent inside the middleware (Figure~\ref{fig:exp2:res:breakdown}) gives an answer: queueing time does decrease with $S$ for all replication levels, but this gain is offset by the increase in time spent waiting for memcached's response. -Given that $tMemcached$ increased with $S$ even when $R$ was constant, we can conclude that the performance degradation was mostly due to networking -- if it had been caused by memcached's slower responses, $tMemcached$ would not have changed with $S$. +Given that $tMemcached$ increased with $S$ even when $R$ was constant, we can conclude that the performance degradation was mostly due to network delays -- if it had been caused by memcached's slower responses, $tMemcached$ would not have changed with $S$. \subsubsection{\set{} requests} \label{sec:exp2:res:set} Figure~\ref{fig:exp2:res:replication} shows that increasing $R$ does increase response time for $S=7$ but unexpectedly, decreases response time for $S=3$. This is counterintuitive: how can a system that is under a higher load also be faster? -From Figure~\ref{fig:exp2:res:breakdown} we see that queueing time actually decreases with $R$ at all values of $S$ and the increase in $tMemcached$ offsets the decrease at $S=5$ and $S=7$. Why, then, do \set{} requests spend less time in the queue as $R$ increases? We can explain this by looking at the architecture of \linkmain{WriteWorker}. Two steps are done in the same loop: first, if the write queue has any elements, one is taken and sent to all $R$ servers. The second step is checking for responses from memcached (waiting up to 1ms using the function \verb+Selector.select(long timeout)+). This means that if there were no responses from memcached servers, the thread just sleeps 1ms. +From Figure~\ref{fig:exp2:res:breakdown} we see that queueing time actually decreases with $R$ at all values of $S$ and the increase in $tMemcached$ offsets the decrease at $S=5$ and $S=7$. Why, then, do \set{} requests spend less time in the queue as $R$ increases? We can explain this by looking at the architecture of \linkmain{WriteWorker}. Two steps are done in the same loop: first, if the write queue has any elements, one is taken and sent to all $R$ servers. The second step is checking for responses from memcached (waiting up to 1ms using the function \verb+Selector.select(long timeout)+). This means that if there were no responses from memcached servers, the thread sleeps at least 1ms. The result of this design is that a system with a larger replication factor -- which means more responses from memcached servers -- sleeps less at \verb+Selector.select()+ and thus can faster go back to processing elements from the queue. -Adding servers at $R=1$ decreases response time to \set{} requests -- this is in line with the hypothesis. For $R>1$ adding servers does not have a linear effect on response time: for 50\% replication, response time (and the time spent in each component) is constant and at full replication response time increases slightly with $S$ because of increased $tMemcached$. +Adding servers at $R=1$ decreases response time to \set{} requests -- this is in line with the hypothesis (see Figure~\ref{fig:exp2:res:servers}). For $R>1$ adding servers does not have a linear effect on response time: for 50\% replication, response time (and the time spent in each component) is constant and at full replication response time increases slightly with $S$ because of increased $tMemcached$. $tMemcached$ is almost constant at 50\% replication because the difference between values of $R$ is small: $R \in \{2,3,4\}$. At full replication tMemcached has a larger effect because the difference is larger: $R \in \{3,5,7\}$. (The slowest response determines $tMemcached$; it can be modelled as the maximum of $R$ samples where each sample is the response time to one request from the middleware to a memcached server.). @@ -313,10 +314,10 @@ \subsubsection{Throughput} \label{fig:exp2:res:throughput} \end{figure} -From Figure~\ref{fig:exp2:res:throughput} we can see that throughput does indeed decrease with $R$ -- which is in line with the hypothesis --, and higher $S$ amplifies this effect. At $S=3$ throughput is almost constant; this is because the value of $R \in \{1,2,3\}$ does not change enough to make a significant difference, similarly to the previous section. Maximum throughput is achieved at $S=5, R=1$ which is likely because in Section~\ref{sec:exp1} we picked the values of $C$ and $T$ that maximised throughput under exactly those parameters. +From Figure~\ref{fig:exp2:res:throughput} we can see that throughput does indeed decrease with $R$ -- which is in line with the hypothesis --, and higher $S$ amplifies this effect. At $S=3$ throughput is almost constant; this is because the value of $R \in \{1,2,3\}$ does not change enough to make a significant difference, similarly to the previous section. Maximum throughput is achieved at $S=5, R=1$ because in Section~\ref{sec:exp1} we picked the values of $C$ and $T$ that maximised throughput under exactly those parameters. \subsubsection{Relative cost of operations} -As hypothesized, increasing $R$ also increases $tMemcached$ for \set{} requests (see Figure~\ref{fig:exp2:res:breakdown}). Unexpected though was the decrease in $tQueue$ for \set{} requests as $R$ increased, and the increase in $tMemcached$ for \get{}s. Both are explained in previous sections of this chapter. +As hypothesized, increasing $R$ also increases $tMemcached$ for \set{} requests (see Figure~\ref{fig:exp2:res:breakdown}). Unexpected though was the decrease in $tQueue$ for \set{} requests as $R$ increased, and the increase in $tMemcached$ for \get{}s. Both are explained in previous sections of this experiment. \subsubsection{Scalability} \begin{figure}[h] @@ -327,7 +328,7 @@ \subsubsection{Scalability} \label{fig:exp2:res:servers} \end{figure} -As Figure~\ref{fig:exp2:res:servers} shows, there is no speed-up for \get{} requests when we add servers, and there is even a slight increase the mean response time. Increasing $S$ does decrease mean response time to \set{} requests, but only at $R=1$ -- as hypothesized -- and sublinearly. At $R>1$ there is no speed-up. In summary, SUT performs significantly worse than the ideal system described in Section~\ref{sec:exp2:hyp:scalability}, and worse than expected. +As Figure~\ref{fig:exp2:res:servers} shows, there is no speed-up for \get{} requests when we add servers, and there is even a slight increase the mean response time. Increasing $S$ does decrease mean response time to \set{} requests, but only at $R=1$ -- as hypothesized -- and sublinearly. At $R>1$ there is no speed-up. In summary, SUT performs significantly worse than the ideal system described in Section~\ref{sec:exp2:hyp:scalability}, and worse than expected. The reasons are laid out in previous sections of this experiment. \clearpage % --------------------------------------------------------------------------------