Redundancy scheduling has attracted strong interest as a mechanism to improve the delay performance in cloud computing applications. The key element in redundancy scheduling is the replication of jobs upon arrival. Replicas of each incoming job are dispatched to, say d, different servers. By creating multiple service opportunities, redundancy scheduling boosts the chance of a fast response from a server that is swift to provide service, and alleviates the risk of a long delay incurred when a job is assigned to a single server that may be slow. When the job sizes of the various replicas are independent and highly variable (e.g., heavy-tailed), concurrent service provides some degree of immunity from extremely long run times, and even tends to reduce the total amount of service capacity devoted to a job. In contrast, when the job sizes are less variable or highly variable but correlated, which is the typical case observed in practice, concurrent service can cause a substantial wastage of service capacity, and at high degrees of replication it can in fact nullify the performance gains from the power-of d choices and adversely affect stability properties. Thus, the performance gains from the c.o.c. variant of redundancy sensitively depend on the characteristics of the joint distribution of the job sizes of the various replicas. This claim is supported by empirical evidence which shows that, especially in case of highly variable job sizes, redundancy may improve performance, in terms of the expectation and the tail of the response time.
Analytical results for redundancy scheduling are unfortunately scarce however, and have largely remained limited to exponentially distributed job sizes. Expressions for the expected response time and the stability condition for the c.o.c. variant of redundancy scheduling are known in the scenario with uniform selection of the servers, exponential job sizes, i.i.d. replicas and homogeneous servers, i.e., the server speeds of all servers are equal.
In this thesis we analyze three key performance metrics within redundancy scheduling, i.e., the stability condition, the expected response time and the tail behavior of the response time. In particular, we establish the stability condition for the c.o.c. variant of redundancy scheduling with the FCFS discipline and scaled Bernoulli service requirements. In addition, we prove that for this variant with identical replicas, general job size distributions and suitable type-dependent assignment probabilities the stability region for d = 1 is strictly larger than the stability region for d > 1. We establish that the same statement holds for i.i.d. replicas and New-Better-than-Used (NBU) distributed job sizes. In case of non-observable job types the stability region for d = N is larger than or equal to the stability region for d = 1 when the job sizes are New-Worse-than-Used (NWU) distributed. For the PS discipline in the scenario with generally distributed job sizes and possible dependence among the replica sizes of a job being governed by some joint distribution, we prove the stability condition by construction of suitable lower and upper bound systems.
For improving the expected response time, we propose two threshold-based policies, i.e., the rerouting and replication policy, and investigate the effectiveness of these policies in the presence of underlying job-server affinity relations. More specifically, we determine the effective load per server under both policies for completely unknown and partly known job types and derive an approximation for the expected response time. In addition, we propose two policies, i.e., the delta-probe and delta-probe preemption policy, that outperform the c.o.c. variant of redundancy scheduling in terms of the expected response time. The main idea behind the policies is that an additional probe task is launched to find the server that is the fastest for a particular job or has the smallest current workload, depending on the specific policy. The use of probes ensures that there is only one server working on the actual job. Lastly, we examine the tail behavior of the response time for both the c.o.s. and c.o.c. variants of redundancy scheduling when job sizes are heavy-tailed. For regularly varying job sizes with index the tail index of the response time for the c.o.c. variant of redundancy with the FCFS discipline equals 1 − v and 1 – dv for identical and i.i.d. replicas, respectively. For the LCFS-PR and PS discipline we consider the more general fork-join model, which subsumes redundancy-d scheduling as a special case. For both service disciplines we show that the response time tail is just as heavy as the job size tail, which implies that these disciplines achieve better tail asymptotics than the FCFS discipline.