Other Publications

Other Publications

Here is a list of all publications from researchers of the Math Citadel in other venues. Click the link to follow each to their host site.

        Rachel Traylor, Dell EMC; Andrzej Korzeniowski, University of Texas at Arlington (February 22, 2017)
          International Journal of Statistics and Probability, Volume 6, No. 2

       Server resource allocation and traffic management is a large area of research and business concern in order to ensure proper functionality and maintenance procedures. As a result, good server reliability models that can incorporate workload and traffic stress are necessary. This paper generalizes previous dynamic server reliability models for partitioned servers with clustered-task selection by relaxing the assumption that the correlation between channels in the server remain constant. We allow the correlation to vary deterministically with time, or as a function of a random process in discrete or continuous time. The explicit form of the survival function is derived in such cases. Numerical illustrations demonstrate the dangers of erroneously assuming independence among channels, which can lead to costly and unnecessary interventions in the system. In addition, we numerically explore the effects of a variable correlation on the survival function. 

        Rachel Traylor, Dell EMC; Andrzej Korzeniowski, University of Texas at Arlington (July 1, 2016)
          International Journal of Statistics and Probability, Volume 5, No. 4

    Suppose a single server has K channels, each of which performs a different task. Customers arrive to the server via a nonhomogenous Poisson process and select 0 to K tasks for the server to perform. Each channel services the tasks in its queue independently, and the customer’s job is complete when the last task selected is complete. The stress to the server is a constant multiple of the number of tasks selected by each customer, and thus the stress added to the server by each customer is random. Under this model, we provide the survival function for such a server in both the case of independently selected channels and correlated channels. A numerical comparison of expected lifetimes for various arrival rates is given, and the relationship between the dependency of channel selection and expected server lifetime is presented.

        Ao Ma, Rachel Traylor, Fred Douglis, Mark Chamness, Guanlin Lu, Darren Sawyer, EMC Corporation; Surendar Chandra, Windsor Hsu, Datrium, Inc. (November 1, 2015)
        ACM Transactions on Storage. Volume 11, Issue 4

Modern storage systems orchestrate a group of disks to achieve their performance and reliability goals. Even though such systems are designed to withstand the failure of individual disks, failure of multiple disks poses a unique set of challenges. We empirically investigate disk failure data from a large number of production systems, specifically focusing on the impact of disk failures on RAID storage systems. Our data covers about one million SATA disks from six disk models for periods up to 5 years. We show how observed disk failures weaken the protection provided by RAID. The count of reallocated sectors correlates strongly with impending failures.

With these findings we designed RAIDShield, which consists of two components. First, we have built and evaluated an active defense mechanism that monitors the health of each disk and replaces those that are predicted to fail imminently. This proactive protection has been incorporated into our product and is observed to eliminate 88% of triple disk errors, which are 80% of all RAID failures. Second, we have designed and simulated a method of using the joint failure probability to quantify and predict how likely a RAID group is to face multiple simultaneous disk failures, which can identify disks that collectively represent a risk of failure even when no individual disk is flagged in isolation. We find in simulation that RAID-level analysis can effectively identify most vulnerable RAID-6 systems, improving the coverage to 98% of triple errors.