The Fundamentals of Heavy-Tails:
Properties, Emergence, and Identification
Heavy-tails are a continual source of excitement and confusion across disciplines as they are repeatedly "discovered" in new contexts. This is especially true within computer systems, where heavy-tails seemingly pop up everywhere -- from degree distributions in the internet and social networks to file sizes and interarrival times of workloads. However, despite nearly a decade of work on heavy-tails they are still treated as mysterious, surprising, and even controversial.
The goal of our forthcoming book is to show that heavy-tailed distributions need not be mysterious and should not be surprising or controversial. In particular, we will attempt to demystify heavy-tailed distributions by showing how to reason formally about their counter-intuitive properties; we will highlight that their emergence should be expected (not surprising) by showing that a wide variety of general processes lead to heavy-tailed distributions; and we will highlight that most of the controversy surrounding heavy-tails is the result of bad statistics, and can be avoided by using the proper tools.
The book will cover mathematically deep concepts such as the generalized central limit theorem, extreme value theory, and regular variation; but will do so using only elementary mathematical tools in order to make these topics accessible to anyone who has had an introductory probability course.
A more detailed overview of the topics to be included in the book is below. Additionally, the slides from recent tutorials we given on heavy-tails provide a high-level glimpse into the topics and perspective of the book.
If you would like to be placed on our mailing list so that we can alert you when the book is available, please email Adam Wierman (firstname.lastname@example.org).
Table of contents
- Surprising? Mysterious? Controversial?
- More "normal" than the Normal
- Demystifying heavy tails
- An overview of this book
- Heavy-tailed distributions
- Defining "heavy-tailed"
- Examples of heavy-tailed distributions
- Advanced material: Multivariate distributions
Part I: Properties
- Scale invariance, power laws, and regular variation
- Scale invariance and power laws
- Approximate scale invariance and regular variation
- Properties of regular variation
- Conspiracies and catastrophes
- Subexponential distributions
- An example: Random Sums
- Advanced material: Variations on conspiracies and catastrophies
- Residual lives and hazard rates
- Heavy tails and residual lives
- Long-tailed distributions
- An example: Random extrema
Part II: Emergence
- Additive processes
- The central limit theorem
- Generalizing the central limit theorem
- Understanding stable distributions
- The generalized central limit theorem
- An example: Heavy-tails in random walks
- The multiplicative central limit theorem
- Variations on multiplicative processes
- An example: Preferential attachment and Yule processes
- A limit theorem for maxima
- Understanding max-stable distributions
- The extremal central limit theorem
- An example: Extremes of random walks
- A variation: The time between record breaking events
Part III: Identification
- Identifying power-law distributions: Listen to your body
- Identifying power-law tails: Let the tail do the talking
- Dependencies and tails: Careful copula construction
Jayakrishnan Nair received his PhD from California Institute of Technology (Caltech) in 2012. His PhD thesis focused on scheduling for heavy-tailed and light-tailed workloads in queueing systems. He is currently a post-doctoral scholar at CWI in the Netherlands. His research interests include modeling, performance evaluation, and design issues in queueing systems and communication networks. Jayakrishnan was a recipient of the best paper award at IFIP Performance, 2010.
Adam Wierman is a Professor in the Department of Computing and Mathematical Sciences at the California Institute of Technology, where he is a member of the Rigorous Systems Research Group (RSRG). He received his Ph.D., M.Sc. and B.Sc. in Computer Science from Carnegie Mellon University in 2007, 2004, and 2001, respectively. His research interests center around resource allocation and scheduling decisions in computer systems and services. More specifically, his work focuses both on developing analytic techniques in stochastic modeling, queueing theory, scheduling theory, and game theory, and applying these techniques to application domains such as energy-efficient computing, data centers, social networks, and electricity markets. He received the 2011 ACM SIGMETRICS Rising Star award for outstanding contributions to the computer/communication performance evaluation by a junior researcher, and has been co-recipient of best paper awards at ACM SIGMETRICS, IEEE INFOCOM, IEEE Power & Energy Society General Meeting, IFIP Performance, IEEE Green Computing Conference, and ACM GREENMETRICS. He was named a Seibel Scholar, received an Okawa Foundation grant, and received an NSF CAREER grant. Additionally, his dissertation received the CMU School of Computer Science Distinguished Dissertation Award. He has also received multiple teaching awards, including the Associated Students of the California Institute of Technology (ASCIT) Teaching Award. Wierman has more than 60 refereed publications and serves as an Associate Editor for the Operations Research journal and on the editorial board of the Performance Evaluation journal and the IEEE Transactions on Cloud Computing.
Bert Zwart is currently a senior researcher at CWI, where he leads the Probability and Stochastic Networks group. He also holds a full professor position at VU University Amsterdam, is senior fellow at Eurandom, and holds an adjunct professor position at the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Institute of Technology, where he was holding a Coca-Cola Chair until 2008. Bert Zwart is the 2008 recipient of the Erlang prize for outstanding contributions to applied probability by a researcher not older than 35 years old, and an IBM faculty award. His research is concerned with the application of analytic and probabilistic asymptotic methods to applied probability models in computer systems, communication networks, customer contact centers, and manufacturing systems. Dr. Zwart has published more than 70 refereed publications and is council member of the Applied Probability Society of INFORMS. Dr. Zwart has been area editor of Stochastic Models for Operations Research, the flagship journal of his profession, from 2009-2011. In addition, dr. Zwart is editor-in-chief (with J.K. Lenstra and M. Trick) of the journal Surveys in Operations Research and Management Science, and serves on the editorial board of Mathematics of Operations Research, Mathematical Methods of Operations Research, Operations Research, Queueing Systems and Stochastic Systems. He is a recipient of Veni and Vidi research grants from NWO.