Can statistics be used to predict data breaches?

By Todd M. Rowe on October 8, 2015

As the East Coast closely watched meteorologists’ models and predictions to prepare for Hurricane Joaquin, it may be a good time to consider the role of using statistics and models to predict the next data breach.

A recent study entitled Hype And Heavy Tails: A Closer Look At Data Breaches uses statistics and modeling to call into question how we view data breaches. Despite the increase in media reports on data breaches since 2005, the statistical models in this study suggest large-scale data breaches, such as those seen with Anthem and Home Depot, may actually be decreasing. Additionally, this trend may continue as the study found the chances of seeing two large-scale data breaches the size of the Home Depot breach (September 2014) and the Anthem data breach (January 2015) occurring within four months of each other is unlikely.

Based on data taken from the Privacy Rights Clearinghouse (PRC), the study also concludes:

  • Breach Size: The statistical modeling indicates “there is a 1.2% chance of another Anthem-sized breach occurring between February 19, 2015 and…June 2015.” On the other hand, there was a 70% probability that there will be a breach of at least one million records during the same timeframe.
  • Predictions: The statistical modeling also indicates that over the next three years there is a 7.8% chance of a breach equaling the size of the Anthem breach. There is only a 0.4% chance of two data breaches equaling Anthem and Home Depot occurring within a year of each other.

Commentators interpreting the results of this study indicate that large-scale data breaches may not be on the rise “precisely because computer security experts have been vigilant in the face of these risks.” This study also supports the theory that there is a “cybersecurity arms race” taking place between hackers and security experts. The number of breaches may be staying consistent because security measures and hacker’s techniques are evolving at an equal pace.

In discussing these results, the researchers warned: “Our results aren’t necessarily aimed at individual organizations, and may be more relevant to policymakers who make decisions based on media and industry reports.”

This is not the first time statistical modeling has been used in an effort to gain a better understanding of data breaches. Catastrophe modelers have considered using modeling for data breaches similar to that used to predict hurricanes. Therefore, even if there is not sufficient historical data to predict the next data breach with precision, statistics and modeling provide valuable insight into the risks associated with cybersecurity. Any method that allows us to gain a better understanding of this risk should not be ignored.

Todd M. Rowe is an attorney in the Chicago office of Tressler LLP. He focuses his practice in insurance coverage representing specialty, property and commercial lines insurers in litigation and non-litigation disputes. He also regularly provides guidance on issues related to policy analysis and drafting and claims handling procedures. Todd has actively practiced in Wisconsin, Michigan and Illinois and has been involved in a number of insurance coverage matters in various other states.