Writing the Day After Tomorrow’s Newspaper, Today

Two teams of researchers led by USC Viterbi’s Information Sciences Institute are developing techniques to better predict cyberattacks and terrorism

In the 2002 movie “Minority Report,” based on a short story by science fiction writer Philip K. Dick, police use technology to arrest and convict murderers before they commit their crime.

The technology isn’t at that level yet, but scholars at the Information Sciences Institute at USC Viterbi find themselves on the cutting edge of foundational research into how to better forecast the likelihood of similarly bad stuff, such as cyberattacks and terrorism.

Supported by $26 million in grants, two teams of ISI researchers are developing technologies to help predict evolving cyberthreats and future events. Their projects are called EFFECT and SAFE.

“This is a massive investment on the part of the government,” said Premkumar “Prem” Natarajan, executive director of ISI, vice dean of engineering at USC Viterbi and a professor of computer science. “We’re not very good at [forecasting] right now. The whole discipline is in its infancy. That’s why these research efforts have been started.”

The goal of EFFECT and SAFE, Natarajan said, comes down to figuring out how to write the day after tomorrow’s newspaper, today.

“Obviously, we can never do that,” he said, “but maybe we can write some sections of it and put some probabilities around it, and say, ‘80 percent of the time, this prediction is going to be true,’ so you can prepare for things.”

EFFECT — which stands for Effectively Forecasting Evolving Cyber Threats — seeks to develop technologies that could counter cyber-threats by anticipating large-scale breaches before they happen.

SAFE — Signals Intelligence-based Anticipation of Future Events — aims to develop methods for forecasting significant events such as terrorist attacks, military actions and violent protests in the Middle East and regions of North Africa.

ISI is on the ground floor of this brave new world of forecasting research, which involves using computers to discern patterns from massive portions of observable data and feeding the data into mathematical models. Researchers are designing algorithms, characterizing their performance, then selecting the best ones and putting them to work.

“We’re not designing a better nail or corkscrew — we’re actually inventing a new thing,” said Natarajan. “This is like the early days of speech recognition research, when folks were asking themselves, My God, how do we do this?”

Data and Models

One component of the EFFECT team’s research involves looking at platforms like Twitter, security blogs and “dark web” chatter, and then extracting structured information about cyberattacks, exploits and vulnerabilities. Another component involves using reverse name lookups to detect large scanning campaigns.

Say, for example, you have a set of tweets, security blog posts and dark web chatter going back several years, and you want to predict cyberattacks in the near future.

 “You could theoretically hire a stable of people to go through and read all of this data and count things like who has attacked who and when and with what software, but practically speaking, that’s a nonstarter,” said Elizabeth Boschee, a research lead working on EFFECT and a senior computer scientist at ISI. “It would take way, way too long, and you couldn’t keep up on a daily basis.”

 Instead, the EFFECT team is using natural language processing technology to perform the task. A computer reads the reams of data, identifies all of the cyberattacks, exploits, software and hardware vulnerabilities, organizations and places mentioned, and then puts all that information in a database that a model can query quantitatively. Thus, a researcher could ask the computer, “Tell me how many attacks against Microsoft Word occurred in each month of 2016 and 2017.”

 “This would then produce a ‘time series’ that can be used in a mathematical model. In the simplest case, the model could just look at the time series and project it into the future,” Boschee explained. “If the number of such attacks has been rising each month, it can predict that it will continue to rise as time goes on.”

Leading EFFECT are Kristina Lerman, a project leader at ISI and a research associate professor in USC Viterbi’s Department of Computer Science; Craig Knoblock, a research director in ISI’s Information Integration Group and a research professor of computer science and spatial sciences; and Stephen Schwab, an ISI project leader.

Staying Safe

“Generally speaking, our ability to do better-than-random forecasting relies upon existing structure and statistical regularities in the data,” said Aram Galstyan, research director of the Machine Intelligence and Data Science group at ISI and an associate professor in computer science. He is co-principal investigator on SAFE.

Among the tools SAFE researchers use are graph-based models, information-theoretic prediction techniques and causal modeling. Our ability to predict the likelihood of certain events through modeling and algorithms depends on the process we want to forecast and the type of data available for making such forecasts, he explained. For example, an analysis of violent events, such as terrorist activities, suggests that those events do not happen at completely random times, but have certain spatial and temporal correlations that can be exploited for more accurate forecasts. Boschee says it’s too early to say how good we can get at predicting future events.

“There already has been work showing that these models can do a better job than even expert humans on some tasks,” she said. “But we don’t know yet what the ceiling is for how well the models can do.”

It’s very difficult to predict an attack will happen at a very specific time and place, the ISI researchers said. It’s easier to predict in aggregate and predict, say, that there will be 20 percent more attacks this month than last month.

Yet there is tremendous economic and societal value to such fore-
casting ability that extends to such things as predicting the spread of an illness following a natural disaster, such as cholera after a massive flood, Natarajan noted.