The Cybersecurity “Eye in the Sky” Now Includes Machine Learning
By Bill Brink
“In Vegas, everybody's gotta watch everybody else. … And the eye in the sky is watching us all.”
Since 1995, when Robert De Niro set the tone with those words in “Casino,” the eye in the sky has undergone some serious LASIK.
In Vegas, it’s money. In Silicon Valley, perhaps it’s a new product. In healthcare, patient records. In government, the nuclear codes. Everyone has something they want to protect, and more often than not these days, those crown jewels live on a hard drive, or a server, or in the cloud. The cybersecurity equivalent of the eye in the sky now incorporates machine learning. The computers don’t just see; they learn and predict, and they do it in real time.
But the machines can’t do it on their own. The market lacks sufficient cybersecurity professionals. And the power of machine learning means the market needs practitioners who grasp its real-world ramifications, not just how to build it.
“People should be thinking really hard about who is going to be impacted by the systems that we build, whether it’s economic impact, moral impact or any other kind of impact,” said Dr. David Steier, a Heinz College Distinguished Service Professor who teaches Introduction to Artificial Intelligence. “All of that is just absolutely critical.”
Because it’s not just the good guys who get to use the latest tech. That’s how we find another Hollywood casino boss, Andy Garcia of “Ocean’s Eleven,” watching surveillance footage of George Clooney and Matt Damon ransacking his vault.
“People should be thinking really hard about who is going to be impacted by the systems that we build," Heinz College Professor David Steier said.
“Machines take me by surprise with great frequency.”
~ Alan Turing
Machine learning is a subset of artificial intelligence, which dates back decades. After Turing cracked the German codes during World War II, he began some of the earliest work in the field. In 1975, the Association for Computing Machinery’s Turing Award went to Allen Newell and Herbert Simon, two pioneers in the field who spent the majority of their careers at Carnegie Mellon. In the 1980s, Newell served as the Ph.D. adviser to Steier, the son of an electrical engineer and a psychologist who operated ham radios and built computer kits as a kid.
“I’d always thought, AI is something that could influence just about everything,” Steier said. “It wasn’t true at the time, but now ...”
The way real-life casinos use artificial intelligence mirrors its deployment in cybersecurity. AI helps casinos monitor betting patterns to detect anomalies: If a couple of bettors who usually wager $20 drop thousands on Steelers-Bengals, perhaps something is amiss. Likewise, the machine-learning models designed to augment network protection systems work by learning what usual behavior looks like so they can flag aberrations.
“The more data you have, the better you’ll be able to identify what’s considered normal activity,” said Randy Trzeciak, the Director of Heinz College’s Master of Science in Information Security Policy and Management program and the Deputy Director of Risk and Resilience in the Software Engineering Institute, a federally funded research and development center at Carnegie Mellon. “It’s not saying ‘normal’ as good or bad. It’s just what you would expect.”
“By the time they figure out what went wrong, we’ll be sitting on a beach, earning 20 percent.”
~ Hans Gruber, “Die Hard”
The average cost of a data breach was $4.35 million globally in 2022, according to IBM, and $9.44 million in the United States. That same study determined that it took an average of 277 days – that’s a pregnancy, folks – to detect and contain a data breach. If that much time passed between when a casino cheat exited the premises and the powers that be found out about it, the cheat would be, as Alan Rickman’s Gruber put it, long gone.
In addition to being able to consume and process information far faster than humans, network security systems with machine-learning models can compare activity to its baseline programming in real time, potentially saving weeks or months of undetected intrusion and millions of dollars. A comprehensive cybersecurity system needs human operators to investigate; to prevent those human operators from losing trust in the system, the model needs to reduce false alarms – normal activity mistakenly classified as abnormal - as much as possible.
“If you are scanning through a billion packets a day, and you have an 0.1 percent false-alarm rate, that sounds pretty good on paper, but you have generated so many alerts that no amount of human personnel could possibly adjudicate all of the alerts in a reasonable fashion,” said Dr. Shing-hon Lau, an AI and Cybersecurity Researcher in the CERT Division of the SEI. “And so your system is effectively worthless, right?”
Advanced Persistent Threats
“They get in, they map out the environment, they have a low-and-slow approach," MSISPM Program Director Randy Trzeciak said.
“What the eyes see, and the ears hear, the mind believes.”
~ Harry Houdini
A bad actor who gains access to a network and immediately downloads every file will quickly attract attention. Today’s cybercriminals are smarter than that.
“An adversary can make your machine learn the wrong thing, do the wrong thing or reveal the wrong thing,” said Dr. Nathan VanHoudnos, a Senior Machine Learning Research Scientist and Lab Lead at the Software Engineering Institute. “You have to be very, very broad about how you defend because you might be going up against multiple attackers.”
They’ll bide their time and extract data in dribs and drabs, slowly expanding the model’s perception of normal network behavior while fooling the system into thinking all is well, like Indiana Jones replacing the golden idol with a bag of sand in “Raiders of the Lost Ark.”
“You’ll hear this described as advanced persistent threats,” Trzeciak said. “They get in, they map out the environment, they have a low-and-slow approach. They slowly exfiltrate data, they learn, and they want to avoid detection, so then they can inject something else that can be more impactful.”
New technology like ChatGPT, a generative AI product that explains quantum physics and writes poetry, can, if asked carefully, create polymorphic malware, which shape-shifts to avoid detection. But threat actors don’t even need the computer’s help: Dr. Lujo Bauer, a Carnegie Mellon Computer Science Professor, co-authored a study published in 2021 called “Malware Makeover” in which the researchers crafted cyber attacks that evaded malware detectors. Their souped-up software fooled one commercial machine learning-enabled antivirus 85 percent of the time.
In addition to practicing basic cyber hygiene to stay safe, organizations must train their model on accurate data, test the system frequently, and close the gap between cybersecurity jobs and capable employees to fill them. They must also collaborate. Industries with similar network traffic patterns, like healthcare and finance, are now sharing information to achieve strength in numbers. Most importantly, the models must be built and trained with their operators – and their operators’ clients, patients and constituents – in mind.
“In cybersecurity, as in other applications of AI, we’re long past the stage where it was enough to build a stand-alone AI demo in a lab,” Steier said. “Whether AI succeeds will depend on whether we can enable humans and AI to work together in the mess of complexity that is the real world.”