This is a lot of ones and zeros. It's what we call binary information. This is how computers talk. It's how they store information. It's how computers think. It's how computers do everything it is that computers do. I'm a cybersecurity researcher, which means my job is to sit down with this information and try to make sense of it, to try to understand what all the ones and zeroes mean. Unfortunately for me, we're not just talking about the ones and zeros I have on the screen here. We're not just talking about a few pages of ones and zeros. We're talking about billions and billions of ones and zeros, more than anyone could possibly comprehend.
Now, as exciting as that sounds, when I first started doing cyber — (Laughter) — when I first started doing cyber, I wasn't sure that sifting through ones and zeros was what I wanted to do with the rest of my life, because in my mind, cyber was keeping viruses off of my grandma's computer, it was keeping people's Myspace pages from being hacked, and maybe, maybe on my most glorious day, it was keeping someone's credit card information from being stolen. Those are important things, but that's not how I wanted to spend my life.
But after 30 minutes of work as a defense contractor, I soon found out that my idea of cyber was a little bit off. In fact, in terms of national security, keeping viruses off of my grandma's computer was surprisingly low on their priority list. And the reason for that is cyber is so much bigger than any one of those things. Cyber is an integral part of all of our lives, because computers are an integral part of all of our lives, even if you don't own a computer. Computers control everything in your car, from your GPS to your airbags. They control your phone. They're the reason you can call 911 and get someone on the other line. They control our nation's entire infrastructure. They're the reason you have electricity, heat, clean water, food. Computers control our military equipment, everything from missile silos to satellites to nuclear defense networks. All of these things are made possible because of computers, and therefore because of cyber, and when something goes wrong, cyber can make all of these things impossible.
But that's where I step in. A big part of my job is defending all of these things, keeping them working, but once in a while, part of my job is to break one of these things, because cyber isn't just about defense, it's also about offense. We're entering an age where we talk about cyberweapons. In fact, so great is the potential for cyber offense that cyber is considered a new domain of warfare. Warfare. It's not necessarily a bad thing. On the one hand, it means we have whole new front on which we need to defend ourselves, but on the other hand, it means we have a whole new way to attack, a whole new way to stop evil people from doing evil things.
So let's consider an example of this that's completely theoretical. Suppose a terrorist wants to blow up a building, and he wants to do this again and again in the future. So he doesn't want to be in that building when it explodes. He's going to use a cell phone as a remote detonator. Now, it used to be the only way we had to stop this terrorist was with a hail of bullets and a car chase, but that's not necessarily true anymore. We're entering an age where we can stop him with the press of a button from 1,000 miles away, because whether he knew it or not, as soon as he decided to use his cell phone, he stepped into the realm of cyber. A well-crafted cyber attack could break into his phone, disable the overvoltage protections on his battery, drastically overload the circuit, cause the battery to overheat, and explode. No more phone, no more detonator, maybe no more terrorist, all with the press of a button from a thousand miles away.
So how does this work? It all comes back to those ones and zeros. Binary information makes your phone work, and used correctly, it can make your phone explode. So when you start to look at cyber from this perspective, spending your life sifting through binary information starts to seem kind of exciting.
But here's the catch: This is hard, really, really hard, and here's why. Think about everything you have on your cell phone. You've got the pictures you've taken. You've got the music you listen to. You've got your contacts list, your email, and probably 500 apps you've never used in your entire life, and behind all of this is the software, the code, that controls your phone, and somewhere, buried inside of that code, is a tiny piece that controls your battery, and that's what I'm really after, but all of this, just a bunch of ones and zeros, and it's all just mixed together. In cyber, we call this finding a needle in a stack of needles, because everything pretty much looks alike. I'm looking for one key piece, but it just blends in with everything else.
So let's step back from this theoretical situation of making a terrorist's phone explode, and look at something that actually happened to me. Pretty much no matter what I do, my job always starts with sitting down with a whole bunch of binary information, and I'm always looking for one key piece to do something specific. In this case, I was looking for a very advanced, very high-tech piece of code that I knew I could hack, but it was somewhere buried inside of a billion ones and zeroes. Unfortunately for me, I didn't know quite what I was looking for. I didn't know quite what it would look like, which makes finding it really, really hard. When I have to do that, what I have to do is basically look at various pieces of this binary information, try to decipher each piece, and see if it might be what I'm after. So after a while, I thought I had found the piece I was looking for. I thought maybe this was it. It seemed to be about right, but I couldn't quite tell. I couldn't tell what those ones and zeros represented. So I spent some time trying to put this together, but wasn't having a whole lot of luck, and finally I decided, I'm going to get through this, I'm going to come in on a weekend, and I'm not going to leave until I figure out what this represents. So that's what I did. I came in on a Saturday morning, and about 10 hours in, I sort of had all the pieces to the puzzle. I just didn't know how they fit together. I didn't know what these ones and zeros meant. At the 15-hour mark, I started to get a better picture of what was there, but I had a creeping suspicion that what I was looking at was not at all related to what I was looking for. By 20 hours, the pieces started to come together very slowly — (Laughter) — and I was pretty sure I was going down the wrong path at this point, but I wasn't going to give up. After 30 hours in the lab, I figured out exactly what I was looking at, and I was right, it wasn't what I was looking for. I spent 30 hours piecing together the ones and zeros that formed a picture of a kitten. (Laughter) I wasted 30 hours of my life searching for this kitten that had nothing at all to do with what I was trying to accomplish.
So I was frustrated, I was exhausted. After 30 hours in the lab, I probably smelled horrible. But instead of just going home and calling it quits, I took a step back and asked myself, what went wrong here? How could I make such a stupid mistake? I'm really pretty good at this. I do this for a living. So what happened? Well I thought, when you're looking at information at this level, it's so easy to lose track of what you're doing. It's easy to not see the forest through the trees. It's easy to go down the wrong rabbit hole and waste a tremendous amount of time doing the wrong thing. But I had this epiphany. We were looking at the data completely incorrectly since day one. This is how computers think, ones and zeros. It's not how people think, but we've been trying to adapt our minds to think more like computers so that we can understand this information. Instead of trying to make our minds fit the problem, we should have been making the problem fit our minds, because our brains have a tremendous potential for analyzing huge amounts of information, just not like this. So what if we could unlock that potential just by translating this to the right kind of information? So with these ideas in mind, I sprinted out of my basement lab at work to my basement lab at home, which looked pretty much the same. The main difference is, at work, I'm surrounded by cyber materials, and cyber seemed to be the problem in this situation. At home, I'm surrounded by everything else I've ever learned. So I poured through every book I could find, every idea I'd ever encountered, to see how could we translate a problem from one domain to something completely different?
The biggest question was, what do we want to translate it to? What do our brains do perfectly naturally that we could exploit? My answer was vision. We have a tremendous capability to analyze visual information. We can combine color gradients, depth cues, all sorts of these different signals into one coherent picture of the world around us. That's incredible. So if we could find a way to translate these binary patterns to visual signals, we could really unlock the power of our brains to process this stuff. So I started looking at the binary information, and I asked myself, what do I do when I first encounter something like this? And the very first thing I want to do, the very first question I want to answer, is what is this? I don't care what it does, how it works. All I want to know is, what is this? And the way I can figure that out is by looking at chunks, sequential chunks of binary information, and I look at the relationships between those chunks. When I gather up enough of these sequences, I begin to get an idea of exactly what this information must be. So let's go back to that blow up the terrorist's phone situation. This is what English text looks like at a binary level. This is what your contacts list would look like if I were examining it. It's really hard to analyze this at this level, but if we take those same binary chunks that I would be trying to find, and instead translate that to a visual representation, translate those relationships, this is what we get. This is what English text looks like from a visual abstraction perspective. All of a sudden, it shows us all the same information that was in the ones and zeros, but show us it in an entirely different way, a way that we can immediately comprehend. We can instantly see all of the patterns here. It takes me seconds to pick out patterns here, but hours, days, to pick them out in ones and zeros. It takes minutes for anybody to learn what these patterns represent here, but years of experience in cyber to learn what those same patterns represent in ones and zeros. So this piece is caused by lower case letters followed by lower case letters inside of that contact list. This is upper case by upper case, upper case by lower case, lower case by upper case. This is caused by spaces. This is caused by carriage returns. We can go through every little detail of the binary information in seconds, as opposed to weeks, months, at this level. This is what an image looks like from your cell phone. But this is what it looks like in a visual abstraction. This is what your music looks like, but here's its visual abstraction. Most importantly for me, this is what the code on your cell phone looks like. This is what I'm after in the end, but this is its visual abstraction. If I can find this, I can't make the phone explode. I could spend weeks trying to find this in ones and zeros, but it takes me seconds to pick out a visual abstraction like this.
One of those most remarkable parts about all of this is it gives us an entirely new way to understand new information, stuff that we haven't seen before. So I know what English looks like at a binary level, and I know what its visual abstraction looks like, but I've never seen Russian binary in my entire life. It would take me weeks just to figure out what I was looking at from raw ones and zeros, but because our brains can instantly pick up and recognize these subtle patterns inside of these visual abstractions, we can unconsciously apply those in new situations. So this is what Russian looks like in a visual abstraction. Because I know what one language looks like, I can recognize other languages even when I'm not familiar with them. This is what a photograph looks like, but this is what clip art looks like. This is what the code on your phone looks like, but this is what the code on your computer looks like. Our brains can pick up on these patterns in ways that we never could have from looking at raw ones and zeros. But we've really only scratched the surface of what we can do with this approach. We've only begun to unlock the capabilities of our minds to process visual information. If we take those same concepts and translate them into three dimensions instead, we find entirely new ways of making sense of information. In seconds, we can pick out every pattern here. we can see the cross associated with code. We can see cubes associated with text. We can even pick up the tiniest visual artifacts. Things that would take us weeks, months to find in ones and zeroes, are immediately apparent in some sort of visual abstraction, and as we continue to go through this and throw more and more information at it, what we find is that we're capable of processing billions of ones and zeros in a matter of seconds just by using our brain's built-in ability to analyze patterns.
So this is really nice and helpful, but all this tells me is what I'm looking at. So at this point, based on visual patterns, I can find the code on the phone. But that's not enough to blow up a battery. The next thing I need to find is the code that controls the battery, but we're back to the needle in a stack of needles problem. That code looks pretty much like all the other code on that system.
So I might not be able to find the code that controls the battery, but there's a lot of things that are very similar to that. You have code that controls your screen, that controls your buttons, that controls your microphones, so even if I can't find the code for the battery, I bet I can find one of those things. So the next step in my binary analysis process is to look at pieces of information that are similar to each other. It's really, really hard to do at a binary level, but if we translate those similarities to a visual abstraction instead, I don't even have to sift through the raw data. All I have to do is wait for the image to light up to see when I'm at similar pieces. I follow these strands of similarity like a trail of bread crumbs to find exactly what I'm looking for.
So at this point in the process, I've located the code responsible for controlling your battery, but that's still not enough to blow up a phone. The last piece of the puzzle is understanding how that code controls your battery. For this, I need to identify very subtle, very detailed relationships within that binary information, another very hard thing to do when looking at ones and zeros. But if we translate that information into a physical representation, we can sit back and let our visual cortex do all the hard work. It can find all the detailed patterns, all the important pieces, for us. It can find out exactly how the pieces of that code work together to control that battery. All of this can be done in a matter of hours, whereas the same process would have taken months in the past.
This is all well and good in a theoretical blow up a terrorist's phone situation. I wanted to find out if this would really work in the work I do every day. So I was playing around with these same concepts with some of the data I've looked at in the past, and yet again, I was trying to find a very detailed, specific piece of code inside of a massive piece of binary information. So I looked at it at this level, thinking I was looking at the right thing, only to see this doesn't have the connectivity I would have expected for the code I was looking for. In fact, I'm not really sure what this is, but when I stepped back a level and looked at the similarities within the code I saw, this doesn't have similarities like any code that exists out there. I can't even be looking at code. In fact, from this perspective, I could tell, this isn't code. This is an image of some sort. And from here, I can see, it's not just an image, this is a photograph. Now that I know it's a photograph, I've got dozens of other binary translation techniques to visualize and understand that information, so in a matter of seconds, we can take this information, shove it through a dozen other visual translation techniques in order to find out exactly what we were looking at. I saw — (Laughter) — it was that darn kitten again. All this is enabled because we were able to find a way to translate a very hard problem to something our brains do very naturally.
So what does this mean? Well, for kittens, it means no more hiding in ones and zeros. For me, it means no more wasted weekends. For cyber, it means we have a radical new way to tackle the most impossible problems. It means we have a new weapon in the evolving theater of cyber warfare, but for all of us, it means that cyber engineers now have the ability to become first responders in emergency situations. When seconds count, we've unlocked the means to stop the bad guys.
Chris Domas is a cybersecurity researcher, operating on what's become a new front of war, "cyber." In this engaging talk, he shows how researchers use pattern recognition and reverse engineering (and pull a few all-nighters) to understand a chunk of binary code whose purpose and contents they don't know.
Chris Domas is an embedded systems engineer and cybersecurity researcher.
Chris Domas is an embedded systems engineer and cybersecurity researcher.