Rethinking IT and OT - Lessons from Colonial Pipeline and Other Cyber Incidents

Episode 48 March 10, 2025 00:21:22

Hosted By

Aaron Crow

Show Notes

In this episode, host Aaron Crow tackles the ongoing debate of IT versus OT attacks, using the example of the Colonial Pipeline incident and others to illustrate his point. Aaron argues that focusing on whether an attack is an IT or OT issue misses the bigger picture - the real impact on operations. 

Through engaging stories and industry insights, Aaron emphasizes that asset owners ultimately care about operational continuity, revenue, safety, and risk management, rather than rigid definitions. 

Join us as we explore why understanding the broader business risks is crucial and how organizations can better protect themselves in this evolving landscape.

Key Moments; 

 

04:48 Key Role of OSI PI in Utilities

09:05 New Domain Issue: Same Name, No Access

11:26 IT vs. OT Asset Management Dilemma

15:28 OT Cybersecurity: Beyond Securing PLCs

18:47 Blurring Lines Between IT and OT

19:36 Business Risk and Cyber Protection

Connect With Aaron Crow:

 

Learn more about PrOTect IT All:

 

To be a guest or suggest a guest/episode, please email us at [email protected]

 

View Full Transcript

Episode Transcript

Aaron Crow (00:02.092) Hey, thank you for joining me today. I want to talk about IT versus OT. Recently I posted about Aaron Crow (00:12.824) Hey, thank you for joining me. Today I want to talk about IT versus OT. Why the colonial pipeline debate misses the point. Recently posted about colonial pipeline and how that is, you know, talking about OT attacks. And I had quite a few folks that reached out saying, that wasn't an OT attack. And absolutely correct. To be clear, the colonial pipeline attack did not hit a OT system. What is an OT system though? let's talk about why people keep arguing, about it versus OT attacks and why I don't think it matters. I think we're missing the point. every time cloning pipeline gets brought up, cyber security professionals jump on to clarify it wasn't an IT attack or it was an IT attack, not an OT attack. and while that's technically true, since ransomware hit the business, it network and not the pipeline control systems, it misses the bigger picture. to me, the OT systems were shut down as a result of the attack. When I'm an asset owner, I don't care. Did it hit my PLC or did it hit my third party system or did it hit some IT system? At the end of the day, I care. Did my system come down and why did it come down? Was it, did it come down unplanned? Period. Then that's a problem. if you're, if your cyber mindset is stuck in rigid definitions, you're missing the real world impact. asset owners don't care. about whether the root causes IT or OT. They care about operational downtime. They care about lost revenue. They care about safety risks. This is something that we seem to forget. And I don't want to argue on was it IT, was it OT, any of that type of thing, right? But why the IT and OT argument is flawed. It reminds me of working in power utility, NERC CIP compliance. There's a clear definition in NERC CIP. What categorizes something as low versus medium. And is there any single shared cyber asset that can impact more than 1500 megawatts in 15 minutes? It has no time does it say OT system. It says, is there any system that can impact 1500 megawatts in 15 minutes? Then it's deemed a critical asset. Aaron Crow (02:30.114) So when we have these conversations around IT and OT and was it an OT system or not, what, again, I get back to what is an OT system. Now the line is so blurred now because 20 years ago, OT was very obvious because OT, even though we didn't call it that, these systems were proprietary, they were one-offs, were custom built, all that kind of stuff, right? But ever since these vendors started bringing in these commercially off the shelf, IT type systems into our OTSpaces, those lines blurred. It's not as obvious what an OT system is versus an IT system. We have Windows, we have network switches, we have Cisco, we have pallets of firewalls, and we've got VMware, and we've got SQL servers, and we've got active directory, and we've got patching solutions, and we've got... scanning and uptime. There's all these different products and services that are technically IT systems, but they are serving in an OT function. They are serving a OT system and they are providing a value and sometimes a critical value that can impact these things. So a prime example, OSI-Py. If you've worked in OT, especially in power utility, you've probably heard of Py. The joke is that you generate electricity so that you can generate Pi data. But Pi in and of itself is not technically OT. It's not directly connected to control. I can't control the unit. can't control the manufacturing from Pi. Not that you can't. Most of the time, especially I'll talk specifically from my experience, most of the time in a power utility implementation, they're just doing indication. They're just doing logs and they're looking at, you know, uh, data, right? But I'm not actually controlling you. I can't go in and there's no, no amount of, uh, none of that data is directly, um, going into the control system to actually control you. Again, it can, it can absolutely, I've seen it where it's done. Most of the time, that's not the standard. Now that being said, the, walk in a control room, again, let's talk about a power utility. That's where I'm, or I've, I've, I've spent a ton of my career. If you're, if you walk into a power utility in a, in a control room, Aaron Crow (04:51.022) there's gonna be screens up on there. There's gonna be OSI-Py data coming in and building out those dashboards. And an operator is making decisions based on that data. Now, is it directly, is there a trip? Nope. Is it part of NERC-ZIP? Usually not. Is it a 1500 megawatt, 15 minutes? Nope. But if that goes away, what are they gonna do? It depends on what data is there, but let's look at opacity in a SIMS environment. Let's look at... any number of things that an operator may have to make a decision. Maybe they don't have to punch a unit out. Maybe I have to trip it, but it can be impacted because they don't have telemetry that they may need. So if it goes down, there's probably no real time process visibility in some of these areas. Obviously, you've got it on control systems. You may have it on other systems. You may you may have the, you know, lose the ability to make certain decisions. So they may decide to shut something down or to or to adjust. but was that OT? That's the question. No, probably not. And that classical definition of what is OT something that's specifically and directly one-off controls, hardwired connection, valve, a probe, things like that, or is driving a physical reaction. No, PI by that definition is nothing to do with OT. But everything, only talks to OT stuff. It's only getting data from OT. It's only part of the plant management network. So is it OT? Aaron Crow (06:21.656) There's a prime example of another example of this in my career, working at a power plant. A non OT domain controller by definition, if that is the definition of OT, something that has to directly control something, domain controller, active directory domain controller would not be an OT device. I know firsthand, I've got multiple sources that were there with me when it happened. where a domain controller, active directory domain controller took a power plant down. How did that happen? It wasn't a trip, it wasn't an automatic, very similar to what the colonial pipeline, other than it wasn't a bad state actor, it wasn't even cybersecurity related. So this analogy is not to do on the cyber side, it's more around the justification and classification of OT systems. But I was at a power plant, it had multiple units. And in those multiple, multiple units, had a unit one, a unit two, and a common system. The vendor, the DCS vendor, I won't name them, call them vendor A, they were there on site doing an upgrade during an outage. Unit, let's say two was in outage and unit one was still running. Unit two was they were upgrading the control system. So they were upgrading the HMIs, the engineering workstations, operating workstations, the historian, and also the domain. this environment was ran on a flat network. They had different subnets, but it was all interconnected, no firewalls, no nothing. So it was a big network that they could route between. And they had an active directory forest that provided access control to the control system. So you logged into the HMIs and the engineering workstation, et cetera, use an active directory. And all three of the control systems used that same forest. It was one domain, unit one, or it wasn't unit one, it was site. site name, domain controller, whatever. Well, they had multiple domain controllers. had a domain controller in unit one, a domain controller in unit two, and a domain controller in boundary or in a common. Okay. So the, the, the vendor, the control vendor was there. And part of this upgrade, this control system upgrade process was to upgrade that domain. So they went through their script and they did their thing, but the domain controller failed. Like when they were building out, they had a CD that they followed and it was supposed to script at all. It failed. So Aaron Crow (08:45.388) What they'd been told in the past is just, okay, if that happens, not a big deal. There's multiple domain controllers, just build a new domain controller. Well, the person that was building this did not have good experience or understanding of Active Directory. So when they rebuilt the domain controller, they didn't just rebuild the domain controller, they rebuilt the domain. So when they spun it up as a domain controller, they created a new forest, a new Active Directory forest, but called it the same thing as the old one. The domain name was the same. The IP address of the domain controller and the domain controller's name was also the same as the old one. So the same name, same IP address, same domain name, all that kind of stuff, right? The problem is, is that this is a new forest. It doesn't know about any of the old devices. So even though those devices are reaching out to domain controller one at blah, blah, blah, blah, know, 192, 168, one dot one, whatever the IP address was, it's not, when it's responding, it's, it's, it's not authenticating any of those requests because, um, it, even though it's the same name, it has no idea any of these other things. Well, when you build a new domain control in a new forest, there's no other domains. So what happens if you know anything about Active Directory? It takes over all the FISMO roles, which are primary domain control, all of those type of things. So it starts broadcasting itself as the FISMO role holder for this domain. And all of these devices are like, well, I've never taught these other devices. So unit one and common start trying to authenticate to it because it says it's the FISMO holder and it's the right name and IP address, but All of it, all of their requests are getting denied. So in the middle of this upgrade, again, unit two is offline. It's in an outage. They're doing an outage window on this whole time. It's a couple of weeks. I don't remember the exact details. This is many years ago, but the other domain or the other units that the operators in the control room, all of the numbers on their screens started smurfing and what I mean by smurfing, all of the numbers just stopped responding. So they had big goose eggs on, the, on the control screens. Now to be clear, The unit was completely running. There was nothing wrong with it. It was running. was doing what it was supposed to do. But the control room and the operators could not see it. So what did they do? They punched the unit out. They tripped the unit intentionally, brought it down because they couldn't control it. Now they didn't do it in a bad way. They did it in the way they've been trained to shut the unit down. And they did not want to bring the unit back up until they understood what the problem was and what caused it to break. So my question then. Aaron Crow (11:10.41) again, with this logic of this whole argument around was this OT or was this IT, was that an OT incident? Technically, a domain controller is not an OT asset. It's an IT asset, but it's managed by OT. if it's a device that's managed by the OT group, is it OT? What if it's a firewall, but it's managed, it's in the OT space and it's providing access to OT, but it's managed by the IT group? Is it an IT asset or an OT asset? You see the dilemma here in arguing over these little minute details of is this an OT attack and is an IT tech? And I agree. We need to understand what the difference between those things are and where the risks are. But ultimately the risk to these environments and through these OT systems are to, they going to come down? Do you think an attacker cares that he attacks an OT system or an IT system? If, if the end goal is to bring whatever it is down, it's so different than I can attack a substation. by walking in and impacting the IDs or the devices that are in the substation, or I can shoot it with a deer rifle from a mile away, which we've seen happen in North Carolina. That wasn't an OT attack, but it impacted our OT systems. It's a risk. OT cybersecurity is just a risk. It's another risk that needs to be considered in the overall business risk of your system. and arguing over that whether or not it was an IT or an OT, if they didn't attack a PLC or the DCS, then it wasn't OT. Did it, did the IT system take down the operational plant? Yes. Did they know what happened? No. The control vendor was even in, they still brought it down. They argued that nothing they did, did the problem. had to come in, my team came in, looked at all the logs and showed them what happened. They were getting a million failure failed. authentication requests like an hour. It was insane. And it basically just timed out. the token expired from the previous domain controller, all of the data just smurfed. So it goes back to, this was not an OT attack. And again, attack, in this case, it wasn't malicious. It wasn't a cyber attack, but was this an OT incident or was this an IT incident? Nobody from IT was involved. Nobody from IT supports this equipment. Aaron Crow (13:38.646) Nobody from IT even knew what was going on. Nobody at IT was called in to support it. So was it OT or was it IT? What really matters is understanding impact. People get too caught up on technical definitions and miss the actual risk. If an IT system can impact OT, does it really matter if it was a true OT attack? If you're an asset owner, you care about impact. Was your system taken offline? Did you lose visibility? Did your business lose money? Did you have a safety incident? Did you damage equipment? Like these are things that you care about. You don't care if it was an IT system or an OTC. You do, but only to be able to classify it and understand how to protect against it. It doesn't change the impact that your system was down. Even if I had to bring it up, just like this one priming, a perfect example, although it wasn't a cyber attack, it wasn't ransomware. It was the own vendor that caused the problem. It wasn't malicious. But happened. And again, the plant manager didn't care that it wasn't malicious. He was concerned about it going down. And obviously everybody there, their bonuses are tied to availability. They had a unit come off when it wasn't supposed to be in the middle of an outage. They had one unit offline, which meant they were making no money. That was an issue. The real problem, many organizations don't know their own risk. Colonial pipeline, the biggest takeaway was they didn't know whether it was an IT or OT attack. It's the company, the company didn't have a clear understanding of their own cyber risk. They didn't know that the system that was impacted could impact their system and would force them to take down their pipeline. Because if they did, and I guarantee you that system now has different controls. Aaron Crow (15:35.222) And I don't know what they classify it as, if it's classified as IT or OT, but I guarantee you that it's classified and has a risk understanding that it can impact their pipeline and that they can't bill and they can't, they'll have to shut down their system. So my gut says they fixed the problem and they would not argue that this was an IT versus an OT. They understand that it had a direct impact to their OT systems and their output of their product. This tells you everything you need to know. This is exactly why OT cybersecurity is more than just securing PLCs. So again, where do you draw the line? So people argue it's only OT if it's a PLC, if it's a DCS directly, you know, compromised, but in modern environments, these OT rely on so much IT infrastructure firewalls, active directory, like I've already said, right? Monitoring tools that aren't OT, but they correct, they, provide services in this space. So in my opinion, the real takeaway is not whether colonial was an IT or OT attack, it's the impact was on OT. That should be the focus. So my final thought is, can we stop arguing and start understanding the risk? Cyber security leaders stop wasting time debating definitions. Cyber vendors stop wasting time definitions. Like this is not about fear. This is not FUD. This is not any of this. Like these things actually impact your environment. They can impact your environment. And just because you haven't had an incident like this happen doesn't mean that you won't. And sometimes when I hear those conversations of, we've never been impacted by anything cyber in 40 years. We've been running this plant for 40 years. We've never had this problem. Many times and I'll walk in there, they may have had an incident and they don't know. saying that they've been hacked. Aaron Crow (17:32.046) but they could have had an incident like the one I was talking about with Active Directory and they didn't really understand what happened or what caused it. And their vendor told them it wasn't them. It wasn't this. And they didn't have the expertise and understanding or a group like mine to come in and be that third party advisor for them to help them. Because when we call a cyber incident, when a cyber incident shuts down production, your CEO doesn't care if it's IT or OT. They only care what happened and how am I gonna get it back? And how am gonna make sure it doesn't happen again? The care that your business just lost millions of dollars, that you damaged equipment, that somebody's life was hurt, that all of the reputation was damaged, right? IT, OT, it's all business risk. Now, how I mitigate that risk is obviously different in IT and OT. I'm not trying to say it's the same in IT and OT. Absolutely should consider and look at things differently from an IT and OT lens, because they're absolutely different. but that line, it's not black and white. There is no clear line of delineation in any place. And maybe I've just gone to all the bad places in the world, but every place I've ever been, walk in and there is no clear line of definition of where OT starts and where it stops and which systems can impact even secondhand their OT production system. And that's, that's usually where the biggest problem is. They don't realize that they've got single redundancy and they're using You know, IT switches in these environments for OT functions. So it's, it's supported by IT. It's, it's actually serving an OT function, but they don't have it classified as that because it's a Cisco switch. So obviously that's IT. Same thing with, with a windows machine or a firewall or a router or any number of examples of technology that lives in these OT spaces that are not technically OT. So what do you think? Should we stop debating on IT and OT focus on business risk instead? Or do we need to really define what IT and OT is? And we're having these conversations that we can all speak with the same language. My point is, I don't think that there is a clear line of definition. If there's a standard out there that says this is what OT is and this is what IT is. But even with that, everybody's going to have their own definition. Every environment is different. Everybody's got different equipment and capabilities in their environments. The bigger risk or the bigger understanding and the bigger need for these entities. Aaron Crow (19:50.624) is to understand their business risk and to be able to protect against it, be able to stop bad things from happening, whether it's a bad actor, whether it's a cyber attack or whether it's their vendor that makes a mistake and it causes an issue. Ultimately, that's the issue that we're talking about and that we're trying to solve without arguing over, well, that wasn't an IT or that wasn't an OT attack. That was IT. So this doesn't matter. It does matter. Colonial pipeline would tell you it matters. They spend a lot of money. fixing this after the fact that I guarantee you they wish they'd understood that before this happened so that we wouldn't be sitting here talking about colonial pipeline years later and will continue to talk about colonial pipeline for years. Same thing with target. Obviously that wasn't an OT attack or anything to do with OT, but still people talk about the target attack because it's these pivotal moments in cybersecurity and in the news that people remember. It's the game changers. So what are you going to learn from Are you going to argue about the IT and OT? Are you going to go out and do something? Aaron Crow (20:56.046) Thanks a lot.

Other Episodes

Episode 35

December 09, 2024 00:54:45
Episode Cover

Understanding Cybersecurity Risks and Management: Insights from Harry Thomas

In this episode, host Aaron Crow dives into cybersecurity and risk management with guest Harry Thomas, CTO and co-founder of Freanos. This episode tackles...

Listen

Episode 45

February 10, 2025 01:12:29
Episode Cover

From Navy to Consulting - Dan Ricci's Unique Perspective on Bridging Security Gaps

In this episode, host Aaron Crowe speaks to Dan Ricci, founder of the ICS Advisory Project, to delve into OT cybersecurity. Dan brings a...

Listen

Episode

September 23, 2024 00:24:40
Episode Cover

Funding OT Cybersecurity: Priority Setting and Practical Approaches

Welcome to Episode 25 of the Protect It All podcast, titled "Funding OT Cybersecurity: Priority Setting and Practical Approaches." In this episode, host Aaron...

Listen