TruSTAR CEO and Co-Founder, Patrick Coughlin, recently sat down with Dave McComb, President of Semantic Arts, to talk through what it means to be Data-Centric in a Data-Driven world. Read the conversation below and watch the webinar here.
Patrick Coughlin: We're going to be talking about a topic that is top of mind for many security leaders, which is how to architect security programs, architectures, metrics in a Data-Centric world. Before we jump in, Dave, give us a little bit about your background and how you came to author the book, The Data Centric Revolution.
Dave McComb: I started my career with Anderson Consulting, which is the company that eventually became Accenture, and had a great career building custom enterprise applications. I started observing what I was doing and what I was seeing happening in the enterprise, and realized that it was completely arbitrary and reactionary. People just kept building and adding on, which eventually became this set of silos that we now call a data scape. I started thinking, is there a principled way to think about the problem such that every design wouldn't be arbitrarily different?
I found some very early work on semantics and built a dot com around what I had learned, but I had to go public in the winter of '99, which dragged into the spring of 2000 in the IPO window, so we were dead before arrival. We had an architecture, technology, and patents - we had patented probably the first fully model driven development environment - but none of that was available to us because it was in the smoldering wreckage of our dot com. At that point we started Semantic Arts in order to consult with companies about what we learned and how to think about, simplify, and rationalize data scapes.
Patrick Coughlin: Your book, Software Wasteland, was a real wake-up call for us in the security space in terms of always looking for that “one ring to rule them all”, the one tool that's going to solve all of our data challenges. Of course, we’re still looking for it and it's mind blowing to me how many single panes of glass one person could have. Following Software Wasteland, you wrote The Data Centric Revolution. In this book, you talk about restoring sanity. What was really the insane issue that you were seeing?
Dave McComb: They say the definition of insanity is doing the same thing over and over again and expecting a different result. What people do in enterprises over and over again is to continue building data model after data model, and as soon as you get that project implemented you get another silo. That's my idea of insanity.
The other thing that people do over and over again is what’s called legacy modernization.
They have a legacy system and put up with it until some vendor in that stack will no longer support the database. So, they have to get out of the legacy system but inevitably they go out and get another hideously expensive implementation, and then you've got a neo-legacy system.
Patrick Coughlin: Tell me, what's the difference between Data-Centric and Data-Driven?
Dave McComb: Data-Driven is the relationship of data to decision making. What data-driven is juxtaposed with is heuristics or personal judgment. In business, historically, decisions were mostly based on rules of thumb, personal judgment, etc. Replacing that with data gives you the opportunity to make more informed decisions over time, to evaluate whether your decisions were any good, and which factor really led into it. However, being Data-Driven doesn't mean you're Data-Centric. Data-Centric says that behind all that chaos is a single, simple, extensible model. There's a model of your enterprise that has less than 1,000 concepts in total, that connects everything together, and that you can extend into your existing systems. All the data that you're currently gathering and managing can resolve back to a pretty simple model.
Patrick Coughlin: So Data-Centric is the idea of embracing this unified, simple data model, an expression of the enterprise's ecosystem of intelligence or data. What is the difference between data and intelligence, semantically, from your perspective?
Dave McComb: Raw data is just like a number, like the number 42, and information is when you put enough context to that to at least know what it means. For example, 42 might be a particular person's age. You add more context and come to the conclusions that that means they’re old enough to buy alcohol. It’s a matter of laddering up until you get to where there's enough context that it's actually actionable.
Number 42 by itself isn't actionable, it doesn't mean anything.
Patrick Coughlin: One of the things that we talk about at TruSTAR is how the word intelligence is such a loaded word because it conjures up images of cloaks and daggers and back alleys. Really what we're talking about here, from a Data-Centric perspective, is data in context. That is what intelligence really means for the purposes of integration and automation in the enterprise. So, we have to move away, as an industry, from defaulting to definition number one and embrace definition number two, or else we won't be able to use words like intelligence constructively in these data centric conversations.
What's the role that you see between data and human capital gaps and automation? How do these things work together?
Dave McComb: There has been a skills gap for data scientists as well as for machine learning experts. Part of that gap is that both of these groups spend somewhere between 50 to 80 percent of their time data wrangling, which creates the skill gap. If 75% of their time is not doing their job, then you could've done it with one fourth as many people. The way they've dealt with that on the data science and machine learning side is to create another category of employee, called a Data Engineer, whose job it is to prepare all the data so that these scarce or more highly paid people can be more productive, which ends up treating the symptom rather than the cause which is complexity.
Patrick Coughlin: Tell us about that complexity and who is doing this well. Are there certain industries that have embraced the idea of being Data-Centric and who are adopting it?
Dave McComb: There are some that are on the path, but very few who have arrived. There's a division of Standard & Poor's that arrived there. Somebody once said, "There's two ways to get to the top of an oak tree. You can climb it or you can sit on an acorn." and these guys sat on an acorn. When they started, the company was fairly small and they had this Data-Centric core that they grew which is now quite a large division of Standard & Poor's, called Market Intelligence. They're about the largest, closest, almost completely Data-Centric company I know of.
Another organization is the Montefiore Hospital system in New York. They have all of their medical knowledge in a single, unified knowledge graph which is completely integrated. More importantly, they have every piece of data on every one of their patients (so hundreds of billions of pieces of data) in a federated knowledge graph. It's really quite impressive.
Patrick Coughlin: What is the magic sauce here, and what are the traits that you see for a Data-Centric leader who's pushing that program forward effectively?
Dave McComb: The main traits are a semi-isolated, relatively small team that is not subject to an immediate payback every quarter.
Somehow these guys have gotten set aside. They still have to deliver something every three to six months, but it's a small team over a long period of time. The Montefiore one was in the works for five or six years, at least. The Standard & Poor's one, almost 20 years. It's not one project that takes several years. It's every few months, there's more progress and value and deliverables, but it's a journey. It's going to take a long time to unwind all those silos and legacy systems.
Patrick Coughlin: It seems like this would lend itself well to enterprise security organizations, but enterprise security is late to this game. Why is it that?
Dave McComb: For the same reasons that almost everyone else is late to the party. There's not a lot of people in this party yet, and the people that are in it are about dealing with the complexity. They don't necessarily want to fix it.
Patrick Coughlin: You bring up this idea in your book of the high priests of complexity. What does that mean?
Dave McComb: Let’s say you start to build a spreadsheet and you keep adding stuff on and it gets more and more complex. You will almost always go just to the edge of where you can understand it and it's just about broken, but the edge that you go to is way beyond the edge of the next person you give it to. So you've worked on it for months, you give it to somebody else who's going to try and understand it in days or hours now which makes them reliant on you to repair it, and you ramp that up, not just at the spreadsheet level, but at the systems level.
The way we design, deal, think about, and integrate systems is that every smart person is working at the edge of their complexity frontier. They're dealing with as much complexity as they possibly can, with 10 hours a day, six days a week, and then they hand it off to somebody else who can't follow it. It happens everywhere.
Part of the problem is, with somebody who's wrestled with all that complexity for years, it's hard to convince them that it could actually be simpler.
Patrick Coughlin: What trends are you seeing that you're excited about? Are you reading any good books lately?
Dave McComb: Obviously one is blockchain. I think there are some applications where an immutable ledger totally makes sense. There are some firms now who are combining knowledge graphs with blockchain which I think is a neat combo.
There are so many parallels between collective intelligence and extended minds, and what we wrestle with in security and risk every day. To that end, I'm reading two books at the moment, one is called The Thousand Brains Theory by Jeff Hawkins, whose original claim to fame was inventing the PalmPilot. He's secretly always wanted to be a neuroscientist, and has dedicated the last 20 years trying to figure out how cognition really works. He has a company called Numenta which has been studying how neurons are wired in cortical columns and how they're organized.
The other book is called Consciousness In The Brain, which is about specific work done with stimulus and threshold to try and figure out what consciousness actually is.
It turns out that most of your brain is an incredible parallel processor attending thousands of things simultaneously. Until something rises to the point of attention where you're giving conscious attention in that moment, there's a different part of the brain that will only attend to one thing at a time. Thinking about this in terms of the Internet of Things and smart cities, you really want an architecture that is mostly doing a lot of parallel processing at the edges, and only occasionally passing a condensed message but doing it in a way, such that if the central intelligence needed to drill down, you go right back down to the source without any translation.