DataOps — Data, Quality and Engineering Meet

What is DataOps? Why is it important? And how does it differ from DevOps?

Our technical leadership team Rob Ulmschneider, Head of Advanced Analytics, and Walter McAdams, Chief Engineer, recently sat down to dive into this important intersection and its impact on business. Below is an excerpt from their conversation.

Ulmschneider: DataOps is part of the sliver in the Venn diagram where our worlds of data, and quality engineering overlap. There are so many definitions floating out there, but to begin, how do you define DataOps?

McAdams: DataOps is the continuation of the concept that started with DevOps, which is the idea of continuous everything. Data is such a powerful force that it’s critical to incorporate it throughout the application and operational lifecycles, and the way to do that is to ensure that your means of collection are continuous, embedded, and a byproduct of your development and operational processes. It’s all about ensuring that data propels and informs all aspects of your operations.

Ulmschneider: Why do you think it took a little while for DevOps principles to be applied to data management?

McAdams: Until you came into the age of big data and user-friendly data presentation, access, and tools, data wasn’t recognized as an asset, but more as just the electricity flowing through the wires. It wasn’t immediately obvious how valuable it was, as it was solely living in the departments that used it most — marketing or business research. But now we’re truly seeing that data is underutilized and is an equally powerful force as DevOps.

Ulmschneider: I’d add that we were able to get away without any sort of “Ops” approaches in the data world because the nature of the work allowed us to be siloed. There was a lot of freelancing, a lot of people developing reports on their own, and less collaboration. In modern data management, this is no longer sustainable. The volume of data streams is greater, the velocity is faster, and the streams are often continuous. Real-time is REALLY real-time in many cases, not just last night’s data. Our pipelines need to support that, and to do so they need the orchestration and efficiency that DataOps provides.

McAdams: That’s very true and just as there was on the dev side and the ops side, there was a lack of understanding of what each could do on the data side and the ops side. Ops people didn’t realize just how much operational value there was in including data and building real-time data collection. And data people didn’t realize the degree to which data could shape operational decisions. Just as leaders started realizing that all companies are IT companies, now they’re also realizing that they need to be data-driven to bring competitive parity. Otherwise, you’re in danger of losing that competitive parity over the next few years.

Ulmschneider: Do you think a company can be data-driven without being DataOps driven?

McAdams: You can try, but you’re going to be putting a lot more effort into data collection and analysis than you really should be. You’re going to be doing it as a separate process, which will be expensive and disjointed. It’s not useless, but it makes the data much less timely and increases the cost of collecting it. It also makes it harder for people to see the connection between what the data is telling them and what they’re doing. So I think you have to quickly build in a DataOps mentality.

Ulmschneider: Aside from the fact that one deals with data and the other with software development, there are differences between DataOps and DevOps in terms of how teams should approach them.

McAdams: Agreed. DevOps is inward-facing, very much focused on quality of product and quality of process. DataOps should be equally inward and outward-facing. A lot of the concepts of product and process quality are inspired by the manufacturing world. In that sense, DataOps is even more concerned with what is happening “outside the factory” than DevOps is.

Ulmschneider: DevOps, to me, has always been about bringing together, quite literally, Dev and Ops. The intersection of the business and tech side. When companies are starting with data, it’s tech-driven. There’s maybe someone on the IT help desk who takes the occasional report ticket. That work reaches critical mass and you hire a data analyst. At some point, you need to get the business side into the middle of that process. But data is technical and tech is always going to play a role.

McAdams: I would agree. Data needs to be both connected to, and independent of, both business and technology. You’re seeing the rise of Chief Data Officers because data is so central to everything a company is doing and also about to do that it needs an equal place at the table to other assets in the enterprise.

Related Reading: Advanced Analytics: The Biz-Tech Bridge

Ulmschneider: Where do you think is the sweet spot for a company to be in terms of where data is living and where they can start? A DataOps mindset is an important factor to start with, but how can they bring in some structure around it?

McAdams: It becomes possible when you’ve got sufficient automation and integration of your pipeline and your goal, and when you’ve established data governance. Data equals trust and data governance is the way by which we come to trust our data.

Ulmschneider: I’d like to go back to DevOps for a moment to use it as a case study for DataOps. When you look at challenges to implementation of DevOps in an organization, from establishing the cultural foundation to bringing in dedicated personnel, where does the barrier tend to lie? Is the business leader or the technical leader typically more of an obstacle towards advancing DevOps?

McAdams: I would say it’s the technical side. It’s human nature not to want to change when you finally get something working well. Technical leaders are more resistant when they get these things established, build relationships, build structures, processes, and routines around a certain way of doing things, and DevOps traditionally exposes the inefficiencies and unreliability.

Ulmschneider: And the business side doesn’t care about that change much if it will get them their data faster. Any change requires investment and business leadership cares about that, but I agree that they might be faster to see the value.

McAdams: Certainly. And you need that executive sponsorship, so that’s a good thing. But the technical side needs to understand the value as well because it’s going to make their jobs easier.

Ulmschneider: This reminds me of something I’ve cynically called the “in-flight magazine effect.” Imagine a CEO or other leader coming back from a trip, having read all about cloud in the in-flight magazine. And they come back to the office and go right to their CIO and ask “have you heard of this cloud!? Why aren’t we in the cloud!” DataOps strikes me as something that runs the risk of being one of those things.

McAdams: That’s why having executive sponsors to drive momentum is so important. Beyond that, you also have to get and measure wins along the way so that you can drive immediate impact, get everyone on board, and demonstrate the ROI. Start getting those elements in place and you build momentum, and it eventually comes to the point where the momentum is so strong that people don’t want to be left behind.

Ulmschneider: The status quo bias is such an important piece to call out. You’ve got this data professional or even a small team, and they have been siloed. And maybe they look at the engineering team with their DevOps approaches and they might think, “I’m so glad we don’t have to do all that, I can just develop a report and be done with it.” Then we get into DataOps and suddenly, “I have to check my SQL and my Python into a repo? We’re going to do code review? Someone else is going to touch my stuff?” It’s uncomfortable when you’re doing it for the first time. Some of us in the data world have been operating in a DevOps-like way for years, but others are new to it, and it can be a jarring change.

McAdams: I think that’s absolutely the case. We come from different perspectives because we come from different backgrounds. You’ve lived in the data world, whereas I’ve been more on the operational side and what I see is that I underestimate the degree to which the data people want to be their own world. They want to be able to do things their way, and they don’t necessarily want to be getting into the messiness of day-to-day operations. They want to have the ability to work where things are more controllable within their hands, and that I can understand. The trouble is that it comes to the cost. It means that anytime the data you’re going to be working with comes from a message. And so that ability to pull it out, bring it in, and continually start making it into something that is trustable and actionable, is benefited by getting your hands dirty.

Ulmschneider: You can look at data like a moving company. You could start a moving company with one person and get some business for customers who have a bunch of boxes that they need to put in a truck to move. But the first time someone has a big armoire, you can’t do that with your one-person shop. In the analytics world, it’s machine learning and advanced analytics that requires either a constant stream or some intense manipulation and integration that does require people of different skill sets and backgrounds. Sure, if you’re just doing some simple visualizations off a standard data model, you can work on your own, but that’s not the ceiling of what even a basic business intelligence team is focused on anymore. They are exceeding that, and it requires more of a team focus.

McAdams: Yeah, and that’s not going to be reduced if you look at what I call the changing definition of network or the changing definition of infrastructure. It both complicates the ability to understand and pull data from things, but also greatly increases the visibility, the understanding, and the wisdom we can get before we make decisions. We’re coming into the great age of data. The fact that we now have useful machine learning tools and AI algorithms that can make data collectible and understandable is a revolution. It’s going to be a huge competitive advantage for those companies that embrace the idea of DataOps.

Ulmschneider: So what is your biggest piece of advice to accompany that is looking to implement DataOps?

McAdams: Start from a common understanding and a common level of literacy. Once that’s in place, pick your targets and build sources and streams. They don’t all have to come in one at a time, and there will be some blind spots but the best way to get better is to practice and to start doing it with a DataOps mindset — collecting and analyzing continually and making this part of our feedback loop.

Looking to jump-start your DataOps initiatives? Ready to maximize the power of the intersection of data, quality, and engineering? Our technical leadership team is always available to answer quick questions that you have. Feel free to reach out to them directly here: Rob Ulmschneider and Walter McAdams.

DataOps — Where Data, Quality, and Engineering Meet