IoT Standards & Data Mesh: Utility Facility App Architecture
Vice President, Applications and Technology Architecture at Xylem
Never miss an episode
Grant Muller, Vice President of Applications and Technology Architecture at Xylem shares valuable insights on bridging end-to-end solutions while focusing on energy efficiency and data collection in utility facilities. He emphasizes the need to address cyber security concerns and highlights the significance of dividing the workload through a robust architecture. Grant also gets into his thoughts on Data Mesh and its usage in managing and scaling data effectively.
Join as we discuss:
Tim Veil: Well, welcome to another episode of Big Ideas in App Architecture. Today, I’m very excited to have with us Grant Mueller, who is Vice President of Software Strategy and Architecture at a company called Xylem. As we’ve done on multiple episodes, Grant, what we’ll start with is just a little bit about you, your background, how you got into the industry, and then we’ll learn a little bit about Xylem. As I said to you earlier, it’s such a fascinating company and industry and I think one that not many people may have heard of or will know a lot about. So, I’m excited to learn about what they’re doing and what you’re doing for them. But to get us kicked off, let’s learn a little bit about you. Maybe for those of you who’ll be watching on video, you can explain all that’s happening in your background. I know that won’t be fascinating for people who are only listening, but I think I’ve said to you before, you win the award for the most interesting background in the podcast thus far. So, welcome to the show, Grant.
Grant Muller: All right. Thanks for having me. Yeah, so I think like a lot of the people you’ve had on the show, most of my background is not related to software architecture and strategy. I fell into this kind of work. If you had told me when I was 10 years old, I actually started developing basic software, the thing you do when you’re a kid and you’re writing little knockoffs of scorched earth or something like that. I started doing that thing, and through high school, learned other stuff. I got really interested in digital signal processing and BST plugins for Steinberg Cubase and stuff like that. Then when I got to college, I had a bit of a background in digital and I really wanted to go into digital film. It’s end of the ’90s and movies like Toy Story had just come out and John Lasseter’s everybody’s hero now. I thought, “I’d go work for Pixar. How do I do that?” Go to film, I’m going to learn all this digital film stuff and Ray Tracing and RenderMan, all this thing. Of course, got out of college and there’s not a lot of work for somebody with that background. A buddy of mine from high school said, “Well, we got a job openings at this company I work for called Cellnet,” which is the most generic sounding company name ever in the early 2000s. So, I went to work for Cellnet and they did automated metering infrastructure. So, it was all remote meter read type of stuff. At the time, that’s early days of what we would probably call IoT now, I guess. It’s millions of meters deployed out there in the field. You’re reading them over a network and taking all that data and providing some minor amount of analytics and billing and stuff like that. So, that’s how I got involved with software in a professional way. Twenty years later, I’m still basically working in the same kind of industry. It’s expanded a little bit. I’ll tell you how I got to Xylem. About 10, maybe 12 years ago, I got involved with a little startup called Verdeeco. It was started by a guy named Brian Crow who lives here in Atlanta. I came on as the technical director, so basically building all the software. It was very, very small and we were focused on meter data analytics and SCADA and trying to get more data out of the system and more value out of the system, analytics to drive efficiencies and transformer loading and anything that could cut costs or improve sustainability. We were doing it in the cloud, which at the time was very blasphemy. You’re not going to be successful. Lo and behold, we had some customers, some very, very big customers. We eventually got acquired by a company called Cellnet who a few years later got acquired by a company called Xylem, where I am now. Xylem is one of the world’s leading technology providers in the water industry. So, we do everything from measurement to movement to treatment. There’s tiny little pumps, little pumps that we have in these Coke fountains or in boat toilets. Evidently, we have the Cadillac of bilge pumps for your boat. I did not know that. I mean I do not have a boat, but that’s interesting to know, the Cadillac of bilge pumps. You heard it here first. Yes. So, some interesting applications and the more you are around Xylem, you’re right. It’s interesting and amazing, some of the stuff that we’re doing and improving efficiency and sustainability and some of the applications are just outrageous and amazing. I mean, if you remember the cave rescue in Thailand. We had domain experts from Xylem on the ground there working with pumps to try and keep the water out of that so they can complete that rescue. FEMA uses our pumps. It’s a very big industry. Of course, Sensus provides water, electric and gas meters in the industry as well. So, extremely broad. Basically, we’re doing everything we can to sustainably keep lights on, water moving, and home heated.
Tim Veil: I think it’s really fascinating. I mean, you spend any time on the website. Certainly, you get the impression it has a lot to do with water, but there’s a lot there. As it turns out, water’s an important thing to people on earth. Clean available water is an interesting thing. Well, before we dig a little bit deeper into that, tell us a little bit about the background. So, I’ll describe it for folks who may not see it on video. I mean, it looks like 15 drum sets back there. This is not a fake background. This is his real background and there’s a lot happening. So, I’m assuming, I don’t know, going on a limb here. You like drums. You’re into music. Tell us a little bit about that so the people won’t leave the podcast wondering what was happening behind you.
Grant Muller: Sure. So, we lived in this house for 10 years now and this has been my music studio and an area to just kick back and play drums. It was not my office until the pandemic. When the pandemic came around, well, I got to set up shop somewhere, might as well be right here. It became a very convenient way. Actually, I started playing drums more than I had in a while, because the two minutes that you had between meetings or five minutes, if you happen to get that, just turn around and play some drums. So, it’s worked out and it’s a nice way to balance the hobbies that I have and the work. Now, I’ll be honest, you spend all day in here for a couple years. At some point, you’re like, “I don’t want to be in that office. No matter how great the drums are, I’m out here.” But there’s also a lot of kids' artwork on the walls. My kids came to kindergarten first and second grade right around the time the pandemic was going on. So, they’re bringing home artwork or doing it at home. I’m like, “Aren’t you supposed to hang this on your wall in the office?” Well, I don’t have one of those. I might as well hang it all up here. So, there it is on that wall.
Tim Veil: Oh, no, that’s awesome. Well, before we get into more Xylem, just one thing about your background that caught my attention was your desire, I think, to go into digital art or the film business. Is that something you’ve done on the side as well, or once you made the transition to other lines of work, it fell by the wayside? I’m always curious about people’s initial dreams or aspirations when they graduate college. Just curious what became of that.
Grant Muller: No, it isn’t. I remember right after I got out of school, I don’t know if you’ve heard of Blender 3D, it became an open source at the time when school were using 3D Studio Max and Maya. So, it was all the thing where if you needed a license for it, you better bring your checkbook. But this Blender thing came around. It was like, “Oh neat. I can still play with my keychains and ray tracing and rendering and stuff like that.” But no, once I got involved with real software development of assets and stuff like that at work, it became something that fell by the way, said, unfortunately. I’m an enthusiast. Any Pixar movie or digital animation that comes out, we generally watch it. So, I still love to see how the technology’s progressing and then the move to the uncanny valley and then back out again.
Tim Veil: Yeah, it’s funny. As a father of three girls, we’ve watched our fair share of Pixar and animated movies. Speaking of uncanny valley, what is that? Oh gosh, it’s Tom Hanks and it’s about a train, Polar Express.
Grant Muller: Polar Express.
Tim Veil: Very early, I think fully digital movie, but my goodness, that’s a frightening two hours if you look at their faces too long.
Grant Muller: Yeah, we were in the depth of the uncanny valley at that point for sure.
Tim Veil: I think you’re right. I think you’re absolutely right.
Grant Muller: What’s happening here? We came back out of it, and now, what’s interesting to see is water and hair. That was the most challenging things when I remember doing this in school 20+ years ago, water and hair. If you watch the progression of water and hair, it’s incredible how realistic it is. But then you have a cartoon face, oh, everything’s okay.
Tim Veil: Well, what always tripped me up, Polar Express, I thought this, but I mean others as well, is the mouth and inside the mouth. People are speaking and it just looks like, I don’t know, very unnatural. Very, very, very much the uncanny valley. Well, I know people aren’t here to hear about that. So, why don’t we transition back to Xylem and your work today? So software strategy, architecture. I mean, what’s the day to day like at Xylem for you, your role? You’re covering lots of different industries I can imagine and maybe you will hear about lots of different kinds of products and technology. So, maybe talk a little bit about what the day-to-day is, what some of the technology you’re working on is.
Grant Muller: I think one of the most interesting challenges at Xylem is that we bridge the whole end-to-end solution. A lot of times when you work with IT and IT providers and people that are in the industry, they’re normally thinking about it from the point that they get the data and they’re doing something with it, showing it on a screen, an UI or something like that. We’re thinking about it all the way down to the device that’s buried in the ground. There’s a meter out there at a pit in somebody’s basement. There’s technology involved. We have to think about from end to end. It’s challenging, but it’s also really interesting. You never get bored with the fact that maybe just looking at data streams and stuff like that, if there’s ever a point where you’re so tired of looking at data, because I just want to go… Somebody talked to me about signals coming out of a SCADA system or something different. You have that opportunity, but no, we call this operational technology space a big part of what we do. How do we bridge that is an important part of the technology that we’re developing all the time. Got to get it out of the OT network in a secure way. Obviously, cybersecurity and risk are huge concerns in the utility space or in the treatment space. So, yeah, that part of the technology is a part of the variables that we deal with and it makes it just a bit more challenging. I mean it’s not as simple anymore as just saying, “Yeah, just put it in the database and everything’s going to be fine.” Not to mention just the quantity of data. I think this is something that if you come from FinTech or ad tech or something like that, you’re probably used to click data and stuff like that. That’s a huge amount of data too, but the time series data coming off of some of these SCADA systems or out of the sensor network or out from meters, it’s a mountain of data.
Tim Veil: I mean, if you can’t share, that’s fine too. But I mean just what’s a ballpark of like the size? I mean are we talking petabytes of data? I mean is it hundreds of terabytes?
Grant Muller: It’s going to be hundreds of terabytes. We’re not quite into the petabyte range now. If you summed up all of the operations of all the utilities, you’re definitely going to be in the petabyte range and it’s going to become that pretty soon, because the desire for more data, obviously as soon as you get a little bit of data, they want a little bit more data. We have a little bit more data. We can get a little bit more data and there’s a point at which there’s diminishing returns. We haven’t necessarily hit it yet, but your physical models can get that much better the more data that you have. So, if you’re coming from collecting a sample once every hour, at some point, a data scientist says, “Well, if I had it every 15 minutes, I could give you a better result.” So you collected at 15 minutes, and then at some point, you’re down to 256 hertz and you’re setting up a whole lot of data. You’re getting the results that you want, the models, the improvement, the network, better efficiency, faster leak detection, the ability to localize a leak in a way that you’d have to go and dig a bunch of holes to try and find it. We’re isolating that to much smaller space now so that you spend less capital on where you go put a bulldozer, which is great. I mean that’s again the thing when you’re solving these kinds of problems, you can see the real world application. It’s not just bits on a screen. At some point, it’s somebody in the field saying, “Thank you, I found the spot. It’s a huge leak and I would’ve spent weeks trying to find this thing. Thanks for helping me looking.” It’s really cool.
Tim Veil: I wish and I’m sure it’s very different technologies, but my gosh, it reminds me of our next door neighbor years ago in our old neighborhood, had some leak, literally dug up the entire front yard, like the entire front yard with a backhoe 15 feet down, it seemed like. I’m like, “There has to be a better way to figure out what’s happening underneath there.” I mean literally destroyed the entire yard, and oh by the way, in the process, destroyed the cul-de-sac to get to it.
Grant Muller: Yeah, water and mud everywhere, I’m sure.
Tim Veil: Oh, yeah. It was disgusting. Then of course, he didn’t really plant grass afterwards and the whole thing looked awful for years. I totally understand where you’re going, because you’re right and I’ve talked to even on the podcast, but certainly in the industry, you talked to a lot of folks who aren’t necessarily worried about where the data originates or how it gets there. It may be time series, it may be sensor data, but they don’t care about that. That’s not their problem. They’re looking at data in an OLTP system or an OLAP system or other stuff. But you guys have to worry about the whole chain of custody. I mean the entire life cycle of the data. I know it will vary, I’m sure, between applications and solutions, et cetera. But you’ve got data coming from a sensor. That’s going to make its way into some internal system for processing. I mean one of the things I’m curious about it and maybe the answer is everything or everywhere or everyone, but who’s consuming this data? Is it internally to make better decisions? Is it other corporations? Is it consumers? Would I be a potential consumer for some of the products and services that Xylem offers?
Grant Muller: Yeah, that’s right. Actually, the primary users are obviously going to be the utility operator, the customer service agent or the engineer, field engineer. People that work within the utility that take our solutions, combine it with maybe some of their systems, an area that we’re trying to help them quite a bit because they have their own IT [inaudible 00:15:37] I guess described as of stuff that they have to cobble together. So, we don’t want to exacerbate the problem. Part of what we try to do is to take that information and provide it to them in a way that they can make better use of it, situational awareness and a little bit more in terms of their planning and being proactive. Yeah, I mean it’s the utility staff, the primary users of the software that we develop or the technology solutions we develop. After that, internally, we use a lot of this data for planning our own future, plotting our own course. We develop a lot of models, a lot of domain specific type of things related to treatment. The more data obviously that we have, the better we can expand on those models and make them a bit more universal. So, that would be the secondary. But we do also have products, especially in our metering space, where you as the customer of the utility also are exposed to the data that we’re collecting through either portals that we’ve created or partners that are taking that data and then showing it. You can see your interval rate and compare that to last month or last year. So, you get an idea of, “Am I using more power than I used to? Am I using less and when?” Having that hourly breakdown is an interesting thing. You don’t use a lot of water at home really. I mean you see patterns, but if you start to see weird patterns, you might ask yourself that there’s something wrong with my refrigerator. What’s going on with the compressor? Who keeps turning the air up? I keep putting it back down. Somebody’s doing it at 3:00 in the afternoon. I can see it. I can see it happens.
Tim Veil: I think we know the answer to that question. Universally, it’s the same answer. Yeah, this a fascinating area for me. I mean I think especially as an individual consumer of many utilities, it feels like it’s an area that’s ripe for innovation and maybe there has been plenty of innovation. It just hasn’t been rolled out uniformly or I’m not aware of it, but I know you guys are big into water and water is certainly an issue at our house. But yeah, electrical consumption just has always seemed rather opaque to me. You get like, “Hey, my bill is this this month,” but trying to get down to what is consuming that and how and really working to better our own internal usage of electricity has always been a bit of a mystery. I feel like we haven’t evolved much, at least in our house, beyond what my father used to do constantly, which is turn these damn lights off. Who the hell put this light on? Why is this light still on?
Grant Muller: Yeah. Then my understanding, we’re focusing a lot on electric efficiency obviously for the consumer as well. It’s something that that’s been top of mind in the electrical industry too for a long time. Even for the generating function, the generator and the distributor, how they can reduce the amount of electricity being used up there. You see a lot now in the space related to electric vehicle detection so that they can do infrastructure planning, because a lot of it is. Electrical usage, the demand curve has flattened a bit. It’s actually not as steep as it once was. Oddly enough, this is a thing that in the industry, you see a bit that’s not as steep as they had planned. It’s still increasing but not at the rate that maybe they would’ve predicted 20 years ago. There’s efficiencies around LED bulbs and better windows. That kind of thing is a huge contributor, air conditioning systems. But now that there’s electric vehicles back in the mix, trying to plan for the infrastructure that has to be in place related to the transformers, that’s something that the electric industry is pretty focused on now. Just that detection and knowing this neighborhood’s going to have 20 more electric vehicles show up, how do we make sure we have the infrastructure ready for that?
Tim Veil: Yeah, that really surprises me. I would’ve thought, I mean LEDs for sure and all the other things you mentioned, air conditioning, because I know that’s always been the biggest consumer of electricity, at least at our house. But yeah, I would’ve thought with the amount of electrical cars on the road that would’ve been rising rapidly.
Grant Muller: It’s rising. Also, that’s the space in water too, that the water companies, they have to use power too to pump water up. So, that’s another area where electrical efficiency for pumping water, how can we create the right duty cycles and pump efficiency curves that they can reduce their own electric usage because they have to buy power too. So, everybody needs power. You’re right, it’s definitely an area where more and more efficiencies can be driven, maybe even more so than water. Water, it’s leak detection, trying to drive down losses more than anything.
Tim Veil: Well, let’s talk a little bit about the tech stack if we can. What good would this podcast be if we didn’t start talking about architecture, Grant, architecture and technology? I know you guys have probably dozens of applications and lots of different stuff, but given what we were describing, hey, we’re starting at very early, the sensors all the way through to big utilities, their consumption of data. Maybe give us a high level view of how these systems are architected. Again, I know you probably have many, so maybe we pick one. I don’t know how best you want to describe it, but I’m so curious about how a company like Xylem handles this data. I mean, what’s that pipeline look like?
Grant Muller: Yeah, sure. So, if you’re talking strictly the data, let’s talk a little bit about what happens when a sensor is deployed in the field. It could be a meter or a sensor out, buoys that we put out in lakes in the ocean, very specialized technology that sits out there for a decade or more. There’s very few standards is the first thing I’ll say. In the utility industry or in this industry, there’s not a whole lot of standards to drive IoT in a normalized way. That’s becoming more common. You see lightweight M2M and standards like that, protocols like that becoming a lot more easy to come by. But in the past, you’re dealing with machines and devices that are out there sending things like Modbus or their own proprietary data streams. If you didn’t provide the technology, a lot of what you’re doing is trying to figure out, “Okay, where do I normalize this thing? Am I going to put a device out there in the field that’s going to normalize it to a gateway for instance and we’re going to turn that into an MQTT or a co-op stream and connect to a broker in the cloud? Are we going to deploy an MQTT broker or some device management system that understands lightweight M2M, or do we have to have a proprietary custom software stack all the way up into the head end system where we’re translating that now into business data?” It’s all of the above. We’re more so driving towards standards. You can see that there’s a maturity model in the industry as things start to become more digitally available that a desire for standards is there and modularity. So, that as a consumer, the utility doesn’t have to buy an end-to-end solution. They want to be able to pick from three vendors and have second sourcing and the ability to make trade-offs and all that.
Tim Veil: Just to put it into layman’s terms or at least for my understanding, I mean the problem that you’re describing is that all these sensors are emitting different formats in different technology stacks or are different formats, I guess, thus needing different stacks to interpret. So, there’s a desire underway to get all of these different kinds of sensors perhaps deployed in different places to speak some at least semi coherent and common language, so that every sensor doesn’t require its own ETL tool. That’s an interesting thought. I hadn’t thought about that. It’s like how close to the edge do you do that? Are you admitting in a standard or is it one hop or am I going to wait until it gets all the way to my doorstep before? That’s interesting.
Grant Muller: Obviously, the software centric way, the way you think about it is you’re trying to move into business objects or physical units where you can do it flexibly. You want to do that in the cloud where you can have a big ETL system or basically translation functions that say, “Ah, this is type of function. I just need to turn it into a meter read or an alarm or a signal that came off a SCADA system.” The SCADA signals, that’s another interesting one where there’s standardization in the form of tags and tag names and all that, but then the implementation is up to whatever the implementation engineers.
Tim Veil: What is SCADA? I don’t know what it is.
Grant Muller: Supervisory Control and Device Automation. I think it’s the full acronym, but you see it deployed in water treatment plants and in the pumping systems. It’s not new technology. It’s been around since the ’70s. Rockwell Automation and companies like Modicon, they created the Modbus standard. That’s actually one of the things you see a lot is these standards that emerged from a company and they just became the defacto standard because nobody else was driving a standard here. So, there’s a lot of that and you end up having to deal with some of that legacy. So, where do you make that translation? So the automatic IT and software centric approach is obviously, we want to transform that in the cloud. It was very flexible. We can do that here. But one thing to keep in mind is if there’s a drive towards more automation at the edge and a desire for more of the edge to make decisions on its own, they have to know the physical units. This device out here on the edge, that’s a gateway that just received a pressure reading. It can’t be opaque. It can’t just be a number. It has to know this is the pressure value. It’s 120 PSI and I need to change this valve to adjust for the fact that the pressure’s too high. If you’re doing all that in the cloud, you’re making these back and forth transactions that take too long. If you think about long term, is that how we believe our data systems are going to work?
Tim Veil: I’ve also been wondering, with Cockroach, obviously, one of the things we talk about all the time is resiliency and high availability. I mean I would imagine in systems like this possibly deployed in places where communications maybe aren’t as reliable as they are for you and I, waiting to send a signal that needs immediate action all the way up to some centralized hub to process and push back down if those comms break, maybe some important action isn’t happening at the device. So, that makes sense. So, in other words, getting intelligence is close to the edge as possible. Something I would imagine in at least some cases would be incredibly important.
Grant Muller: Yeah, it’s definitely case by case. I mean in your metering space, it’s probably less important. In your SCADA OT networks where you’re trying to drive maybe a water treatment plant action, you’re probably looking at something that needs to happen on the edge. Some of it is very driven by cybersecurity and risk as well. I mean unlike many industries, the adoption of cloud is I would say slow in comparison to maybe many of the industries that you’ve encountered on the podcast so far. There’s still a desire to have those on-premise systems because there’s the possibility of a cutoff or just cybersecurity risk. You don’t want anybody to be able to get into that.
Tim Veil: This may be a touchy subject, and if it is, we can punt. But I mean I am curious, because you read in the news about security around utilities in general. I mean, at least in some part, that’s physical security, but I think that the risk to cybersecurity and attacks there is important across all major pieces of infrastructure. I mean what are some of the things that you all worry about?
Grant Muller: So we take cybersecurity risks very seriously and it’s one of the reasons you talk a little bit about architecture. We intentionally architect some of our systems in a way that we can divide the workload and that the connector piece and the most sensitive parts, we can deploy those on-prem, but even if we deploy them in the cloud, we are very, very hardened. Our cybersecurity, we try to drive compliance though with the highest degree of cybersecurity standard, because it is something that our utility partners, they take seriously and we have to as a result make sure that we don’t… Obviously, the last thing our CISO wants is to be in the news, because one of our networks was penetrated and somebody shut off chlorination or something strange at a water treatment plant. So, that’s the kind of thing that we are trying very hard to make sure we protect with the technology stack that we’ve created. A lot of it is trying to divide the operational and transactional workload from the analytical workload so that the folks that need data and they just are running queries and trying to do high level modeling and they’re trying to do physics-based generative networks on a hydraulic model. You don’t need to do that over here where the operational workload is very high and it needs this set response. Also, the cybersecurity is.
Tim Veil: Select star from operational database.
Grant Muller: Yeah, so no, we divide those up. We try to divide those up very cleanly so that you have this notion of many operational systems doing what they need to do, translating different standards from different sensor types and then unifying it, making decisions local to the network it’s operating in. But then the data is over here en masse and we can combine that with other operational systems across our whole deployments. It’s not just one utility but many and drive better models that we can then send back down to our operational systems. So, that they have better heuristics, better hydraulic modeling capabilities, better pump efficiency curves, and all that. So, it’s a good cycle to get into. The data comes here. There’s summations and we create a lot of smaller ways of processing the data and we send it right back to the operational assistant to do its job.
Tim Veil: Again, it’s something if you can’t answer, I totally understand. But again, working at a database company, I’m always curious about the kinds of technologies. I mean you’ve mentioned two already, I mean at least two big buckets. You’ve got some transactional systems perhaps and you’ve got some analytics or OLAP type systems. It sounds like doing the right thing by the way, by separating those two. We certainly find lots of folks and maybe you’ve seen this in your past where you know do like, “Hey, here’s all my operational data.” We don’t want to necessarily drive reporting out of that, especially for critical systems, because things get haywire. You’re just curious about what data back ends.
Grant Muller: So in our operational space, it tends to be your typical RDBMSs, tried and true SQL server, Postgres, your typical databases that have wide support. Because if you do deploy them on prem, there’s probably a database administrator there that knows at least enough to administrate that database. Obviously, if you get involved with a utility large enough, they’ll want a specific database technology, because they already have that expertise in households. Some of what we’re doing is just making sure it’s not necessarily JPA or Hibernate, but ensuring that we have the ability to operate on a different data repository if that’s the driver. We’re seeing less of that now than we used to. I would say 10 years ago, that was normal to see it. Now it’s a lot less typical for a utility to select the database for you. On the other end of that, if you come up now to our analytical systems, those are almost entirely cloud. It’s hard not to take advantage of the separation of compute and storage. It’s too cost-effective. We used to operate a Hadoop cluster years ago and obviously migrated out of that within a couple years.
Tim Veil: Well, I used to work for one of the big Hadoop providers and interesting technology back then. I think the cloud’s taken a lot of that, but yeah, I used to dabble a lot with Hive and HBase and some interesting stuff there. So, yeah, I always like hearing those stories.
Grant Muller: Yeah, we did a lot with Hive and HBase. It was actually that need for us to combine. If you’re talking about hundreds of utilities using your product and they all have their own operational database, how do you run a query across all of those? I know the buzzword right now is data mesh, but you’re not going to data catalog every one of those and pretend to run a federated query or something like that. It’s got to go somewhere else so that you can run broad analytics across it. So, the only real cost-effective way to do it anymore, the expectation especially is cloud compute. So, we use a lot of AWS and Azure actually for that. AWS is probably our primary data lake provider at the moment, but we have the ability to operate in multiple cloud providers. We’re taking advantage every bit that we can of some of the data lake house patterns as well where we’ve got lots of parquet storage and then creating smaller data warehouses that our customers might even have access to.
Tim Veil: I feel like I’m going to embarrass myself by asking this question given that I did use to work for a big data company, but we didn’t use this term, but I’ve seen this more recently. I just haven’t paid attention to it, lake house. I remember data lake we used to talk about. I guess is this some combination of warehouse lake? Will you kindly without embarrassing me and making me feel super dumb, tell me what a lake house is?
Grant Muller: Yeah, so if you think about the progression, there was data warehouses. The ’90s and early 2000s, everybody had these data warehouses set up, propensity to use highly normalized and flattened stuff. Then you moved into the data lake and it was, “Oh, my God, it’s all this data and we can run high queries and it’s unstructured so now we can schema on read and all this stuff.” The drive was to centralize the data, just get all the data pushed into that one central place. I think we experienced this with our Hadoop clusters, but certainly others experience the same thing. When you centralize the data, now you have this problem of trying to describe it to everybody and I only need to do this one little query. This is a huge haystack. I’m looking for two needles and you’re going to tell me it’s like a three-hour query though. Is there a better way? So I think what I’ve seen with the lake house is it’s literally just taking the data warehouse concepts and making them a part of the data lake. So, you have this concept of a huge haystack of data. Well, let me pull off 200 needles and I’ll structure them in a way that you can now do your business queries. So, we’ve got a technology that we developed recently that’s called Utility Data Lake that we actually offer our customers. It’s exactly that. We’ve taken all their data. We’ve got an unstructured data store that we’re using and now we can offer to them if you have BI tools, plug them in right here. We can house this for you, very cost-efficient. So, that’s the next progression. I think if you’re following some of the data literature, you see this data mesh thing happening next. How do we have many data lake house that we can better rate and stuff? Still watching it. It’s certainly an interesting concept, but also from a data governance perspective, it’s a lot of work.
Tim Veil: Yeah, I mean that was certainly the case when I was involved in it however long ago was last 10 years. I mean I would’ve imagine, especially given the workloads you described where potentially from lots of different sources, I mean data governance, just understanding, like you said, what’s in the haystack can be really, really difficult. Earlier, you mentioned two technologies I’m particularly familiar with. I don’t know if this is something it’s used a lot at Xylem, Hibernate and JPA. Are those technologies you guys use? I mean do you guys use a Java shop or a little bit of everything?
Grant Muller: We’re a little bit of everything. I would say the dominant technologies for coding are Java and Python. We use a lot of Python. A lot of the domain experts, especially if they’re coming from a civil engineering background, hydraulics, a lot of the stuff that they’ve built up is Python. So, we’ve seen a growth of Python in our stack to handle all of the different kinds of data modeling needs that they have. It’s a pretty versatile language to use for those kinds of things. There’s a ton of data science libraries also that have built up over time on SciPy and Pandas and stuff like that. So, it’s very worthwhile there. Whereas Java, I don’t know that they’d quite excelled in that space, but certainly, for a lot of our backend services, fixed services that have a job, you’re talking like a domain driven design thing. It’s going to do this thing and it’s only going to do this thing. Just write it in Java. It’s super-efficient and the number of Java developers that you can have maintain it and help you now build on it.
Tim Veil: Yeah, I mean Java’s been my language ever since I started. So, I keep threatening to try and do other things, but I haven’t quite gotten there. But do a lot of Spring, a lot of Java, a lot of Hibernate, lot of JPA, all good stuff.
Grant Muller: It’s funny the degree to which you learn a language, and at some point, you’re like, “I want to learn Rust. Let’s learn a little Rust.” I don’t know if you’re the same way I am. Rust for Java developers, just tell me how this works if I’m coming from a Java background. I want to know what an interface looks like and I want to know what generic looks like. How do I do that?
Tim Veil: Maybe it’s just because I’m getting older, I don’t know what’s wrong with me. I have these grand visions of learning new languages. I really do. I want to make … Cockroach, right? Cockroach, for example, written in Go, goes apparently this hot language, it’s very approachable. Now somewhere back here on the shelf is my Go book. In fact, I ordered the Go book and they shipped me two. So, I have one in my bag that I carry with me and I have one on the bookshelf. I don’t read either of them and I try. I do. The Cockroach people listening are probably grimacing. I try to read the code and make sense of it and I can, I guess. Why? Why do I have to have all these new languages? Rust is one in particular that that’s all the rage right now. You hear Rust all the time. Why? For what?
Grant Muller: Yeah, I learned a bunch of languages at one point and I remember reading the quote. It was, I think, Alan Perlis. I think it was a language that doesn’t change the way you think isn’t worth learning or something like that. So, a lot of them, I don’t bother with, because there’s still some object-oriented language. I’m not going to learn anything from this. I learned Haskell and that was one of those jarring experience where everything is so different that there’s no Haskell for Java developers. No, this is just Haskell. This is Haskell for mathematicians if you want to work it that way. So, yeah, I feel like I’ve given up on the same thing, because you start to wave it up and say, “Well, I can just do it in Java and that’s the language I know, so I’ll do it.” If somebody wants to translate it to the rest, they can.
Tim Veil: No, aren’t we bad people for thinking that way and doing that? However, I learned Objective-C, so I built a game in Objective-C for my kids on the older iOS devices before they came out with Swift. That was actually a pretty jarring experience. I mean not just the language and the syntax, but you know really had to use Xcode as an ID and I’m so used to JetBrains and IntelliJ and things like that. They at the time did have an Objective-C thing. So, I made that transition. But man, everything about that was different for me.
Grant Muller: I don’t know. I never went through a major visual studio. I did some backend programming in C-Sharp but never the front end type of stuff. So, Xcode to me feels like just a beast. You open it and there’s a storyboard and there’s all kinds of stuff. So, where’s the code?
Tim Veil: Yeah. Can I just go to the code now please? Thank you for all this. Funny. Well, I know we’re running up on time and I don’t want to take more than we are allotted. So, maybe some things to cover as we wind down. Again, maybe this is a personal thing, maybe it’s stuff you all are building in Xylem, but what are some things you all looking forward to? What are some exciting things on the horizon in your world that you can share?
Grant Muller: Yeah, so we’re on the precipice of really driving broad digitization of the industry. I guess you can say if you look across the water industry is not broadly what you would call digital at this point. It’s a bit of a word on the tongues of a lot of IT administrators inside utilities, but even inside industrials, it’s not digital like you would think of in tech or some of the other industries where they’ve been digitized for a long time. That’s a big driver for us right now is getting everything inside the utility digital. That doesn’t just mean taking what they have in the field, plugging it in and showing it to them on a screen. It’s really making their lives digital, their work lives digital. So, it’s taking what they would’ve thought of as a normal business process of going out to a facility, putting eyes on the asset there, and saying, “Yup, everything looks good,” and driving back to the office to tell their boss. How do you change that into a truly digital workflow where that is pretty unnecessary? You drive efficiencies, drive down costs, and start to use that personnel to do more advanced stuff. Deploying more infrastructure, having more money to improve the infrastructure and to spend money on the capital and to really put those folks to better use, instead of just driving out and putting eyes on a thing. So, that’s a big part of what we’re doing now is really try to transform the utility space into a digital operation and not just an operational one. That’s going to require all the technology that we’ve talked about, SCADA and edge devices and IoT and operational platforms and cloud platforms, but also just rethinking how the operation takes place, how the business does the work. I think the next challenge for us is really examining the workflows with the utility, partnering with them to understand what’s the driver for them and how do they do it today and not how do we map that, but how do we change that.
Tim Veil: Interesting. Now that’s great stuff. What about for you personally? Any exciting things on the horizon? Any new books, any new movies?
Grant Muller: We haven’t talked about this at all. I’m actually an ultra-runner too, so as an aside.
Tim Veil: Really?
Grant Muller: Yeah, I have a big race coming up in July, so I’m going to go do a bunch of running and maybe why I don’t have any time for Blender, because I spend all my time running at this point.
Tim Veil: Now, ultra is how long?
Grant Muller: Anything over 26 miles. So, my next race in July is 100 miles. It’ll be my 11th 100-mile race.
Tim Veil: Your 11th 100-mile race?
Grant Muller: Yeah, this is my favorite race. I’m even going to give it a plug. High Lonesome 100, it’s the best race in the country. It’s in Colorado. Super beautiful. It’s the fifth time I’ve done this one and I just can’t wait to be out there.
Tim Veil: Oh, that’s awesome. You are a much, much better man than I struggle to find motivation to move my arms and legs a lot, let alone go run. So, that’s awesome. Well, Grant, listen, this has been really fascinating. Learned a lot about an industry I didn’t know about, a company I didn’t know a lot about. So, it’s been a real pleasure hearing from you. Hopefully, we have you on again. I’m sure there’s lots more we could talk about. If we can’t have you on the show, you’re an Atlanta guy, we can always meet for lunch.
Grant Muller: That’s true.
Tim Veil: So again, thank you very much. Glad to have you on the show. Look forward to talking to you again.
Grant Muller: Thanks for having me. It was great.
Tim Veil: Thank you as always for listening to Big Ideas in App Architecture. If you haven’t already, rate and review the podcast wherever you’re listening and check out our YouTube page where you can watch every episode. Thanks. Bye.
Big Ideas in App Architecture
A podcast for architects and engineers who are building modern, data-intensive applications and systems. In each weekly episode, an innovator joins host Tim Veil to share useful insights from their experiences building reliable, scalable, maintainable systems.
Host, Big Ideas in App Architecture