The data behind digital marketing: A conversation with Bluecore’s Software Architect

Mike Hurwitz

Software Architect at Bluecore

Never miss an episode

Spotify
itunes
google

Bluecore’s retail marketing platform is trusted by companies including CVS Pharmacy, Neiman Marcus, and others to transform data into rewarding customer journeys.

In this week’s episode, our host David Joy sits down with Bluecore Software Architect Mike Hurwitz to learn what it takes to build architecture capable of powering some of the world’s biggest marketing campaigns.

Join as we discuss:

  • The technical decisions behind Bluecore’s architecture
  • Why Bluecore prioritizes bandwidth over latency
  • Effective ways to work with AI/ML in marketing
  • Mike’s perspectives on Visual Basic, C#, Scala, and Go
David Joy: What is up, everyone, and thanks for tuning in. In today’s episode of The Big Ideas in App Architecture podcast, we speak to Mike Hurwitz, who is the architect at Bluecore. Mike’s had an amazing career spanning companies like Tumblr and Shutterstock. In this episode, we get into Mike’s journey in tech and get into some really cool details on how Mike and his team are building and designing solutions for retail brands that help them make critical decisions for their consumers, so pump up that volume and get ready for an intriguing conversation with Mike Hurwitz, and also, happy New Year to all of you. All right, Mike, we finally made it to the podcast. How are you doing today? Mike Hurwitz: I’m doing great, how are you? David Joy: I’m all right, man. I love doing these on a Friday because you’ve gone through a successful week and then you talk to somebody exciting and that motivates you for the next week, so I’m really looking forward to talking to you, actually. Mike Hurwitz: I’ve been looking forward to this now. We were trying to do it in December and I’m so glad we get to do it now. David Joy: Yeah, it’s awesome. Do you know this is actually the first recording that I’m doing this year, so you are my first guest for 2024. Mike Hurwitz: Oh, I’m honored. David Joy: Yeah. Mike Hurwitz: Let’s kick it off right. David Joy: I know. The way I joke about this is, if we get this one right, the rest of the year will be perfect, so you have a lot of pressure going on to be an awesome guest. Mike Hurwitz: I’ll do what I can. David Joy: Awesome. For everybody listening in, welcome to 2024, the first episode of the Big Ideas in App Architecture Podcast. Today I have with me Mike Hurwitz, who is an architect working at Bluecore and has done some interesting work at companies like Tumblr and Shutterstock. Now, before I jump into talking about all the amazing things that, Mike, you have done in your career, let’s start with a very simple question as to why did you decide to jump into the world of tech, and do what you do today? Mike Hurwitz: My father was one of the few people that he knew who ever had done any software development at all. While he was at the University of Maryland in 1964, I want to say he was doing some programming. When he went out into the business world with my grandfather, he was very big into, how do we automate everything? How do we automate our business? He was bringing computers into his office in 1978 or 1979. When I was a little kid, I went to his office and got to write in Basic on his IBM machines that were littering the office. Then when I was a kid, we had all of the common early ’80s, late ’70s computers. We had a Timex Sinclair and a TRS-80 and all of those, and then finally got a PC. I was like, “Oh, my God, this is amazing.” I was writing in Basic and a bunch of time went by and I didn’t think that this was a career that I wanted to do. When I was in college, frankly, I was too much of a coward to apply to the Computer science school, so I don’t have a computer science degree, but when I got out of school, I went to a job fair and I thought I was going to get an IT job because I’d been building computers for years just because it was something to do and it was fun. Everybody I talked to in line was looking for that kind of a job, so I threw my resume into a software development job. I was like, “Okay, I can do this, and I’ve got to do something.” I was maybe in my second week of work. I was like, “Oh, wait, I really love this.” This was in the late ’90s, so it was very easy to get a job. It was a lot harder to get a good job, so I had a bad job. I was working in, going back to my roots, working in Visual Basic. I was standing one day by a printer waiting for something to come out and there was, I think, the second volume of The Art of Computer Programming was right by my head. While I’m waiting for my printout, I just start flipping through it like, “Okay, this is awesome.” I found the rest of the volumes littered around the office. One was holding up a desk. One was turned around in a bookshelf and I found them all. I was like, “Okay, this is great,” and that’s when I realized, this isn’t just something I want to do as a job. This is actually something I love doing, and worked really hard to get better at it. It takes a particular kind of idiot to not figure that out until they’ve already graduated college, but I am that idiot. David Joy: That’s amazing. One of the things that fascinates me about everybody’s story is especially origins. If you’ve been a fan of comic books, and I’m a huge comic book person, we all want to know how a superhero, what was the beginnings? What made him? It’s funny when we think about it, when we, people who work in tech, especially people who are working on solving complex problems, building these amazing applications that can scale, that solve the problem for millions of people in the world, what you are doing is a superhuman effort that we don’t really realize. I always have this interest in knowing how somebody got the passion that they got. It’s really interesting that you brought it up. The other funny thing that I really enjoyed hearing from you right now was the concept of looking at a book and reading a book to learn programming, because look at what’s happening nowadays. With all the access to the internet, obviously, and now we have AI that helps people code and learn coding, such a different world right now, but it’s been a fascinating journey. When you started off your career, Visual Basic, what was the next programming language you jumped into from there? Mike Hurwitz: I found myself, I was at a consulting shop and we got contracted to a company that was trying to do a project in this new language that wasn’t even out of beta yet called C#. I spent about the next 10 years, I did a bunch of different stuff. I did a little bit of C++. I did enough Java to get by, but the next roughly 10 years, the first three years was mostly Visual Basic, a little bit of this, a little bit of that, but mostly Visual Basic. The next 10 years was mostly C#, and I’ve got to say I really loved C#. I still think it’s a great language. I haven’t worked in it in a long time, but I learned a ton about how to be better at this game from C#. David Joy: Yeah, I think C# programmers have learned really great concepts that allows them to abstract with other languages and deliver really good quality code. Mike Hurwitz: The thing about C# that I thought was really great was, because it wasn’t the first language like that, they were able to learn a lot of lessons and they were very structured about how they built everything. If you read the .net framework design guidelines, it’s not the only way to do something, but they were able to explain, this is why we chose to do it this way. This was the tradeoff we made. In some places, actually, they even say, and we regret it because of X, Y, and Z. There was also, when the parallel framework came out, the guy who founded Pulumi wrote a book called Concurrent Programming on Windows. Again, fit really well into that ecosystem. From C#, I was able to access all of this amazing low-level stuff that otherwise, I wouldn’t have really understood what it was doing and why. When I stopped working in C#, all of those concepts still applied. I’ll say I developed a taste because when I’ve worked in languages that were more evolutionary in how they came up, and I’ll take PHP as an example of that, I’ve always been really frustrated. I don’t think I’m the only one to say I was frustrated working in PHP, although I know a lot of people who love it, but when you compare that to, say, the work that I’ve done in Scala or Go, which again, are languages that were deeply, deeply thought out, it really shows, and I really appreciate the opportunity when I get the chance to work in languages like that. David Joy: Oh yeah, yeah. You brought up a language, Scala. I was getting into Scala about eight years ago especially. Mike Hurwitz: That’s about the right time. David Joy: Especially when we were doing some stuff at Spark and our programming with some data science-y stuff, but I got out of it very quickly because I chose to go the Python route and stayed on Python. Do you still code with Scala, or will you use it? Mike Hurwitz: I haven’t written Scala in a couple of years, in part because I just haven’t been in an environment that had a lot of JVM in it. When I was at Tumblr, though, that’s pretty much all I did. I found Scala to be a little bit harder to learn than maybe some of the other languages that I’ve worked with. It bent my brain in some ways, and there were definitely times when I was like, “I know this works and I’m not really sure why, but we’re just going to roll with that,” but I happened to be working with some really, really good Scala developers, and getting to see what you could do largely immutable and highly concurrent and really fast, it was really eye-opening for me. It was a different mechanism to get to concurrency than what I’d used before and I liked it. I have to say I don’t love Scala as a language, but that’s mostly because I don’t like how not great at it I am, but I loved what I could do with it, and some of my friends at Tumblr built amazing frameworks for us to work with. David Joy: Yeah, it’s amazing. I wanted to ask you, I know we went this programming route, direction with some of the questions, but what’s your point of view on Go right now as a language, and do you dabble with it? Mike Hurwitz: I do a lot more than dabble. After Tumblr, I went to Shutterstock, and at Shutterstock, I learned Go. It’s just because the system I happened to be working with happened to be written in Go. As coming from a Scala environment, I was like, “Oh no, where’s all the structure? Where are my algebraic data types? Where’s,” insert whatever reason people hate Go? I’ve been using Go now for quite a while, and I’ll say that it is my most comfortable language at this point. I’m a fan. I think that Go, first of all, I think it’s very well-structured where, again, people really thought hard about what this should be and then built it rather than, they built most of it and then, we’ll figure out the rest later. The other thing I’ll say about Go is the compatibility guarantee in an environment where you’ve got code that’s going to live for a while is really great. It is amazing to say, “Okay, I’m going to upgrade my version of Go. I’m going to run my ciSuite and it’s probably going to be just fine.” It’s really wonderful. I love that. There are definitely times where I miss the powerful type system that you had in languages like Scala, but I don’t know. I think that people get a lot more religious about Go than maybe it deserves. David Joy: Yeah, that’s what I’ve been noticing, too. I don’t know if you know this. CockroachDB is completely rewritten PostgreSQL stuff in Go. We wrote the entire database in Go and I have been, since I joined the company, been dabbling with Go On the side. I had my own apprehensions about going and trying something because you either love it or you don’t love it. You can’t be, there’s no middle ground here, but I’m starting to enjoy it a lot. I don’t use it often, but whenever I get to use it, I feel like it’s pretty neat and does some interesting things. Mike Hurwitz: I was just going to say, the thing about Go is, the way people are using the language has really changed. When I first started writing Go, it was like, “I don’t care what you’re doing, a channel must be the answer. There will be a channel in your program. I don’t care what.” Channels are great. There’s a really strong academic history behind CSP and what it’s all about, and it’s a great tool. It’s not the only tool in the world. Once the Go community figured that out that channels are a sometimes food, then I think working in Go got a lot better. A lot of the criticisms that people had about the inefficiencies of the language and bringing in concurrency controls when you didn’t need them, a lot of that fell away. There are still some things that I miss. I miss iterators. Iterators were nice, but you can fake it. David Joy: Yeah, no, we should go back and give this feedback. Mike said, “Iterators need to come back.” Mike Hurwitz: Oh, don’t worry. I am not the only one. There is an open issue. The Go team has commented on it many times. They don’t need to hear from me. It’s cool. David Joy: Okay, that’s awesome. If you’re tuning in, for everyone listening but tuning in, let me just reiterate that Mike is a software architect, not a software developer, but has great skills in programming languages from what he just shared. Let’s dive into a little bit more, Mike. Why don’t you let the people know a little bit about your current role at Bluecore. What are you working on that’s making, that’s changing the world, I would say, for retailers around the world. Also I would say, expand on what Bluecore does as well. It’s like a platform that you have developed for retailers that many people do not really know about. Mike Hurwitz: The way that people have thought about digital marketing, and Bluecore is a digital marketing company. Let’s start there. Bluecore as a digital marketing company is a little bit different than the way people were maybe thinking about digital marketing a couple of years ago, 10 years ago. Bluecore actually started 10 years ago, but what was popular at the time. The way that people think about digital marketing is, I am going to learn absolutely everything about you. Then once I know everything from what you had for breakfast to what your favorite shoes are to the color of your underwear, I’m going to know what music I should market to you or I’m going to know what experience I should market to you. As it turns out, people are a little bit more complicated than that and really separate their lives. The way that I shop at one particular retailer may be very different from another retailer, even if they have overlapping product lines, just because I want something different from them. Bluecore is not the kind of platform that follows you around the internet. If you go into a shoe store and you, say, are looking at a pair of shoes and someone comes up to you and says, “Hey, can I get that for you in a size 10?” That’s not weird. That’s working at a shoe store. When that person grabs a clipboard and follows you from the shoe store to the grocery store to your doctor’s office, that’s weird, and that’s what we don’t do. We have first-party data, so you go onto a site. You do a bunch of behaviors. We track those behaviors and those behaviors are used only by the site that you’re on. If you go to Nike and you look at shoes and then you go to Adidas, what you did at Nike is completely separate. If that weren’t the case, honestly, I probably wouldn’t be comfortable working here. Now that we have all of this data and we can start drawing inferences about who you are and what you like and what your shopping patterns are, we’re focused on digital marketing, which means we’ve got a site product, but our top line product is really email. When do you look at email? That’s something that’s important. How likely are you to look at two emails in a day? That’s also pretty important. What is, for you, the most important kind of communication to receive? Some people really respond to things like an abandoned cart while somebody else might respond to a discount. Somebody else who doesn’t care about discounts but only wants the new stuff, we should know that, too, and we do. Now that we have all of that information and store it in a very large database, as you might imagine, we then schedule out communications and send them out to various recipients. One thing that makes this different from some of the other systems that I’ve worked in is, in other things I’ve done, users were either completely unidentified, we don’t know who they are and we probably never will, or they’re completely identified. They’ve logged into the system or they have their own username, or the only thing that matters is a session. Bluecore straddles that line. It’s very difficult to know the absolute truth when you don’t know what a person is. Imagine that you come to a site. Let’s say it’s a site you’ve even shopped at before, but you’re on a different device or you cleared your cookies. For whatever reason, we don’t know who you are. You come in, you’re doing a bunch of stuff, you look at some products, you add to your cart, remove from cart, search, and then finally, you decide you’re going to buy. At that point you’re like, “Oh, if I want free shipping, I’d better log in.” By the way, we have a thing to capture login, of course, because that’s part of this business, but once you’ve now logged in, we have all of these events, which we’ve already committed to our database, but they’re now associated with you. Oh, okay. How are we going to do that when we’re using a log-structured database? We’re using a columnar database. For what it’s worth, we’re on GCP, so we’re using BigQuery, but you could say the same thing about Redshift or Snowflake or a million others, Sybase IQ. We can’t update. That’s not really a thing, so how do those associations get made? Oh, but wait, it gets worse, because if you’re going to commit some behaviors, maybe you do some stuff and maybe even you log in. Then you put your laptop down and you leave the room and let’s say your significant other comes in and they pick up your laptop and they start doing a bunch of stuff. They actually go to log in and say, “Oh, I’m logged in as David, let me not do that. Let me go log in as me.” Now all of a sudden, we have a bunch of behaviors that need to be separated from you. Maintaining that is pretty complicated. Another thing that makes this complicated is the marketers and the merchandisers are different people. They’re different groups. If we, for instance, get a feed of product information, that product information is very hard to keep up to date. We actually use our observed values as people are going through the site, and that’s the data that we use to determine what the state of truth is about product data. We’re taking raw behavioral data, which we still need. That’s still very important. Who did what when, still very important, but then figuring out, what is the universe of shoppers? What is the universe of products? Those are actually derived off that event feed. One of the innovations that I think really started Bluecore was recognizing that feeds of these things come too late, so maybe we should build a system that doesn’t require that, and that’s what we have. David Joy: Got it. It’s super interesting because what you described to us is something that we as consumers of these retail applications or websites don’t really think about, but then you, on the other hand, are enabling these retailers and marketers and merchandise people to make the right decisions that eventually help me make the right decision quickly. In many ways, it’s helping me because of what you’ve done. Mike Hurwitz: What we try to do, if things are going the way they’re supposed to, the communications that you get from our partners via Bluecore should be the ones you want. They should be ones that you want to open and you want to action on. If we’re sending you stuff you don’t want, that’s actually a really bad thing, because you’re much more likely to do things like either opt out from SMS or unsubscribe from email, and we don’t want that. You’re more likely to say, you know what? I don’t care what they say anymore. When that discount code that they really want to use to get you to go buy something can’t get to you, then everybody loses, so we try to be, and when our partners are using us in the best way, we try to be very targeted in what we do. Now, there are definitely partners that we have that will happily and especially during the holiday season, send you three emails a day, happily do that, fill your inbox. It is a valid strategy. It is not the strategy that we optimize for. David Joy: I think that behavior also changes. Every year, consumer behavior changes and the way we use our application changes. I was just joking with my wife. I have stopped using my email as much as I used to two years ago, but I do have all these emails coming to me that I hardly look at. I’m really bad at following up on my emails. Mike Hurwitz: Maybe I shouldn’t say this in a public forum, but I am, too. What I find myself doing with the way that I work with email, specifically talking about marketing email, because that’s what Bluecore does is usually, I already have intent to go buy something. I’m like, “I wonder if I got a discount code or an offer in my email. If I did, let me go search for that and go work with it.” That’s a totally reasonable thing to have happen. From our partners' perspective, that’s a win. I decided that I wanted to buy something. Where I chose to buy it from was in part driven by my email even though all of that other time, when I didn’t have intent to buy something, I didn’t really interact with them at all. That’s fine. That’s a win. David Joy: I wanted to jump in and understand, what goes into architecting a solution like this or a platform like this? Break down how you put everything together, the decisions you have to make around technology, cloud platform, all those things. Mike Hurwitz: Like a lot of companies, Bluecore when we started, and to be clear, the company’s 10 years old. I’ve been there not quite six, so a lot of what I’m about to say were decisions that were made before I got there, so I’m standing on the shoulders of giants like everyone else. When Bluecore started, we were targeting a much smaller set of the communications that you might want to send to a shopper. To that end, we also didn’t have any customers, so it made a lot of sense, first of all, to go to cloud. Amazon was off the table because a bunch of the partners that we worked with compete with Amazon. At the time, and again, this is 10 years ago, weren’t comfortable with the separation between AWS and Amazon as a retailer, so that’s off the table. GCP was the next logical choice. Now I say that. GCP didn’t really exist. App Engine existed. App Engine, which is Google’s platform as a service service is what Bluecore was originally built off. We still have a bunch of stuff there. We have outgrown App Engine for a bunch of our use cases, but App Engine’s fine. I have no complaints about a lot of what App Engine does. It’s not the direction we’re going at our scale, but nine, ten years ago, it made a ton of sense. We built it on App Engine. Messages come in. Everything’s HTTP. They had a work queuing system that they could use. It’s called task use. We were able to take things in and schedule them asynchronously so that if we get overwhelmed with data, we can just process it later. Then everything went into a database. A couple of years later, that database got changed from some row-based store. I think it was MySQL, but I’m not really sure. It got changed into BigQuery. BigQuery makes a ton of sense for these use cases because while you normally think about columnar databases in terms of business intelligence type queries, think about what an audience segment is. I want to find all of the people who bought red shorts last year. They’ve been on the site in the last 90 days, but they haven’t bought red shorts yet this year. You’re not making an index for that. That’s not going to happen. You would have to be clairvoyant to think that you can do that. We can’t know what our indices are going to be ahead of time. We’re going to have a ton of data. Most of it gets written once and never updated. Boy, that sounds like a columnar database, so BigQuery has been a great choice for us. I don’t want to say that BigQuery is the best columnar database out there. Don’t get me wrong. I’ve had great experiences with it. If my Google Cloud reps are listening right now, I love you guys, but if we were talking about this in a more abstract sense, picking a columnar store, Redshift, Snowflake, Sybase IQ, Vertica, there are a million of them, makes a ton of sense. Bluecore made that choice probably 8-ish years ago, and we’ve stuck with it. That’s one thing I’ll say is that our source of truth for things like generating audience segments should be a database that has those attributes, can hold a ton of data, can process a ton of data, doesn’t require indexing. If we were building the same thing 10 years ago, we would be talking about Hadoop. Hadoop is great. I don’t want to talk crap about Hadoop. It’s a lot harder to run. David Joy: Yeah, I was a huge fan of Hadoop when it came out because of what it could do, but then I think I eventually ended up in scenarios where I needed to do things much more quickly and the processing time with Hadoop was way longer. I think for people who don’t want to manage Spark, I think BigQuery came out as a pretty good solution, or Dataproc is another solution that folks use. You are using BigQuery in that position, so you get data feeds from different sources? Is it real time or is batches? I’m thinking batches in my mind. Mike Hurwitz: All of the above. What happens for us is, I’m allowed to say their name because they’ve used us in their marketing materials. Sephora is a big retailer. They have a huge online presence, but they also have a huge brick and mortar presence. We get from them a bunch of things. We get feeds from them so that we can true up things like customer lists and catalogs and things like that. We also get big feeds of their offline purchases and they’ve got identifiers on there that we can use to associate that, again, with a shopper. We also have JavaScript installed on their site, so when you’re browsing around Sephora, you are sending beacons off to us, so we’re getting real-time events from their site. We are getting, I’ll say, large batch bulk that we’re getting from them. That might be, let’s say a daily feed. I believe we even have some relatively higher-frequency hourly stuff that we’re getting from them. From that, we’re generating all of our marketing communications that we’re doing, emails and SMS and all that, but also, we are sending them feeds of their own data. As an example, our segmentation is really pretty good. Not to toot our own horn, but I guess I should. For Sephora, they want to be able to say, “I have all of this knowledge in Bluecore. How do I unlock that on other platforms?” You can think of all the ad networks that are out there that they want to participate in with their data from Bluecore. We are not only getting these real-time feeds in, not only getting the bulk feeds in, not only sending out individual messages to recipients, but we’re also sending out feeds that are being used to update things like ad networks, things like their internal databases. There’s a lot of data flying around, and the stuff that is bulk versus the stuff that is not bulk, it gets pretty complicated. When we’re talking about this from a design perspective, one of the things that has changed in the last couple of years at Bluecore, and I am not the person who came up with this idea, but I am a huge supporter and have been a big part of getting it all implemented, was really changing from being request response focused and having request response stuff being very separated from bulk stuff to instead thinking about things in terms of streams. I don’t care if you send me a flat file or you send me a single event over a beacon from the website. Both a row from that bulk file and the message that you sent are both going to end up in the event stream. Because we now have one event stream that we can use for everything, we don’t have to worry about, what happens if a piece of data goes around through the back door and just lands in the database? That’s not really a thing. For some stuff, it still is, but being able to autoscale up and down based on streams, being able to use some of the message queuing features that are available to buffer information, replay if necessary, has been a real boon. It also means that for us, when we’re thinking about our services, I don’t have to worry about massive spikes, as an example. Let’s say that Sephora, let’s stick with Sephora. They decide that at 3:00 PM every day, they’re going to email every single customer they have, all of their shoppers. I don’t want to speak out of school, but let’s just say it’s got a couple of commas in it. They decide that they want to do that. At 2:59, everything was nice and quiet. The query runs and it’s a columnar database, so query takes a moment, but once the query is finished, we now have tens of millions of events that we now need to respond to. All of the downstream services that are involved in doing that need to suddenly scale up from basically zero to maximum speed? No, that’s crazy. That’s impossible to manage unless you’re going to have. David Joy: It can’t happen in seconds, too. Mike Hurwitz: It can’t happen in seconds because some of these services do have some internal state, although definitely, we try to minimize internal state as much as we can. We’re living in a Kubernetes world like everybody else these days, so you can’t scale them up instantly. It’s very expensive to build systems that can scale like that, and we don’t actually need it. One of the things that has been really interesting for me about Bluecore, and I’ll say this is very different from example of Tumblr, we don’t really care about latency most of the time. There are limits, but if we’re going to kick off an email campaign. Let’s stick with my example of 3:00, it is completely unreasonable and no one expects that all of those tens of millions of emails will have been sent by 10:02. That’s not going to happen. Even if it does, who cares? People don’t look at their email and respond to their email in that way. Even text messages, with the exception of some transactional stuff, you care about latency by seconds or minutes. You don’t care about milliseconds. We care about bandwidth. We care a lot about bandwidth. This is where streaming systems work really well. Since I don’t care about the extra latency hit that I’m going to take, there’s already a system that somebody else is managing, which is great, that’s going to buffer all of that information for me, and my service is going to drink from that fire hose of stuff at the rate at which it can. In a sense, going back to when we were talking about Scala earlier, it feels a lot like the actor model that you got from tools like Akka in terms of how you compose your services. Akka, by the way, is just an Erlang architecture that happened to have been written in Scala. That feels like a great way to respond to this kind of load, and it’s worked out really well for us. We still have some cases where there are services that get hit really hard, really fast in unexpected ways, but slowly but surely, we’ve been working all of those out of the system because maintaining them is very difficult and expensive and there’s no value in it for us. It’s not a problem we actually have to solve. David Joy: Right. What I liked about what you are saying here and the way you’re designing things is that not many people come out and say this outright. Pretty much everybody we know is designing stuff for scale on Kubernetes, but very rarely do you go back and question, do you really need it or not? It’s also expensive to put something together like this. When you were saying this, what I appreciated in your comments is the fact that there is a very clear, realistic perspective to what your business requires and designing and architecting solutions to cater to that. It’s very good to know that latency is not important, but it’s bandwidth because you have to shoot out lots of emails through in a certain amount of time. If it gets delivered after 30 seconds, is it okay? Maybe it’s okay. Also for everybody listening, Bluecore has almost, in my understanding, it’s about 400-plus brands who trust Bluecore to deliver this experience. You have brands like Aloe Yoga to Steve Madden, Lenovo to Express, Lulu and Georgie and Sephora. These are really, really good, important brands. Mike Hurwitz: I think so. David Joy: Yeah, my wife buys half the stuff from these shops, dude. Mike Hurwitz: I hope she clicks on her emails. David Joy: Yeah, except for Lenovo. Mike Hurwitz: Tell her I said to click on her emails. David Joy: Yeah, but I think half the reason is why it is because of those emails. Right now, this is where you are with 400 trusted brands. There’s so many more brands to which you have to scale your business, your platforms, your services, your applications to. How are you catering that? Is the goal still to be within a single cloud environment like GCP? Have you started considering that? How do you make decisions around, what’s going to happen in the future, say, when you increase to say 4,000 brands? Do you start thinking about them now or you deal with it as you get there, a six-month timeframe? Mike Hurwitz: A couple of things about that. One is that the way that we think about scale, for the most part, it’s a complicated question because we do keep data separate. If you, for instance, somehow knock me unconscious and grab my laptop and start trolling around our BigQuery instance, you’re going to find that all of the data for all of those partners is all separate. You would have to try really hard to join those things together because we do nothing to make it easy for you because that’s not a problem that we actually want to solve. Actually, it’s a problem that in a lot of cases, we are contractually forbidden from solving, so that’s good, but when it comes to the size of, let’s say our BigQuery instance or we also use a lot of Bigtable, I’m a big fan. It’s also a really good paper if someone’s looking for something to read. David Joy: Fantastic paper actually, 2007 or 2008, I think. That’s when it came out. Mike Hurwitz: Goes well with the Hadoop paper, got to rep my Yahoo roots. Those tools, from what we’ve seen so far, I don’t want to say confidently, we could definitely go to 10x scale and these tools are going to be fine, because I’m not certain that’s true, but from everything we’ve seen so far, we don’t really have a scale of data problem in those spaces. When it comes to the more modern databases that I’ve used, I think those are two good examples, the separation of storage and storage throughput from compute has made it so that that kind of scaling becomes possible. If you said to me, Hey, Mike, go do that using just good old-fashioned PostgreSQL, the first question I would be asking is, what grain should I be sharding on? For us, it would obviously be on the partner level, but that would be the first question I’m asking. How do I shard? How do I group small shards together? How do I identify what the hot shards are going to be? All of these questions come up that on these other systems, I don’t have to care about. It’s nice not to have to care about them, because they’re hard. We still have to deal with things like schema evolution, what does data recovery and disaster look like? Those are all big questions, but when it comes to scaling the systems, that has not been the problem. We’ve got other areas that are, but those have not really been the problems yet. When we think about multi-cloud for what Bluecore does internally, it’s not really necessary at this point. Again, I don’t want to pretend like I know the future, but at this point for us, it would really be borrowing trouble. It’s solving a problem we don’t need to solve. That doesn’t mean that we’re not thinking about multi-region, multi-zone. It’s not that we’re not thinking about disaster and what that looks like for us and our partners, but it does mean that, let’s take for example, BigQuery and Redshift. On paper and at a very high level, they do the same thing, kind of. Now, like all of these tools, the same way you could say PostgreSQL and MySQL, they do the same thing, right until you actually want to do something with them. Then you realize, they solve the same problems, but they do them in very different ways and they’ve got different edges that you might cut yourself on and you have to be careful about. As an example, when I was at Tumblr, we were a big MySQL shop. There were no joins. Zero joins in Tumblr. Bluecore, lots of joins. It is a different solution for a similar problem. David Joy: Oh yeah. As soon as you said you use BigQuery, in my mind, these guys use joins. They use all types of joins and nested queries, lots of data. That’s how you think. Yeah. Mike Hurwitz: It’s because you can. There is, especially in a sharded, low-latency environment, pushing all of that logic to the application tier makes tons of sense. I am not saying that what Tumblr did was a mistake. I’m sure that it’s evolved since I was there. It’s been quite a while, but it was a good architecture. It worked really well. It would be wholly inappropriate for the kinds of problems we’re trying to solve at Bluecore. Now, what I was saying before about BigQuery versus Redshift, they solve similar problems. They do it in slightly different ways, and I’m certain that some of the things that we’ve done to make BigQuery work best for us would actually hurt us if we were doing it in Redshift. For us to go multi-platform or multi-cloud without having to, we’re pushing that off as long as we can. Now, there is a distinction that I want to draw, which is as IT teams have really stepped to the fore, and this is something I’ve seen a lot in the last year or two, where it used to be that at Bluecore, the marketer was really in the driver’s seat about what their platform was going to be and how they were going to work with us, they want a UI. They want to click around. When we talk to IT teams, they don’t want to click anywhere. They want to write code. They want API integrations, but more importantly even than API integrations, they want data level integrations. They don’t want to call my API in order to get hundreds of millions of events out of our system. That’s not a good way to do anything. They want it integrated with their stuff, which means that we’re figuring out, and this is something that we’re actually actively working on, how do we take the data that we have, especially the data that we’ve cleaned, and share that with people outside of the four walls of our cloud infrastructure? It’s a different problem today than maybe I would’ve thought about even just a couple of years ago because of tools, whether it’s Snowflake or Cockroach or a lot of these tools that are meant to be cloud native, that are meant to be outside of any particular infrastructure or installation. There’s a lot changing there. It’s very exciting. It’s a lot of fun. David Joy: Yeah, it’s fascinating. It’s fascinating what you were sharing. I was coming into the conversation. I was researching what Bluecore is doing, and you guys are leading some really cool innovations in this space, obviously. Obviously, the amount of data that you’re dealing with from one particular brand itself takes a lot of work and effort, and it’s put together in a way that’s digestible, that’s decision making, leads to making decisions. One of the things I was really interested in was also understanding, how are you guys integrating with the latest innovations in AI and managing the scale of all of that? Mike Hurwitz: I’m happy to say that for Bluecore, machine learning is not new. When I joined, again, that’s almost six years ago, I joined the data science team. We’ve been at this for a while. What we’re finding is there are a lot of assistive technologies that really didn’t exist a couple of years ago, where large language models have been really important to make marketers' lives easier, but it’s helping them do the thing that they’re really good at. As an example, one of our engineers put together a demo that was generating good subject lines for a particular communication. There was a feedback loop, and it wasn’t just like, go ask ChatGPT, hey, what’s a good subject line for my email? It was a lot more functional than that, I’m happy to say. We went through and all of us, I will tell you, I was blown away with the quality of what came out the other end. I was fully expecting there to be hallucinatory subject lines that we could all laugh at later, and that wasn’t the case. There were times where it needed correction, but you could give it correction and it did just fine. A little bit of slick UI on top of some prompt engineering, and we were able to get something good. However, what that doesn’t really help with is personalization. If you want to say, I have a million people I want to contact about something, but what’s the best way for me to talk to Mike about this? That may be different from the best way to talk to David about the same thing. There are different things that we’re interested in, and that’s fine. That’s where these large language models, it’s not that they don’t have value, but it doesn’t actually solve the hardest problem, which is understanding how you and I are different and applying things to it. When it comes to those types of problems, we’ve been at it for years and we’ve got some really smart people who work on it. I’m constantly embarrassed by my math skills or lack thereof. One of these days, I’m going to try to learn linear algebra, but it hasn’t happened yet. David Joy: Highly recommend it. Mike Hurwitz: What’s happened for us is that it seems like a lot of what we’ve been working on for a while, the world is waking up to, oh wait, marketing correctly, marketing effectively requires more than just segmenting based on things like behavior. It requires understanding and learning about who your customers are and about your own data, about, whether it’s catalog or shopper behavior, all of this stuff comes together to create something a lot more powerful than just having the best subject line because the best subject line in the world doesn’t matter if, first of all, it’s not relevant to me, which implies it’s not the best anyway. If it’s not relevant to me, it doesn’t come at the right time, it’s perceived as spam, this is where a lot of what’s happened recently, and don’t get me wrong, we have things that we’re exploring. There’s a lot that’s exciting there, but it’s a lot more useful to the people that use our systems rather than to the internals of our systems themselves. David Joy: I agree. I agree, and especially what you were saying. I have been in scenarios where people are using these large language models in a RAG system where they have specific data that they’re adding to generate something specific. I think one way that I’ve seen you can reduce hallucinations is, of course, by adding negative prompts, say, don’t include these things, but one of the things that the world is getting aware of is the idea that large language models generally are good at generalizing. I like the point that you made that it’s not good for personalized, but that’s where our traditional machine learning programs like your KNN or your classic recommendation models, there’s so much work that has been put into those areas, and there’s a classic combination of these two that can actually produce some amazing, mind-blowing results, actually. Mike Hurwitz: One of the things that I’m finding really exciting that’s going on now, and we’ve done some dabbling with it, we can get into why we went the other way, at least in the short term, but when you mention things like KNN, having databases that can answer that question effectively, where previously, you thought about, whether it’s an R Star type of database to do geo things, that’s fine. Great. Now what do I do when I have 300 dimensions? You’re not doing an R Star, I can promise you that. We did some experiments. The name of the database just fell out of my head. We did use Facebook’s F-A-I-S-S, or I think they call it Faiss, I’m not really sure. We played around with that for a bit to try to do real-time recommendation stuff, and it’s really powerful. There’s not just cool stuff going on in the large language model space. There’s really neat stuff going on with feature stores. There’s really neat stuff going on with vector databases and vector search, and I think it’s a very exciting time. Part of that is because memory has become so damn cheap, comparatively speaking. I still don’t have a terabyte of RAM in my laptop, but maybe one day. Memory has become so cheap, relatively speaking, that these kinds of problems are now solvable in a way that they weren’t before, but for machine learning beyond just large language models, it’s been a really exciting couple of years. It’s a lot of fun. David Joy: I only anticipate it to get wild in 2024 and the rest of the coming years. When you were talking, I was reminded of, I had a brief stint at Wal-Mart. I was working with the Sam’s Club project team at that time, and I remember one of the projects that I worked on was to influence the product click-through rate and make sure somebody puts something in their checkout, basically, the cart in checkout. We used to run a lot of A/B tests at that time to give different users with, say, different age groups and I would say, same characteristics, different views, and then we would measure the A/B test results and then push a certain amount of view on the Sam’s Club website actually. We did a bunch of crazy stuff that you are doing, but in a very small space for a very specific use case. It was very interesting to see how all of that mapped, and we used similar technologies to what you’re using right now. Mike Hurwitz: We do a lot of A/B testing, both internally, so we have what we refer to as covert A/B tests where we’re going to make a change on our side. If we want to update a model, as an example, we’ll run a covert A/B test to make sure that we maintain or improve the quality of our models. We also have client-driven A/B tests because as you can imagine, if you’re a marketer, you are looking to sell stuff, or more importantly, you’re looking to get eyeballs on stuff. They are constantly being driven to measure their own performance and improve it. If we’re not able to provide them the tools to do that, they’re going to go somewhere else. Now, we also need to answer those same questions, so a lot of the tooling that we have in order to answer those questions externally, we can also use internally. There are some caveats to what I just said, obviously, but as a general statement, we are very data-driven in terms of how we drive marketers to reach their shoppers. Again, if all you want to do is send tens of millions of emails, there are cheaper ways to do it than through our platform, or if you want to send SMSs or site impressions or whatever. On the other hand, if you want to make those count, you better have something that’s a little bit smarter than that. We try to be a lot smarter than that, but that’s actually the mission. It’s not just about reaching people. It’s about knowing how to do it correctly and at the right time and with the right message. David Joy: It’s amazing. As I was saying, Bluecore is doing some interesting stuff, obviously. That’s why you guys are doing what you are. Almost 10 years as a company, 400 plus brands. It’s fascinating. I wanted to get into conversations about your experience at Tumblr. I remember last time when we met, we were talking about your Tumblr stories, and now you’re wearing a Yahoo T-shirt, basically. Tumblr obviously got bought by them, right? Mike Hurwitz: Got to represent. David Joy: Yeah, I know. Mike Hurwitz: I want to say I joined a fabulous social network and had a wonderful three years working there, and all I got is this sweatshirt. David Joy: Man, that’s awesome. Mike Hurwitz: That is totally not true, by the way. I loved my time there. David Joy: Yeah. Here’s what we said. We were joking about this. Tumblr was a social app that came out at the wrong time. If it came out today, it would have been a very interesting solution, I would say. Obviously, with some changes to it. I was just thinking about it, but yeah, obviously, we couldn’t get into some of your Tumblr stuff. I know we are at time. Tell us a little bit about how people can catch up with all the awesome Mike Hurwitz updates. Is it LinkedIn? Do you write some stuff? Where can people follow you? Mike Hurwitz: You can find me. I’m Danger Mike pretty much everywhere, so you can find me. I think I am @dngrmike on Twitter. You’ll find me as Danger Mike on LinkedIn. You can find me on Blue Sky. You can find me everywhere. You can find me on GitHub. David Joy: Yeah. That’s cool. Is that also your Slack name within the company when people are reaching out to you? Reach out to Danger Mike. Mike Hurwitz: It is, and it’s funny. We had a new employee start just this morning and he’s like, “Do I call you Danger?” I’m like, “I don’t know, man.” It all started because actually, when I joined Tumblr, I joined a team of seven people and I was the third Mike. David Joy: Oh, wow. Mike Hurwitz: In a team of seven, so nobody got to be just Mike. Actually, there was one guy who got to be just Mike, but he’s cooler than me. David Joy: Just Mike. Probably the first guy. Yeah. Oh, man. It’s awesome. Mike, I wanted to say, this has been such a fascinating conversation hearing about the cool things you guys are working at Bluecore. It’s definitely fascinating. I just hope we can continue this conversation into a version two, have a catch-up call, dig into your Shutterstock and your Tumblr story. It’s been fascinating hearing what you guys are working on, obviously, but more fascinating to see how you as an architect are building all these things and designing applications and services for scale. Obviously, when I meet you next time, I hope we can do a little bit more shit-talking about AWS Redshift. Hopefully, we’ll edit that out, but it’s been an absolute pleasure having you. Mike Hurwitz: It’s been great talking to you, too, David.

Big Ideas in App Architecture

A podcast for architects and engineers who are building modern, data-intensive applications and systems. In each weekly episode, an innovator joins host David Joy to share useful insights from their experiences building reliable, scalable, maintainable systems.

David Joy

David Joy

Host, Big Ideas in App Architecture

Cockroach Labs

Latest episodes