About this talk
According to Research Firm ESG, 1TB of production data turns into 8 to 9TBs of actual enterprise data, because people across the company are copying it for other purposes. What if you could capture that production data in its native format, quickly preserve it in your own data lake, and then be able to go back in time at a moment's notice? That's a data strategy!
Watch this episode of InsideAnalysis to hear host, Eric Kavanagh, interview GRAX CEO Joe Gaska, along with Cloud Visionary Sarbjeet Johal, as they explore new ways to get value from data in the cloud, including Salesforce. They'll discuss data ownership, data access in the cloud, and how to ensure that you've got a viable insurance policy for when bad things happen!
[MUSIC PLAYING] RANDALL BOETTGER: The information economy has arrived. The world is teeming with innovation as new business models reinvent every industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more at insideanalysis.com. And now, here's your host, Eric Kavanagh.
- All right ladies and gentlemen. Hello, and welcome back once again. It's time for the only coast to coast radio show in America, all about the information economy, Inside Analysis. Yes, indeed.
Your host here, Eric Kavanagh with two visionaries on the call today. We've got Joe Gaska, the CEO of GRAX and Sarbjeet Johal, a Silicon Valley visionary who knows all about the cloud and lots of the trends that we're seeing in the marketplace. And he's got some big ideas that we may talk about on the show today.
But the topic it's very interesting folks. We're watching a real renaissance in several different areas of enterprise computing. And there's one space that's always been very sleepy and very boring, no one ever wants to talk about it. If you did a webinar on this topic a few years ago, nobody would show up because nobody cares. I mean, they should care, they really should, but it's just boring. It's called backup.
But backup is very interesting these days because a lot of people have been thinking about concepts like cloud native for example and concepts like reuse. So if you think about where we've come in the last, gosh, 10 years or so. Really 20, but in the last 10, and especially the last five, things have changed so much and it's largely because of the power of enterprise cloud computing.
Google, Amazon Web Services, Microsoft of course, has pivoted hard to the cloud with Azure. Tremendous amount of energy being placed in that environment. And hats off to Microsoft for really pivoting hard and succeeding well. And I always joke that it's ironic that Microsoft would save us from the monopoly of any company, in this case Amazon Web Services. But it's good, competition is good, it's healthy.
But when you think about data and using data, well, when you're trying to do analysis, you're trying to build out some new business process, where do you get your data from? Well, typically, from a database. That's historically what you would do.
And this database is got bigger and bigger and more powerful and stronger, concurrency issues were fine. But the point is, at a certain point, that database becomes a choke point. And if everything in your enterprise is leaning on one choke point, that's going to be a problem and you're going to run into some issues.
Now, there are lots of issues about using other sets of data. We know all about silos, as they call them single location, servers, for example, [? are ?] repositories of data that are not reconciled with the rest of the organization. That's an issue. We've talked all about master data management strategies, for example, to reconcile all these different systems and make sure that you have that cohesion across your organization from a data perspective.
But now there's this whole concept of data fabric being thrown out there too which again, gets to the whole art of provisioning that data, of getting it where it needs to be for who needs it at which particular time. Well, so where this all intersects with backup, is that a number of clever folks have realized, hey, your backup is ideally a clean version of your enterprise data. So depending upon how much latency there is between the time it was captured at the time that is now, it could be a very relevant data set and for certain use cases like machine learning, artificial intelligence for training that data, well, guess what? The backup turns out to be a pretty good substitute for your live production data.
So all of a sudden, you have these companies figuring out that there's a way to really use and reuse data that's sitting in your backup environment. And there are different ways to do backups. So that's what attracted me to the folks at GRAX. And we had Joe on the show last year and again, just a few weeks ago talking about why do you want to rent your cloud data.
So Grax came up with a pretty interesting theory of how to essentially insert in your cloud environment like your Amazon Web Services or Salesforce primarily, or Salesforce environment and siphon off data at a frequency that is meaningful to you. So maybe it's every day or every hour or whatever really makes sense for you and store it in its native format. So instead of compressing it down to a CSV file for example, which is often done as a way to get cheap storage, for example. Well, that also makes it harder to reuse that data because you have to extract it then you have to unpack it, basically expand it out. Sometimes there are problems in that process too.
So long story short, there are just some really interesting things happening in terms of how companies will leverage their data. Not just their production data, but their backup data. So with that, let's bring in Joe Gaska of GRAX, tell us a bit about what's going on out there and I just keep marveling at how fast things are changing.
If you look at like just very quickly that whole Hadoop space that spun up and then dissipated for lots of different reasons, but one of which is because cloud storage got real cheap. One of the big value propositions for Hadoop was cheap storage and a file system, but then they're all kind of security issues and reusability issues. All kind of stuff happened.
It was fun to watch. The folks at Cloud Direct can tell you all about it. But Joe, start talking so I can stop.
- I think you're one of the only people out there that gets excited as I do with backup. It's like the pre-COVID days, like no one really got excited about selling toilet paper, but then all of a sudden, there's nothing on the shelf.
[INAUDIBLE]. Cynthia Smith, she agrees obviously [INAUDIBLE] chat there. So one thing you said, when things get cheaper, we all know the relentless progression of Moore's law. Computing, storage, all of that is going to continue to get cheaper.
And when we really thought about historical data and the fact that having the highest fidelity historical record accessible at a time that you want to really reuse it, backup if you really think about it, it's a tactical obligation that every business has to do. Whether it's disaster recovery, whether it's data issues, whether it's auditability whether it's compliance, regulatory, backup data is a necessity that has to exist. It's not just the strategic things that we talk about, but there is all these necessities that every business has to think about.
And what we wanted to do, and one key thing that I try to press on everyone and why I'm so passionate about this, if we wanted to protect the best interests of every one of our customers, and by doing that we didn't want to take the data and lock it away in our cloud and then rent access back to it to the customer. We want to put it in its most rawest, purest form directly in the customer's cloud of choice, whether it's Azure or AWS, I can tell you good things about both. We put it in its most rawest, purest form in the customer's storage of choice.
And for some of our customers, data never leaves their environments that they own. So we maintain a chain of custody of data. So whether you're the federal government, whether you're a foreign entity, whether your insurance, banking, any regulated industry, data never leaves their environment. It's stored there forever. And this is critical.
We can get into why, because of the sheer mass or volume of backup data. If you stored in other people's clouds, this is where APIs break down terabytes, petabytes of data trying to do anything meaningful with it. It becomes very interesting in what you can do with that. And it's obviously backup data for me is all about history and the historical record. And the historical record isn't just an insurance policy.
- That's an interesting concept too that you throw out there because we've talked about this on other shows. But your data history tells us a lot about your company and where you're going and what the trajectory is. And so you can learn.
It's funny how little people like to learn from history because we're always looking forward. There's a great movie in which I think it was The Great Race where the Italian driver gets in the car and says to the guy next to him, he goes, the first rule of Italian driving, what's behind you is gone. He rips the rearview mirror and throws it away.
And that's how you data lot of times. It's like, Oh, forward looking, forward looking. What's next, what's next. But if you look back, you can figure out what you did right and what you did wrong. And unless you do that, your improvements will be largely accidental.
- And if you think, everyone wants this dream, this panacea of machine learning and AI. But what is it, it's signal variance. What do you need to train a signal with, its the historical data.
So when we look ahead of where you want to go with these complex learning systems, the one key point is an accurate historical record because knowledge is based from history. You're 100%. I mean, that's really when you start getting into this, you have to have that accurate record. And it becomes super interesting.
We're talking to some of the most biggest brains in the world, the things that they're starting to think about doing with their back up, like you said. People are finally realizing that, wait a minute, I have all the historical record of all of my most strategic data stored in the SaaS cloud. For example, Salesforce.
Data in Salesforce isn't one and done. When you change data, the fact that you changed the data and who changed it and when is very interesting for a company, whether it's a salesperson, whether it's customer service, whether it's marketing, those change events become very interesting for the business itself. And that's [? what ?] we started with back up, let's just get everything in a place that has the highest economies of scale for customers.
And that's when we store in its most rawest, purest form in these clouds. And that's really where it became interesting for us. We started rethinking or reimagining how we deploy GRAX, why and where we store it. And that's for me even my biggest passion is we are not storing the data in our cloud, but we're putting it in the customer's cloud because I believe if you're going to protect the best interests of the customers, you don't hold them hostage
- That's good stuff. It's actually a good segue. I'll bring Sarbjeet in to talk about this. And first I'll just throw an interesting curveball here.
We're talking to another company, Dremio We'll do a webinar with them next month. And they are the folks who are leveraging Apache Arrow, which is this really interesting open source project that allows you to query in a federated fashion, to reach into different cloud data lake storage systems environments. And because they've done some work on this in-memory processing, they're able to align the access of the data with the execution engine fairly tightly.
And so the point is now, you don't have to ETL all that stuff out of the cloud to make use of it. You can just do this distributed query where you're reaching into these different environments, and it's like, man, that is what we've been hoping for years. But I'll throw that over to Sarbjeet just to comment on. Go ahead.
- I think there's a lot to unpack there. So the key aspect of consuming SaaS is-- I talk a lot about consumption economics and technology space. I'm an economics major. So how do we consume stuff? It matters. Now, how do we consume technology matters.
So that when we consume SaaS, there's a big side effect of that. That's data silos. My CRM data is sitting with SalesForce, most of my CRM data, and my HR data, HCM data is sitting with Workday from consuming HCM and SaaS, and my profit and loss and GL data sitting with SAP or Oracle. Most probably you're not using Workday's or Salesforce doesn't even have those kind of applications if you will.
So it's all over the place. And then at some point you have to marry that data and make sense out of it. And then all this [? the ?] end of data warehouse and traditional data warehouse is evolving and then now, data lakes and then now, we have big names like Snowflake coming in with data fabric and all that stuff.
So everything revolves around data. And data needs to be kept as liquid as it can be. So you can maneuver quickly.
So we talk about the liquidity of assets in the financial world and we need to think about data in the same sort of terms, I believe. So yes, there's a lot of data sitting in our backups. When we retire data, we don't want to keep all the data in our transactional systems for performance reasons, for cost reasons and all that stuff.
I believe that we can leverage a lot of that data in data science applications especially to start with in the beginning. Like when you're training the models, a lot of people need data for that. Data science eats data for a living. That's what it does. So you need that and then you need to maybe anonymize some of it And. If you want to give it to third party, your partners.
So I think what was said earlier about keeping the data into the most purest form, I think is the key. So that the liquidity of the data or the usability of the data is when somebody gets it, it's like very usable at that point. You don't have to go through a lot of pre-processing or making it usable. I think that goes a long way.
I think we can get into more discussions around whose cloud, where does that data sit in, and what is the difference between making data more pervasive or accessible versus reducing the cost. I usually say this that the most-- I mean the best leaders in tech are the ones who treat their platforms as they are number one assets. And they don't treat that as spend right. So I'll stop there. And I think we'll get into some other discussions in that context
- That's a very good point. Because again, if you're thinking strategically, you're thinking in terms of value, like I'm a supply side guy in terms of my mindset. I saw a comedian joke about how every married couple has the one person who wants to just go out and have fun, let's live for today, and the other person of course hates fun. That's a joke, he's a comedian.
But the point is, you do have to monitor your costs and you don't want to get out of control with all that kind of stuff. But if you're just trying to cut costs, that's not a forward looking vision at all. Right, Joe?
- No. Absolutely not. One thing that's super interesting is if you think about it, we did we did some research with ESG group. And they estimated that for every one terabyte of production data in the SaaS, there's eight copies of it in the enterprise today.
You really think about all of that risk and liability and compliance and regulatory, and you think about all of those pieces that go into storing and backing up all that data. But one key thing as we were speaking about earlier is, storing in its most purest form and not deciding on what is and what isn't backed up and how much data there is, that's really where it gets very interesting very quickly. And as there's a lot of different consumers downstream and what do they want to consume and how much of it, that's really a business decision more than anything.
And really starting to piss that out and say, OK, what's the access. You mentioned Snowflake earlier, a lot of our companies are using Snowflake for all their data but we have all the data scientists using R or using Redshift or using all of these other pieces because you have different consumers within the enterprise. How do they get access to that data? What APIs, which is even worse that we've seen as we're adding more sources to capture all the history?
The one thing that becomes an evil breakdown is APIs break down with large data. So trying to move large data or large volumes, think about for a minute we said, there's eight copies of the data in enterprise. What if your data scientists, your data backup people, your sales ops, your marketing ops, everyone's heading your APIs to suck your data? And you're losing--
- Now, let's break on that. We'll be right back. You're listening to Inside Analysis.
RANDALL BOETTGER: [? Harrison's ?] reality check. Now, there's no easy way to monitor for coronavirus at home in a moment. Corona alert.
ANNOUNCER 2: There is a false number circulating on the internet that claims COVID is 99% survivable. fullfact.org points out that the only part that's actually--
- I thank God that Zoom has that warning, you are muted right now. You start talking, it's like, Oh, yeah I got to unmute. That was pretty clever.
Zoom has done a lot of really clever things I have to say with their platform. I just love the fact that they each show they save as both a video and an audio only file. I'm like, ka-ching. Thank you. That's podcast friendly.
- My last company was bought by LogMeIn.
- Oh, really?
- Yeah. So we were there IoT platform. But it's amazing how fast Zoom came in the market and captured the market quickly just from usability and pieces from, the go to market, I mean, we go to meetings.
- Oh, yeah.
JOE GASKA: And now it's just Zoom everything.
- It's amazing. I mean, there are probably 50 different vendors in that space. I mean Cisco of course, I think it was you Sarbjeet, who pointed out to me that Zoom had surpassed Cisco in market cap. And you're like, Cisco is an infrastructure company, they actually make stuff, and they by the way have Cisco Webex which has dying on the vine for years.
- Zoom's founder came from Webex. So I hooked them up to Oracle Cloud back then when I was at Oracle. So I know these guys. Very small team, it was like around 10 people when I met them. It's just-- I think pandemic just put them-- they were doing pretty good. They have big vision actually
I mean, they're not even scratching the surface of that. They want deeper application integration into Oracle application, into SAP, into Salesforce and stuff like that.
- I mean, think about the unstructured data that they could capture for in the meeting context. and it just--
- There are companies doing that. What's the one that's doing that exact very thing, where they're actually, they get a transcript of every show. Every time you use the tool, they're grabbing transcripts of all that, doing NLP, doing text analytics, and then coming back and saying, OK, you sales guys, you should use what this guy said because that seemed to work. I mean that's really, really impressive.
- I did 30 integrations to Oracle's CRM from these startups. I mean, Zoom was a startup back then. So when you are having a sales meeting, most of the meetings were with the customers or sales meetings or presales, sometimes doing sales, some post sales. So they touch the sales. And then doing the transcript.
And also within that getting the documents, sharing the documents, the like, what do you want to send them somebody asked for something and then follow up meetings, I mean, you can do a lot more in these meetings with CRM software.
- That's right.
- That's a whole new generation. We got about 23 seconds. We had a couple of good questions from the audience too, so I'll throw those [INAUDIBLE].
- Had someone to fix email and Slack. Because they're both, after I scroll out of slack, the information's just magically gone. And then email, you lose control of it if you blink your eyes.
- That's it. That's where I live. That's my life. 8 seconds, stand by.
ANNOUNCER 3: Beating cancer is in our blood. Learn more at lls.org.
RANDALL BOETTGER: Welcome back to Inside Analysis. Here's your host, Eric Kavanagh.
- Oh yeah, baby. Take us to the future where everything just works and it's all free, right? That's what I'm talking about. Folks, we're talking all about cloud, and cloud data, and ownership of data, lots of interesting topics. We had a couple of really good questions here so far. We've got Joe Gaska of GRAX, and of course the cloud luminary Sarbjeet Johal on the line today.
But I wanted to throw this question over at you, first Joe and then maybe Sarbjeet. It's kind of what I was talking about earlier, one of the attendees is saying that we used to always have conversations around backup, and how important it was, et cetera, and I just don't hear those conversations anymore.
Is that because we're just too busy focusing on revenue, or-- it's an interesting question, right? Because someone asked to focus on it, someone has to pay attention. And if that person isn't paying attention and disaster strikes, boy. [WHISTLES] That's a bad, bad day at the office. But Joe, what do you think? It was a lady, by the way, who asked.
- This is a great, great topic, and it's a great question. So here's the thing the reason why business continuity, we hear over and over, no one thinks about is because everybody assumes when you sign up for a SAS vendor that they take care of it for you. It's magic, it's in the cloud, everything's there. But a lot of retention policies for a lot of these SAS clouds, for example like Salesforce, is six months.
You can't call them up and say, hey, something happened, restore my data. They don't have a backup service that they offer anymore. So they have it if their servers go down they can recover the data, but if one of your IT folks goes in and does a mass update and changes all of your data, how do you recover from that? How do you get that data back?
All of those-- as we were talking earlier-- those tactical obligations for your business, just as if when the data-- as we were talking about earlier-- the data silos or databases inside of your infrastructure, it's the same as externally as well. If something bad happens, or data is updated, and how do you recover, and how do you restore it? That still exists today. And we talk to everyone about this all the time.
You really have to think of GRAX and a lot of what we do. We came out to be basically the black box for your SAS applications. It just records everything. If you need it, you can grab it, or if you want to get it downstream. The need for that exists in any SAS application that you're using, and that's one of the most critical things. Not just the continuity parts of it, not just the insurance policy.
- And there's another fun question from an attendee who wrote, have you had any moments when a customer will actually realize looking at changed data that, wow, I didn't want to see that. It's interesting there's a certain aspect of revelation that occurs if you can get visibility into all that. Obviously, we have to have governance and so forth. But yeah, I think almost every time someone takes a hard look at their enterprise data, they're going to be, oh no. They're going to figure something out that's wrong, right Joe?
- So a lot of the really interesting questions that I've been asked over the years for data is all about business velocity changes, customer attrition rate, customer acquisition rate, growth rate. All of those things come from historical data, and all the really interesting things that people do or don't want to see is all about influencing or affecting those velocity changes, or predicting inflection points of my business rate decrease is actually happening, or increase, and I need to focus.
All of those pieces that everybody has been looking for as soon as they invented the spreadsheet, that's what people want to look at and say, what's my growth rate or what's my chart that I can see? But yeah, we've had a lot of people that look at their data and they start to realize trending analysis. And a lot of those things that people say, or they equate to artificial intelligence, are really about statistical anomalies with your historical data.
So a lot of those pieces that people think that the AI is not that unachievable when you start thinking about show me the statistical anomalies in my data. You feed enough historical data, that exists. And that technology is, if you think about the basic signal variances that have existed since the 1940s.
So it's signal processing, and that's getting enough signal variance in there and processing as you can start to detect the anomalies. And that's with historical stuff, but obviously your interpretation of whether it's positive or negative for the business-- there's a lot of there's a lot of indicators that you have to look at and say, whether it's visualizations, that's where it becomes really interesting. But yeah, we've definitely had a lot of very interesting things that people have discovered.
- Yeah, sure. And I'll bring Sarbjeet back in. As I look at the sprawl of cloud, it's really quite amazing. You look just in the Martek space, 7,000 applications that are tracked by the Martek. 7,000. 7,000! That's a lot. And they all have their own data models, they all have their own unique IP, their own way of doing things. And it's not an apples to apples. I mean there's a lot of overlap if you go from one tool, to another tool, to a third tool, to a fourth tool, but it's usually that one part that this system doesn't do that you really need, right?
And that's why you wind up with this best of breed approach, that's why you wind up using all these different applications. And I think we're just at the beginning of the era where if it is in AWS, if, let's say you have five different technologies that are all built on top of AWS, I think we're closing in on the time when companies will be able to very richly and strategically weave that stuff together.
I was joking with someone the other day who pointed out that, well, maybe Amazon isn't too keen on making you do that since they're making money five times now every time you're using all five of those tools in that environment. But nonetheless, there is something to be said for coalescing what is taking place in the cloud. But what do you think, Sarbjeet?
- Yeah I think this is a lot to-- I mean it's very weak kind of question. But there's one sort of mechanism, and if you take the best practice or best architecture systems, you know how do we in a weight, if you will. There's one construct in our thinking frame which is like, you waste what is the cheapest resource. Right?
As Joe said earlier, the computers are becoming cheap, right? And that means you can end up wasting. That means you can use that a lot more. And Joe said that earlier, I think prior to we started this casting to the public, this whole talk, that we can easily now take the data which is in the back up and make it usable using AWS cloud provided services.
There are so many tools available to us today which were not even available two years back, forget about five years or 10 years back, right? So there's a lot more computer available to us on demand, and we just pay for it when we use it. And mostly we will do that kind of stuff in batch process, and if you're even more clever you will use reserved instances, even cheaper computing resources.
So I think there's a lot you can do with the data. I think in that case, the data-- what I'm trying to say is the computers are cheaper now, the data is very expensive. The data is your core asset, if you will. And you want to leverage it as much as you can. And I think I said that earlier, I'm an economics major, and I am bringing these time tested concepts from the financial world.
In the financial world, we talk about the stocks and flows, right? There's a stock of data, and there's the flow of data. Mostly data scientists and they mostly focus on the flows, less on stock. I think we have to focus on stock. If you study financials, just go to YouTube and just watch a video from MITU, or Harvard, or whatever, you see OpenCourseWare on these two things, stocks and flows, just watch that.
And then it will open up your mind about data, how can we use data as an asset in two of these forms, and then how can we make the best out of all of that? Yeah, I'll stop there. These are abstract concepts, but I did go a long way, actually. That's what I think will move us forward.
- You know that's one interesting thing about your background with financials that I've always thought about is everyone in the financial world had it correct first, and they back test their models. They back test their models when they start to build financial models with historical data, and I really feel like with computing now and transactional systems like with the SAS world, being now to be able to back test everything from business processes with historical transactions, it becomes very interesting.
I think we're really getting to the point now where that's actually plausible to do that, like having every transaction from your business, and before you're making these changes. If you think about-- there's one thing that we're leaning into a lot is any regulated industry coming up that deploys a machine learning or AI model is going to have to prove, if they're regulated, that they back tested it, and they have to store the fact that they back tested this model with historical data.
So it's becoming very interesting, and a lot of people are starting to really think about all of that from a regulatory and compliance point of view. But we always go back to the financial world, like you were talking about, your history is they back test their models. And now we're getting into that point when that becomes incredibly important for a lot of these businesses that we're talking about.
- Yeah, I think one of the audience was asking about the RTO or RPO, how soon you can recover from disasters and what is that time frame, and what is the recovery point objective, if you will. Right? RPO, RTO, those concepts, mostly talked about in the back up and [INAUDIBLE] perspectives or those discussions. I think those two terms need to be revisited in today's context, in cloud context.
We still keep using the old terms in the old context, and we tend to actually make this mistake. I think that's the difference between forward-looking leaders versus average leaders out there, is that they try to take the same terms that we are using and apply to what is available today and what will be available in next two years. I work with a lot of startups here in the Bay Area, living here for 25 years, and have seen a lot of patterns here.
When you're developing something today-- another thing, actually, back up a little bit. Another thing I usually say is that the best leaders treat their investments into these kind of platforms as pristine assets. They are building assets, they're building IP, they are building a system which gives them advantage over others. They don't treat these systems as normal spend, or not even normal investment.
This is investment into their future tooling. They are building that IP. I think by picking the right tools for the right job, and in picking the right platforms, and also the ecosystems on top of the platforms, I think it goes a long way. So you have to-- which the terminology and the consumption economics around technology, I said that other thing earlier-- you have to have a good, solid understanding of the different systems and how they will let you leverage your assets. And data is one of your best assets, I mean there's no doubt.
- Yeah, that's it's a good point and you know I think we've got another segment coming up here we've got a couple more minutes. But as I think about the reuse story, that seems to me to be one of the biggest ones. Because again, for machine learning, for back testing for example, for training these algorithms, it may be difficult to through that whole process, to basically take all this production data, export it into a separate system, load it into a Mongo or whatever you want to do, and then do it there.
Well gosh, if you have a solid backup that is at the ready, that's a fantastic place to do that. Do you see that happening more often Joe, or is that still forward looking?
- So right now-- the companies that we're dealing with now, we did a study with ESG and we found that the-- we're talking about the RPO-- the lowest tolerated RPO right now as a 15 minutes window for some of the people's recovery point objective. And a few different things about that is as we talked about back up and historical data, back up-- when you think of backup and using the word backup, it's your tactical obligation that your business is now making sure that they can fulfill.
Auditing, restoring of data, capturing every version to be able to kind of quickly help the business-- there's a perception that changes, is historical data is the same data source, it's the same everything. But when people use the word history, that's for the strategic part of the answers they're trying to get to, right?
So this right now, we're seeing an evolution shift where all companies right now are starting to really look at that and start saying, not only do I want the highest frequency, the highest fidelity, but I want to put it in a place where it can be reused quickly. And one thing that we're speaking about, all of the negative pieces of AWS-- having Glue, having RedShift having SageMaker, QuickSight, all of that having access to your full history-- those tools-- go ahead.
- Yeah no, go ahead. Finish up real quick.
- Those tools help you answer business questions. So the real thing is just coming up with the business questions that you want to answer and then using the correct tool to help you answer that. Whether it's a visualization, whether that's the RedShift, you don't have to worry about the tools there, because they're there now.
- Yeah, that's a really, really good point. We'll pick that up after the break folks. Don't touch that dial, you are listening to Inside Analysis, we'll be right back.
- Can we talk now?
- Yes. I think we should be good.
- I spoke a little too early.
- That's OK.
- I think that there's some noise coming from your mic, is that a cat or something? What is it?
- That is my child upstairs in her rollerblades. I've been trying to mute it.
- That's pretty loud.
- It is pretty loud, sorry about that. She just got them yesterday, so she's scraping all around upstairs.
- That's a good excuse, then. I want to just tell the whole audience that's what it is.
- No, that makes it more fun.
- It's more endearing.
- Yeah. Oh man, I'm a late bloomer. We didn't have a kid till I was-- let's see, I'm 53 now and she's about to be eight. So I was what mid 40s? Whoa.
- We started early. We had our first kid at 25, 25 1/2. Got married at 24.
- I think I'm in the middle then, mine was 38.
- All right. It seems to be happening later and later, for various reasons. But it's good in a lot of ways, and in a lot of ways it's just terrifying. Like, oh my God what was I thinking? I should have backed up that data a long time ago. All right, let's see. So we'll come back at I think 48? So let's go into-- what else? There was something else I wanted to bring up. So we do have another question from the audience here.
So one thing I have noticed between the old business and the new is the data visibility beside the core financials and GPR. What is your take on sharing data within the corporation? It used to be if they don't need it, don't share it. Now it's if they can see it, let them. I think that's a very good point.
- Yeah, that's great.
- Let's start that off with the final segment here.
- All right.
- I mean it's my wife, who has a master's in traditional Chinese medicine and acupuncture, she goes we're talking about the vaccine and everything, people demanding to know if we've got the vaccine. She's like, what happened to HIPAA? Remember HIPAA? HIPAA said don't even ask me about it. What are you talking about?
- We were bragging about our vaccines online.
- I got vaccinated.
- Did you? Did you get it?
- I got it, and then I tweeted, and somebody, like some hater person said-- everybody says that these days but-- somebody was like, why are you publishing this?
- They threw out HIPAA with the liability--
- All right, here we go.
ANNOUNCER 5: Brought to you by Feeding America, 200 food banks strong, and the Ad Council.
ANNOUNCER 6: Social security is with you through life's journey, from birth to retirement.
- We start in about 40 seconds. Does make me a little concerned that the government is running radio ads about social security. It's going to be there folks, we promise.
- It's funny that we don't know what ads are being played between our segments, and on the internet as well there's very religious stuff and next to it is the condom ad or whatever. It doesn't go insult some cultures together, but they don't know that culture, and these are very contradictory.
- Yeah, the juxtaposition is a bit bizarre, isn't it? That's called the matrix.
ANNOUNCER 4: Dot gov, produced at US taxpayer expense.
RANDALL BOETTGER: Welcome back to Inside Analysis. Here's your host, Eric Kavanagh.
- All right, ladies and gentlemen. Back here on Inside Analysis, talking with Joe Gaska of GRAX and Sarbjeer Johal. And we had a great question come in from our virtual studio audience. And by the way folks, if you're listening to this driving down the highway and you're like, man, how do I get to be part of the virtual studio audience?
Just go to insideanalysis.com and you'll have the website listed there, the different shows. You can sign up for our newsletter and be informed of all the content coming your way, unfiltered. And the audience member writes, one thing I've noticed between the old business and the new is data visibility. Beside the core financials and GDPR, what is your take on sharing the data within the corporation?
It used to be if they don't need it, don't share it, and now it's if they can see it, let them. I think it's a really interesting point, I'll throw it over to Sarbjeet first to comment on, basically. But I think that's very poignant, I think that's exactly what's happening. And by and large, unless it's something that's sensitive, I think let people see the data because it helps you understand the business, the context, where things are going. But what do you think, Sarbjeet?
- Yeah I see this as a big trend, and rightly so. I think data is the backbone of your business and it always was, actually. I always say this, information is the backbone of making, in open economy especially, right? And as we usually say that open systems work better, and then we have to be open within the company as well. So you got to open up your data to different stakeholders so they can leverage that.
You don't want to hold the data where they can't get to it. And it was a huge problem, I think it still is a huge problem. But with the advent of like data leaks and all the toolings around the data leaks and our data fabric, whatever we call it, I think we are getting there. We have to democratize the data so that our developers can have at it.
The whole digital economy is based upon how much we can automate with the telemetry data, with the transactional data, having that sort of information to us is a huge plus. And if we hold that data-- hoard that data, not hold that data-- behind walls, I think developers and/or data scientists can go only so far.
I think we have to give that data, especially to the data scientists, so we can train newer models and we can automate to the nth degree. And data science is automation 2.0 or automation 3.0, whatever we call that. So without data, I think we can't get there. And then the tooling is improving, actually. The tooling is improving our access to data through cheaper computers, and brewing, and storage is also, price-wise, is going down, down, down.
I think the advent of GPU and how our spatial hardware innovation also is helping us a big time, actually. We can throw more GPU at machine learning and inference, that goes a long way. So you got to keep an eye on the convergence of technology, I keep saying that to all the leaders. You got to keep an eye on that, do not think the way you used to think two years back.
- Yeah, that's a good point. Things are changing very quickly, and it's hard to make too many long term investments in tools if you don't know what tomorrow brings. But still, you have to decide right? You can have your forward looking vision, but you've got to execute day to day and get things done. So I guess, Joe, the key is maybe from an organizational perspective-- I've often wondered about having a strategy group.
You know, I don't hear too much about this but you have of course a CDO, a CIO, they all tend to crystallize into certain subdomains of the business. And to me, one of the most important is the strategy, is the information strategy of the organization have dedicated people to thinking about that, looking at data assets they have, looking at the business objectives, and coming up with plans to bring people and technology together. To me it sounds like a lot of fun, frankly. So if someone wants to pay someone a lot of money to do that, give me a call. But Joe, take it away.
- No, it's-- so the one thing that it all starts with, the first thing before we start going down there is just taking ownership of all your data. Once you have all the data, it becomes a business decision after that fact, of who has access to it, where can they put it, do I anonymize the data? Who has access to which type?
First thing is taking ownership of the data. Not just, hey do I have a couple of CSVs that they produce for us, but taking all of your data, putting it in a place that you own forever, whether it's GRAX or whether it's someone else's. Take your data, put it in your cloud. But I 100% agree with what you're saying Eric, once the data is in the cloud, it is then your responsibility to make sure that you are prudent with who can access it, where they can put it, you understand what data is in it.
It is a complex thing that you have to make sure when you have data you are being a responsible steward for it. For example, like we mentioned earlier, most companies today, one terabyte of production data is replicated eight to nine times in an enterprise. Just think how scary that is when you start talking about data access, GDPR, CCPA, worm compliance, HIPAA.
- Sure. Yeah, it's off the charts. And the other thing too is I think the vision of data fabric is where you want to go. And we got about four minutes left of the live show here, but with the data fabric, if you have performance then you don't need to be creating all these copies. The reason people make a copy is because you can't get performance in the system where the data is held, right?
And so that's the problem. And I don't know when we're going to get there, but I think it's pretty soon. I think we're moving pretty swiftly to the place where you can have this data fabric for the enterprise, certain people have access to it, certain people don't, but if you have access to it, then you can just grab that live data-- and don't make your copy, you shouldn't even allow the downloading of a copy in some circumstances. But I think we're getting close to that, what do you think Sarbjeet?
- Yeah, I think the API-fication of systems is causing that us to get to the data, which is closer to where it's produced. And we have to also talk about the newer sort of form of computing, which is edge computing, where the data is produced, it's kept there, and we want to mine the data where it is.
So we don't want to haul all the data to the central places. I think you're spot on with your observations, that we have to stop copying that data over and over. And then, as you said, it was copied because it was not available. Now it's available through APIs. I think you've got to have the API-fication of your systems. I mean, there's no denial in that.
- Yeah. And ownership, real quick, one of the attendees thanked you for talking about ownership. Joe, we did a whole web series a couple of years ago on data ownership, I was curious to see how it would do. And it was very popular. So people get it, they understand.
Ownership is accountability. If you own the data, then you're accountable for the data. And you know the old joke, if no everyone's in charge, no one's in charge. If no one's in charge, everyone's in charge. So someone has to be in charge, and that's typically the CDO at the highest level these days.
But yeah, I think it's good because if you take ownership-- and Joe, closing thoughts from you here, got about two minutes before the podcast bonus segment. But if you take ownership, you take pride in the data and you pay attention. You pay attention and you get things fixed when they have to be fixed. If you think someone else is going to fix it, you don't worry about it, and then it doesn't get fixed, right?
- I think it's also-- if you think about it, taking ownership of your data is protecting your future. I have optionality in the future if I have all my data and I own it forever. You don't know what you're going to want to do with analytics, with AI, with machine learning. The one thing is you need that historical data, because that is the foundation of knowledge. And nobody should be leasing you access back to your data.
Putting the data in the cloud that you own, and making sure that the people, like GRAX, who are protecting the best interest, is basically making sure that our customers own it. If they shut us off, they still have all of their data. It's not even like your iPhone, like we've spoken about before that you shut off iCloud, and all of a sudden lose all of your music library. It's not like that, and it shouldn't ever have been like that.
So that's one of the key things is ownership accountability. You have to be a good steward of the data and making sure that you do have the internal methods to do that, whether it's a team or whether it's compliance and regulatory. It's a whole bigger discussion that you and I could spend a few more hours on it today.
- Yeah, the newer platforms actually give you knobs to control who is getting the data, how much data is going through. You could catch the bad guys. You can put the thresholds on, like these parties cannot get more than this data per day, or per hour. So I think you got to take advantage of that.
Earlier we did not open the historic data to the outside parties because we didn't have these knobs to control it, right? Somebody can steal all of our data. But now we have those knobs, now we have those controls, I think we got to open up our solid data for data science.
- Yeah, that's exactly right. Well folks, it's been a fantastic time talking to Joe Gaska of GRAX and Sarbjeet Johal as well. Podcast bonus segment is coming up next, send me an email, firstname.lastname@example.org. We'll talk to you next time.
All right guys. So for the podcast bonus, if you can stick around, just eight minutes, it goes by fast. Anything in particular you want to dive into? Joe is there anything we forgot to talk about?
- No, I think we've touched on a lot of things. I mean, I'm open. We can talk about historical data reuse, we can talk about any of those pieces of--
- Oh, you know what I wanted to do? Let's do this. Let's talk about how data sharing, and the whole issue around groups that really should work together, but historically haven't all that much. Like security with compliance, with governance, I mean it's all the same thing really. So why don't you dovetail? I think we'll see some developments around that space. So give me 10 seconds, I'll start and we'll just do, like I said, eight minutes.
All right, folks. Time for the podcast bonus segment here on Inside Analysis, talking all things cloud data. And I had a little epiphany there at the end of the show, as Joe was talking I think, about sharing of data and group collaboration around key issues like governance, like security, like compliance.
I can tell you in a lot of big organizations, you'll have a security team that is completely separate from the compliance team, that is completely separate from the data governance team, when these folks are all working on the same thing. It's the same challenge. And you do have certain control points in terms of who's legally responsible in this situation versus that situation, but all in all these folks should be working together.
And I do think that just this movement we see towards data fabric and towards opening the kimono, if you will, and letting more people see information that is not personally identifiable or sensitive in some significant way, I think that's going to be very useful for getting these groups to work together going forward instead of being their organizational silos. But Joe, I'll throw it over to you first. What do you think about that? Am I wishful thinking again?
- So there's a whole evolution of the way the world used to be when IT was just a call center. No one liked to spend money on IT. And then a lot of the tools that are coming that we're talking about with AWS, with Azure, and the compression, if you think of the old OSI model, for the techies out there.
- So the bottom now is really about where the raw storage is, and where that's happening. That is in the cloud. And then empowering these access control to data, all of those things that we were talking about earlier just before the break was just throttling access. When do you have data that times out? All of those features of the native platform itself is no longer a burden to address, and being able to reuse that data.
So what I'm really getting to is now that IT is actually empowering people with a lot of these tools, you're seeing a lot of cross pollination between audit, compliance, regulatory, backup, all those pieces. Because now you can fill those obligations without being a burden, and they're really starting to really look up and say, how can we do that quickly? And a lot of the technologies out there are amazing.
- Yeah and there's a good question, I think you might have been speaking to it. And if you weren't, then it was serendipitous and the matrix is real. One of the attendee's is writing, could you talk about policy based access controls as it relates to backup? And here we go. Policy-- we talk about this all the time, but policy five years ago was wishful thinking in a manual on a shelf somewhere. You had your policy, and you hoped to God that someone actually read it and then adhered to it.
But now you really can bake policies into the systems that run these businesses, certainly on the information side. And there's this whole concept of dynamic policy management, companies like [? Okira, ?] [? Amuda, ?] there's a whole bunch, [? Primasera, ?] there's a whole bunch of them out there that are basically recognizing you can no longer control this situation manually.
You can no longer have Bob in IT be the guy who flips the switch on and off when somebody needs something. You have to have more of a dynamic approach. So that's changing, it's changing very rapidly but I think you were just kind of speaking to that Joe. But quickly just comment on policy based access controls for backup, that's where you want to go right?
- I mean backup is nothing more than data. Your data policies, backup is just where it originated from. You are taking ownership of the data that's stored within your cloud, and by utilizing native technology and why we built it this way so you can capitalize on all the tools that you were speaking about earlier. So this isn't just doing things differently, it's doing things the same and putting data where all data can be reused, and not treating backup as if it's a second class citizen.
The data is the data that's now history, can be reconsumed and used downstream. That's really why we built GRAX the way we did, and why we wanted to. Because we didn't want to build all those policy based controls when, guess what? You already have them with a lot of the features and functionality that exists within AWS.
- Yeah, that's a good point. And Sarbjeet I'll throw it over to you, a last curveball perhaps. But as I look at what's happening in the marketplace, you look at companies like Snowflake with this tremendous IPO, you look at all these data driven companies that are having tremendous success right now. Well, there are a lot of companies that are not having tremendous success, and they're either collapsing or they're disappearing.
And it's largely because they're not leveraging data at scale. And to me this whole concept of sharing data and opening up these environments is really in a way just in time, because I think businesses are going to have to take a very-- especially big companies-- are going to have to take a very hard look at their personnel. What are they doing? Who are they interacting with? What value they driving for the company today?
You look at some of these companies that have 10, 20, 30,000 employees, well, you know. The combination of age, and automation, and data can optimize many of those jobs, and you won't need as many people. So you either put those people to work in new ways or you let them go.
And I'm always thinking of using it in new ways. But to me this sort of revolution in data access and use is coming just in time, because companies-- especially with a lot of employees-- need to really figure out how they're going to automate, how they're going to optimize their operations, and how they're going to survive if not thrive in the new world. Right Sarbjeet?
- I think so. I think that we talk about digital transformation all the time. I think digital transformation-- you can't do that if you don't think to see the old problem through new lenses. So what are the new tools available to us, and how we can leverage those tools, and what new business models that data can enable. That is the key there.
Actually, there's a book I always recommend in these kind of talks, actually most techies and people who are managing a business people, they should read this book called Ten Types of Innovation. And most of the time, the people who handle the data, they are data scientists or they're developers, or they're application ninjas, if you will. I was one of them for many years and years, like a couple of decades. We actually tend to have this blind spot that the data can be used in a certain way, and that's the only way to use the data.
I think you can build a newer business models based upon your historic data, your current data, and then you can extrapolate from that where you need to go. So you can apply a lot of econometrics, economics principles, and just come up with, for example, price elasticity of product one, versus service one, versus service two. You can do those kind analysis like you can do A/B testing on your business models.
And today's tooling to raise agility in cloud, you can do a lot more experimentation, gives you an ability to innovate at a faster pace. And I usually say this, that the companies which are doing great versus the companies which are doing OK or not great, the difference is that they can do experimentation at a higher pace. The velocity is higher, they do a lot more experimentation and the feedback loops are well oiled. So I think that it all boils down to how you leverage your assets. And coming back to the data again, I sound like a broken record, data is one of your pristine assets.
- That's exactly right. Well right on time too, folks. Send me an email, email@example.com. You've been listening to Joe Gaska of GRAX and Sarbjeet Johal. You can follow Sarbjeet on Twitter, that's how his name is spelled, S-A-R-B-J-E-E-T J-O-H-A-L. And he's on Clubhouse too, I got to get on Clubhouse. Talk to you next time, folks.
- You can have fun there.
- Later. Bye, bye.