AI at Scale

Una Shortt: Trusted data revolutionizes business success

Una Shortt, Chief Data Officer, SVP Data & Performance at Schneider Electric, shares the story about trusted data being the new currency for business. She explains why there is a need to collect and process high-quality data, why not all data holds the same worth, and finally why the type of data is becoming a key differentiator for companies.

Duration:: 28m
Broadcast on:: 08 Jul 2024
Audio Format:: mp3

We really need to understand not just that data is the new currency it's good data that's the new currency not all data is actually equal and really the type of data a company has is the differentiation. So here at Schneider we always say we work to have trusted data at scale. Welcome to the AI and scale podcast. This is a show that invites AI practitioners and AI experts to share their experiences, challenges and AI success stories. These conversations will provide answers to your questions. How do I implement AI successfully and sustainably? How do I make a real impact with AI? Our podcast features real AI solutions and innovations. All of them ready for you to harness and offer a sneak peek into the future. Hi, I'm Gosh Agulska and I'm the host of the Schneider Electric AI at scale podcast. I'm pleased to introduce Una Short, Chief Data Officer of Schneider Electric, Una joined Schneider in 24 and has enjoyed the dynamic career in software and data engineering and data architecture and governance. Welcome, Una. Thank you, Gosh. It's a pleasure to be here today. Yes, and we have quite a topic to discuss because in previous episodes we were really focusing on AI, on application, on strategy, on some practical examples. But today with you, I would really like to tackle the topic of data and how important it is for all the applications. And it's pretty interesting to see how different metaphors were coming along with data. So we've been hearing for years that data is a new oil, it's gold, it's king. I recently even heard that data can be compared to water because data is an essential resource that should be of high quality and travel securely from the source system to the consumer. So first question to you, how does the world based on the new currency called data look like now? You know, Gosh, it's so exciting for me because I've been in the data industry now for over 20 years. And for the first 15 years, I can tell you it was always a struggle because you always had to convince people of the why, why they needed to manage data in a certain way. And even to explain what data was because a lot of people didn't realize, naturally enough, that they were all working with data. But now, you know, we've always known in the data industry that it was the critical backbone to business success. Now everybody is understanding data more because what they're touching every day generates data, whether it's their phone, they're creating a WhatsApp message or an email, or they're on their smart TV, and they're getting predictions and recommendations from Netflix. It's all related to the data. And people are now understanding what data is. And when we really think of it being, you know, what the world looks like with this new currency, or as you said before, it was data is like oil, right? And really, one of the challenges we have of the many challenges is how we really collect and process the data, because we really need to understand not just that data is the new currency. It's good data. That's the new currency. Not all data is actually equal. And really the type of data a company has is the differentiator. So here at Schneider, we always say we work to have trusted data at scale. We commit to all our stakeholders that this is what we work to do every day, because it's with the data. We can do that better analysis. So for us, the new currency, absolutely is data and always has been data for us. But naturally now, it's easier to influence and convince people why they need to be on that journey with us because it's not a data organization that bring trusted data to the company. It's actually everybody who handles data in a company. Yes, exactly. So you already mentioned one of the challenges. So ensuring that we have high quality data and overall, having a data, maybe that's focused on this for a minute, does it mean that a company today that has access to data? Like they are collecting data from their own operations or the operations of their customers? Is it putting them in a better position on the market? Absolutely. So what I would say, obviously the more data you have, the more insights you can bring, but also the more data you have, it can be a lot more complex to navigate that data. So it only brings an advantage if you know how to manage your data. If you know how to read those data sets, if you have the teams who can transform those data sets into valuable business insights, if you know how to sustain the quality of the data sets, of course, you also need to understand that the data you receive can be sensitive of personal data, and we also need to manage that quite closely. So the value it brings is based on what you're able to do with the data. And you know, every data set, I often say that data is part of the insights value chain. Of course, AI is in those final miles of that value chain really bringing the magical aspect of what we need and innovation to the data, giving us those predictive insights on our data. But to get to that point of our data journey, we really need to have very strong data management capabilities in a company. So yes, Gosh, it can bring so much. However, it can only bring so much if you work with it. It's not just going to come naturally. Data by itself in a lake can be a very, very dirty lake, unless we're willing to manage it, structure it and control it. Yeah, so it's definitely good to have it. It can be a value in itself for a company to have access to this data. But as you mentioned, there are some additional requirements in order to really be able to capitalize on the data. You mentioned, for instance, data privacy. So it's good to have the data, but then you need to be responsible for it the way that you treat the data. And I actually was recently coming back from Hanover Messe, and I was surprised at the airport. It was actually said privacy of the data on the screen. And it was showing how you are going through the security check. And they do a scan of you. So where this data is going to and what kind of data exactly it is. So I was really surprised in a positive way that they are actually giving this information. It's available to you. You are informed how this data is captured and what they will do with it. But in a more business context, and especially for Schneider and other companies, how do you see the importance of data privacy and how companies can ensure this? You know, I guess having worked in data for so long, Gosh, one of the aspects for me. I'm alert always the data privacy. I was in an international airport recently. And to use the Wi-Fi in the airport, you had to scan your passport. Certainly, I didn't do it. You know, I did not want to just scan my passport into a system that I had no awareness of where the data was going just to use the Wi-Fi. I decided to be Wi-Fi free for the time I was in the airport because there's really no ifs or buts when it comes to data privacy and protection. Okay, so this is really something that has become more prevalent for us in the last five or six, seven years as well. Previous to that, I think there wasn't the amount of awareness as there is now in more recent years about it. Any breaches with privacy or protection can lead to reputational damage for a company. And that reputational damage impacts trust, right? So we all take it incredibly, incredibly seriously. And I would be foolish to say that we have it nailed. I don't think anyone has it nailed. You don't know what you don't know, right? So no matter how good you may be, there may always be a surprise around the corner. But when there's global and local regulation at play, we really all need to target perfection in this area because we make commitments to all those who work with us, whether there are employees or customer suppliers or even investors that we manage closely both personal and sensitive data. And while I have stated some difficult aspects there that we take care of, we need to understand that there's many pluses to this regulation because when there is privacy and protection regulation, what does it make us do? So it really drives us to understand what we're doing with our data, what data do we have, how are we accessing and sharing it, for example. So while it comes with that extra effort, it gives us a great insight to the data we have, which also then can lead to absolutely great innovative use cases we can do with that data downstream. So the positive aspect of not only being able to protect and commit to people that were taking their data seriously and were protecting their data is also that we get a very good handle of the data we have, because we have to for that regulatory aspect. Yes, and that regulatory aspect is really important for applications, right, because you not only use the data that you have in order to create models to feed the models to create the applications. But in one of our previous conversations, you once mentioned that actually there is a whole new space of data management data that will come from this applications. So it's not only what you fed to the system to the model, but also now the new data coming out and it made me think of other examples as well about health monitoring, right, because the model was trained on some data. Maybe you have a watch or on your mobile, you have an application that is supporting you to monitor your health, your sleep and so on. And you can get some recommendations to write your goals if you want to be, I don't know, more healthy, you want to sleep well, you will get a recommendation how to do it. But then the company who is providing this application, they also collect the data out of how you are using this application. And now is the regulation actually giving some recommendations and guardrails about how this company can use the data coming from our usage of the application. You know, I think we've all noticed in our personal usage online and true apps that there's more and more acknowledgments consent required from us as we access apps and apps are regularly giving us a review of our consent. What have we consented to be shared? We all need to take a lot of care when we look at that. In fact, because we are doing exactly what you described, we're given permission to those companies who are gathering our data, whether it's our fitness app or whatever it may be to utilize that data in their other aspects. But of course, we expect when they do that, even if we give them access, because look, at some point, they need data to develop and grow these capabilities that are coming to us. So if we want the world to evolve, we need to somehow give this data so it can be analyzed. However, we expect that data to be anonymized, right? We expect that no one's going to know if I go for a run this morning at 7am where I'm running and that that could be my regular routine, right? We want to know that there are data's anonymized that nobody actually can connect that data back to a specific individual. And this is where the regulation really helps. So when we think of GDPR in Europe or US regulations such as HIPAA or CCPA in California specifically, there's no negotiation with these regulations. We're expected to secure privacy and personal data protection and because of that, we're expected to anonymize data. Now, anonymization is very interesting because it's just like AI algorithms that can get stale over time or start to hallucinate. Anonymization as well, over time, there can be threat actors, hackers who can ultimately infer how to de-anonymize the data that we have anonymized. So we constantly need to revisit those algorithms that we use to anonymize data to make sure that the personal information, the sensitive information cannot be re-identified. So it's really important for us not just to anonymize that data in the first place but to also assure that it cannot be re-identified. But as I said, if we don't provide this data, there's also an impact that we won't evolve with the capabilities. For example, if you can get a lot of data, of course, on a certain illness, you can start to try to understand how that illness can be triggered or occurred and that can help everybody in the world, right? So there's this balancing act of giving our data, but as we give our data that this trust in return, that that data is anonymized and we can never be re-identified through that data. Yeah, so there has to be a compromise between the data sharing, but then the data privacy as well. So it's good to share the data, but it's good to ensure that you actually can get this data privacy for your data, the data will be anonymized. That's really an interesting perspective, especially as you mentioned for healthcare, but also, I guess for industry customers, it could be also very important when you are using connected products, like in case of Schneider. And we are definitely a custodian of the data of our customers, right? So we ensure this data privacy for them and proper governance. And one thing I would like to expand with you, when I was reading more about your profile, you mentioned that in the past, you were responsible in Schneider for creating something called data quality routines. It really sounded interesting to me. Could you tell us more about it? In fact, it wasn't in Schneider Goshib, believe it or not, it was 26 years ago in my internship from university when I was on internship in quite a famous telecom provider. And I was writing data quality routines and low-level code for the manufacturing floor, actually, for the line. And it would then get embedded in the products that we were producing, which were pagers at the time, actually. So pagers were still very popular for those who are listening today who remember pagers. And when I was writing those data quality routines, it was all very manual. And for us who are low-level coders, we absolutely love writing code, okay? But this is something that we need to be aware of that should be a lot more automated. And I'm really expecting and hopeful that with the transformation in the AI space that will significantly grow in this area. So, as I say, 26 years ago, I was writing low-level code. My teams can still be writing some low-level code. Yes, they may be using out-of-the-box applications and toolkits, but they're still having to write the code. What I would love to see in data quality, and I really would love to see AI bring this to us, is an inference of what the data quality rules should be. Okay, so rather than also having to ask the business, how can we keep this data clean? What do you expect to see in this data? What is the rule? And the business telling us, well, it needs to be a plus b equals c. My vision would be that, well, why isn't there a data quality tool that goes in and looks at the data, looks at the patterns of the data. You know, makes decisions on what could be good data in those patterns. Maybe there's a human in the loop, for sure, validating those decisions that, over time, the machine learning algorithm then goes, okay, I can infer that these data quality rules would apply. Because this is one of the stumbling blocks to data quality today is the amount of business knowledge you need to actually create those data quality rules. And then, of course, there's the coding aspect. Now, one thing I would say about that is data quality is very important, but when we look at the insights value chain, the decision chain is only as strong as its weakest linked. Okay, and I always, always put emphasis on the business process. Okay, so business process comes before data quality. If we don't have very clearly defined, clearly owned business processes, it can be super difficult to assure data quality, because what are we ensuring the quality on? We don't have any reference as to what it should be. We're making up the rules, right? So really, we need to also focus not just on the quality of data, but actually having quality controlled business processes. And of course, we see that day-to-day in Schneider, where we're executing this as we introduce new products, having quality end-to-end and quality by design everywhere will naturally then infiltrate to have data quality by design. Yeah, so definitely judging from what you said, data is everyone's business, and it's really important to find a way to collaborate between data scientists, data experts, and the business, because both sides are bringing a lot into this common understanding what is the quality data? We come to the second part of the conversation when I wanted to ask you more about the collaboration. So interestingly in Schneider, we have a separate position of Chief Data Officer and Chief AI Officer, and I was curious to know how do you share responsibilities and what is really key to a successful implementation of AI from this collaboration perspective. Okay, wonderful question. I think a really great question. And even though we have these two unique roles, Chief Data Officer and Chief AI Officer, we work as one team because we know AI at scale requires data at scale, and that's kind of our mantra internally at Schneider. You know, I think what's really important to know about our company is our revenue or turnover has quadrupled in 20 years, and really we've evolved significantly now. We know from our portfolio of product solutions, services and software, for example, that all of that brings an evolution in the volume of data we have. So when the AI team and the AI Hub come to implement a new innovation, they need to have the data ready already. If we say, if they have an innovation, a new idea, and they are thinking of implementing it, and suddenly they find that either the data doesn't exist, that can happen. Okay, someone can have an innovation, but maybe the data was never digitized. It could be in a spreadsheet summer, but not digitized, so not trustworthy, or maybe the data quality is poor, or maybe there's no owner of the data. So when they have questions about why the data is as it is, they can't actually evolve their innovation. So we work really hand in hand to assure on my side as the group chief data officer, working with a network of business data officers we have across the company. We have 32, 32 business data officers. We work on the data management and governing the data management in the company. And this is perfect then for the chief AI office, because the chief AI office can trust if we give a stamp to date and say, yes, this data is ready for you to work with from an AI perspective. They know that that initial percentage of effort that's required for data readiness, data quality, data assurance is secure for them. So they can really focus with their specialist skills to actually implement the AI evolutions of the data. Okay, so very, very interesting for us. We work hand in hand, we know we don't have any data without AI, and I think it's a fantastic relationship. It's a very important relationship, because AI, in fact, is a huge effort in itself, and I really do think the distinction of the roles has been a very smart move by our company. I think there's a lot to do in data to prepare for AI and AI should stay focused and need to stay focused on the AI component. In fact, if we'd won chief data and AI officer, the person would be fully consumed with data readiness and they never really get to the use cases. Okay, so really, really, I would say an advanced view of how things should work for us and very successfully working for us at cyber electric. Yeah, so it's definitely a very serious commitment in the company to have that many data officers and really the strategy to organize the data and then allow AI teams to work on it. But I can imagine that there are smaller or medium-sized companies in which it's not possible to have these two separate offices. So in case somebody in a company has to do both jobs, would you have some recommendations for them, what's the key priority, where to start? When I think of that, I think that, okay, I put myself in their shoes, that's the first thing, and imagine if I was a data and AI officer in a smaller company. And knowing, you know, in my mind, we used to say 80/20, maybe 60/40 now, we say 80% of AI is data preparation, right? Data readiness is probably getting more to 60/40 now because AI is really ramping up and accelerating. So I would think of that if I was a chief data and AI officer in a smaller company, I would put, you know, I'm probably out of a small team as well. Okay, so it's not just a smaller company, of course, a smaller group of resources. So I would put a certain amount of resources preparing data. And I would be very smart. Like we are two in Schneider, smart at the use cases I would implement for AI. I would pick those use cases that I knew were ready to implement. No matter how excited my business partners were on an AI use case, you know, that's our job to say, well, actually, okay, excellent use case, however, with the data readiness, it's better that we focus our time elsewhere. When we think of the distribution of effort, again, I would have probably a little smaller AI team than I would have focusing on the data. And I would assure that I'd really have top experts in that AI team. One aspect that we can't miss and will be very important for that type of role is that they absolutely have business support. So it's still a challenge today. And I studied computer science, computer engineering is the communication between engineers and their business representative or business owner. And there's still a gap with the business speak with the technical team understanding them with the technical team speaking and the business team understanding them. So there's certain resources we need that do not fit the exact data or the AI build. They're not a data and analytics engineer. They're not machines machine learning or data scientists, but there's somebody we need on our team and we always need them on their team, the translators between the business and our experts in these capabilities and capacities of data and AI. So not to forget that skill set in your team too. It's not all about the engineers on either data or AI or data scientists. It's also about bridging that communication gap being able to influence and convince the business and understand what they're asking us to do because at the end of the day, we are here for the business. Yes, and actually you bring a very important topic about the skills that are required for an AI extensive word in the future. So communication becomes key. Maybe exactly some of the soft skills that are needed to marry this to world between the data and the business about how we can actually come up with some innovations that are dependent on AI technology, but then really reply to a specific value for our customers. So would you have any advice for anybody who is preparing, maybe for the role in the application development teams? What would be the skills that are important? How to prepare? You know, I love the question because Gosha, when we think of the demand right now for these types of skills, you know, somebody exiting university, you may kind of think I'm invincible. Everybody needs this skill. I can go wherever I want. You know, 20 years ago, when I first came to the company, when I first came to Schneider Electric, at that point, a lot of people were using Excel, right? They really felt that they didn't need you because they already had their data. They already could apply their business rules in Excel and all you were doing in their minds was coming basically with a cost. Okay, this is going to cost you X and I'm going to scale your data, but nobody really understood at the time what scaling data was, except for small pockets of excellence around the company. So what I realized pretty quickly, so coming as an engineer and a low level coder, I quickly realized that the world of data takes a lot of influencing and convincing, absolutely. And it also takes a lot of project management work, okay, being able to be structured in a very systematic way to deliver outcomes and results that the business see tangible results that our business partners can see, which engages them into the data story. Okay, so influencing and convincing for me are very particular soft skills that are needed for all data companies is and something really we need to educate people. And I was only mentioning it to somebody yesterday as well in the organization, you know, let's look at influencing and convincing for all our teams. And I do have to mention that one about the project management. I see fantastic engineers over the years, however, structuring their work and coming out with the final outcome can be difficult, it can be a struggle. Now, maybe they themselves don't want to be a project manager, but I do think the skills of project management and here I'm talking old school project management waterfall, you know, you start and you finish, okay, there's a start and an end date. Of course, agile is cool, agile is great, and but agile is constantly, constantly incrementing, it could be a little bit harder to get to the field of finish line in agile. Whereas if you do systematic project management waterfall type of project management style in your work day to day no matter who you're working with, they know that you're going to come with a specific deliverable at a specific end date and they're going to have an outcome, of course. Okay, and we can do that in agile with the product increments of course as well. So for me, I think that an engineer fresh out of university, I would say, ask your company to invest in some influence and convincing training for you. There's many courses available on a variety of digital sites now. Also look at stakeholder management, it's critical, we have to be able to talk to stakeholders. And then, of course, they added edge, the golden edge is an engineer who can structure their work. So some project management, Scrum Management, Scrum Master Training will really add a massive value and give you the differentiation out in the market. Yes, I think these are really valuable recommendations, Rona, thank you for this. I'm really sorry, but we are actually at the time to finish our conversation. I would love to continue. So maybe to wrap up our episode, Rona, when thinking about the companies that are preparing to use AI to invest in AI this year, we are hearing that this will be the year of AI practice. What would be your recommendation for them in terms of data management? Maybe two or three golden rules for data management so you can then be ready for AI. Okay, so I would be super excited when we hear that and it's the same story of course in Schneider, our strong investment in AI because it means as a chief data officer, I know there's going to be a strong investment in data. So it's super exciting from a data management perspective. We do have four data golden rules at Schneider and this applies to everything we do in data and I'll very quickly go through them for you. Gotcha, golden rule zero is all about data privacy and protection, okay, securing for those local and global regulations and it's called zero because it's so critical. Golden rule one is about where you're sourcing your data. We call it sources of authority or authority of sources. So you should really assure that wherever you're taking your data for your upstream and AI implementations is from a trusted source. Golden rule two is about the very structure, the definition of our company, what we call our referentials. So everyone knows you have master and transactional data. Well, in addition, you've referential data which defines the structure of your company. That could be your organization structure. It could be a view of your profit centers, cost centers, legal entities. It's very specific to your company. If we don't have a common language with those, well, then people are going to be implementing AI capabilities with different outcomes. So make sure you have a referential in your company with 16 different referentials in Schneider, 16, basically data sets that are defined as the structure around the company. And then the final golden rule is golden rule number three. Of course, it's golden rule four, but we put that zero on data privacy. A golden rule number three is all about data consumption and assuring that where the data flows in this case into the AI is coming from a trusted source of consumption, which for us is our enterprise data lake. In summary, four gold rules, maybe not two or three. We have four, but really, really great standards and something that everyone across the company can actually understand. So very tangible for people. That's really amazing, Luna. Thank you for this. I was really happy to discuss with you today. I know that you are actually preparing for an event that is happening in the second part of the year where you will be discussing the collaboration with chief AI officer. So please follow Luna. You can see her and hear her during different events sharing really great knowledge, great recommendations. Thank you enough for being with us today. It was a pleasure to be invited. Thank you, Gosha. Thank you. Thanks for joining us today on AI at scale podcast. Be sure to visit our sc.com/ai website and learn more about our AI at scale solutions. Head over to our Schneider blog platform to read more. Don't forget to subscribe to the show on your preferred platform and share it with your network. Thank you for listening and stay tuned for the next episode. (upbeat music) [Music]