Introduction - Rola Shaar interview
This post is part of our 6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.
This interview is available in text and as an audio recording (embedded here in the post, and later in our 6P external podcasts).
Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for reference.
🗣️ Interview - Rola Shaar
I’m delighted to welcome Rola Shaar as our next guest for “AI, Software, and Wetware”. Rola, thank you so much for joining me today! Please tell us about yourself, who you are, and what you do.
Awesome. Thank you so much, Karen, and I'm really happy to be here.
Just to quickly introduce myself, I currently am a director of R&D at Nanometrics. We specialize in the study of human-made and natural seismic activity. Most of my career, I've been focused on analytics and data modeling and knowing everything about data. So that's kind of my core competency, whether it has to do with seismic activities, or historical data, or retail, or different types of industries. That's been my key focus.
In the past 5 to 8 years, I've shifted to look at analytics from the lens of how AI can actually participate and help us expedite some of our analytics, and removing, mostly at first, technical barriers - really, how does it help us get to analysis and insights a little bit faster?
So that's a little bit about me.
That sounds like a really impressive background and some really cool work that you're doing there.
Thank you.
What is your level of experience with AI, machine learning, and analytics – professionally and personally? It sounds like you've studied the technologies and you've built tools using them.
Yes. I would say about a decade ago, things started to get a little bit interesting around the AI and ML. I did some self-taught courses and some online training. But mostly, my initial real experience is through a professional experience where we were really looking to improve our user experience around data analytics, and really lowering the technical barriers for people in terms of how they can look at data, how they can analyze data.
And that's kinda where I started to work with some very smart colleagues of mine around, “Okay, how can we take this very interesting technology, and how can we use that to improve some of these areas from a product perspective and really enhance our customer experience?”
So that's initially where I started. We really built a whole user experience around revitalizing how people experience our products through natural language interface. So the ability really to ask natural questions and be able to get an analysis through that. So that's my initial professional experience.
From a day-to-day perspective, obviously, we probably use AI a lot more than we think. Some of the more intentional uses, I would say, for me was really around documentation - I don't use coding every day - understanding or using AI technologies that help provide for syntax prompts - you know, “I can't remember what syntax you use here.” I don't use Python or Java every day. So just being able to get some suggestions of how you would do that and getting the right syntax is extremely helpful.
As well as legal documentation – I don't know if most people know SRED1, but writing SRED documents for tax purposes – being able to get the right wording and the format, and make sure that it makes sense. I think it's fairly helpful.
So these are some of the aspects that I've applied AI around.
SRED is the Canadian R&D tax credit, is that right?
Yes. Part of being an R&D organization, one of the biggest things that we love and we dread is, you know, being able to document key innovation that we've done. And it needs a certain level of formatting, which we all struggle with sometimes going, “Okay, how do I make sure that I'm writing this that reflects the work, but also put it in a context that it's clear?” There's a lot of formatting that needs to go into it in inclinations2. And a lot of the AI technologies certainly help you create that format and help you even if you do write it, you could do it a little bit better here or there. So it is fairly helpful because, again, we do it once a year every year. But every year, we struggle to do it. What did I do last year? So it can be a reminder - here's all the key things you need to hit on. So it's fairly helpful.
It’s an interesting application. In my last role, I had to do that twice. And you're right. We did it the 1st year, and then the 2nd year, it was, “What do we do?“
Yes. I know! It's like, but you did it last year. Yeah, but that's like a year ago. What did we actually do, and how did we think about formatting? And you go back and you read some of the things that you wrote. I'm like, “Oh, that's pretty good, but I don't know how to write this year's content.”
It's fairly helpful to formulate or reformat some of your thoughts and point things out, which makes things a little bit faster, which is great!
So that's a lot about what you've done with AI professionally. I'd like to hear a little bit if you have any experiences with using AI outside of your work. And like you said, it's almost impossible to avoid nowadays in some ways.
Yes. In terms of my day-to-day life, I do a lot of experimentation at home, just some of the things - knowing what it can do, what it can't do. But I try to, a lot of times, do it the old-fashioned way and kind of disconnect from technology, when I'm not working or learning about something specific with technology.
I think one thing I've used AI just recently is around organizing recipes. So there are some toolings that help you organize if you have ingredients and suggest recipes and things like that. I do use it for creative recipe management.
But I try not to dwell on some of these applications on a day-to-day basis. I'm sure I rely on them when I'm driving, when I'm using my phone for everything without even thinking about that I'm actually using it, probably unintentionally.
Or probably when I log in to Netflix and it recommends what I should be watching today. That's probably unintentional use of AI. But other than the recipe thing, I can't really think of anything intentional that I went out of my way to actually use it for.
Yeah. It's funny you mentioned the recipes. I just saw an ad on TV the other day. Someone was advertising a phone, and they showed taking a picture of the ingredients and asking for a recipe that would use those ingredients.
Yeah. Yeah. So that's kinda cool. It's kinda funky. It does interesting things. Sometimes you're running out of ideas, and you look in the fridge and “Okay, I got 2 tomatoes and a couple of these things. Okay, what can I make?” It's quite nice. This can be helpful.
I think there's a couple of other things that we probably as well use, like, “Tell me where I can get the cheapest prices for groceries”. I know there's little apps like that, as well as gas. These are kind of boring things.
One of the gas applications that I've used down here will try to predict whether gas prices are going up and down. In other words, should you fill up today, or should you wait a couple of days?
Yes. Yeah. Actually, that's a forecasting capability. I thought that was quite interesting too, based on trends.
I know some of my colleagues have used some - market and when to buy stocks, forecasting for when to sell, and things like that. I'm not as brave as they are, but I've seen some apps that also help you understand a little bit about some of the stocks that you own, when did they hit their high, with doing some cool trending and some forecasting, based on what you have in your portfolio. So I thought that was kinda neat. I didn't use it myself, but I did see some colleagues use it. So that was kind of interesting.
I think there's a lot of things that it can do - that we use it, but don't remember that there's some AI technology probably driving some of these capabilities. We probably kind of threw some of that out. We're so used to it now. These days, we don't stop and say, “hey, there's some AI here”. There's a lot of assumptions that there is something like that behind it, which is kind of interesting.
Yeah. And part of that, I think, comes back to people not necessarily knowing what all is encompassed by “AI” and some of the old fashioned data science methods or machine learning models. There's a tendency, I think, to think that AI either means AGI, the Artificial General Intelligence, or it just means the generative AI tools that everybody talks about nowadays. “AI” is really much broader than that.
Yes. Yes. That's actually a really, really good point. A lot of people use AI as ML interchangeably. They're not really the same things, but I think it's a term that people apply in a more general way. But there's a lot of different applications and different types of technologies that you use for different use cases as well. That's a really good point.
Yeah. I want to go back for a minute to your work experience, because you've done some really neat things there. Can you share a specific story on how you and your team used AI or machine learning? And what are your thoughts about the AI features of those tools, how well they worked for you or didn't?
Okay. Yeah. Absolutely.
I think when we initially started quite a number of years ago, I think there was a lot of apprehension around, will it even work for what we're trying to do? One of the ideas was - we wanted to make or give the ability for your data to really talk to you. So rather than typing SQL statements, can you actually ask it a question for it to respond back?
So as you can imagine, there's a lot of different tiers to go from understanding the data as a human being, right, and understanding those relationships, understanding what the data means, to having the machine interpret your words, and apply context, and understand how that maps to the data and how the data itself are related, as well, just interpreting those relationships.
A lot of the technology was, I guess, the first one is understanding language models and how you interpret language models, and some of the early works around how well it worked and how well it didn't work.
So there was a lot of trial and error in terms of, can we generate a model that actually understood what we meant with the questions, specifically around the analytic industry, right? It had the right connotations and the context for the questions that we're asking it. And how do you train a model that actually understood that?
So there was a lot of issues around, how would we produce something like that? And then one of the more fundamental problems is that we needed to have data to actually create such a model. So we had to actually sit down and type out all of the different types of questions people would normally ask in an analytic type of context, because these types of things were not readily available.
So you had to spend a lot of time picking the right tools, understanding what are the different pieces that you needed to build, and it was like a pipeline, right?
You start in one place. You start to make it analyze and understand questions. It starts to map it to the right data understanding. There was a lot of different pieces that we needed to put in place to have it go end-to-end. So there was a lot of misunderstandings for how quick something would happen or how fast. I think initially, there was a lot of skepticism. Can we even do that?
But after various trials, and working with some really brilliant people, the various parts of the problem were broken down into fairly pragmatic pieces that we could action, that were strung together. And then we managed to get an end-to-end scenario working.
Initially, when we got it working, it gave us fairly decent answers. But, again, it wasn't perfect. So it took many, many months to iterate through that and understand, okay. Now it understands basic questions - how to make it understand [the] more complicated? Little things, like understanding “less than” or “greater than”. There was a lot of variations that it needed to understand, and not just basics.
It's kind of interesting because as human beings, that's pretty straightforward, right? It's simple. It's “greater than”, “less than”. Like, what's the big deal? But where the “greater than” or “less than” appeared confused the system. And so there was a lot of different iterations that you had to go through that seemed fairly simple, but they weren't really simple.
So it was very interesting in terms of how things evolved. But once we got the basic pipeline of the system in place, improving it, really focusing on various parts and adding different complexities to it became much, much simpler.
Getting things started, I would say, was the toughest part, and then making it available to business users and getting it out into product was probably the other part that probably took longer than usual. Because you had to figure out, how do I fit a really large language model in the product build pipeline, where sizes mattered in terms of, how do I bring in all these dependencies?
There was a lot of challenges to getting the initial setup and the initial pieces working. But once that was in place, the evolution of the actual capability happened at a much faster rate, if that makes sense.
Yeah. It does. Yeah. I know a lot of people struggle with how to visualize data, and it sounds like you found a very practical way to help streamline that work for people who aren't technical and don't really want to, or know how to, or want to learn how to, write SQL.
Yeah. So one of the most interesting parts of this project was really understanding people's challenges. They didn't even know where the data was, or what data to use, what things were in it.
Because all they really wanted to focus on was, “Okay, tell me how my product is doing today versus two months ago, or same time last year.” These are really their goals that they're trying to get out of the tooling.
And so being able to get out of their way and produce answers for them in a fairly easy way was, I think, the biggest breakthrough. It's not really around the advanced analytics, but just making it so that every user is able to just focus on what they needed to do, and not have to worry about, ”Okay, well, where's that data source again? And what was it called? And how do I create a connection?” It's just removing a lot of these day-to-day things. Like, “What port was that thing on? How did I make a connection to it? Do I have to call IT now to create a new one for me?” and things like that.
Just having some of these technical configurations, these small things that really slow people down – being able to just say, ”This is what I'm trying to do”, and for it to help at least – maybe not resolve all your problems; maybe in some cases, you still need another human to help you -- but at least say, “Here's what you need to do”, to walk you through the steps and guide you along the way. Or give you an answer if there's no other configuration that's required. So I thought that was really important, and it was fairly interesting to see people's reaction.
And like you say, a lot of people aren't visual people. So knowing even, “Do I use a bar chart for this, or do I use a box plot?” Or you could say the heat map, you know? Just knowing what type of chart even provided the best information, the best way of communicating what you were trying to say.
So one of the key aspects of the system is that it had a visualization recommender. And the system looked at all aspects of the data attributes, distribution, and it said: “Here's the best chart that can communicate what you're trying to answer. But here's some other charts that can also display it, if that's your preference.”
So it guided the person in terms of like, “Yeah, don't use a pie chart. Because in my opinion, pie charts aren't the best way of communicating what you're trying to say. But here's the best chart.” Like, a heat map is the best chart, or no heat map. It looks at all of the data attributes and what it was doing.
So that was another cool thing, being able to provide a subject matter expert's opinion, in terms of how best to visualize a chart or even a dashboard, in terms of the best layout. So it's just able to recommend that for you - it was very, very cool.
Yeah. It was interesting that you mentioned the work that had to be done to support things like “greater than” or “less than”, because if you've heard about ChatGPT and math, everyone says that's one thing it just sucks at.
Yes! (both laughing)
They've been trying to integrate it with Wolfram Alpha3, trying to use the chat as the front end, to feed actual math into Wolfram Alpha and have it do the work.
One of my other interview guests, , had mentioned recently 4 that she fed a CSV file to ChatGPT 4.0, and it generated charts and graphs for her data. And she thought that was kinda cool.
Yeah. It's very interesting in terms of the things it picked up on. And the things you can build on it, as well, in terms of what it picks on and how quickly it can look at some of the data, what's correlated to what, and what's driving what. And it's very interesting because it can drive some very interesting charts.
The human, most of the time, is trying to tell a story, right? You're not just looking at charts. You're trying to build a story around, “Okay, what is the data actually telling you to help support the story you're trying to tell?” And so having some of the different variations of the data, or even what's most interesting about the data, helps you start to generate that story that you're looking to create to support, whether it's a business objective, or a case study, or what have you.
It starts to more forming that story, rather than spending hours just trying to create a visualization or a chart. Which is nice because the creativity of the human is really around the storytelling part, and not necessarily just creating charts. The charts can be, now, more of a supporting actor that helps you create that story as well, which was very interesting.
A lot of great insights there, so thank you for sharing that.
I want to come back to something. You've indicated that you mostly don't use, or you don't seek out (at least) to use, AI-based tools in your personal life or in certain situations. Are there work situations in which you might avoid using AI? And why would you not choose to use it for that?
In my day-to-day work life, I don't think there's anything. I think there's probably one area that I would avoid using AI initially, and that's if I'm looking to come up with a new idea or a new type of path of thinking. We would try to look at it from a fresh perspective. Because I think AI can bring a lot of different things to the table, a lot of different perspectives. But when you're initially trying to formulate a path, an original path, or something a little bit more personal around, “Okay, I'm trying to create something, a new idea”, I like to just work off of a blank whiteboard, right, and just generate some ideas. And then potentially bring in some AI technologies to sift through, and maybe either build on it or bring other perspectives around that idea.
Knowing that it was created initially without any other inputs or biases from different perspectives is something that I'm very cognizant of. And so in that particular critical thinking, where there's genuine new paths or new ideas we're trying to generate – and, you know, sometimes you do generate an idea that someone else has had, which is perfectly legitimate, but knowing that you generated it without having any sort of bias or influence from other sources or other perspective, I think, is a very important aspect of not using AI. Because it does bring in a lot of different perspectives into maybe what you're already thinking about, and it can bias certain results or conclusions that you might get to.
Yeah. One common concern nowadays, and you mentioned about biases, where do AI & machine learning systems get the data and the content that they train on? Often, they're using data that users put into online systems or published online, and they're not always transparent about how they intend to use our data when we sign up.
So I'm wondering: how do you feel about companies that use data and content for training their AI and ML systems and tools? And do you think they should be required to get consent from and compensate the people whose data they want to use for training?
Yeah. I mean, that's something I think everybody talks about, I think every day, right? It's where does the data come from, and somebody generated it at some point. I guess the best analogy that we normally talk about in my circle is data to train ML models or AI technologies is kind of like open source code, right? It's out there. You can use it if someone wrote it, but they do ask that you get consent if we are going to use it in any commercial or specific type of applications or usage, right?
So it's not that you don't want to use it. Maybe sometimes you don't. It does bring certain biases into it. But I think everybody should get consent to using data that was, or content that was, created by other individuals or other firms.
It's like buying a textbook to go to school, right? You buy it to learn from it. You still have to buy the textbook, right, to use it for your learning, because someone else went through the trouble of creating it. So I think that's a very important aspect of using other people's work or content is that you do have to give credit or get consent in some cases to leveraging that. I think that's very important.
Yeah. I think most of the people around the world seem to agree pretty strongly about that. And your open source analogy is good, and the one learning from books is good. Because either we pay for the books that we have to learn from, or a library pays for them. But somebody pays for the author and the creator of those materials.
Exactly. Exactly. And sometimes they're not asking for monies to be exchanged. It could be just getting credit or participating in it. Or actually knowing where it's ending up - another reason is how is this data being used and for what purpose.
And I've even seen customers now, when they do give you data, they tell you, “I don't want my data to be used in any kind of AI training or used for any purpose.” So before you even ask them, they make a disclaimer, right? Which is a very interesting trend that people are starting to be so aware that, “Okay, I'm going to give you something. I just want to be clear. Please don't go off and start using it. I'm telling you now. I don't want you to use it.” Which is very interesting, the way of proceeding.
So people are very conscious about their data. They're very conscious about what they're using it for and how it's exchanging potentially. It's going from one system to another.
Mmm hmm.
Getting data to drive the training of AI models have been a topic of conversations since AI and ML really started to take off. I mean, it's always been a conversation: “Where's the data going to come from? How are we going to create this data?”
It's always been a conversation around where to get the data, how to create the data. But I think a natural question is “If there is data, do I know the source of it, and can I just use it?”
And I think the conversation now is starting to be more complete around - not just, “Is there data, or do we have to create it?” Now the conversation is also around, “Can we use the data, or can we get consent to use the data? Should this data be even used in some of these scenarios?”
So I think that the natural conversation now is becoming more holistic, not just around “Where are we going to get data to train the model?” It's “Where do we get the data, can we source it, and how do we ethically source this data?” I think the conversation is finally becoming more holistic and complete.
That's a good summary.
Yeah, yeah. It's always been a discussion point. It's just been around “How do I make it work?”. Now “How do I make it work, and is it the right thing to do?” or “What can we do to make sure that how we're doing it is a good way of doing it?” or “Is there a better way of doing it?”
Mmm hmm.
That's a good thing. It's a natural way of evolving the conversation and improving on what you started with. “Now that I've made it work, now how do I improve how I'm doing it?” So yeah.
We've talked about you building AI based tools and systems. You mentioned a little earlier about where the data came from, that you end up having to create it and tie it into different enterprises, it sounded like. Could you talk a little bit more about that?
Sure. Yeah. So I think in our particular use case, there wasn't data available to just train it. I'm sure there is some now. But the time when we started, our approach was, okay, I need to get subject matter experts to actually sit down, and we all started typing, generating.
Basically, one part of the system is, well, how can I train it on what questions it can be asked, or what's relevant? Then we literally had to sit down and type or generate all of the different content to train the model with.
And I think when we initially started, a lot of other companies were facing a similar problem where the data didn't just exist. And I remember there's one other company that did that, and I think they were trying to collect either location data or some other type of data that wasn't really readily available. And they had hired people that basically ran around and collected this data for them. They sourced it. So they paid people to source this data.
And so it was a very normal thing, if the data didn't readily exist, to harvest it or create it through subject matter experts we need to create the data. So we sat down and we typed or generated a lot of the various content that we wanted it to understand. It took a little bit of time, and you always had to continue to add to it as well.
And so that was an interesting part where you didn't just go find a source that provided the content. We actually had to start somewhere and then go to various other subject matter experts that helped us add to it and grow it into something that was a bit more holistic in terms of a starting point, and we kept adding to it.
So every release, we keep adding more and more content around the different types of questions we can be trained on. A lot of people thought it was like, “Oh, that's going to be a lot of work”, but it actually wasn't a lot of work. It just took a number of us that sat down and just thought, “What are all the different types of questions people would normally sit and ask a system?” You know, it took us a few weeks, but it wasn't that bad. But, yeah, we had to create a lot of the data ourselves in terms of what we wanted the system to be trained on.
But a lot of the different content and the data that we generated, we did have to craft ourselves because we were the industry subject matter expertise. And so we were in the best position to create some of this data. And then we got feedback from other people externally, and they helped add a little bit more variations in it to make it a bit more holistic. I don't know if that answers your question.
It does. Yeah. And the other thing that I'm wondering a bit is: once your system got into production and people were actually using it to ask questions, were you then able to take the list of, oh, here's all the questions people asked us, and asked you for the past month, and use that?
Yes. That wasn't originally what we thought about. But people are very interesting, because the first thing that people tried to do was break the system. So asking things they didn't know how to answer, which was very interesting because people always want to see how smart or dumb it is, right? Which is fair enough. And that's actually what clicked us to harnessing what people were asking it. Because we were looking at, well, what questions didn't we get [right], and are they valid questions?
So we were looking at it from, did it make sense that somebody would ask it that way? And some of the questions were not something the system was meant to answer in the first place, and they were a little bit joking, a little bit goofy. But there were some very good questions that the system had trouble with. And that's where we started to harness what people were asking it, and then what the system was having problems understanding or problems answering.
And we started enhancing its language processing capabilities - also some of the analytic capabilities that can answer those questions. It was a 2-part problem, right? So even if we did, in some cases, did get the question, we couldn't answer the question. And then in some parts where we didn't understand the question, we needed to enhance our understanding of where the question is, and it was a play of where the words were in the sentence that threw off its understanding.
But, yeah, that's exactly how we started to evolve the system, by seeing what are some of the things that we were failing to help them with. And then we started, like, oh, it would be really good to see which ones we're really doing good at, and maybe we don't need to improve in that. But what are some of the key areas that we were struggling with or getting the wrong answer, or things like that.
So I think that really helped us add more capabilities and enhancing the system further after its initial release. It was a good way to start accumulating even more information, because 4 or 5 people internally can probably come up with a good start. Having hundreds of customers out there starting to ask questions, it's a little bit different in terms of accumulating information and knowledge. So that was awesome.
Yeah. That's a great story of how you started the system and then improved it by using the data that your customers were asking about. So that's a great example. Thank you for sharing that.
Yes, yeah, thanks, Karen.
As a member of the public, and this is more in a private scenario, there are probably cases where your personal data or content may have been used, or has already been used, by an AI-based tool or system. Do you know of any cases that you could share, obviously without disclosing anything sensitive or personal?
I'm not aware of any. That doesn't mean that it probably hasn't happened to me, in terms of using my personal information, but I'm not aware in any situation where my information was used without me knowing about it? I guess that's the question, right?
Well, for instance, I took an online test for my PMI certification, and I had to give them some of my information. Or go through an airport - TSA takes biometrics or they do photo screening. Or social media sites, you sign up. Or websites that sign up - Netflix, and they want an exact birth date when they don't actually need an exact birth date.
So there's a lot of personal information that we give, assuming, I think, that it will be reasonably protected, and often it gets used in ways we didn't anticipate. That type of use of our personal data is what I'm wondering about.
Yeah. Yeah. No. You're right. I think there's a lot of systems, like, going through an airport. They do take a lot of personal information. And, yes, our assumption is that that information is going to be protected - until the next data breach that someone has, and your data is out there. And, you know, you get an apology about the data.
I think there was one time I signed up even for a loyalty program, and then you give them some information, and then they had a data breach. And your information for that loyalty program went out there, and now I get all kinds of weird emails and programs that ask me to sign up for things.
I think for some things, we do give our personal information because it gives us certain conveniences, that we are willing to give some personal information to have those conveniences.
Like, my GPS in the car knows exactly where I work, even though I never told it that's work. It knows I get up and I drive there every morning, and it labels it “you're going to work”. Or it knows that I go and I visit my mom on a specific day, on the weekend, a specific time. And so when I get in my car, it's telling me, “Oh, you're going to Mom's”. And so it's very interesting how it remembers the places that you visit most frequently, and tells you how much time it's going to take you and the best path to get there.
Now it is a convenience that I enjoy, but I never actually told it that that is work, and that I go there every morning at a specific time, and the best path to get there. But it is using my driving information to predict patterns for myself.
It is a convenience that, I guess, I'm okay for it to use my information for that purpose. But at some point, you stop and go like, “Okay. Now that's weird. You think I'm supposed to go to this grocery store, yet I'm going to a different grocery store.”
So it remembers a lot of the different places because of your habits and patterns, and makes you wonder, “Did I give consent for them to be tracking all this information?” Maybe I did. Maybe I didn't, but I wasn't really sure.
So that's probably one thing that I'm not a hundred percent sure whether it's worth the convenience, or whether that's an invasion of your privacy, in terms of knowing all of these different detailed information about you.
But I guess in some cases, I think, if I ever disappear one day, maybe they'll look at my patterns and know where I went. So it could be for my own benefit. You never know. So there's always that!
I guess there's always a balance between it knowing things about you and providing some value that makes my life a little bit better if it knows certain things about me, where I'm going, and what I'm doing. So there are some upsides. It's when it's used, when it goes somewhere it shouldn't, I think that's where we all have issues, or it becomes problematic. It's not knowing where it's going to end up that causes a lot of us some anxiety and uncertainty in terms of, where will this end up, and what will it be used for?
Yeah. I mean, I've read through a lot of Terms and Conditions in my day. I know some people see that it's 20 or 30 pages and don't bother. I tend to take the time to read through them because I really want to understand whether companies are being transparent about this. And in a lot of cases, it's, you know, we don't do this, we don't do that, but we're going to use your data for “product improvement”. Which is wide open, right?
Yes. Yeah. Yeah.
What does that mean? Are you going to use my data to train a machine learning system? And what are the risks of that data, my data, leaking out? And are you going to sell it to somebody else for marketing purposes? A lot of that is just they're just not transparent about it in the terms and conditions.
And they're not transparent on purpose. Because I think in many cases, they don't even know themselves, so they leave the door wide open, so that they can be flexible in terms of their plans moving forward as well. But the interesting thing is a lot of companies will bury this discussion in a 10-20 page legal document.
But you'll see more and more companies, especially European companies, and I don't know if it's because their GDPR privacy discussions are a little bit ahead of everywhere else. But even my car now tells me that my personal information is being used, and am I okay? So there's a screen that comes up that says “Using certain features will require the use of your data. Are you okay with it?”, which I've never seen before.
So now there is a more upfront declaration, which you wouldn't necessarily have seen before, rather than, it's just a big legal document that you have to read through. Now, again, it doesn't tell you what that exactly means. And if you say “okay”, where is it going to end up?
I mean, there's still a lot of work to be done around what happens when I say “okay”. But at least there's some more upfront notification that “Hey, by the way, I might use your personal data, are you okay with it?” which was a new thing. I don't know if other people knew about this, but this is kind of a recent experience of mine, which I thought was kind of interesting.
Yeah. There's a big discussion around cars and the privacy of the data that's in cars. And, for instance, some of the data that is on your phone …
Yep.
they would need a warrant to get to that. But if it's in your car's infotainment system – and a lot of it is, more than you would think – it's fair game. And it's like, “Wait, what?” A lot of people have grave concerns about that. Some people are even staying with older cars for a reason.
Yep. Yeah. Which is interesting. And what kind of information are you collecting, and who are you sharing it with? Maybe that's why my insurance keeps going up. I don't know.
Right. So car insurance – some of the companies are offering these dongles. Just put it in your car and let us track how fast you accelerate and what speed you go, and we'll give you a discount. Guess what they're doing with all that data?
Raising your insurance! Yeah. Yeah. I mean, little things like that, I think, really turn people off from true potential of some of this technology, because it is being used in not as clear a way as possible. It's being sold as a benefit to you, but in most cases, it's not a benefit.
And that's where there's a difference between sharing and then oversharing. When you overshare, then that information ends up being used in various different scenarios. And I think that's where people start becoming more suspicious and less trusting of the technologies. It's not because of the technology itself, but because of how it's managed or used ultimately. That's where I think there's a breakdown in terms of, where does this go from here?
Yeah, and it's good to see movement towards companies explicitly asking for consent for certain features. But there's also true informed consent. And I think we're quite a long way from that, even reading the terms and conditions.
There was an example a few months ago where some attorneys were reading the terms and conditions for an AI-based legal support tool. And even they didn't realize that the terms would expose their clients' confidential data to that company's AI training.
And if even the lawyers couldn't figure that out, how are us non-lawyers supposed to? And maybe the answer is, like you said before, maybe it's deliberate obfuscation. Maybe they don't want us to figure it out.
Yeah. I think there's a lot of misaligned interests about understanding where the data needs to be used and how it needs to be used. I think you can put some very clear wording and intent around it. But people try not to do that because they want to be able to get value out of it. Because data is valuable. And so the more data you have, whether you're training AI, or just having the data for other purposes, is also valuable. So I think there's a big push to wanting to get a lot of data that we don't normally have.
And it's these mixed intentions that can be very clear, right? Like, when you put up a message, and saying “Hey, we're going to use your data for this and this, and are you okay with it?” Or give them a place where they can go and read about, what does that mean in terms of their data and how you deal with it?
I think people kind of gloss over it now, and then they just say, “Okay, yeah, I agree, I agree.” And then nobody even reads it, or it doesn't register until something bad happens. And then they go, “What? What do you mean somebody stole my data?” That's when people start to kind of panic about the meaning of it. I think organizations in general need to do a better job of educating people around, what is all this data being gathered for, and what does this mean, and some of the pitfalls?
And you know what? Maybe I say “okay” today. How do I tell you to remove it later? And is it removed? How do I check on that? I think they don't do good education on that. It's because they don't want people to change their minds. They don't want people to ask to remove their information. And even if they do ask “forget me”, did they actually forget you? How do you check for that once they have your information?
Mmm hmm.
There's no way of knowing if they actually did it or not. There's no way to check on that. So it's a very complicated, I would say industry, the industry of personal data. It is actually another industry that it generates, and it has a lot of value. And, therefore, there is interest in maintaining that industry and that value. And I don't think, unless there's more real precedents around how do you manage all of that information and to what purpose, we’ll continue to struggle in terms of its meaning and the outcomes of some of this information being out there.
Yeah. And this brings us up to the final standard question for these interviews, which is: public distrust of AI and tech companies has been growing. And partly, I think that's because awareness of these issues is growing. And people are saying, “Hey, wait a minute”.
I take the increased distrust as actually kind of a good sign in terms of awareness. But what do you think is THE most important thing that an AI company or tech company would need to do to earn, and then keep, your trust? And do you have specific ideas on how they can do that?
I think information is critical. I think educating your users about the information that you're gathering and where does it go, I think it is critical.
And it's interesting because at the beginning of when I started getting involved in AI, people were distrusting, only for a different reason – because they really didn't understand the technology. They thought “it was trying to replace my job”. So they distrusted it, because they didn't think anything can do something as well as people were claiming it would do.
Now that AI has become more mainstream and it has been proven, I think we just need better education around that to help them understand what are the pitfalls, right, of this information. “How do I make a request? If I don't give you my data, what's the consequence? If I do give you the data, where does it go?”
It's important to put it in English terms, right? Don't put it in 20 legal documents. No one's going to read that. They might as well save the paper. It's really about educating the people of the consequences and being a responsible partner and technology provider in terms of saying, “Hey, this is what we're going to do, and this is what we're not going to do.” And I think that's an important part of being a trustworthy technology partner with your customers and your end users.
Most of the people that I ask that question to end up talking about transparency. And you just did as well, saying that they need to be open about what they're using and what the consequences are of giving the data or not giving the data. So that seems to be emerging as the number one thing that people want companies to do. And, fundamentally, we're really trying to figure out if we can trust them to behave ethically.
Yep. Yes. And do what they say they're going to do.
Mmm hmm.
And nothing stays the same, right? Everything evolves. And so you need to go back and update your consumers or your end users on new things that you're evolving into as well. Like, constant transparency is not a one-time thing, right? Even if you say, we have continual upgrade, just like your technology and everything else.
This is the constant conversation we're having around information, and where it's going, and what it's being used for, and where can it help, and where can it be problematic, and how do we work through the problematic areas together? I think it's a constant conversation that we need to be engaged in.
Yeah. The other part, you alluded to this earlier. You're talking about people not trusting AI partly because they were afraid of it taking their jobs. But people also can distrust it because they don't see or understand what data it used, or how it came up with the answers.
Yep.
And people that are already deep subject matter experts say, look, I know more than this model. So they may distrust it because they don't have transparency into the models. Never mind how the company operates, but just understanding the models and whether it's explainable AI or if it's just an incomprehensible box.
Yep. This is such an excellent point. One of the key things around our forecasting and analytics is one of the key features around that, with people going, “Well, how did you come up with that answer, and why should I trust this answer?” And one of the key capabilities was around explainable AI. This is the ability to explain the results that you came up with. There's always a formula for how you came up with something, so being able to explain that.
Whether people agree with it or not, I don't think is the key point here. Because people always have opinions around how you derive or arrive at certain conclusions, but at least they can see your process of how you arrived at your conclusion.
So explainable AI is being able to say, “Well, explain yourself. You're telling me this is driving this other metric in my system. Well, explain that to me. How are you coming up with this conclusion?” And being able to walk somebody through it, they can either agree or disagree, but at least they can understand your process. So I think that is an excellent point, making sure that you have explainable AI.
Right now, to a lot of people, it's just a black box that you can agree or disagree with. You don't understand what data it's used for its training. You don't understand what the model is capable of or not capable of. You don't understand the limitations, and the things it's good at, or the things it's not good at. And so being able to have that explainability helps provide some of that transparency as well, right?
It's all around transparency and how you arrived at some of your deductions and conclusions that is just as important as the final outcome. If you go back, like in your math exams, giving an answer is not sufficient. You actually have to show your work. I remember I used to get a lot of deductions because I'd just give the answer and the professor would always say, “Where's your work? Show your work. You lost three marks here because you didn't show your work, and you have to show how you got to an answer.” It's not just the answer itself. Yeah. I think that's a really key point that you brought up.
Yeah. The flip side of that, though, is that there's also a tendency, I think, for some people to trust AI recommendations far MORE than the technology or the model or the underlying data would actually warrant. It's like, “Well, the machine said so”.
So this is where data literacy and AI literacy and learning to ask good questions about models of the data that was used for them – this is where all that comes in, to counteract that tendency to just say, “Well, I believe the model. My GPS is telling me to go here, even though I'm pretty sure that road is closed or that town doesn't exist.”
Yeah. Yes. Yes. “Turn here”, but there's no here, okay! But you're right that every model has to have a report card. Like, when was the last time this was trained? On what data? Is there deviations? What's the rate of false positives? These are all things that, I think, drive whether you should trust what this AI technology is telling you or not. So you're right.
I think there's a lot of information that people don't even know to ask that should come with probably part of the answers that it's providing or part of the explainability, right? Are you seeing more deviations from previously? There's a lot of different things that I think can be done to demonstrate viability of the AI technology, not just the answers that it's providing you.
This has been a great conversation! Is there anything else that you'd like to share with our audience?
No, this has been a really good conversation. I think there's a lot of things AI can help us with, but there's a lot of things that we need to educate each other, other people, but lots of interesting learnings to bring out. And I really thank you, Karen, for taking the time. It's been a really great discussion.
Thank you, Rola. I appreciate you being willing to join the interview series and sharing your insights with people out there. I think it's really important that we hear from people rather than the tech bros that are hyping all of these tools, how it's really affecting people's lives, and how they are or aren't using AI tools. Like, you're choosing not to use it for times when you want to do something more creative, or things like that. So I think it's important that people get all these perspectives. So thank you for joining me. I really appreciate it.
Yep. That's perfect. Thanks, Karen.
Alright. Thank you.
Interview References
A recent Consumer Reports AI survey of 2,022 U.S. adults goes directly to Rola’s point about people caring about transparency and explainable AI. For more details, see:
About this interview series and newsletter
This post is part of our 2024 interview series on “AI, Software, and Wetware”. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools or being affected by AI.
And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I don’t use AI”:
We want to hear from a diverse pool of people worldwide in a variety of roles. If you’re interested in being a featured interview guest (anonymous or with credit), please get in touch!
6 'P's in AI Pods is a 100% reader-supported publication. All new posts are FREE to read (and listen to). To automatically receive new 6P posts and support our work, consider becoming a subscriber (free)! (Want to subscribe to only the People section for these interviews? Here’s how to manage sections.)
Enjoyed this interview? Great! Voluntary donations via paid subscriptions are cool; one-time tips are appreciated; and shares, hearts, comments, and restacks are all awesome 😊
Credits and References
Audio Sound Effect from Pixabay
Microphone photo by Michal Czyz on Unsplash (contact Michal Czyz on LinkedIn)
‘Inclinations’ in SRED:
“Providing documentation structure that makes it very clear, so that reviewers that has had no involvement in the project (or even technology background) can quickly understand:
what problem that was being solved,
what theory or solution was proposed,
what struggles or challenges faced during the duration and
what was the outcome and benefits that was achieved.”
“Wolfram GPT” - ChatGPT plugin for Wolfram Alpha https://gpt.wolfram.com/
See and Stephanie Fuccio’s AISW interview for more on her experiments with ChatGPT
Share this post