6 'P's in AI Pods
6 Ps in AI Pods (AI6P)
🗣️ AISW #029: Heiner Leiva, Costa Rica-based cloud data and AI architect (AI, Software, & Wetware interview)
0:00
Current time: 0:00 / Total time: -42:17
-42:17

🗣️ AISW #029: Heiner Leiva, Costa Rica-based cloud data and AI architect (AI, Software, & Wetware interview)

An interview with cloud data and AI architect Heiner Leiva on his stories of using AI and how he feels about how AI is using people's data and content (audio; 42:17)

Introduction - Heiner Leiva

This post is part of our 6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.

This interview is available in text and as an audio recording (embedded here in the post, and later in our 6P external podcasts).

Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence? for reference.


Photo of Heiner Leiva, provided by Heiner and used with his permission

Interview - Heiner Leiva

I’m delighted to welcome Heiner Leiva as my guest today on “AI, Software, and Wetware”. Heiner, thank you so much for joining me today! Please tell us about yourself, who you are, and what you do.

Thank you so much, Karen. So first thing first is, my name is Heiner. I am based in Costa Rica. I am working right now as a cloud data and AI architect in one of the 5 tech companies. Additionally, I teach part-time at the Costa Rica Institute of Technology. And I have recently started on my PhD at University of Barcelona, where I am focusing on generative AI for robotics and human interaction. 

Great. So what is your level of experience with AI, machine learning, and analytics? Have you used it professionally or personally, or studied the technology, or built tools using the technology? 

That's a good question. I have several years of experience working with machine learning models and nearly 4 years specializing in AI algorithms. Recently, I have leveraged numerous out of the box models to develop chat bots and utilize pre-trained models for custom human interaction interfaces. I also advise companies on how to implement these algorithms to create AI pipelines and enhance their AI strategies.

On the other hand, I have experience working with cloud-based analytics platforms. While I initially started with AWS, over the past 6 months, I have been extensively working with Azure, and all the Azure ecosystem as a whole. 

Thanks. That's a good overview of your professional life. How about your life outside of work? AI and machine learning are everywhere now, whether we seek it out or not. And we have some examples that I've shared about before. So what are your experiences? And is there any example that you can share?

It's a big interesting question. I will say that we have AI everywhere. Well, I will give you one specific example. Every single time that I'm doing exercise, like doing workout every single day, I will use one app that is tracking all my movements and also my exercise, and it's using AI. So every single time that I'm doing an exercise, the next exercise will be based on the tracking history that I have and also all the exercises that I have. Every single time while I'm doing exercise, I will listen to music. And also, there will be a lot of recommendations about music that I might like. So this is something that I would use. 

Every single time when I'm watching Netflix, it would suggest to me a lot of movies, based on my recommendation history. So I think that we have AI everywhere right now.

Maybe the thing is, in the past, we were not concerned about how AI is pulling into our daily lives. But right now, with all different kinds of genAIs, and also with all the models that we have right now, we are more concerned about that. And also, we are seeing, okay, we have data everywhere, and also this data is coming through my life in different applications and in different fields. But, yeah, this is something that I’m also looking for. 

So when you think about the music recommendations and other guidance and results that you're getting from a generative AI tool or from a recommender, how good are they? I mean, how often do they get it right, as far as what you would want to listen to? 

That's a good question. It depends on the day, it depends on the artist that I'm listening to. Because sometimes it is pretty accurate in the genre that I'm hearing. But sometimes it recommends some genres that I'm not into. But I will say that many of the times, like, 70%, it is fair, accurate, but not all the time, definitely. Sometimes, he will need some improvement. Yeah. 

That's great. From a work perspective, can you share a specific story on how you've used AI and machine learning? And I'd like to hear your thoughts about how well those AI features of those tools worked or didn't, and what went well and what didn't go so well in trying to use them. 

I use these tools every day. I've heard some people say that they no longer use them, because they feel like they are becoming dumber. But I disagree. Many people don't realize that AI tools can enhance our work by simplifying and speeding up repetitive tasks that we hate, and I do hate. 😀

So I start by using Copilot to first summarize my new emails and prioritize the ones I need to follow-up first. During meetings with customers, I ask Copilot to summarize the discussions. I also use Copilot to help me search for specific articles online to better understand complex issues my customers face.

So I'm using Copilot for things that I do hate to do, because it is time consuming, and maybe this is not something that I can do because I will need to invest a lot of my time. And then I need to focus on the things that I really like to do. But, yeah, this is essentially what I do with Copilot every single day. 

Yeah. Sounds like you're definitely a Copilot power user! How well are you finding that Copilot works for these different types of administrative and search tasks? For instance, how well does it recognize and spell the names of people or companies or products or projects? And how often does it get the action items and priorities right? 

Fair question. It does a pretty good job sometimes, because sometimes I need to go and check my email, so I can just ask Copilot to, “Hey, please summarize my 5 top customers that I need to talk today.” So it will give me a list. And in that specific list, I can just focus my attention on. Sometimes with some emails, it also reminds me that I need to do a follow-up with specific emails, and also add to a recap. 

Sometimes with some names, I have observed that maybe the name is in a different language, for example, from English, they are not catching that specific name in a better way. So we will need to give some kind of training to that specific model. 

But overall, it's doing a really good job. I can't complain about that. There are some things that they will need to improve, but this is all about AI models, right? Any model can’t be perfect 100% of the time. 

Yeah. Absolutely. I remember when we first piloted the Zoom AI meeting summary tool, and we saw some of these things that it got right, and some of the things that it got wrong. It never got the name of the company right! 😆

Yeah. Exactly. This is something that is happening. Yeah.

When you get these meeting summaries, do you correct them? And then does it learn from the corrections that you make, or is it not clear if it's doing that? 

Basically, what I do is every single time that I have the output, I check that output, and sometimes I will need to go and annotate that specific output. Also, if I need to send a recap to one customer, I also comment to the customer that some things were captured through Copilot and maybe are not 100% correct, so they will need to be aware of that. But many of the times, it does a pretty good job.

Action items is something that is key for Copilot. So this is something that I'm seeing that they are investing a lot of time doing. Because every single time that I have the action item in place, Copilot captures that, and also remind that. So, yeah, I think that it's doing a really good job. 

Okay. That's good. So when you do research to look for articles that help you understand the issues that your customers are facing, how accurate are those searches? Do you find that the articles that it gives you are real? Or, there's no hallucinations? Are they sourced? 

Oh my God. These are really good questions, and I will tell you why. First thing is I used to just write some “Can you please give me some articles about one specific deal?” And then Copilot just gave me a lot of articles. But sometimes, it gave me all the references. So I need to go and I need to check, what are their references? And also checking further. 

So I changed, a little bit, how I'm asking this. Every single time I say, “Please help me to find articles, real articles on the Internet that I can access, and then give me the specific references in a summary.”

So I will go and check all the different articles. Because if I don't specify that I need the articles that I can go and check, like a click, it will be a lot of hallucination, I would say. So, yeah, we need to be aware of that. 

Sounds like you've been learning to refine the way you prompt it to get more useful results. Is that a fair statement?

Definitely. Definitely. The way that you are doing the prompting and also the more specific thing that you're adding, the better the prompting will be. Yes. 

That's good. And just to clarify about these experiences, is Copilot typically working on texts in English or in Spanish, or is it on a combination of both? I know you speak both very well, so do you notice any differences in how well it performs in one language or the other? 

I did an experiment maybe 1 week ago, because I was thinking in Spanish and also I was writing English. I did a combination between both languages. And I saw that Copilot can capture both languages in a very good way. 

I also tried Portuguese. I speak a little Portuguese, so I tried Portuguese, and he's doing a really good job. 

However, I have talked with some peers because they speak other languages - like, for example, they speak Chinese, or even they speak French. And they told me that in French, the experience is a ‘pane of glass’ if we compare that with English. However, with the Chinese people, it's very different, because they told me sometimes they can’t capture all the context. They need to have more context or even more prompting. So, yeah, it's not the same.

OK, that's really interesting! And have you used it to help you with writing code?

Yes. Definitely. Sometimes I use Copilot to help me writing code. I would say it will depend about the prompting that you're giving, in the way that you can specify what is the specific issue and also what are the specific details you want to have, it will give you a really good piece of code. However, sometimes the code is very old, so you will need to check if there is any issue related with security or something that is not used. Or even, like, for example, if using a library that is already deprecated. So we will need to check that. But, yeah, I use that. 

So are you mostly using Copilot to write code from scratch? Or do you typically write your own code first and then use Copilot to help you with debugging? Or how do you normally work with it? 

Okay. Both things apply. But I will say, sometimes, when I am just tired, I just want that Copilot can give me a piece of code so I can just start checking and maybe have some ideas. But for example, if I see a problem in one tenant, I can go further and I can ask.

And I can just think first and say, maybe this problem comes from, I don't know, maybe a library. So I can go and ask Copilot, “Can you please check, what is this specific problem? Can you please fix that for me?” And then Copilot will do that. 

And I just dedicate some time to understand first what Copilot is trying to do and also what is the recommendation. Because sometimes the recommendation is good. But sometimes it's not something that you can use, because it is giving you a general recommendation. But sometimes you need a specific one, or even, Copilot is not capturing all the things that you want to make. 

So, yeah, I try to, every single time that I have a code, I will check that further. And then if I see something that is not very accurate, I will refine my prompting. And I will say, “Please give me one specific example of this using that. And also, I expect that specific output as part of your response.” But, yeah, I'm seeing that Copilot is improving in that way. 

Yeah. That's very good.

2 weeks ago, I was implementing a customer chatbot, which encountered several issues because it was being fed legal documents and customer policies that use very specific terminology the algorithm didn't initially grasp. I had to retrain the model at least 4 times, I think, before it started giving a decent response. 

Well, I wish I didn't have to spend so much time on pre-training. It's normal. We have to remember that AI models learn like children. You can’t expect a child to understand something immediately after hearing it, just one. And this is something that people think that AI models will do, because they expect that the AI model will be giving a response in the way that we are communicating with them. And sometimes it's the case, sometimes not the case, so you will need to refine your prompt. 

Yeah. That's very true. It's an iterative process, right? 

Yep. Yeah. Completely agree. 

So we've talked a lot about how you've used AI-based tools and generative AI tools. Are there any situations where you've avoided using AI-based tools for some things or for anything? And can you share an example of when and why you chose not to, if that's the case? 

Right now, I would say I have never avoided using these tools, nor do I plan to. They are a vital part of our present and future. But, you know, I can change my mind. 😆

Right? Always reserve the right to change your mind! That's part of what happens when we grow and we learn. Right? 

Exactly. Yeah. There you go. 

It sounds like professionally, you're very invested in using them, and that is understandable in your work situation. Do you use Copilot to, for instance, compose emails at work? Or would you use an AI tool to help you write a personal email to a family member or to a friend? Or to generate an image to use on social media, or to create music? Or are there really no cases at all where you would choose not to use it?

Good question. I have used Copilot a lot to write emails to customers, because he can just write the emails in a very professional way. For example, I start from scratch writing for myself, and then I would check with Copilot. I will give it something like, “Can you please summarize this and also make it sound more professional?” So Copilot helps with that. 

With emails to my friends, I would say yes, I also sent that. Even, for example, I'm using Teams or even any other application tool, for example, Messenger, or even, for example, Instagram. I chat a lot with Instagram. So I use that because sometimes I want to hear, like, how can you sound more like a native speaker? And also, how can you use different kinds of phrases that native speakers use? So I use that. 

I have never used Copilot, for example, to create music or something like that. I have used Copilot, I think, like, twice to create some images from scratch. It does a pretty nice job, I will say. However, there are a lot of images online. So old school, I will go and I try to find the perfect image, and then I will send that image. This is pretty much what I have been using Copilot so far. 

Okay. Yeah. That's fair. So how do you feel about companies that are using data and content for training their AI and ML systems and tools? For instance, do you think that AI tool companies should be required to get consent from and then compensate the people whose data they want to use for training? 

Examples would be music or art or either code, writers, actors, people who write books, software developers, data coming from medical patients, students, or people who are on social media, any of those - people who are basically creating the data that AI and machine learning systems are very often using? 

This is part of the questions that we'll need to ask every single time that we're using AI models. So I would say, of course, many of the tools we use today don't have pre-signed consent for their training data. They have been trained on vast amounts of web content, and nobody receives credit for it. During my recent research for an article I was writing, I watched a documentary about a musician who believed the music had been used for trained algorithms. However, after some investigation, it turned out that the melody they thought was stolen had itself been plagiarized from other artists. So my question is, can we really complain about something we might have already taken from others? This is the other question that we'll need to ask. 

Yes. It's a good question. Although, that's one counterexample, and it wouldn't really justify a rule. There are millions of creative professionals whose original work HAS been scraped for training without consent, credit, or compensation, which is what they now call the “3C’s”.1

So that's an area where it seems pretty clear cut, for instance, with the books that were used for training some of the large language models. This is where I think a lot of the ethical concerns come up. 

There was another company, not your company. At first, this other company denied, and then they finally admitted, that they scraped - not public domain, but ‘publicly available’ - YouTube videos, and used them for training their models. And those videos are copyrighted, and people have the rights to them. So there are certainly cases where companies have done unethical things to get content for training.

So I know you have some thoughts about ethics. You had a presentation on it. 🙂

Yeah, we had a presentation a long time ago! For me, ethics in AI are extremely important and crucial to consider. However, people often believe that if they wrote something on the Internet 5 years ago and someone used it recently, they should receive credit. The answer is yes, ideally, like a disclaimer. But the reality is different. 

When I accepted the terms for using products from companies like Meta, Google, and others, I gave consent for my content to be used, even though those terms might not be originally mentioned, AI training or large language models. So trying to claim compensation for the use of ideas that I freely shared years ago, it's difficult to prove. This is why there are now platforms that require a license when selling creative works, but not everyone follows these practices. 

So this is something that I also recommend to my students, for example. Because right now, ethics, and also who will be responsible for that, this is not something that we are talking with others. And also, we are not asking ourselves. 

Yeah. These are really important conversations to have, and I'm glad to hear that you're talking with your students about it. Because they need to get that understanding as part of their context for how they use AI tools in their futures. 

And there's definitely a lot of shades of gray in copyright and consent. As you said, in theory, anything we write is implicitly copyrighted to us. But in reality, it's really hard to enforce. And some people don't even care if their own content gets used, or they might accept that the content they wrote on a platform under its terms and conditions is not only theirs. But a lot of people around the world seem to agree that works that were professionally published under formal copyright should have more protection. 

As an example, I wrote some blog posts on WordPress 15 years ago, and some people believe that WordPress is entitled to license those texts to an AI company for training without my consent. But most people don't accept that the book that my friend Dr. Mary Marcel2 wrote this year, which is copyrighted to her and her publisher, should be used for AI without the 3C's going to her. And it's the second kind of case that I think is more concerning to people. 

Terms and conditions and what they call ‘clickwrap licenses’ are also kind of problematic, and they have been for a long time. And I get concerned that one, they're really not written to let people give informed consent. They're 10, 15, 20 pages long and they're written in legalese. 

And I get even more concerned when they change the terms retroactively to give themselves rights that they did not have when the person first agreed and signed up. And that puts people in a bind. We can maybe talk about that later if we have time, yeah. 

Sure. I completely agree. We can start a long, extended conversation about that, because there are a lot of things that we will need to talk about first. 

But I will say, my concern lies with companies using private data. All data is important, but it's a lot of data that is more important than others.

For example, if I have sensitive clinical information from my patients, so this is very, very private to them. So I will be very concerned if I see their data being used for trained models without proper consent. Because for some people, they will have a terminal disease, or even they don't want that some people know the different diseases they have. So I will be really concerned about that. So this is something that we'll need to regulate.

Yeah. The use of private data and especially medical data, or even when we get to genetic data, those are really serious concerns. One of my more recent interview guests, Dr. Julie Rennecker, she works with startups in the healthcare innovation space. And she had a lot of insights to share about innovation and what they call the HIPAA laws in the US about protecting medical information, and what they do and don't cover. We had a really good conversation about medical privacy there, and she's got some good insights. I won't try to repeat them, but it's a really important area. I totally agree with you on that. 3

So as someone who has used AI-based tools - it sounds like mostly Copilot, but maybe a few others - do you feel like the tool providers have been transparent about sharing where the data for the AI models came from, or whether the creators consented to its use?

Yeah. First, we will need to check if we are using a third-party tool from another company. Because many companies right now, they use a lot of OpenAI models. So we will need to check if the company that we are working right now, what are the terms of service they signed with the OpenAI models? Because if I go to OpenAI and I ask them, “Where do you find all the data that you're using for training?” Maybe they will give you some answer, maybe not. So this is something that we will need to check. 

But companies are not transparent about where the data was sourced or how the algorithm was trained, many times. There isn’t specific legislation governing this yet. And as users, we need to start demanding greater transparency with every new release, much like the European Union is currently doing. Because sometimes when I'm seeing the release documentation about AI models, we have a lot of standard sections for AI capabilities, what are the technical features, what are the things that they use. But if you see data sources, and also what are the things that they have been trained so far, they have “For more information, please click on this”, and then you will click on that, and then it will be general information, but you can't see anything on that. So, yeah, it is a problem. 

Yeah. I agree. This is really important. Do you happen to know if there's been much traction on this in Central and South Americas? I was looking into this a few months back, just looking at all the different regions of the world, of where they stood. And Brazil was leading the way in the region, and trying to set up some standards that then other countries might be following. But I don't remember the specifics of it offhand right now. 

That's a very good question. Brazil is working right now with some legislation. However, I know that in Colombia, they are working on that, and also in Mexico. Costa Rica just started doing that, and I think that is the first country in the Central American region that is doing that, followed by Panama. 

But yeah. Right now, I would say, LATAM is a little behind the United States, and also some countries in Europe. We're trying to have that, but it's some new things in the region to talk about. But I hope that we can just start working on this with other companies and also the government, because it's really important to talk about that. 

Yeah. Agreed. As members of the public, there are some cases where our personal data or content may have been used or probably has been used by AI-based tools or systems. Do you know of any cases that you could share? 

I am not aware of any direct use of this data, even my data, but I don't share much personal or sensitive information on social media. And maybe you can go to my LinkedIn. You can see that I post a lot. 😊 So, yeah, it applies. But I try to keep it private. While I do believe that some of my data has likely been used to train these models, it's nearly impossible to trace anything back to me specifically. With millions of people sharing similar thoughts and content, it's difficult to prove that something originated from me. 

Yeah. That's true. And I think you're wise to be cautious about social media. 

Yeah. 

Do you know of any company that you gave your data or content to that made you aware that they might use your info for training AI or machine learning? Or have you ever gotten surprised by finding out that a company was using your data for AI? 

Yeah. As I mentioned before, once you have signed up and created an account on these platforms, you are essentially giving your consent for both current and future products. While I empathize with those who feel their data has been misused, we can’t guarantee that data we openly share won't be taken. 

As consumers, we need to become more diligent about our social media usage, specifically how, where, and when we share details about our personal and private lives, interests, hobbies, and many others. 

Yeah. That's true. And I think it's important that we distinguish between data that's been openly shared, like on a public Facebook feed, or content that people created privately in an online system that in many cases, they paid lots of money to use, like the Adobe tools for artists. 

And then when that content gets resold by the company, to train an AI tool without the consent or credit or compensation to the artist, basically, they're stealing from the artist and killing the artist's way of making a living with art. 

One of my recent interview guests, Kris Holland, who's a technical artist, has reported that it's basically killed off the business he's been doing for 20 years.4 

So saying that we need to do AI fairly isn't the same thing as saying we shouldn't do it, but we definitely do need to think about how to be fair about it. What do you think? 

Yeah. I do agree. I do understand sometimes when people say, okay, I have been working on this the last 20 years, as you mentioned. But sometimes, for example, AI is a new kind of business, and even is a new tool that we can just understand. Because for some artists, they can say, I am living for this, and also right now, they are just killing my business. 

For example, I have a friend. She is a graphic designer, and I asked her, “What are you doing with this?” With all the AI stuff that we have right now that is generating new images, and also is doing a lot of painting. She said to me that sometimes she's not very inspired. So she goes to these AI tools. She asks for inspiration. She asks, like, for example, “Can you give me, like, a picture of one specific thing that I can just start working on?” So she uses this in inspiration, and she creates this, and also helps her with her job. So this is something we will need to consider. 

The other day, I was watching a YouTuber, a famous man. He said that sometimes he doesn't have any inspiration, any creativity, the way he's creating this kind of YouTube videos. So he asked OpenAI models, “Please create a summary of things that I can focus on my next video”. So he said that he used every single thing that OpenAI offered. He went viral with that specific video.

Some people, for example, could focus on doing this kind of thing, using this kind of platforms in the way that they can be more creative, and also how they can make their job easier. But I do agree - for some of them, it's different. 

Has a company's use of your content or personal data ever created any specific issues for you, such as privacy or phishing or loss of income or having your lesson plans scraped or anything like that? 

None of this is new. Companies that sell credit information have been doing that for decades, and nobody seemed to care. But now, because AI is a hot topic, a trendy topic, everyone is suddenly concerned about their data.

The reality is I could go to any credit bureau, and they will probably know more about me than I do. In this case, I believe these companies don't really care about people at all, and I don't trust the companies that buy this data either. That's why when I need to request credit in Costa Rica, I avoid banks that require consent to use credit bureaus for scoring. Because it's part of the business. You are just feeding them with your data, but, also, you are paying for this. And it's like a cycle that has no end. So, yeah, it's a real problem, we'll say. 

Yeah. That's really surprising to hear you say that you can avoid it. I don't actually know of any US banks that won't use credit bureaus, or require our consent to contact the bureaus, before they would process a loan application. If you have in Costa Rica some banks that won't require credit bureau access, you're very lucky, because we don't have that.

Yeah. In Costa Rica, we have some banks that don't require this, because they will need to pay a license. And this is pretty expensive. So they want to, “Okay, maybe we can just have this kind of record” - I mean, your history. But we are not paying for this kind of bureaus. 

But, yeah, in Costa Rica, we have that. And also, we have the ones that ask you to sign a consent for making this kind of research on the credit bureaus. Yeah.

Yeah, that's funny because here, they require it. Some of them say that they'll cover the cost. And some of them actually ask the person applying for the loan to pay the cost. 

Oh my God. Unbelievable. 

So data privacy is a really big issue. And I think the concern is really threefold - why it's changed, or seems to be more intensive concern recently. 

One is that there's just much more data about us. New devices are capturing it automatically and not keeping it private. 

You know, you mentioned having a device that helps you with your workouts. It's not just our cell phones - it's our smartwatches, our cars, security cameras on the streets, people's doorbells, and biometric systems, and there's just so much more. And schools are even capturing data now about kids, and it's creating increased exposure of children's private information. 

And now there's these new smart glasses. Have you heard about these, that were hacked by the Harvard students to automatically identify personal information about someone that they simply looked at, without that person's knowledge or consent? 

And as you mentioned, there's medical data, and sometimes there's even genetic data.

So just the sheer volume of data, I think, is making this a much bigger concern than it used to be in the past. That's one thing. 

Yeah. In the way that we're just expanding our capabilities with AI, this is something that will be coming more and more in different places. For example, when I was a kid and I was attending a school, I never had a camera on me. And this is pretty normal right now, because kids have a lot of cameras in the schools. In Costa Rica, some time ago, it was pretty abnormal to see some cameras in the middle of the street, because we never had that. But right now, we're seeing that we have cameras, every single corner, and we are trying to monitor all the traffic. 

I was talking the other day with my friend, and I told him, what would happen if we have an accident? For example, if I go in my car, and then I have an accident. Do people know that I have an accident? And he was like, “I don't think so, because there are no cameras.” And I was like, “Change your mind, because in Costa Rica, we have cameras, every single corner.” 

My mom, right now, she's living on side of the mountains in Costa Rica. It's a little town. We didn't used to have this kind of things. But the other day, when I was traveling to my mother’s house, I checked and they have some cameras installed, because they are trying to monitor the animals that are crossing the streets. But right now, you can just monitor every single thing - for example, cars. You can monitor people. 

So we have cameras in every single place. Right now, people have been checking the face. And also, governments have information about the faces of all people. We need to just say, “Okay, now governments and also companies, they know more about me [than] what I know”, as I mentioned before.

But, yeah, cameras, gadgets, IoT devices, they are everywhere, in every corner. And we can just say, “Okay. I don't want to be part of this.” So you can just go buy a house in the middle of nothing, and then you can live there, but you will be completely alone. So this is why I said we need to start focusing on regulation of these things. 

Yeah. I think one of the other things that has changed more recently is that there's a much lower trust and data security. And we used to assume that we could trust companies and organizations to use our data for suitable business purposes, and then protect what we gave them. And nowadays, it seems like their data systems are being breached regularly and it creates risks for everyone. 

Yeah. Sometimes when I talk with some of my friends, sometimes I just ask, “What do you think about the use of AI? And also about your data?” Some of them, they said, “If I don't give my consent, they could use my data. If I give my consent, they will use my data. So by the end of the day, if I use or not, they will have my data.” So this kind of mentality that they have, I know that they will use my data without my consent, or my consent.

And I think that I'm this kind of person that is trying to be ‘by the book’ in some things. For example, if you are asking me about how you use my data, I will give you my consent. But if not, I won't give you the consent.

Maybe I will change my mind in some months, even some years from now. But I think that companies need to respect what is something that is very focused on someone else, and also how you will grant this kind of permissions to people. So yeah. 

Yeah. I think there's a lot more awareness nowadays about how our data is being collected and how it's also being sold, and what one of my interview guests called the ‘industry of personal data’. And it's not just the credit bureaus. It's also the data brokers and the large companies and scammers. And the sale of our data and content has a financial impact, which can be severe for professional creators. Not so much of a financial impact on people who are, as you said, just writing online. But it's another factor that is making the concerns about data and privacy and usage for AI a lot more visible. 

So I think the increased sensitivity to data usage that we're seeing is actually kind of good. It means people are more aware of what's going on and feel, maybe, freer to speak up about it. 

And some people do have that sense of being resigned to, “They're going to use my data anyway, no matter what, I'm powerless to do something about it”. But if we don't TRY to do something about it, nothing WILL be done about it.

I do agree with that. Yeah. I am part of that kind of people that thinks like that. But, yeah. 

Yeah. So we do see that this public distrust of the AI and tech companies has been growing. So what do you think is the most important thing that AI companies would need to do to earn and to keep your trust? Or maybe even more specifically, what TECH companies who are using data would need to do to earn and keep your trust? And do you have any specific ideas on how they could do that? 

Very good question. Companies need to focus on better explaining how they will handle people's data in the present and future. This is one thing. 

Second thing is, for example, by clicking agree, “you consent to use your data in all of your current or future projects, as well as for our subsidiaries”, for instance. If I knew that Meta, for example, will use my biometric data to identify or label me as a Latin, I will hesitate to upload any picture of myself. I will likely opt for avatars instead of this, because this created a lot of discrimination along the way.

However, it's completely different if Meta were transparent about using my biometric data to connect me with similar-looking individuals. For example, it would be really funny if I had just met someone that is pretty similar to me. In this case, I will say, “I could be okay if they can use my data, because this is for a specific purpose.” If they can be really transparent in the way that they are using my data, I would say, “Okay, this is the specific case.”

I know that sometimes we can’t just have a specific and very detailed list. But if they give me general information about how my data will be used and also what is the purpose of that, I will give my consent. It's completely different if they just use my data for labeling people, or even just create discrimination between some people.

Yeah. That's a really good point about the use of that information for discrimination. And I think what some people worry about is that they may be prompted, “In order to give you this feature, we need to use your data. We need to see your pictures.” And you might say “okay”. But then, where the concern comes in is, they take your consent as, not just for that feature, but they take your consent to use your data for anything that they want to. 

And so this idea of unrestricted use of their data for all current and future projects and for any subsidiaries or future affiliates, that's just too broad. I don't feel like that's a fair or a reasonable standard. 

I think we have to do better than that in terms of having control over what the data gets used for. This is where the consent really comes in, I think. Never mind the credit and the compensation, but just the consent, and what some people call ‘control’. Some people call it “4C's”: Consent, Control, Credit, and Compensation. Control is kind of implicit in Consent as well.5 

But I think we definitely have to look at improving transparency, as you're saying, to better explain how they will use the data. And this is something that people worldwide seem to agree is critical. 

Yeah. As you mentioned, when I started data science several years ago, I read one article, and I always remember that specific article. I think that it came from Harvard University. A trans person received a lot of discrimination at the airport because she gave some data to Facebook, and Facebook used that data to label as a man. She identifies like a female, so it was completely discrimination. So they were looking at that.

And since I read that article, every single time that I was trying to build some kind of machine learning algorithm, the standard models, I just have that on my mind. Because I don't want to feel that I am creating something that potentially can harm people. And for example, it can be like trans people, it can be like a black person, it can be whatever person, even it can be my mom.

So I always try to have this kind of things in my mind. Because I understand, right, at the end of the day, it's a business. But we need to make sure that people can feel appreciated, and also they feel respected about their identity. So this is something that we will need to make sure. 

Yeah, absolutely. I couldn't agree more. And I appreciate you bringing up the point about the discrimination based on visual stereotyping of people's identities. That's something that we're seeing here, that the TSA in the airports is now doing something where they take a photo and they use machine learning to compare the photo to your identification.

And this is just screaming for discrimination against transgender people, or people who have visual differences, or even just the fact that those algorithms don't work as well on darker skin. So it's a huge, huge area where there's potential for worsening discrimination against marginalized people. And that's just not something that we should be putting up with. 

And we even see different other kinds of biases coming in - from gender biases that are not just coming from the data, but actually being baked in and reinforced in stronger ways, in discrimination against people. I know we both have strong opinions about that and feel passionately that that's something that we should be very careful about with AI.

Well, thank you so much for sharing all of these thoughts! Is there anything else that you'd like to share with our audience? 

My final words will be: ethics in AI are very important. However, we can’t hold companies solely accountable for providing credit to others if we feel something has been stolen. In our new reality, as I mentioned earlier, these issues will continue to arise, and we need to evolve as society.

Today, it's challenging to have an original idea, and when something is innovative, it will inevitably be replicated by AI or other individuals sooner or later. So I believe we can leverage these tools to simplify our lives. 

And we can just recognize that AI is equivalent to the fourth industrial revolution. One in which many people lost their jobs, but new opportunities were also created. 

I work with various groups from schools and strive to inspire my students to learn about technology and AI. When I see individuals from diverse backgrounds, not just engineering or natural science, showing interest in this field, I realize I am making a positive impact on their lives. This is what should inspire us as human beings. 

Right now, I mean, technology is everywhere. So we can’t just say technology is only for engineers, it's only for math, it's only for scientists. We will need to make sure that all people will be included in this revolution. It's a part of the point that will change our lives in future. 

That's a great summary. And it's a great aspiration to be looking at how you can make a positive impact on people's lives, and improving everyone's lives. And it's awesome that you're inspiring your students to do this as well. 

I try. I try. Thank you. But I try.

Right. Well, Heiner, thank you so much for joining me on this interview and sharing your perspective on AI and use of data! 

Yeah. Thank you so much for your invitation. It was a pleasure, and also I had a lot of fun. Thank you.

Good! Good. Thank you!

Interview References and Links

Heiner Leiva on LinkedIn

Leave a comment


About this interview series and newsletter

This post is part of our 2024 interview series onAI, Software, and Wetware. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools or being affected by AI.

And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”!

We want to hear from a diverse pool of people worldwide in a variety of roles. If you’re interested in being a featured interview guest (anonymous or with credit), please get in touch!

6 'P's in AI Pods is a 100% reader-supported publication. All new posts are FREE to read (and listen to). To automatically receive new 6P posts and support our work, consider becoming a subscriber (free)! (Want to subscribe to only the People section for these interviews? Here’s how to manage sections.)


Enjoyed this interview? Great! Voluntary donations via paid subscriptions are cool; one-time tips are deeply appreciated; and shares, hearts, comments, and restacks are awesome 😊

Share 6 'P's in AI Pods


Series Credits and References

Audio Sound Effect from Pixabay

Thanks for reading 6 'P's in AI Pods! This post is public, so feel free to share it.

Share

1

Credit for the original 3Cs (consent, credit, and compensation) belongs to CIPRI (Cultural Intellectual Property Rights Initiative®) for their “3Cs' Rule: Consent. Credit. Compensation©.” 

5

Credit for the 4Cs (consent, control, credit, compensation) phrasing goes to the Algorithmic Justice League (led by Dr. Joy Buolamwini).

Credit for the original 3Cs (consent, credit, and compensation) belongs to CIPRI (Cultural Intellectual Property Rights Initiative®) for their “3Cs' Rule: Consent. Credit. Compensation©.” 

Discussion about this podcast

6 'P's in AI Pods
6 Ps in AI Pods (AI6P)
AI is affecting People & Places we care about, Practices & Processes we use every day, and Products & Platforms we build & use. Want to understand how use (& misuse) of data and AI impacts these 6 'P's, and what you can do?
This podcast is for you! We share episodes on ethics in AI, use of generative AI in the music industry, and selected audio interviews from the new “AI, Software, & Wetware” series.
All words are 100% human-written - we do not use AI content generators (unless as a clearly-labeled demonstration of the technology).