6 'P's in AI Pods
6 Ps in AI Pods (AI6P)
🗣️ AISW #034: Anyela Vega, Costa Rica-based engineer and entrepreneur (AI, Software, & Wetware interview)
0:00
Current time: 0:00 / Total time: -56:38
-56:38

🗣️ AISW #034: Anyela Vega, Costa Rica-based engineer and entrepreneur (AI, Software, & Wetware interview)

An interview with Costa Rican engineer, entrepreneur, and musician Anyela Vega on her stories of using AI and how she feels about how AI is using people's data and content (audio; 56:38)

Introduction - Anyela Vega

This post is part of our 6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.

This interview is available in text and as an audio recording (embedded here in the post, and later in our 6P external podcasts).

Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence? for reference.


Photo of Anyela Vega, provided by Anyela and used with her permission. All rights reserved to her.

Interview - Anyela Vega

I’m delighted to welcome Anyela Vega from Costa Rica as our next guest for “AI, Software, and Wetware”. Anyela, thank you so much for joining me today! Please tell us about yourself, who you are, and what you do.

Thank you so much, Karen. Well, I am an engineer. I am an artist also, and an entrepreneur from Costa Rica. I am very passionate about technology, art, and local business and economy. And, also, I am an AI enthusiast.

So what is your level of experience with AI and machine learning and analytics? Have you used it professionally or personally or studied the technology?

I have some work and academic experience with AI. I have worked with machine vision, specifically giving technical support to users of certain machine vision tools. I have experience with training models, using active learning loops with said models. I've used generative AI for technical and creative work. And, also, I use generative AI for lots of things, including even meal planning for the week or asking ChatGPT to suggest recipes that I can make with ingredients from my pantry, stuff like that. So I will say that's the level of experience and involvement of AI in my life.

Thanks for sharing those examples, Anyela. How would you characterize your experiences with AI and ML as an artist and an entrepreneur?

As an entrepreneur, I basically gave myself this whole year as a time to experience what entrepreneurship is. So I had 2 journeys during this year.

The first thing that I did is I worked with AI with a research team from Michigan University. They are studying monkeys and communication in primates here in Costa Rica, and it was something that I was working on with my brother at the beginning of the year before he left for his doctorate program.

So that's the first thing that I did as an entrepreneur with AI. Once he left, I wanted to try different things with AI. Not because the project was not cool enough, because it was - it was a very cool project - but because I wanted to be involved in local economy as well. I wanted to experience the whole entrepreneurship thing.

So I started a business with a friend. In my business, we offer graphic solutions to other businesses. And we do use AI tools such as generative AI for images, for video editing. We used AI tools for creating vector images also, and I've used that for many clients and offering low-cost graphic solutions for people and improving productivity. So that's also been my usage of AI.

And in music, I have used AI, but, honestly, not that much. I've used AI for helping me to discern certain harmonies and stuff like that, but at the end, I end up playing my music or using music backing tracks made by humans at the end. So yeah.

Okay. So that's a really nice wide range of experiences. You mentioned that you're using AI-based tools, or I think you said generative AI-based tools for creating graphic designs and solutions for people. So which tools are you using for that?

I have a subscription with Adobe and also OpenArt. What I do with this is, for instance, I have a friend that wanted me to help her create designs for T-shirts that athletes could use. And she had this cool idea about certain movements in CrossFit that have the names of several animals and stuff like that. And we thought that it would be fun to draw some animals making CrossFit movements, right?

I used OpenNotes for that. So what it does is: I selected the style that I wanted, and I wrote the prompts, have them optimized, I guess. But the final product of this generative tool doesn't really make sense. Like, the proportions make sense, but the actual movements or the tools that the animals were using didn't make sense. It had, like, extra arms for the bar. Made no sense.

So what I did is I printed out some of those designs and drawings, the ones that I thought made the most sense. And then I’d manually draw the things that corrected some of the mistakes made by AI. And then I scanned my drawing on top of the AI drawing and converted that into the final design, which empowered me because I am not, like, a classically-trained artist, I am an enthusiast of art. But I do have a business, and I have to deliver to my customers, right? So this is the kind of generative AI tools that I use. I don't fully rely on AI, but it does give me a great place to start, if that makes sense.

Yeah. Yes. Absolutely. It does. So that sounds like a pretty cool project.

It's been very interesting. Some people ask things for their birthday parties, and they are very creative, but they started drawing all the things from scratch. It would definitely be a much slower process. Instead, I ended up having people order something in the morning and then delivering the T-shirt with the design at the end of the day or the next day using AI.

Wow. That's impressively fast turnaround.

Yeah. Yeah. That's some happy customers.

That sounds great. I'd like to go back to what you were saying about your work project, your entrepreneur project, where you were working on the sounds, the really cool classifier project. Can you tell us a little bit more about that?

I helped a group of researchers. They have, I guess you could call it, a jungle lab or a lab jungle. What I mean by that is: in Costa Rica, there's a very small patch of forest that is surrounded by agricultural activity. And there are some monkey troops, 3 groups of monkeys. And they were trying to make some research about the type of calls that they make in the forest, type of sounds, right?

So some calls they make, they call it the ‘lost call’. Monkey A calls, and then a couple seconds later, you hear Monkey B imitating that same call. There are other calls, like alarm or warning calls, like when they see a predator, or some other types of vocalizations by the animals. So the idea was that the research team placed recorders in a mesh kind of fashion in this small forest, and they captured the sound of the forest for periods of 1 hour. And they were trying to use AI to listen to these recordings and try to decipher whether a certain sound was, or it wasn't, a monkey, and what kind of call that monkey was making, right?

So, of course, it was not practical to have experts in monkey sounds listen to all of these hours and hours because we're talking about a lot of data. They asked my brother and I to use AI to make that easier for them. So, basically, we used a tool called BirdNet, which is originally meant to be used to recognize bird sounds. But you can use that model to train your own model to recognize your kind of audio, right?

We used TensorFlow and BirdNet to create models that will classify - they call it ‘hits’. So it will say, okay, this is more than 50% chance that this hit is actually a monkey. So, basically, the problem that we had is we had a lot of data of 1 hour recordings and just a few tagged audio clips for each type of call.

I created a pipeline that will take a bunch of tagged data and train the models. And then my brother created an application that will help the experts classify what will be the supervised classification, right, saying whether or not those hits that our models were giving were correct, basically. That's the short version of the story.

So in just 3 iterations, we had a very decent model for one specific type of call. And we're talking only 1 person in 2 hours was able to accurately tag the data and even include metadata on some of these data points. For instance, we discovered that our model mistakenly classified one of the sounds as if it was a monkey call, but then the experts said “There's no way that's a monkey. That's definitely a feline, probably a puma.” That means that, potentially, later on, they would have data about the sound of the puma. So that was very cool.

And, also, this active learning loop was making these models better. But at the same time, it was teaching me a lot about how the forest sounds. Because, I mean, I'm not a biologist. Everyone else in the team was a biologist, including my brother, but I wasn't. So I learned quite a bunch just as the model was making mistakes, and I was learning, okay, so at the first iteration, almost 99% of the hits were false positives. Turns out that in each and every recording that we've used to train the model, there was a background of crickets and insects, right? So the first iteration, the model said, “Okay, well, it seems like we're looking for crickets”, but we were not. So at the second iteration, we had other kinds of sounds, but it included machinery sounds from the agricultural activity and stuff. And then they said, “No. That's a false positive.” And so that's how the model was learning about what we were trying to look for, and it was very cool. Yeah.

Yeah. That sounds really neat. So do you have some sense of how good the model was by the time you finished your couple of iterations? Like, what percentage of accuracy did it have in identifying a monkey versus a puma versus something else?

I do. I have the data here. The 1st round, we had less than 20% of true positives. That's the F1 score, actually. That's how this metric is called. So the F1 score, basically, it gives you a number between 0 and 1. And the closer it goes to 1, the better the score is, right? We were looking for more than 80%, which is 0.8. The first iteration, we had around 15%, 0.15. And then the second report, it was better. I think it was closer to 20%. And then the third one jumped to around, like, 85%.

Wow.

Yeah. It was a very quick jump, and we were very happy with that iteration, actually.

That is very cool.

Yeah. Yeah. The active learning loops are the coolest.

So it sounds like you've gotten some really useful results from using different AI tools. I'm wondering now if there are any cases where you've avoided using AI-based tools for some things and if you have any examples of when or why you chose not to use AI for those cases.

Well, first of all, if you're trying to come up with a playlist, you probably will be good using Spotify for that. But don't ever try to use ChatGPT to play the playlist because it literally makes up some names. That's not a good use. It's not great.

Also, I don't work in anything related to law, but we did have a situation where some lawyers were in big trouble because they were trying to use AI for their case, and AI was making up laws and stuff. So, yeah, it was a big mess.

And people try to use AI to cheat on their exams and stuff like that. I'm saying that, not because I am a student, but my best friend, she's a professor in a university, and it's become pretty obvious that people are just using ChatGPT and other tools to try and cheat on their exams, stuff like that. Definitely not a great thing to do because teachers are clever people. And I actually help my friend to write questions in a manner that will be very difficult for AI to give a good answer, or people will have to be very tech savvy to even put the questions in words that would make sense for the AI, at least, in my opinion. And it showed in their grades. So that's one thing.

The other thing: in personal use of certain applications, at least in Latin America, since I'm from Costa Rica: Costa Rica is part of Central America, and we don't have great laws that protect our data from unethical use by big corporations, social media, and other type of agents, right?

So the kind of things that I avoid are limited by the fact that I have to use these applications if I want to have a business in marketing, for example, Or if I want to advertise my music, then I have to deal with how people use my data. However, in other places in the world like Europe, people have different laws that protect them from these practices, but we don't have that in Latin America. So the user experience is different for us.

I don't know if the same happens in North America. I haven't explored that. I believe it was something that lawmakers were talking about in the past. Honestly, I don't know how that went for American people.

Yeah. It's still very much under discussion, and it's very fragmented. Different states have different laws, and they're still working on the federal laws. Yeah, we definitely do not have the same kind of protection here that folks in Europe who are under GDPR have.

And I totally understand with your business that you maybe can't avoid YouTube or Facebook or Instagram or one of the other sites. That's very limiting. You kind of have to put your business information there, even if you can keep your personal information somewhat out of it.

Yeah. And if you do, like, if you want to limit the application's access to your data, because certain things you can opt out. Very limited things. But if you do, then your experience in the application or the software, or this website even, is seriously broken. Terrible ads. So it becomes very annoying.

And at times, I've been saying to myself, okay. No. I'm not putting up with this. And then I give up my privacy for a better user experience, which I think is not very ethical for them to develop their products that way, but just my opinion.

We were talking before this call, you were mentioning some thoughts about targeted ads, both from the business owner's perspective and from a customer's perspective.

Yes. Yes. As I said, I own a small business. We give graphic solutions for advertising. Most of our customers are also small business owners, and they're trying to reach their customers. I work with my founding partner. She owns half of our business, and she's been telling me about certain tools that can look up the user information of your competition online, right? And saying things like, “We could use these emails and target them specifically and do this and do that.”

On one hand, I know how that could be very interesting for some people. On the other hand, I'm not sure how ethical that is, right? Because I don't want to create spam for people or make our customers feel like their data, their privacy has been misused.

And, you know, that's a hard spot for me. I said to her that, probably, we should try to just refrain from using their data that way because I don't want to be part of that problem, right?

I can see how other people who don't question these things don't even see the harm in it, and they don't do it because they want to do harm at all. They just want to grow their business. And it's a tool, a very attractive tool, and people are paying for that. And some other people are making money from that. So that's my perspective from the business owner.

And as a customer, well, I wouldn't like that to happen to me. Many customers don't like when it's very obvious that an AI tool was used for writing headlines or writing what they are listening or seeing in the video.

From a business owner perspective also, I've become aware of certain MLMs and businesses that have risen with questionable claims about their AI solutions for small business owners and for investors. And I've known about very intelligent, highly-educated people, but not technical people, that have fallen for their claims. And for instance, they say stuff like, “We're gonna build your website in 15 minutes with whatever you want”, right? But they don't say stuff like, what happens if you stop paying your subscription? Like, I am paying a subscription so that I can generate websites, many websites in 15 minutes, so I can sell these websites to people. But what happens if I decide to stop paying this subscription? What happens with these websites? What happens with the data?

If these websites are being used for businesses to tell the customers, “You can pay here, put your card information in here.” All of that personal information is going somewhere, and they don't even know where. They take for granted that their data is going to be safe.

And I'm not happy with that. I don't think that's ethical, because in 15 minutes, you give people with limited to no information about cybersecurity at all this false sense of security. And they don't even know the kind of questions that they should ask to these kind of providers.

These people are obviously making lots and lots of money, and many people are opting for this kind of solutions for digital transformation for the small businesses, right? So I personally don't like it. I don't think that can be right or sustainable. What happens if the person stops paying the subscription and then these other businesses were depending on these sites? And then who owns the database? Who owns the data? Who is going to make fair use and take care of the data at all? I don't like that. I feel like I'm very skeptical that these people are gonna be responsible if they are generating websites like that in 15 minutes, you know?

Yeah. I can understand why you would be skeptical about that. That seems really deceptive. And especially if they're creating a website for someone who is trusting that their customers are going to use it, and that their customers' data will be protected, when maybe it's not. So that definitely sounds sketchy to me.

So I want to move to a different aspect of what we've been talking about, with the way that people's data is being used. How do you feel about companies that are using people's data and content for TRAINING their AI and machine learning systems and tools? You know, we talked about companies that maybe aren't acting ethically. Do you think that for an AI tool company to be ethical, that they should be required to get consent from, and compensate, the people whose data they wanna use for training?

I think they definitely should get consent. About compensation, I have ideas, concepts of compensations. What I think is, first of all, there's these guidelines and principles of ethics in AI called FAIR and CARE, right? So a basic thing is consent.

Consent will be part of the Authority to control how the people's data is being used. That's part of the CARE acronym. So, definitely, the first Authority is to say, “Yeah. I'm okay with you using my data or my art or the products of my creativity for your machine”, right?

Compensation will be ideal, and the right thing to do. I don't think it's going to be practical. But at the very least, people or lawmakers, someone should make sure that these companies are giving back to the community that they are benefiting from, right? That's my idea about this issue.

People are not informed about how much they're giving, or that they don't have control about their data. They don't know how their data is being used. They don't know that these companies know how they are feeling, based on their face on the front camera of their phones.

And, also, the tools that are given to people that would help them exert their control on their data are confusing and obscured. Many times, companies don't make it easy for people to opt out. And that's in countries where you have the options. So many other people, they just don't have the option.

Can you talk a little bit more to help our audience understand about FAIR and CARE? Those are 2 acronyms. Can you elaborate on what they stand for, and where they came from, and how they're being used or how you're seeing them in effect?

Sure. I believe they came from the use of indigenous people's data that people came up with these acronyms. 1 FAIR stands for Findable, Accessible, Interoperable, and Reusable data. 2 And CARE 3 means Collect the data and the machine learning and the AI tools generated. They are working for the collective benefit of the studied communities, and these communities will have Authority to control how their data is being used. The R stands for Responsibility in engaging with the studied community. And the E stands for Ethical use and overall application of these principles.

So if I could explain this, for instance, an app may say that they are protecting young users, like kids, from misuse of their data because their parents check terms and conditions, checkbox, and privacy notice upon installation of the application. And then everything that they do with the data then is approved by the users. So they may say, “Okay. We literally had their consent, and we are engaging with the community.” And they are saying that “This is for their benefits. We have it in text.” So that's true, but maybe that's not ethical, because they know that they are using language that's obscured, and they are making it hard for people to understand how their data is being used. And they are not fully disclosing what they are doing.

That's really interesting because they're getting what they're calling consent, but it isn't really informed consent, because people aren't really getting a fair opportunity to understand what they're being asked to consent to.

And, also, just clicking a box, that's really not what I would call “engaging with the community”. That's a one-way transaction, to click a checkbox, right?

Exactly. Exactly. People nowadays are trained to just look for the checkbox, look for the call to action, and just click to make a popup disappear from their screen and just get to what they were trying to do. And that's how I personally believe that that's not fair, because they know how human behavior will lead people to just act without thinking. And, of course, people are not going to sit and say, “Oh, this seems like important text that I should read to protect my child's data”. They're saying, “Okay, I just want my kid to start using this iPad or this product because I want my kid to play, and I can concentrate on this other thing”.

And that's the way everybody does that, so I don't want my kid to be singled out or whatever. And then they don't know how their loved ones' data is being used.

So in my opinion, the only way that we can prevent companies to do this stuff is having a system that looks and studies these practices, and say, “Okay, no, that's not fair, actually. That's not fair, and you're not caring for people. And we're making a law, and we're making a system that protects people.” I feel like, for companies that have these ways of acting towards other people, the only way to do something about it is through the law, in my opinion. Like they do in Europe. And we don't have it in Latin America.

Yep. And we don't have it here in the US either. Not yet. I will say yet. We don't have it yet.

So that's a good perspective, looking at the types of tools that you would want to use and how you view it, as a person who is running a business.

As someone who has used these AI-based tools - and so you mentioned a couple, you mentioned Adobe and a few others, the OpenArt - do you feel like those tool providers have been transparent with you as a user about sharing where the data came from, and whether the creators of that data consented to it to be used?

Not at all. I actually looked for it, especially the Adobe one. I'm gonna be honest. I love using that tool, but I have no idea how they made it, where the data came from, because the images are vectors. I kinda get how they do images in the OpenArt style, but the Adobe one, I don't know. I don't know where this database came from. And the product is pretty good, actually. I prefer that. If I can use it instead of OpenArt, I would use it. The designs are very clean. They make sense, way more than OpenArt does. And from a productivity perspective, that's great. But also as an engineer, I wonder how they do that. Where did data come from? I don't know. I tried to search for the answer, but I couldn't find it.

Yeah. Adobe has talked a good story about only using data that was within their own repositories and saying that they're compensating the creators. But I've heard from a lot of creators and seen some further stories that that's not really the case, that part of what they trained that tool on was actually images created by MidJourney, which did not come from human creators, and Midjourney itself was based on stolen art.

And the other thing I've heard from some artists, including the technical artist whose interview I just published, he was saying that Adobe has basically stolen. He's an artist who's been creating work in Adobe tools for many years. And Adobe basically stole all of the artwork that he had stored on their systems and claimed that they had permission, and they didn't.

So now all of his base artwork has been stolen, and it's been used to train the tool. On one hand, that's why the tool is so good. It's based on some really high-quality art. But on the other hand, the artists who created it haven't been compensated for it and didn't get to consent. So there's quite a lot of controversy about Adobe. They're doing some good things, but they've also done some things that seem really questionable.

Yes. Yes. In that case, that's the kind of thing that I feel like if they wanted to compensate artists, they couldn't say, “Oh, we can't. That's impossible. They can't.” Why? Because they have their information. They even have their banking information, because you have to pay a subscription to use their products, right?

I actually, until, like, last week, I had refrained from using their cloud services because of that, because I didn't want them to own my stuff. But then I had to do it. And now that I'm talking about this with you, I think I'm taking everything down from there, and just work with my local stuff and my other cloud things that I have going on, repositories, I guess, instead of using theirs. I don't want to go through the same thing.

That's what makes it tricky. Right? It'd be good to have the benefit of the tools, but not at the cost of cannibalizing somebody else's art and the way they make a living.

Yeah. And the licenses for these products are not cheap. So it's crazy that you're paying someone for stealing your work, you know?

Exactly. Yes. Yes. Right. And I know you do art and graphic solutions, but you also do music. Can you talk a little bit about music?

Regarding art and AI, I don't think anybody has a right answer to everything because this phenomenon is very new. In terms of music, I think that in the past, where a naturally creative person that wanted to be an artist, a musician, they would probably have access to instruments, music teachers, or academies, time to practice, and resources to learn the classic way and then create their art, record it, and upload it to the Internet if they are part of the Internet era, right?

These situations represent a possibility of their effort to be undermined by automatic tools, and feel like this is unfair to try to compete with a robot, and that it undermines their effort and their talent and everything. But competing against a robot in the music, it means competing for an audience, competing for contracts and their way of living, basically being affected by that.

And, of course, at the end, people end up with lower-quality art, soulless art. But for a person with limited resources who has no access to fancy instruments or music academies’ teachers, these tools can be used to spark creativity in them and, you know, even give them the option to create art, which they didn't have. They can be an artist for a short period of time, or have the experience of creating something, have it more accessible to people. I think that's a good thing. That means that it's more - I will use this word - democratic. It makes it so that more people have the power to become artists in some way and experience what creating art is. So I think that it definitely makes the artwork more chaotic than it currently is, but that's the nature of art in my eyes, at least.

It's not great that other people are disempowering classical artists or creating soulless art just for the sake of money. That's not great in my eyes. But the other side of it, I think, is a good thing.

Actually, if you look at the history of music, the further you go on the past, the greater amount of privilege a person will need to be an artist and to live from the arts, right? Right now, we have platforms and tools that allow people to create art. People who wouldn't have that option, they have the option now.

For instance, you can think about someone who can't move, they can’t talk, but they can interact with the world, and their mind can interact with the world and create things via a computer. They can use AI. They can use these tools to create their arts, and to create music, and that's given them this human experience of being an artist that wouldn't be there in the past for them. I hope that made sense.

Oh, it does. And there are some cases where they've used AI-based voice cloning to help an artist who has had a physical problem and they've lost their voice. But they can still create new songs based on an AI that was trained on their old voice and their old songs and have it perform a new song for them. And it's helped them to revive their career and still be creative.

So there are definitely positive applications from all of this. But there's also the downsides, and that's where we have to look at the tradeoffs.

My friend, he's got a golden voice, a beautiful voice, and - I don't know how to say this in English, ‘locotor’. He studied how to use his voice for narrating and everything, like radio people. He has this amazing talent, and he has worked hard on being articulate and using his voice. And then he's worked hard trying to get these contracts and working with people and leaving his old job and trying to work full time on his craft. And turns out someone used an AI tool to clone his voice.

Oh, no.

They make advertisements, and they are selling his voice audio very, very cheap. So for instance, maybe I have a bakery, and then, paying some guy for the audio of a voice that sounds like my friend’s, saying stuff about my bakery, I'm using it for ads and stuff. And that's obviously not fair, right?

Right. Yeah. He should be the one who can create an AI model of his voice and make money from it, not somebody else.

And not even famous people are safe, you know? I believe there was an actress. She was asked if they could use her voice, I think, and she said no, and they still used it.

That was OpenAI, and they stole Scarlett Johansson's voice for a voice in their latest tool that they called Sky. And they asked for her permission to use her voice. She said no, but then they apparently went and used it anyway.

People who are famous can fight back. They can make more of a fuss about it and embarrass the company into doing the right thing. But most people like your friend don't have the means to do that.

Yeah. Yeah.

But there's what they call up here N I L: Name, Image, and Likeness. The example here is, in the state of Tennessee, they passed something they called the ELVIS Act. The letters for Elvis' name spelled out the name of the bill, and the idea is to protect the voices of performers.

But again, this is one state. This is not the whole country. And it doesn't cover, necessarily, cases like your friend being a voice artist. I think it was aimed more at performers or musicians. So we definitely have a patchwork of laws here.

Like a voice actor - I think that's the name.

Yeah. I'm sorry to hear about your friend losing control of his voice through someone stealing it. That's really terrible.

From the perspective of someone who built an AI-based tool, for instance the models that you trained for the sounds, do you have anything you can share about where the data came from and how it was obtained?

Well, the sounds were obtained from a research team. They recorded sounds from the jungle. They call it ‘soundscapes’. Basically, they are using 2 different types of microphones, and these microphones are set up in several parts of the jungle, and they automatically record at the same hour for an hour, right?

So that's 60-minute recordings, and they just put them in the jungle, then that's how we obtained the data. Also, some of these recordings were shared by a team of experts, and they manually annotated what type of calls they were listening to and what they thought were the kind of call that the monkeys were doing. And then based on that, we created the whole thing. However, that's how the data came into our hands.

And during the training of the model, since I trained them locally the first couple rounds, I tried my best to use them in an ethical way. But then for the third time, we had to use GCP for training the model. And to be very honest, I'm not 100% sure about how they deal with the data or how they will use or misuse the data that we placed there. I didn't read the fine print.

I don't think it's gonna be very problematic because it’s sounds from the jungle. It's not bank information or anything, and I can't imagine Google doing something harmful to the monkeys with that. I hope that doesn't happen, and that's why I basically, with my brother, took the risk to just process that thing in Google. That's how we treated the data.

How about with regard to your personal data or content? Because as members of the public, there are certainly cases where it's been used by AI-based tools or systems. So do you know of any cases that you could share, without disclosing any sensitive personal information, of course?

It's a common practice in supermarkets that they would ask you for your ID number when you're in line for paying. And then they will just randomly ask for your ID number and your full name and your phone number, and people just give that information.

I once asked “Why would I do that? Why do I have to give you my ID number? It doesn't make sense.” And the cashier looked at me like I was asking a very dumb question.

And turns out, certain discounts and certain things, the supermarkets use that to lure people into giving up their information. I guess that's a fact of life now. But, yeah, that's something I have seen in different opportunities here in Costa Rica.

Yeah. So loyalty programs are well known up here. You give them your personal information, and in return, you get small discounts and they track what you buy in your grocery cart and which things you buy with which other things. And that's one area that I know people's data has leaked out without them necessarily realizing it.

But if you have a personal national ID number, up here it's a Social Security number, they've actually taken the steps to say that no, companies aren't allowed to ask you for that. That's too personal, and it's too much of a risk for identity theft.

But other things, supermarkets here, they always, you know, “What's your phone number?” And I've taken to saying, “I don't give that out”. And then usually they go, “Oh, okay”, and then they move on. Sorry, you don't need my phone number. I don't need more junk calls on my cell phone.

Yes. What's terrible is that sometimes it's your friends or your family who gave up that information. I saw one paper that - like, you go to the gas station, and then if you spend more than $20, then you are entitled to one of these little paper - that then you are participating in a raffle of a car at the end of the year. It's a whole thing. Many gas stations do that here, but they don't only ask for your information. I saw one that was asked “Put information of a friend that you will go on a trip with if you won the raffle” or stuff like that.

Oh, that's really interesting.

That's another strategy for people to access personal information.

It's really pervasive, and our data's been leaking out everywhere over time. Do you know of anybody that you gave your data or content to that actually made you aware that they were going to use your information for training an AI or machine learning system?

Well, most of the software that I install in my PC asks for the chance to monitor my work or my system performance or how well their software is with my data. I think they call it diagnostic data. They might be using the data ethically, but they might not be using it ethically. They are making me aware that they intend to do it. And some of the software pieces right there in that pop up, you can say, “No, don't do it.”, right? But online tools normally don't put that as a popup option. And you have to go to a certain application in your phone, you have to Google how to opt out of them using your data.

Also, especially, your phone asks you if you give permission of using the camera or the microphone, or looking at your photos and pictures and videos or your contacts, right? I don't like giving access to the whole gallery of photos. I feel like that's how we get into this kind of mess. I don't like it when tools say, “Hey, is this your friend, Isaac, or is this Jenny?” And they look at your pictures, and they are asking you if that's your friend. I can see the use of that, but I find it dangerous. I don't like it. I don't trust people like that.

A combination of wearables and your phone can tell if you are getting emotional about something, maybe angry, aroused, if you are sad, if something is triggering you, even if you're eating while you're doing something, right? And that's markers about your emotional states.

And then at the end of the day, all of this matters is: they don't do it because they want you to be happy or to experience the emotions, right? They're trying to process the data, analyze it, get insights, and use that to make money somehow, or to make more money out of your emotions, or the kind of ads that they give out to target audiences.

I feel like that way of interacting with emotions of people is dangerously being used in political debates in different countries all around the world. People get very emotional about everything. And this is a very long discussion that we could have about this topic, which I am also passionate about.

But in short, I think that this is very apparent, very obvious to me that these companies don't care at all about the mental health, emotional health of our society, in the way that they manipulate people and they use the data about how people feel.

Now you have headphones or earbuds that measure your heart rate and even other things, I believe. I'm not sure if this is fake news or not because you can't trust things in the Internet anymore, it seems like. But they say that there was a patent of a device that is like a headphone that could get certain signals from the brain and - not read your mind, but kind of understand and know more about what you're thinking or what you're perceiving.

And I don't think they are doing it for ethical purposes. They are trying to get money out of that, right? And they are not going to care about the people who are using it. What if it's kids? People from all ages. I don't think our system, the laws, are ready or prepared to even ask the questions that will protect people from predatory behavior of these companies. That's what worries me about this whole thing.

I love technology. I love that. I want you and your audience to know I love technology. I'm not one of those people that say, “You know, everything was better in the past, and I don't like technology. I don't know.” I love it. I really love it, but I feel like we need to be on the lookout for our fellow humans that may not be as technically aware, about the possibilities of their data and their experience to be used for monetary gain for people that don't care about them.

So as we've been talking about, public distrust of AI and tech companies has been growing and partly because of some of the things that they're doing that we are starting to realize they're doing that we don't like. What is it that you think is the single most important thing that these AI and tech companies could do to earn and to keep your trust? And do you have any specific ideas on how they could do that?

Well, first, we can go back to FAIR and CARE. FAIR and CARE, if we go back to the acronym, Findable, Accessible, Interoperable, Reusable, to me, it means that if you are creating these datasets, especially if you got this data from people, if you stole it, right, that somehow that data has to be available for others to humanity to continue to learn about the world and do good.

About the CARE one, if you're probably not gonna be able to compensate people for how they contributed to your model or your business in AI, at least give back to the communities. If you're stealing art from artists, at least could donate some of your income to arts programs or something like that. That's the only way. Because it seems like people forget that the most important thing is the human experience of all humans. And they put that in the lowest priority and just think about “How can I make more money?”

Generative AI-related, it will be great if they could be public and clarify about how they will pay artists – for using their data, for training their models – or any other people, any other community that they are taking from so that they can train their models. And if they can't do that, for instance, if their models are trained on data that favors anonymity from the community, at least they could give back to the community. Because everything that they do, everything they have and they produce, they owe it to a community of artists or a community of people that put their data available for them. So that way, they could at least be able to show some responsibility, in my opinion.

At least in Costa Rica, I could mention the University of Costa Rica or other public universities. In my country, public universities tend to have very high-quality, low-cost programs. And they train people in art, in technology, languages, agriculture, many different kinds of fields and disciplines. And that is all over the country. So I feel like that will be a great way to help or to give back to the community, making sure that their money is going towards high-quality, state-of-the-art, low-cost, very accessible tools for the people.

Sounds like you're suggesting, then, that a way that companies could give back to a community, especially in a case where the data is anonymized, is that they could support some sort of a charity that's focused toward that community. Is that a fair statement?

Yes. I'll make an example. Say they are taking data from Reddit. I feel like it's impossible to give back money to people in Reddit, right? I feel like that's pretty impossible to do. Or even chats in video games, or places where people can chat, leave comments, stuff like that. So I feel like a good way to give back to the community is maybe investing in ways that Internet access could be more accessible to people or giving a better voice online to certain communities or stuff like that. Giving back to the community, empowering people to be part of a more active, more accessible online world. That would be my suggestion.

Okay. That's a great suggestion. I think that makes a lot of sense. As you said, these companies are not separate from the community and society - they are part of it. And so it makes sense that if they are going to profit from it, that they should support and engage on it.

Just as I became aware about my models being biased by the cicadas and the insects' noise instead of the monkeys, they can become aware about the biases and the lack of representation of certain communities in certain spaces and conversations, right? People will have to deal with those biases, and they will be very mindful of them. So at least report on that, making sure that people know about these biases, and make them real for the public. Because I feel like if a company reports about it and make sure people know, then it will not be just noise in the Internet, but it will become real for some people, I think.

That's a great suggestion. So thank you for sharing that.

Is there anything else that you'd like to share with our audience?

I guess I will like to reiterate and advise that we live in a world where software and AI, artificial intelligence models, are not only being trained by us in our production of art and text and conversation, but at the same time, by the constant use of the tools and indirect ways that we interact with it. We are being trained by AI, which means we have to take responsibility about the kind of thoughts and the kind of synapses and models that we train in our wetware, as you say, in our brain.

I suggest that people, just as we think about taking courses about programming and being fluent in programming languages or knowing computers and stuff, I think humanity is in a place that we need to start actually working into understanding how our emotions work, how we react, and how we respond to this kind of stimuli.

Because it does have a real impact in our quality of life and even an impact in how our society makes decisions and our society treats people. And we can't afford to lose sight of that as a global community, in my opinion.

Yeah. It's a great observation. Thank you, Anyela!

Well, it's been really great having you as an interview guest. It's been a lot of fun and great to reconnect with you. So thank you so much for sharing your time with me.

Sure. And thank you, Karen, for making these spaces and being part of the voice of sanity about AI, I think, in my opinion - that's refreshing. And thank you for everything you do, Karen, to spread the word about this. Thanks.

Alright. No, thank you. It's my pleasure.

Interview References and Links

Anyela Vega on LinkedIn

Leave a comment


About this interview series and newsletter

This post is part of our 2024 interview series onAI, Software, and Wetware. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools or being affected by AI.

And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”!

We want to hear from a diverse pool of people worldwide in a variety of roles. If you’re interested in being a featured interview guest (anonymous or with credit), please get in touch!

6 'P's in AI Pods is a 100% reader-supported publication. All new posts are FREE to read (and listen to). To automatically receive new 6P posts and support our work, consider becoming a subscriber (free)!


Enjoyed this interview? Great! Voluntary donations via paid subscriptions are cool; one-time tips are deeply appreciated; and shares, hearts, comments, and restacks are awesome 😊

Share 6 'P's in AI Pods


Series Credits and References

Audio Sound Effect from Pixabay

Thanks for reading 6 'P's in AI Pods! This post is public, so feel free to share it.

Share

1

Operationalizing the CARE and FAIR Principles for Indigenous data futures”, by Stephanie Russo Carroll, Edit Herczog, Maui Hudson, Keith Russell & Shelley Stall in Nature / Sci Data 8, 108, 2021-04-16.

Article includes 2 links to the Research Data Alliance (RDA):

Discussion about this podcast

6 'P's in AI Pods
6 Ps in AI Pods (AI6P)
AI is affecting People & Places we care about, Practices & Processes we use every day, and Products & Platforms we build & use. Want to understand how use (& misuse) of data and AI impacts these 6 'P's, and what you can do?
This podcast is for you! We share episodes on ethics in AI, use of generative AI in the music industry, and selected audio interviews from the new “AI, Software, & Wetware” series.
All words are 100% human-written - we do not use AI content generators (unless as a clearly-labeled demonstration of the technology).