6 'P's in AI Pods (AI6P)
6 Ps in AI Pods (AI6P)
🗣️ AISW #053: Erin Spencer, New Zealand-based PhD student in health sciences
0:00
-28:41

🗣️ AISW #053: Erin Spencer, New Zealand-based PhD student in health sciences

Interview with New Zealand-based PhD student in health Erin Spencer on her stories of using AI and how she feels about how AI is using people's data and content (audio 28:41)

Introduction - Erin Spencer ()

This post is part of our AI6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.

This interview is available as an audio recording (embedded here in the post, and later in our AI6P external podcasts). This post includes the full, human-edited transcript.

Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence? for reference.


Photo of Erin Spencer, provided by Erin and used with her permission. All rights reserved to Erin.

Interview - Erin Spencer

I’m delighted to welcome Erin Spencer as my featured guest today on “AI, Software, and Wetware”. Erin, thank you so much for joining me on this interview! Please tell us about yourself, who you are, and what you do.

Yeah. Thanks so much. So I was born and raised in the US, specifically in California. Little bit of my ancestral background: so my maternal grandfather immigrated to the States from what used to be Czechoslovakia, and I also have some West African ancestry as well.

Previously, I trained as a government analyst before making a move over to Aotearoa in New Zealand, where I'm just about to finish up a PhD in health sciences. I mostly research topics in outdoor health. So impacts of time spent in nature as a mental health intervention for LGBTQIA+ and POC communities here.

That sounds great. That is such an interesting thesis topic, Erin. I see so much advice nowadays about telling people to “go touch some grass” to deal with stress. But I haven't seen much about the impact on LGBTQIA+ or people of color. So that's interesting to hear, and I'm curious to see what your findings are.

I know for me, walks and gardening and even weeding can feel very therapeutic, and I would love to hear more about your PhD work. And I hear New Zealand is beautiful. What a great place to do your studies about the effects of nature on mental health!

Yeah. Absolutely. That's part of why I came here. And also uplifting the voices because, as you said, you don't hear about us that much.

So what is your level of experience with AI and machine learning and analytics? Have you used it professionally or personally, or have you studied the technology?

I've just started looking into it more recently, after using AI transcription for my research and seeing the ethics committee waffle back and forth – on what can and can't be used, and how it can and can't be used, for research. So I've attempted myself to use it for other projects as well, but I didn't really like the results. And I guess the other experience is I do casually work on training ML systems.

Thanks. Can you share a specific story on how you've used a tool that included AI or machine learning features? I'm wondering about your thoughts about how the AI features of those tools worked for you. What went well? What didn't go so well?

What we're meant to use for our research here is Microsoft Copilot. That allows you to also use transcription services. So for my interviews, I noticed that it started randomly censoring certain terms and phrases. So here in New Zealand, one of our official languages is Te reo Maori. And so when discussing genealogy, it's common to use the phrase whakapapa”, which the transcription service censored.

And then also, one of my participants – obviously, we’re all intersectional and queer – and so one of the phrases was “gay sex”. And so I noticed that that was bleeped out with little stars, and I had to go back and listen to the transcription to see what that was. So yeah, I transcribed my interviews, but obviously with those weird little hang-ups, it didn't feel like a reliable tool for specifically what I need it for.

Yeah. That's disappointing that they bleeped it, and it's so interesting. I'm wondering now if other major transcription tools have the same failings. Like, now I'm really curious to see what Zoom and my transcript creation tool are going to do for the transcript!

Tell me what happens when it comes to whakapapa. Tell me what it says!

I will definitely include that! [Readers: see footnote1]

So that's a good overview of how you've used or tried using different AI-based tools. Are there any situations where you have avoided using AI-based tools for some things, or for anything? And could you share an example of when, and why you chose not to use AI for that?

Yeah. Like I said, as I start to dive a little bit deeper, I've taken Siri off of my phone. And then after that weird censorship, I did try again to use NVivo coding. So it's a software that is used for a lot of, especially, qualitative work. And so I think that can work well?

It made a really nice color wheel of common phrases that were used across all of my interviews. But then there's this new feature that allows you to drill down even deeper into themes. And you can select themes based on sentiment, so, references to negative or positive feelings.

And it was coding things like the use of Red Zone, which is a description of a location here, as negative. And, also, there's a really popular beach that people go to, called Taylor's Mistake. And so then when that was referenced, it was also coded as negative. So that just kind of reaffirmed that maybe these tools aren't quite what I need them for personally. So I stopped using it after that.

Yeah, so they didn't know the difference between a proper noun, the name of a place, and a phrase that in some context has a sentiment.

They can't really pick up on the context. Maybe one day, which is scary.

Right. A beach is normally not a negative sentiment for most people!

Yeah.

Those are all great examples of reasons NOT to use large language models. I wasn't familiar with NVivo before this, but now I'm going to have to look it up, and I'll add a link in the interview for people that would like to know more about NVivo coding. That's good to learn about that. [link: lumivero.com/products/nvivo/]

You know, a lot of my guests have had a lot of examples of ways that they use AI-based tools, and I usually have to prompt them to think of times when they have avoided AI. You're more on the other side of the spectrum here, where you have lots of examples and great reasons for NOT using them, and not many really where you choose to use AI. So I'm wondering if there are other times that you're actually using AI-based tools, and getting benefits from it, without necessarily CHOOSING to use those tools, or even realizing that you're using something that has AI or machine learning under the hood. [Reference: “But I don’t use AI: 8 Sets of Examples of Evereyda AI Everywhere”]

Yeah. So a lot of things, I guess, I didn't even really consider until we had started kind of talking about it. So the way that your photos get grouped, and those memory reminders, like “on this day”, that shows you heaps of photos kind of lumped together from, like, Christmases or specific days in the past. So those are great. I love those.

I also started using music suggestions. But recently, we had a digital humanities lecture here. And somebody mentioned how Spotify puts AI-generated songs in their recommendation list. So you'll have a list called, like, “happy mountain hiking” or something, and then they'll add in these AI-generated songs. I don't use Spotify, but hearing about that made me scrutinize my Apple Music playlist a bit more. But overall, I really enjoy that as well, because it's pretty spot-on with the kind of music that I like. And I have so many songs downloaded to Apple Music at this point that trying to go through and curating my own playlist is a bit daunting. So having it look through the thousands of songs, see what I like to play most often, and generate a list for me, yeah, that's been really, really cool to use.

You had also mentioned in your notes about Netflix. Did you want to say a few words about that?

Yeah. So, I didn't realize that this was AI, but then, I guess, yeah, looking into it - ah, it's so insidious, right? So the little thumbs-up and thumbs-down and the double thumbs-up that they added. I really love interacting with that. And I guess I thought it was just me letting Netflix know whether or not I thought like a movie or a show is crap. But I guess it's also me training it to show me more of what I like. And it's just so crazy that I didn't put those two things together until you asked me about it. Yeah, I thought it was like a fun interactive thing, like thumbs-up, thumbs-down. But, no, I'm training the system in some way.

Yeah. Those are really good examples. It really is insidious. I use this analogy of AI being like an iceberg, where there's some of it that we see - it's above the waterline. But there's SO much more that is under the waterline that we don't even notice or see sometimes. And it's there, and it's affecting our lives, but we don't always realize it.

But it's really interesting that there's so much focus on where AI and machine learning tools get the data that they use for training before it's released. We'll talk about that next. But we're actually giving the tool companies even more data, very valuable data, whenever we USE their systems – because all of those thumbs-up and down, and the amount of time that we spend listening to a playlist, those are all actions that give them useful data to help them continue to train and evolve their systems. And I think we overlook that part of it sometimes too.

Absolutely.

So as I mentioned, one of my standard questions is asking about where AI and machine learning systems get the data and the content that they train on. A lot of times, it comes from things users have put into online systems, or that they’ve published online. And companies are not always very transparent about how they plan to use our data when we sign up for these services or use them.

I'm wondering how you feel about companies that are using data and content from people for training their AI and machine learning systems and tools. And specifically, for a tool company to be ethical, do they need to be – or should they be required to – get consent from, and compensate, the people whose data they want to use for training?

Yeah. Going back to just my research, all of my interviews are very personal narratives, so it's considered sensitive data. And the idea that their transcripts could be used for some, you know, tech bro’s million dollar machine …

We do have ethics approval that we go through at the university. But it just can't keep up with the speed at which these machines are learning and progressing. So within a year, we were told to use Otter.ai, and then “don't use it”; and then “use Microsoft, but maybe don't use it if you can help it”.

So I would really appreciate the Consent, obviously, given what I have to use it for personally. And the Compensation would also be really great. Dealing with marginalized groups would be incredible in any way that you can, you know, assure that. And then I also get paid basically pennies to work hours on end training these machines.

So I do have little hope that anyone else would be compensated for their data being mined, from these corporations.

Yeah. From the interviews that you do, you had mentioned earlier about an ethics board, and looking at some of the aspects of the kind of tools that you wanted to use there. So with using Microsoft: from what I've heard, if you use that as an enterprise tool, they promise not to use the data that they collect from the tools for training the tool. Do you know, does that also apply for the way that you're using it? So do you feel safe that your interview contents are not being used by Microsoft for training?

I don't. And especially because there's that little caveat where it's like, “If you have to use it, use Microsoft”, because that's apparently the BEST option that we have. But, again, I'm just very critical of any kind of corporation guaranteeing anything. And like I said, it's a vulnerable community. So I'm as protective and overprotective as I can be for the narratives that are being shared as well.

I've looked into some of the issues around ethics and whether or not tools are fairly trained, and where companies get the data that they use for training. And there is a certification called Fairly Trained that some companies have gotten. This started a little over a year ago, but Microsoft is not one of them. None of the major AI companies are pursuing these certifications as far as I can tell.

There's only one large language model that I know of – and it's for legal purposes – that was certified as Fairly Trained. That's K3LM. And all that means is that the content that it trained on wasn't blatantly stolen, that it was originally licensed, they got consent, or they compensated the people whose data they used.

And there is an ISO 42001 certification that might cover whether or not a company protects the privacy of the data that people provide when they use the tool. But as far as I know, Microsoft hasn't gotten that certification either. The only one that I know of that has is Anthropic. They recently got that certification for Claude.

But I don't know of any other certifications that cover bias, or censorship, or exploitative data enrichment practices, or any of the other aspects of ethics. So there's obviously still a lot of room for improvement and a lot of work to do on making sure that AI tools are ethical, and the people who use them know whether or not they are ethical, and whether they'll protect the contents that people are putting into it.

Yeah. Making it very clear as well.

Yeah! And I've seen some people talking about how some of these newer, more efficient large language models could potentially be run locally on a computer and not on the cloud. And that would mean that the data that you would put into them wouldn't be shared, and it would stay private. Setting that up is not exactly something that a typical LLM user can easily do, at least not yet. But I'm wondering if that might be a good choice for situations like yours where the data is highly sensitive and confidentiality is critical. Do you think that would be an option for you?

It's all kind of contingent on the university and what its policies are. So right now, everyone uploads to the university's Microsoft cloud. So that's where all of the information is meant to be stored. And we're the researchers. We have to consent to doing it this specific way. But, honestly, like I said, the ability to keep up with all of this, it's going to be a test for these higher institutions. And I think that could potentially be, yeah, a good use and a good option.

As someone who has used AI-based tools, do you feel like the tool providers – you mentioned Otter before, or Microsoft – do you feel like they've been transparent about sharing where they have gotten the data that they've used for training their models? Or whether the original creators consented to the use of that data?

I wouldn't think so. It could be that it's in those user acknowledgments that obviously nobody reads. I haven't tried it yet, but I would say one of the most transparent AI systems is the new DeepSeek. Because, again, as I'm learning more and more about this, I'm told that it kind of shows you how it's thinking or how it's gathering its information in real time. But, again, I've tried to download it and use it myself, and I just don't have time, and I'm getting weirded out at it all. So that's what I hear, though, is that it's one of the most transparent.

Yeah. That aspect of showing where it got its information from and providing sources, that goes to the Credit part of the 3C's. And a few of the newer tool versions claim that they have this traceability, or what some people call ‘data governance’, to show where they got the data from that they're referencing. But this is definitely still a weak area in most tools, so it's interesting to see if DeepSeek is also pushing the state of the art in that aspect as well as the efficiency.

Absolutely.

So as consumers or members of the public, our personal data or content has probably been used by AI-based tools or systems. Do you know of any cases that you could share? Obviously without disclosing any sensitive personal information.

Yeah. I imagine most of my information has been used all throughout Meta's platforms. Like I said, I don't make a habit of reading the privacy acknowledgments. So they could be telling exactly, or they could be explaining exactly, how my information is being used. I know a lot of it's probably for marketing, I would imagine.

Yeah, marketing and training their models are probably the two biggest uses. One of my 2024 guests had told me about this great podcast series called What's in my EULA?! [E U L A]. A lawyer named Joel McMull dug into what's actually buried in the “Terms and Conditions” of 10 common online systems. Of course, the companies change those terms all the time without warning, and unfortunately, they didn't do an episode on Meta. I'm not sure why not. But the episodes might be worth checking out if you have other tools that you're wondering what's really in there. It really does take a lawyer to understand it.

Exactly. Yeah. I would say so. It's kinda like, do I want to see how the sausage is made or not?

Yeah, they tend to be 10 or 20 pages, and it's legalese. And there was a survey recently - like, over 90% of people just never read them. And, honestly, you can't blame them, because if you can't understand them, why put in the effort? But it's not a good way to get consent at all.

Yeah.

Do you know of any company that you gave your data or content to that actually made you aware that they might use your information for training an AI or machine learning system? Or did you ever get surprised by finding out that they were using it?

So, yeah, I always get upset at the thought of my messages, and photos especially – I have a young child, so – being used to train these AI models. But then again, not reading and just consenting to gods know what, what you're signing up for using the social media app. So I just assume they're all doing something in the background. And at this point, I think I've mostly accepted that that's partially the price that we pay to use these services, unfortunately.

I have made the decision to delete most of them. So currently, I still have Instagram, but I only use it on desktop. So my information and interactions on the platform are much more limited when not using the app.

I know a lot of people use Instagram for professional purposes. Are you using it professionally, or personally, or a mixture of both?

Yeah. It was mostly personally. And so rather than upload photos to Instagram, I'm just sending them through Signal, to basically my mom and my sister, which were, you know, the only people keeping up with me anyways. So things just kinda shifted. We were on WhatsApp, and then I told everyone, “Okay, I've shut that down, follow me on Signal”. So they're also trying to keep up with, like, where I'm going, and keeping everybody on their toes, I guess.

Yeah, my family made a similar shift toward the end of last year, when we realized what what's going on with some of the information. We had some in Slack, and there were some things that were in Facebook, even though the albums were protected and supposedly private.

Mmhmm.

When they announced that they were going to start using your stuff anyway, that was kind of the last straw for us.

You mentioned you have a young child, so I'm curious about that. I've interviewed a few people who have talked about protecting children from this. A lot of new parents will – some people call it ‘sharenting’, where they're oversharing about kids, and always sharing pictures of the kids and posting them publicly. And, unfortunately for the kids, that stuff's gonna be out there forever.

Exactly. Without their consent, right?

Right. Yes. Exactly! So it's neat that you've moved to sharing your personal photos privately through Signal.

Yeah. Absolutely.

And it's really hard to get off those platforms. I hear more and more people who are doing it lately. Like, switching to Instagram only on the desktop is certainly a way that gets it off your phone and keeps them from seeing everything else that's on your phone.

Yeah.

I just switched this past week from Google Maps to MapQuest. And I'm pretty locked into Google's ecosystem, but this felt like a good first step towards breaking that lock for me.

That would be, yeah, harder, honestly, than Instagram for me. Because every single time that I hear about somewhere, I put a pin in it on Google Maps. So I have pins all over the globe of places I want to go! So, yeah, that's gonna be a harder one to untangle from for me.

Yep. That's true. So has a company's use of your personal data and content ever created any specific issues for you, such as privacy violations or phishing or loss of income? Or do you have any examples?

Yeah. So I've, throughout my life, gotten so many alerts about data leaks. Recently, I received a notice about another class action lawsuit from a major company that had a security breach. I remember one breach was so major, I was offered credit monitoring for me and my family for life. So in one sense, I've had no real issues because I do get those constant alerts of anything major. But then, I do notice that the alerts are pretty frequent, which is, yeah, kind of telling.

Yeah. I don't know of anyone, at least not in the US, that has NOT had multiple data breach notifications in recent years. My husband's actually a little suspicious, though, of the credit monitoring service. He's like, “Okay, so in order to get this protection, you have to GIVE them all of this information that you're trying so hard to protect?”

Exactly.

So he's a little hesitant about using them, even though we've gotten notified of breaches and been offered these services. Like, “Nah, we don't know those people. We're not giving them our information.”

Yeah. That's true. I mean, this happened maybe seven years ago, so I've just been with them ever since. And, yeah, I probably should be a bit more critical about these things.

Well, in a lot of cases, when there is a data breach, they provide your information to the monitoring company already. So they've already got it, I think, in a lot of cases.

Yeah.

So you're not really giving them anything they didn't already have.

Yeah.

I mean, it’s good that those services exist, and hopefully can prevent these breaches from being exploited by people that shouldn't be doing that, obviously.

Yeah. And it was mostly my, you know, having a kid. So she has a new Social Security number, so they monitor that as well. So making sure that nobody's kind of using our Social Security to take on, like, mountains of medical debt or something. So, hopefully, I would get alerted to something like that.

Yeah. One thing that we are seeing, you know, with all of these breaches and with finding out what companies are trying to do with our data, is that public distrust of these AI and tech companies has been growing lately. What do you think is the most important thing that these companies would need to do in order to earn and to keep your trust? And do you have specific ideas on how they could do that?

Gosh. Yeah. So this is, like I said, a new kind of field that I've been researching. But someone that I would point to is Ruha Benjamin. She wrote this amazing book called “Race After Technology”. So it's “Abolitionist Tools for the New Jim Code”.2

She's also held some amazing talks and even recently did a podcast with Trevor Noah where she discusses technology and how AI systems inherit its creator's discrimination. So, essentially, what she was saying is these technologies act as an extension of racial discrimination and oppression. So with the dismantling of diversity initiatives in the US right now, it'd be really great if tech companies could affirm their commitment to including a range of lived experience on their development teams.

And Ruha also points to the Yale School of Medicine, I think it was, they had an AI intelligence based on the patient's self reports. And it didn't have, or it didn't recreate, the kind of anti-Black bias that you can see that's embedded in the doctor's reports. And so she explained in that way, it was more accurate and less biased.

Yeah. Bias in AI systems, like you said, it comes from the data, the datasets that are out there that already reflect existing biases; and it comes from the biases of the creators who design these systems and don't necessarily think about, “Well, how does this impact a marginalized group? Or how does this handle the accuracy on underrepresented parts of the dataset? or people?”

Joy Buolamini reported on the biases in skin color, and how these security systems weren't recognizing faces with darker skin and not letting them access the system or turn on the lights or all kinds of other things. And those biases then get incorporated into these systems and sometimes even made worse.

So it's definitely a problem in having greater representation and greater diversity on the teams that are doing the development. It's not a panacea, but it certainly would help to reduce the likelihood of those biases slipping through and going into production.

Absolutely. And we're talking recently here in Aotearoa about data sovereignty for indigenous communities as well. So just having different people, you know, diversity on these committees and in these corporations, we’ll be able to have someone to speak to all these different things that obviously aren't being considered as much at the moment.

You know, it's interesting that I've heard of Ruha Benjamin, and I actually follow her on LinkedIn, but I somehow missed knowing about that book! So now I'm going to have to go look for it.

Great, yeah, she's made some great ones, absolutely.

Yeah. But it really is kind of disheartening to see how so much hard-won progress in equity and inclusion is being set back in recent weeks by decrees, and it's great that so many people are speaking up, though, and taking action. So thank you for sharing that book with us. I'll include a link to it in the interview so people can go find it.

Yeah.

Well, Erin, thank you so much for joining me today for this interview. Is there anything else that you'd like to share with our audience?

I mean, I guess if you want to find me, I have a Substack, where you and I have met. There's a bit of my thesis work on there, a little bit of amateur poetry, random personal essay. I called it “Microdosing Chaos”, and I knew that we had heaps of chaos ahead of us. So my whole thing is, let's just take it by tiny little microdoses at a time.

Great! I love hearing explanations for where people come up with the names for their Substacks. It's always an interesting story. And I'm so glad that I found you there. So we'll definitely include a link to your newsletter and your profile, and then I'm going to go look for the articles on your thesis work. I really want to know more about that.

Thanks. Great.

Well, Erin, thank you so much for making time for this interview, and, good luck on completing your thesis!

Absolutely. Appreciate it.

Interview References and Links

Erin Spencer on LinkedIn

Erin Spencer on Substack

Leave a comment


About this interview series and newsletter

This post is part of our AI6P interview series onAI, Software, and Wetware. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools, or are being affected by AI.

And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”:

We want to hear from a diverse pool of people worldwide in a variety of roles. (No technical experience with AI is required.) If you’re interested in being a featured interview guest, anonymous or with credit, please check out our guest FAQ and get in touch!

6 'P's in AI Pods (AI6P) is a 100% reader-supported publication. (No ads, no affiliate links, no paywalls on new posts). All new posts are FREE to read and listen to. To automatically receive new AI6P posts and support our work, consider becoming a subscriber (it’s free)!


Series Credits and References

Audio Sound Effect from Pixabay

Microphone photo by Michal Czyz on Unsplash (contact Michal Czyz on LinkedIn)

Credit to CIPRI (Cultural Intellectual Property Rights Initiative®) for their “3Cs' Rule: Consent. Credit. Compensation©.”

Credit to Beth Spencer for the “Created With Human Intelligence” badge we use to reflect our commitment that all content in these interviews will be human-created:

If you enjoyed this interview, my guest and I would love to have your support via a heart, share, restack, or Note! One-time tips or voluntary donations via paid subscription are always welcome and appreciated, too 😊

Share

1

Here’s what happened with 4 transcription tools. None of them ***’d out any of the words. The reference to “gay sex” was uncensored in all 4 tools. For the two uses of “whakapapa”, here’s what the tools did. [I substituted *** below to keep this public transcript kid-safe. It will be interesting to see if the external podcast providers synched to Substack flag this episode as not safe for children.]

  • Zoom’s “video transcript” file (.vtt) and closed caption log (.txt) both transcribed the first use as “fakapapa”, and the second one as “f*** a papa”.

  • Restream transcribed the first use as “whakapapa” and the second use as “f*** a papa”.

  • ConverterApp transcribed both uses as “fakapapa

  • Substack’s built-in transcript generator transcribed the first use as “phakapapa” and the second use as “f***-a-papa”.

2

Race After Technology: Abolitionist Tools for the New Jim Code”, Ruha Benjamin.

Video: (full transcript is available on Berkeley website)


Share

Discussion about this episode

User's avatar