Introduction - Jessica Parker of Women writin' 'bout AI
This article features an audio interview with Jessica Parker, Ed.D., a 🇺🇸 USA-based education researcher, and the co-host of the Women Talkin’ ‘Bout AI podcast. We discuss:
how she became self-taught on using AI as the co-founder of EdTech startup Moxie
her experiences with using common LLMs for research, and how Open Evidence AI and Scite.AI compare
what simplification bias and granularity shifting are, and why they matter when interpreting AI-generated overviews
how her and her co-founder’s views on AI have taken almost a 180 degree turn over the past three years
what she’s teaching her 7-year-old and 10-year-old about staying safe with AI
asking questions of the auto dealer about data usage when she replaced her Tesla
and more. Check it out, and let us know what you think!
This post is part of our AI6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.
This interview is available as an audio recording (embedded here in the post, and later in our AI6P external podcasts). This post includes the full, human-edited transcript. (If it doesn’t fit in your email client, click HERE to read the whole post online.)
Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for reference.
Interview - Jessica Parker, Ed.D.
Karen: I am delighted to welcome Jessica Parker from the United States as my guest today on “AI, Software, and Wetware”. Jessica, thank you so much for joining me on this interview! Please tell us about yourself, who you are, and what you do.
Jessica: Thanks Karen. Thanks for having me. So, I wear a lot of hats. I call myself an academic entrepreneur because I straddle both worlds. I still do interdisciplinary research, and I supervise doctoral students at the Massachusetts College of Pharmacy and Health Sciences. I supervise my students through the research process. And I also have two consulting companies in the academic research space. And then I also ran an AI ed tech startup for several years.
I do love building things and starting businesses. I’m very ADHD. And so I just always like to feel – not just productive, but I get really intensely curious about things, and I tend to want to just throw myself into it. And sometimes that leads to starting a business, for better or for worse.
I’m from North Carolina. I live here now. I just want to touch on a little bit of my background that I think will be helpful, because it’s kind of a through line in a lot of my work. So I grew up in a small rural part of North Carolina. I went through the public school system there. I’m a first-generation college student. Neither of my parents went to college. And so I’m a big believer in the power of education. I wouldn’t be where I am today without it. I think that’ll be more obvious as I talk today around some of, I think, the promises of AI definitely stem from that perspective.
Karen: Thank you so much for that introduction. I’m a first-generation college student myself. Neither of my parents went to college. Education is really so important for people to fulfill their potential and do what they are capable of doing. So it’s great that you’ve overcome that.
Jessica: Yeah. Thank you.
Karen: Tell me a little bit about your level of experience with AI and machine learning and analytics. I’m wondering if you’ve used it professionally or personally, and if you’ve studied the technology?
Jessica: Yeah, I haven’t studied it formally. I’m not a computer scientist and I was really never interested in technology in this way, quite frankly, until ChatGPT-3 was released. I mean, I’ve used it. I automate aspects of my consulting business, like the boring operational parts. But I would’ve never described myself as someone who really was passionate about technology. But whenever I first used ChatGPT 3, I immediately saw its potential. First I thought about myself as a young kid who didn’t have any money. We were very poor. I didn’t have tutors. I didn’t have SAT prep. And I thought, “Wow, the 12-year-old, 15, 16-year-old me could have really benefited from having access to something like this.”
So when I first started learning about AI, it was from the perspective of, how could I leverage this? Not just in my business, but also thinking about a lot of my clients who don’t necessarily have the finances to pay for expensive consultants, which is a lot of my work. I work with faculty and PhD students who hire me and members of my team for different aspects of academic research support. And we have to turn away a lot of people because they can’t afford it. So I was trying to explore it from that perspective.
And I took a really deep dive into watching YouTube videos and consuming as much information as I could to try to understand it. And so no formal training, but I think some of the best ways to learn is through getting your hands dirty.
So I started building my own AI tools, which is possible because of AI. I used a no-code platform called Mind Studio to build some of our first prototypes at Moxie, which was the EdTech company that I ran with Kimberly Becker. So that was really my first foray into understanding AI and machine learning, just building tools for myself and for my clients.
Karen: Hands-on experience is a wonderful teacher, right?
Jessica: Yeah. Truly. And then through conducting research on it, I was able to more formally and systematically parse through, “Well, what are the capabilities and what are the limitations?” And so that was a big part of my learning as well, doing that research.
Karen: Can you share a specific story with us on how you’ve used a tool, mind studio, or something else that had AI and machine learning features? I’d like to hear your thoughts on how the AI features of those tools worked for you or didn’t – basically, what went well and what didn’t go so well?
Jessica: I first started out using ChatGPT the most. That’s the name brand in the space for LLMs. And because I was testing it out for research purposes, the hallucination issue, especially with citations, immediately became clear. But now I do a lot of my work in different types of wrapper applications. I do like Perplexity, which is powered by ChatGPT. That’s the LLM on the back end. And Perplexity is a good example. I appreciate the feature where it links to the sources; you can go check the source. But because it’s still using the same LLM, ChatGPT, it still has a lot of those same limitations, like simplification bias. Also granularity shifting, when you really study it from a language perspective, which I learned to do through working with Kimberly, who’s an applied linguist.
I also work with healthcare students. And I’ve become familiar with a platform called Open Evidence AI. They call it ChatGPT for medicine. It essentially gives healthcare providers open evidence-based answers to their questions. So I’ve been doing some research into that platform because I realize how many healthcare students, whether it be a nursing student or a pharmacy student, are relying on open evidence AI to get answers to their questions, instead of what they would traditionally need to do was, go look it up, and find an article in a peer review journal, and evaluate the evidence on their own. They’re now using Open Evidence AI. And so I’ve done some research into how their platform works, and I have concerns which I think we’ll share later in the interview. But yeah, those are just some of the tools I’ve been using lately.
Karen: Can you talk a little bit about what simplification bias and granularity shifting are, for our audience that’s not familiar with those terms?
Jessica: Yeah, and so it’s funny as I say that – I don’t know if Kimberly came up with those terms or if they’re actual terms, like, in the linguistics community! But either way, I know them through her. This is in the context of research. So when you’re summarizing or synthesizing research as a human, we have to pay attention to words like significance. You wouldn’t use the word significant unless a finding was statistically significant, meaning that the sample was large enough and the change was large enough for it to be meaningful statistically.
As a researcher, we pay close attention to words. One word can completely shift the meaning of a finding. And if you’re not a researcher, you might not know that; you weren’t trained to pay attention in that way. And so what we see is, when you’re using an LLM to summarize an article, or summarize multiple articles to answer a question, oftentimes the nuance will shift and it’ll leave out important details.
For instance, it might say definitively that excessive social media use in teens leads to depression or anxiety. But then when you go look at the source, it’s one small study of 12 students in Japan. So it oversimplifies the findings, and it leaves out the important nuances that you need as a researcher or a healthcare provider in order to truly make an informed decision.
And so we see that a lot. When you try to scale synthesis and summarizing, you’re relying on the large language model to make those decisions about what gets left in and what gets excluded. And if you’re not paying attention, then the meaning will shift. And so this simplification bias is, when it’s basically simplifying the response to fit it into a short answer, it leaves out really important details.
And sometimes this doesn’t matter. I do like to think of this in terms of stakes. We don’t always need to think about simplification bias. But when you’re using the information you’re getting from an AI to make decisions around research and healthcare, we do. We want that nuance to stay. And so we see this simplification bias happening just because, at scale, that’s what happens. Important context gets left out.
And then granularity shifting isn’t that much different. When you start to look at the nuances of specific findings or descriptions of populations, things get moved around. Like, it might take information from an article that’s specific to a population of Indian women, and then it shifts some of that granularity to a different population based on the question you’re asking. So the context gets moved around. And that’s really important in many situations. We need to be aware of that.
Karen: Those are really good call-outs. One thing I was thinking as you were talking about the simplification bias, the headlines and the summary say this, but then the detailed study says something quite different. I’ve noticed that in some of the articles that I read, and I don’t know that it’s because it was an AI summarizing it, but my assumption was that it was more of a human trying to get a clickbait article that would get people to read it. They come up with a more dramatic headline than the study really supports. And that even happened in one of the MIT studies about people using AI recently.
Jessica: Yeah, and humans do it too. I think it’s important to realize humans aren’t perfect. There’s all sorts of existing errors within research. But when you try to scale it through the nature of technology, you’re magnifying those existing issues that happen with humans.
Karen: Yeah, after all, AI is trained on us. So if they had a whole corpus of people generating those kinds of summaries, then that’s what they would imitate, right?
Jessica: Yeah, exactly. And then you also think about, before generative AI, there’s already research on bibliometric data around citations, and humans make mistakes. Oftentimes we’ll take the word of the author, instead of going to the original source, and so we’ll leave out important information. Well, now there’s another layer of abstraction on top of that, and it causes issues. I mean, it can be amazing, but there’s certainly limitations.
Karen: Yeah, thanks for sharing your story about Open Evidence and what your concerns are on that. So it sounds like part of it is the way that it’s handling citations. Do you have a sense of what kind of data Open Evidence AI was trained on?
Jessica: No, so I’ve been digging, trying to find answers when I realized how many students are using this. It’s one thing for an expert to use it. I think experts who are in the field and have experience, they’re better positioned to question what they’re seeing. But when you have a student using it, I think it’s important to know, what is the data? Is it representative of my discipline?
Another example of this is Scite AI. It’s a literature review tool that is for medical researchers, and they use Semantic Scholar as their database. That’s where they pull their articles from. You can go to Semantic Scholar and they are transparent around what’s in the database. They categorize each discipline. So you can see in their chart that medicine and biology are pretty well represented, but as you go down to linguistics, only a million papers are in the Semantic Scholar database.
But we’re not able to do that with Open Evidence AI. There was someone who, I wouldn’t say a whistleblower, but they rang the alarm whenever they said that they were a reproductive health doctor and that they knew from using it that there was a lot of missing evidence. And it seems that their database of medical literature on women’s reproductive health was thin. But we don’t have a way to actually verify that, because Open Evidence is not transparent around how much data they have, what the data looks like, and even how they’re training their models, how their algorithm decides how to come up with an answer to your question. And so we can’t really evaluate it, and I think that is really problematic.
Karen: One of the reasons, I think, that some companies are not transparent is that they didn’t obtain their data legally, that they’ve violated copyright to get it. And so, if they acknowledge that, then they open themselves up to basically proving a lawsuit for someone that “Yes, they stole your stuff”.
Jessica: Yeah, exactly. Because now they’re saying, “Okay, we have agreements with the New England Journal of Medicine and the Journal of the American Medical Association“, which are top-tier medical journals. But they don’t have all the medical data in the world. There are other really wonderful top-tier journals, and it’s very unclear whether the data is just those two journals. And then how it gets sorted is another layer of it. When you’re asking a question like “What is the effect of social media on mental health in teens?”, it’s important for me to understand how it’s coming to a certain answer. And that’s not clear, too.
So there’s these layers of the features and functionalities, where I think some of it is great because you can look at the source; you at least know where it’s pulling that information from, so you can verify it. But we still don’t know how it’s arriving at that decision, and that can be problematic.
Karen: When you’ve been working with, for instance, medical students, how do you find that they look at the AI as a tool? I interviewed a medical student last summer, and one of his points was that he and his study groups were actually hesitant to use it in certain cases, because they recognized that they didn’t know enough to evaluate the answer that they got, or to know if it was steering them wrong. And so they would use it for some things, but then for something else, “Okay, I can’t let it teach me this, because I don’t know enough to know if it’s wrong.”
Jessica: Yeah.
Karen: He’s just a sample point of one. It sounds like you’ve talked to more medical students or had more experience with them. What are your insights on that?
Jessica: Yeah, it’s a mixed bag. I have some students who are very against it. They don’t want to use it. But they also recognize, especially in healthcare, that a lot of the AI is invisible. It’s behind the scenes. It’s not always very obvious like when you’re using a platform like Open Evidence. So they also realize that there’s aspects of it that they can’t necessarily control.
And then there are students who are really excited about it, like they see the potential for this technology in terms of drug discovery and prevention and prediction models to determine, “Is someone at risk for this condition?” And “How effective is this treatment likely to be given these specific parameters of their genetic history and family history?” And so people are excited for those possibilities.
But then when it comes to using it in a learning scenario, rightfully so, a lot of students are critical. But, you know, at the end of the day, medical students are no different than other students in the sense that the education system incentivizes performance. And if you are in a pinch, it can be very tempting to lean on one of these tools. None of them are immune to that. So they have their skepticism, and rightfully so, but there’s also the realities of learning and deadlines and usually healthcare students are very busy. A lot of them are working in residencies or fellowships, and they’re juggling a lot of priorities.
Karen: That’s a really good overview of how you’ve been interacting with AI in a professional context. Do you use any AI tools for anything in your personal life? You mentioned using it for consulting business. From your perspective on AI, does that motivate using ChatGPT for things in your personal life? Or do you avoid it for those? I’m curious what your personal experiences with it are.
Jessica: Yeah, so I recently just started gardening and I use ChatGPT. So just this weekend I was taking pictures of the soil and I was trying to figure out if I needed to dig deeper or if I needed to add a layer of soil. I just didn’t know what to do. And so I was taking photos of the soil, because we had, like, 10 years of mulch to cut through. And it was really helpful. I still like to call my grandmother, who is a prolific gardener, to get her take on it. So I use it in that way pretty frequently. I think that’s the best.
I use it for recipes too. Ever since we shut down Moxie and I haven’t been working like a crazy person, I’m really leaning into my hobbies. And so those are the ways I use it.
Personally, I’m a new stepmom and I don’t have biological children of my own, and so I’m learning a lot. It’s like trial by fire. And so I’ll even ask it for advice on how to approach a specific delicate issue with the 7-year-old, because maybe I’m afraid to ask my partner, because he’s going to think I’m an idiot for not knowing how to handle these things. So yeah, little things like that I do quite often.
Karen: Yeah, you mentioned gardening and soil. In the part of North Carolina that we live in, it can go anywhere from hard clay to sand!
Jessica: Yes, and that’s what we have. We have clay here, sand here, lots of rocks. Not the best. So we’ll see. I planted, like, 60 plants over the past month. I hope maybe at least half of them will survive!
Karen: I know a lot of people here have resorted to raised beds for getting soil that they can actually grow things in.
Jessica: Yeah. That’s on my radar. We’ll see how everything does this through the winter and the spring, and we might have to do some raised beds!
Karen: Yeah. It’s November, so you’ve got time to figure it out.
Jessica: Yeah. We’re still having some warm weather, and I’ve learned that the fall is a good time to plant.
Karen: Yep. So that’s a good overview of how you’re using AI. Do you avoid using AI for anything specific? And can you talk a little bit about when you avoid it and why?
Jessica: Yeah. I do not use AI to generate images or videos or music, and I don’t like to consume AI in that way. In terms of images, it is interesting. When Kimberly and I started Moxie, we were so naive. We were just really naive. We’ve learned a lot over the years and I’ve become much, much more critical the more I’ve learned. When I really started learning about the environmental impact and how much energy it takes to produce an image or produce a video. I decided these are not things I need to use it for. I don’t need to play around with it and play around with funny images. So I just don’t do that. So that’s more of an environmental concern.
But the more I’ve learned about creators and risks to ripping off their music and someone’s likeness in a video, I refuse to use Sora 2, or any of those video generation platforms or interact with those. And I’m trying to teach the kids not to do that as well. Especially because with Sora 2, if you allow people to use your likeness, you’re giving them permission to create videos that use your image and your voice, and so helping them understand that. But yeah, those are the areas that are off- limits for me. I try not to consume anything on that end because I don’t want to support it. And then I also don’t produce images or music or videos.
Karen: It’s great that you’re teaching your kids. You said 7-year-old, so starting at a young age – that’s good.
Jessica: Yeah, and the girl is 10, and she’s starting to get to that age where – so we don’t let her have any social media accounts, but of course she’s pushing. And so yeah, we’re trying to help her understand the risks with these platforms well before she encounters them.
Karen: That’s great. Girls especially can be vulnerable to the fakes and other kinds of social media impacts that are not necessarily good for their health.
Jessica: Yeah. And it looks so innocent, it’s like, “Oh, this cat video”, or something really silly, but I think it’s pretty sinister. I feel like the most upset I’ve been lately was when Sora 2 came out. I don’t see the value. It just really upset me. It just feels like, what is the value of this type of platform? And then them taking the social media play and making it like a social media platform. I’m just like, haven’t we learned our lesson from this? Are we ever going to learn our lesson? I don’t know.
Karen: That’s one thing I think that a lot of kids especially may not realize is that if they upload a selfie to one of these tools, their picture is now out there for that tool to use however they like. And that’s not a good thing in many cases.
Jessica: No. And it’s like trying to shift this acceptance of the inevitable. Kimberly and I have been talking about that recently. Even myself, I’m guilty of this. Until probably the past year, if you had asked me a question about data privacy, I would’ve almost said that it’s impossible to really have control over your data. But that’s not true. I think it is part of the big tech’s sort of push is for us to feel like these things are inevitable so that we don’t feel like we have a choice. But the reality is we do have choices.
Karen: Absolutely. That’s the hype machine in action.
Jessica: Mm-hmm. Yeah.
Karen: Yeah, yeah. It’s great that you’re seeing that and then sharing that with the students that you work with and with your kids.
Jessica: Yeah. Yeah. And it’s evolved into that sense. I mean, Kimberly and I, three years ago, we bought into a lot of the AI hype. I feel ashamed to admit it now, but I remember when we first pitched Moxie to investors, I was talking about solving Bloom’s 2 Sigma Problem. And I really believed it. And I’m not saying that that’s not possible, but I’m just much, much more critical now. And I think it’s because I’m not building a company. I actually have the time and the space to think and not constantly be focused on working. And I’m grateful for this new perspective because I think it’s important for people to understand.
Karen: Yeah, I know I’ve interviewed a lot of solopreneurs, entrepreneurs, on this interview series. And a lot of them have really expressed this dichotomy. They are under pressure to deliver quickly. They don’t have a large team of marketers and other people that are helping to support them with the business. But at the same time, they want to do the right thing, and they have some of the same reservations that you’ve expressed. So there’s a definite pull there.
Jessica: Yeah, it’s challenging. I’ll just speak from personal experience, but when you’re building a startup, yeah, you’re strapped. I mean, Kimberly and I, we were the COO, CMO, CEO, CFO. We did everything. We were a tiny team. And we leveraged AI to help us do things that we couldn’t have done just on our own without having much larger budgets.
So when you’re bootstrapped or when you have limited resources, I see the draw in using AI to automate as much as you can. And it’s very tempting, especially when you’re getting pressure from investors to cut your costs, to extend your runway, to build out new features. So that pressure is very real. I think it’s also really hard to be in that position and think clearly. There’s so much noise, especially with what’s happening in ai. It really can be hard to find the signal, especially when you have decision fatigue. You’re just being bombarded with so many decisions every day. And it’s hard to make sense of all of it.
Karen: Yeah. Great perspective. Thank you for sharing those stories. So we’ve talked a little bit about where these systems get the data that they train on and whether or not it’s representative, I’m wondering how you feel about how they’re getting the data that they use. For instance, ChatGPT and Sora scraped YouTube videos that in most cases are copyrighted to the people who put them up there, and they didn’t get permission. So in a lot of cases, what some people call the 3Cs, which is Consent, Credit, and Compensation for use of their creative work, are not being satisfied. Some people feel like it’s for the greater good to just say, “Okay, let them take it all.” And other people feel like that’s not fair. I’m wondering what your thoughts are on that.
Jessica: I would definitely agree with the sentiment that it’s not fair. And I have a hard time. Sometimes I feel like a hypocrite because I still use ChatGPT, even though I do not agree with the fact that they stole all this data, they extracted it, and then they sold it back to us. It’s our data, and now they’re selling it back to us as a service. So yeah, that pisses me off.
To be truly honest. I also think about it in terms of academic publishers that have struck deals with these platforms. Going back to open evidence, they struck a huge deal with JAMA and the New England Journal of Medicine. It’s an undisclosed amount. As a researcher, I’ve published over 20 articles. Many of those articles I paid to publish in open access. And now I know that that is sold to an LLM in order for them to get more high quality data.
So, yeah, if it’s from the public service perspective, fine, then make the service free. You see OpenAI’s valuation. Or Open Evidence’s valuation just hit 6 billion. And those academics are not being paid, as far as I know, for their data being used to train those models. And I think that is problematic. I do feel that policy needs to catch up and that we need to require these companies to credit the people whose data they stole, and to also compensate them.
Karen: Mm-hmm.
Jessica: Yeah.
Karen: Yeah. The way that they are using people’s data, as one of my friends put it, they are “socializing the cost and privatizing the profits”.
Jessica: Yeah.
Karen: From taking our data.
Jessica: It’s true. It’s true. And this goes back to this inevitability argument. It feels normalized. I mean, obviously people are questioning it, but most people aren’t. When I talk to most people, they’re not even thinking about this, but it’s because of just the history of what’s been happening in tech. Google’s such a good example of how they basically steal all of your data based on how you use their platform, and then sell it to advertising companies so they can then sell you products. And so it’s been happening. This is not new. But this is just the newest example and it’s on such a large scale that I think people are more aware of it.
Karen: I am starting to see some movement towards people using other LLMs. For instance, there’s Mistral Le Chat and some of the ones based in Europe. There’s a new one, Apertus, the Latin word for open, and they are offering it as a free public service, and it was only trained on ethically-sourced data. I’m in the process of trying to get my account set up for that so I can try it.
But even for email, you mentioned Google, some people are saying you can go to other services, like Proton and such. You may have to pay a little bit for the service, but you know that they’re not going to sell your data and you get that privacy. I see articles about people saying, here’s how to claim your life back from Google.
Jessica: Yeah.
Karen: I don’t know if you’ve seen Dinah’s article recently, I think it was on Code Like A Girl. She’s been sharing some articles about how to do that, how to untangle yourself from the clutches of the companies whose platforms we use, and they use our data as they like in return.1
Jessica: Yeah, I was just reading yesterday about how to extract all of my data from, because my businesses use Gmail and then we use Google Drive. For the past 10 years, all of my information relayed to my businesses in Google Drive and I’ve been looking for alternatives. And just the act of extracting all of that is so – they intentionally make it really difficult to leave their platform.
Karen: Yeah.
Jessica: But it is possible.
Karen: Yeah. That’s good. You mentioned a couple of tools already, so I think I know your answer to this, but let me put it to you anyway. As someone who uses these AI-based tools, do you feel like the tool providers have been transparent with you about where they got the data from, and whether the original data creators of that data consented to it being used?
For instance, you mentioned open evidence and the deals they have with JAMA and the New England Journal of Medicine, but then, the original creators of those articles, are they getting any of the compensation that’s going to the journals? Or are the publishers keeping it all for themselves? I know some authors that have tried to say, “Look, I never agreed to allow my data to be used for this. I want out.” And the publishers have told them, “You can’t opt out.”
Jessica: Yeah, no, it’s true. I am not aware. I mean, there’s a few. I think Scite AI is a good example of someone who seems to be doing it well, in the sense that they’re transparent about where the data is coming from. You kind of have to look at the publishers for making these deals.
But Scite is at least saying, “This is where we’re getting the data”. And then they have documentation on what that data looks like, how much there is. And so when you’re a user of Scite, you can actually make an informed decision about whether or not this tool is even right for me. Like, if you’re a researcher, you should not be using Scite if you’re in education, because that’s not what it’s for. It’s for the medical sciences. But what’s crazy is how often that is not the case. You have these other tools that are literature review tools where you have researchers going in and using them to find literature instead of doing a traditional database search. And there is no indication of whether or not their discipline or their research topic is even well represented.
And so I think Scite is doing that well, in terms of just the transparency. And then they do have papers written about their algorithms to really describe in detail about how the answers arrive on your screen. But there’s still problems in terms of them striking these deals with publishers, and the publishers are charging open access fees, and that data is being scraped. And I’ve never been contacted to be paid for my open access article, so I’m not aware of anyone actually getting compensated for their work.
Karen: So you mentioned cite AI. Is it C-I-T-E?
Jessica: S-C-I-T-E.
Karen: S-C-I-T-E. Yeah, I like to call out companies that are trying to do the right thing.
Jessica: Yeah!
Karen: We can give them a boost so it helps them to compete in a market where they have competitors who are not trying to do the right thing. So that’s good. So S-C-I-T-E dot AI. Okay, good. I’m going to have to check them out.
Jessica: Yeah.
Karen: You said it’s mostly from medical, though, right?
Jessica: Yeah, they say they’re expanding. The founder, Josh, comes from a biomedical background, and so he founded it with the intention of supporting other scholars in his field. So that’s where they got their start. But I do know that they’re actively expanding their database beyond medical sciences, biotech, stuff like that.
Karen: Yeah. Cool. Yeah. I’ll have to look them up and follow them. So thanks for sharing that tip.
Jessica: Yeah.
Karen: Do you know if any of your own articles have been scraped? There was an article that provided a link to LibGen, L-I-B-G-E-N, to let you look up and see if your articles have been published. I don’t have as many as you, probably, but I did go and look, and put in my own name in different formats and I think 10 or 11 of my published articles were used. So you can see that if they actually are included in this set of data that’s been scraped and used for different LLMs.
Jessica: I have not used that tool the way I tested it early on, which, funny enough, I wasn’t doing it for this purpose, I was just playing around with it. But early on, when I first started using ChatGPT, I would ask it information about my own work. Not giving it any information. It would give me responses, like it knew things about me. And so I took that as evidence that it already had information about, especially some of – there was an article that I had behind a paywall that was published in, like, 2013. And it even had information from that article and I was like, “How did it get this? It’s behind a paywall.” And then I was trying to figure out if the publisher made some agreement. I don’t remember where I landed on that. But yeah, I’ll have to check out the site you just talked about, LibGen.
Karen: Yeah. Yeah. It’s a strange thing just to find your work there when you know you didn’t give it to them, but especially behind the paywall. You and Kimberly have a Substack now, right?
Jessica: Mm-hmm, yeah, yeah.
Karen: Yeah.So I’m curious if you’ve opted yourself out of Substack to control whether or not AI training can be done on your newsletter.
Jessica: Yep, I did. And then I also did the same for LinkedIn. So those are two platforms that just recently I knew that was an option and made that selection.
Karen: Yeah, I know some people were concerned about it limiting discoverability of their writing. And that’s not Substack using our newsletter for AI models they are developing. That’s basically like
robots.txt,which controls whether or not other people are allowed to use it. And it does seem to work. I was doing some analysis on the data from the first 80 interviews in this series with a friend and she and I were both working with it. She tried to give ChatGPT the URL of one of my interviews, and tried to fetch that interview and just bring it in. And because I have AI training blocked, it actually came back and said,“Well, I can’t get that one for you.”It’s like, “Oh wow. Oh good. The setting works, so this is good.”But then ChatGPT said
“I can try to get that article for you from Open Substack“or something like that. I’m like, “Okay, I don’t know what that is, but let’s find out.” We told it, “Yeah, go ahead, try and get that interview for us.” This was an interview for a program manager in Costa Rica, and I remember her interview very well. It came back with a supposed interview from someone in Costa Rica, but it wasn’t a program manager. I don’t think it was even a woman. It was some weird composite of different interviews that I’ve done. They got the Costa Rica part right. But they got the person, years of experience, it was an embedded developer,....
Jessica: Interesting. Yeah, it’ll just make things up. I’ve had that happen where I was looking for a scholar on a certain topic. And it gave me the name of a scholar, but it gave me the work of a totally different scholar. I was like, Hmm. And it does stuff like that all the time, because it just wants to give you what you want. It’s not great at saying it doesn’t know or it can’t find it. Yeah.
Karen: Yeah, I was really curious to see that it said that there was such a thing as an ‘Open Substack mirror’. It’s more like a funhouse mirror. It really didn’t show us the article at all. I don’t know whether to be happy or sad about that.
Jessica: Yeah. But it’s also an interesting predicament to put someone in where you do want your work to be discovered, especially now. People use those Google AI summaries. They don’t go to the original source. And I’ve seen that impact on my business. We used to get hundreds of people every month on certain blog posts, and we call those lead magnets. They perform well, and then that’s how people know that we exist as a consulting company. And then they’ll reach out to us for whatever type of support they need. And that went down like 98% over the past two years from hundreds to five a month. Because people are not going to the sources, they’re just looking at the Google AI summary. And so in one sense, it’s not serving me to have that data available anyway, because people aren’t going to the source. But then that opens up a whole host of other questions around people clearly not checking sources, which is also scary.
Karen: That’s a great insight about the discoverability and how it’s been impacting you. I hadn’t heard that specifically from someone. I’ve heard people say in general that it hurts discoverability, but the whole mechanism of paying for SEO access and keyword optimization and all of that is being disrupted pretty seriously. Because like you said, a lot of people will just read the overview. I went ahead in my browser and turned off the AI overviews with the UDM-14. I don’t know if you’ve heard of that. There’s a way to put it in the Google search registry to tell it, do not use AI overviews.
Jessica: Okay.
Karen: Because I’ve brought up a search in a different browser and it’s usually some very simple question, like, “In Python, how do I do this date time format?”, or something super simple. You can see the AI overview is just there chugging and chugging and chugging. And meanwhile, a top hit article that explains exactly this thing in the Python documentation, which is what I wanted, is already there. I don’t even want to wait for that and I don’t want to encourage that. So I just went in and turned it off.
Jessica: Yeah, I think it’s just because I’m a trained researcher, I always want to know the source. I have the AI summaries come up, but I always look at the sources and ultimately go to that to read it. And I think it’s just because I’ve been a researcher for so long and that’s just how I think. I’m like, “Who said this? Is this credible?” Obviously not for everything, like if it’s a gardening question and it’s low stakes. But when it’s an important decision that needs to be made, or I really need to have a full understanding of something, I always go to the source. But yeah, apparently most people don’t, and businesses are noticing it, and I think that whole industry is in limbo right now. People are trying to figure out what to do.
Karen: Yeah. You mentioned LinkedIn and opting out from them using your data. There was quite a fuss last year when they first enabled that, and retroactively took everything we’d ever put into the platform at that point. I and many people were not happy about that, and we can only opt out going forward.
But aside from those kinds of systems, as consumers and members of the public, our data and content is probably being used by a database tool or a system, or a device or product that we own. Do you know of any cases that you could share about that? Obviously without disclosing any sensitive information?
Jessica: Yeah, something that comes to mind: I had a Tesla for a few years and I remember thinking “What is happening to all this data?” Because it gives you a driving score, especially if you use their autopilot. If you’re not a great driver and there’s a lot of incidents, it’ll actually not allow you to use the autopilot anymore. So anyways, I just remember being curious around what was happening with all this driving data. And even though Tesla doesn’t sell your data to third parties, it does use it internally if you have insurance through Tesla, so it can dictate your insurance premium. But then what I realized is there’s all these other car companies like GM and Honda that do sell your data to insurance companies and that can impact your premium. And I just remember being, like, “That is insane. I had no idea that that was happening.” And so that’s something that, when I got rid of my Tesla – I just got a new car – I specifically was asking about. And I feel like the car salesman thought I was crazy. He’s like, “I don’t know. I’m not sure.”
But yeah, it’s just everywhere. I think now that I’ve had time to slow down and really pay attention and ask better questions, it’s embedded in everything. Like you think about wearable devices, even kids who get computers assigned to them at school. Those computers are collecting data. It is just, it’s in everything. And now it just becomes overwhelming to think about everything you’re interacting with, and what data you’re providing. And obviously you can’t fight every battle. And I think that’s when it becomes challenging and people get discouraged where it’s just like you feel like you give up because there’s too many decisions to make, and too much information that you need to take in to make an informed decision. But it’s just everywhere. The more I see it, the more it becomes obvious. But yeah, car companies are selling your data to insurance companies.
I’ve even been recently trying to figure out what’s happening with health insurance premiums and what data is being sold to health insurance companies, and just trying to understand what that looks like. Like, if you’re using a Fitbit or a wearable device or your doctor recommends something to help monitor your glucose at home, where is that data going? Is that data going to a health insurance company? Is that going to impact your premium? Is there any transparency there?
Karen: That’s a super important point, and especially as you said with these devices, we don’t know where our data is going, and ways to protect our data. Like not using the same email address everywhere and letting them connect your data that way. There are some things that we can do.
EdTech – you mentioned the kids getting computers in school or being told to sign up for these accounts. They’re not always transparent about how the kids’ data is going to be used. Parents often aren’t even aware of what they’re going to do with their kids’ data, and that’s a huge issue.
Jessica: Yeah, I was doing some digging into a school, I won’t say the name because I don’t want to get sued, but there’s a new school in North Carolina that has been established in other states that’s in the K through 12 space. And they are an AI learning school. And kids that go there, they learn with AI for a few hours a day. And from what I’ve seen, there’s now been an expose about it. The students are given these laptops and the parents sign away any privacy. It basically says there is no privacy with the school, because the computers that the kids take home and have at school are constantly recording video, audio keystrokes, everything. And then one of the subsidiaries that owns the school has their own commercial EdTech product. And so you’re just thinking: all this data is being used to improve this commercial product. And I just think stuff like that is very, very common and people are not aware.
Karen: I just read about a school – and it may be the same one, but again, I won’t name the name, but I think this was in Texas – they said that the parents didn’t realize that they had implicitly given blanket permission for their daughter to be recorded everywhere. She was using her laptop and she was talking to her little sister, and the school software took a picture of this girl sitting on her bed in her pajamas.
Jessica: Yeah. And then it punished her for it, in terms of, she wasn’t being productive. She was supposed to be doing her homework or something.
Karen: Yeah.
Jessica: I think that we’re talking about the same school.
Karen: Yeah.
Jessica: And they’re opening up a location here in North Carolina and it worries me.
Karen: Yeah. I think making people aware of it. They did say there was some way that parents could opt out by disabling that they can monitor anywhere. There’s some setting that enabled them to turn that off. But I think the parents just pulled all of their kids out of that school and said, “No, we’re not doing this any more.”
Jessica: Yeah, and especially in that scenario, there was some predatory thing happening where they were targeting minority populations and offering them scholarships. And then you have to wonder,“Okay, there may be some good happening here, but is it just to get data for this commercial product that one of the founders is developing?” I don’t know.
Karen: You also mentioned automotive. That’s a big thing. I found out a little bit about this in my last corporate role and we ended up being bought by an automotive company that, again, I won’t name. But one thing I learned from that, from talking with different people in the automotive space, is all the data that a car collects on you from all of these sensors, including this automatic adjustment of the seats. And it figures out, basically, the weight of the person. Okay, now they’ve got your weight and your family member’s weights.
Jessica: Mm-hmm.
Karen: And what are they doing with that data? And I think people don’t realize, there’s some privacy protections on your phone, but if you connect your phone to the infotainment system in the car, then whatever data it pulls from your phone, there’s no privacy protection on that. And I think people just aren’t aware of that. But that’s a huge thing with what they’re doing with that data and selling it and making a business case for it. And like you said, it’s insurance, but other things as well.
Jessica: Oh yeah. Yeah. One of the most common examples when I think about people not being aware of data that’s being transferred between applications, talking about interoperability, like when you go to use Facebook or your Apple ID or your Gmail to sign into some other platform there, there’s a data use agreement there that’s happening.
And we’re just all used to, for convenience sake, “Oh, I just want one login. I’m not going to have to create a password for this other account. I’ll just let it connect to my Gmail.” And then when you’re consenting to them getting all this data from Gmail, it is just so, so common and we’re so accustomed to it.
Karen: Yeah. It’s so convenient to do that, but I never do it. I always create an email alias and create an account and create a password. It’s a hassle, but I really don’t want to have that connection going on behind the scenes without me being able to know what’s being exchanged.
Jessica: Yeah, I’ve stopped and yeah, it’s just a hassle. I use a password manager, so that helps me keep track of it now. But yeah, it’s just like we’ve got to add some friction back in, to question all of this convenience.
Karen: Yeah, it’s a tradeoff, right? All of these are tradeoffs.
Jessica: Yeah, they’re all tradeoffs.
Karen: Yeah. So we’re trading convenience for privacy. And I think a lot of people, like you and I and a lot of others, are leaning more towards privacy and not being willing to pay that price.
Jessica: Yeah, yeah, exactly. And I think it’s a privilege too. So many people don’t have the bandwidth or the capacity to think through all of these decisions that need to be made. I grew up with a single mother. If someone had approached my mother back then and said, “Here’s a list of 20 things you don’t want to do because you’d be worried about Jessica’s data.” My mom would be like, “I’m working two jobs. I’m barely getting by. I don’t have the bandwidth to think about this.” Yeah. So I feel like it’s a privilege to have just the time and the space to learn and think through these things. I think we kind of owe it to people to share what we’ve learned to develop some awareness.
Karen: Yeah. As you said, it shouldn’t have to be a burden on people to opt into doing things the right way and having to opt into having their data protected. The defaults ought to be safe and secure and private, and that’s not unfortunately where we are. But I think, as you said, we’re not powerless to make a difference there.
Jessica: Yeah.
Karen: Last question – we’ve talked a little bit about this – we see public distrust of the AI and tech companies has been growing as people find out more about what they’re doing, like the example you just gave with the EdTech company. What do you think is one thing that AI companies would need to do to earn and to keep your trust? Like Scite.ai, they’re doing some good things, but pick another company. What would they need to do for you to feel like you could trust them? Or is that even possible?
Jessica: What comes to mind for me is transparency. At minimum I would like to know: where did this data come from? Was it consented to? And it’s not that the people have to be compensated, but it should be a choice. There should be a consent process, and it is possible. More and more I read about community driven models that are consented to by the community. These do exist. They don’t make the headlines. But they are out there.
So I think transparency around where the data came from. And then how it’s being trained. I do not like what I’ve learned about the data annotation industry and the exploitation of laborers in the global south, especially when you see companies like scale AI making billions of dollars and paying people pennies on the dollar taking these huge contracts with LLM developers. Yeah, it is just sickening to see a company make so much money through stealing data, not being transparent about it, and not taking care of the people it needs to build the product. So transparency there, I think, is a good first step.
Karen: Yeah, absolutely. There’s so many dimensions of this that are of concern. And I’m glad that you brought up the data workers, because I’ve been trying to highlight that in some recent posts, and it’s something that I think is very much under the hood. They’ve offshored it, or what people call ethics washing, where they say, “Oh, well we do this”. Yeah, but Scale AI is doing that, and you’re using them, so it’s really on you. And I think they try to hide it.
Jessica: They do.
Karen: But it is coming out.
Jessica: And then, but it’s promising, I met with someone last week who’s going to be on my podcast with Kimberly, Women talking about AI. And she started a data annotation company where they actually give workers equity. And so that gives me hope. That is a great business model. Like, you’re actually taking care of the people who are building this product. And so I’m constantly reminded that there are alternatives that are much better than what we currently see.
Karen: That’s a great example of a kind of company that I would really love to help promote. So if you can share their information, we can put a link into this interview when it publishes, or a link to the podcast interview and it may publish before this one does, and we’ll get that out there. Because I really feel like they deserve more visibility for trying to do things the right way.
Jessica: Yeah, yeah. It’s just sad that it’s few and far between. It feels hard to find those examples.
Karen: That’s all my standard questions. Is there anything else that you want to say? I do want you to tell us a little bit about your Substack, Women writin' 'bout AI.
Jessica: Yeah. So I have a Substack and a podcast. The podcast came first. It’s called Women Talkin’ ‘Bout AI. When Kimberly and I started that podcast, we were mostly focused on AI and education, but now we’ve branched out. We talked to experts. Not even necessarily experts, but folks who are knowledgeable of AI in different areas. Privacy. Data annotation.
We have someone we’re talking to in January around AI and healthcare. So we’re really exploring AI in all these different spaces, and our goal is to just help our listeners see beyond the hype in the headlines and just make informed decisions. And then the Substack is a space where Kimberly and I both share what’s on our mind, based on whatever it is that we’re learning about at the moment. We also share transcripts from episodes of our podcast, and that’s Women writin' 'bout AI on the Substack. Yeah.
Karen: Great. We will drop a link to that into the interview article. I’ve followed some of your work and really enjoy reading what you’re posting. And it’s in the She Writes AI Digests that we do every week now.
Jessica: Thank you. It’s fun. And if you were to go back and look at some of our work, I think pre-Moxie shut down, you can see how our perspectives have shifts. We’ve almost done a complete 180 from where we were a few years ago. And so I just encourage folks, if you read some of our older work, look at some of our newer stuff and you’ll definitely see a much more critical stance.
Karen: Okay. Any final thoughts you want to share with our audience?
Jessica: Just stay critical. I don’t know. One of the questions I always ask myself, if someone’s trying to sell me an AI product is, who’s benefiting from this? Who benefits? And I think that can help see past some of that hype.
Karen: Yeah, that’s one of the points that I made in my Everyday Ethical AI book, is: Follow the money.
Jessica: Yeah.
Karen: Look where the money is. Even if someone has an article talking about how wonderful this one AI tool is, well, did they get a free copy in return for posting the review? There’s just so many ways that the money comes into play. Or like you said earlier, people that say it’s inevitable. Well, the world accepting it as inevitable is going to pump their market valuation and it’s going to enrich them personally. So of course they have an incentive to make you think it’s inevitable.
Jessica: Yep. Exactly. Yeah.
Karen: Well, thank you so much for joining me for the interview.
Jessica: Yeah!
Karen: I really enjoyed talking to you and getting your perspectives and, yeah, looking forward to reading your future articles and listening to your podcast episodes.
Jessica: Thank you. This was great. I appreciate you having me on, Karen.
Karen: I enjoyed it. Thank you so much.
Interview References and Links
Women writin' 'bout AI on Substack
Women talkin’ ‘bout AI on Apple Podcasts and YouTube
About this interview series and newsletter
This post is part of our AI6P interview series on “AI, Software, and Wetware”. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools, or are being affected by AI.
And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”:
We want to hear from a diverse pool of people worldwide in a variety of roles. (No technical experience with AI is required.) If you’re interested in being a featured interview guest, anonymous or with credit, please check out our guest FAQ and get in touch!
6 'P's in AI Pods (AI6P) is a 100% human-authored, 100% reader-supported publication. (No ads, no affiliate links, no paywalls on new posts). All new posts are FREE to read and listen to. To automatically receive new AI6P posts and support our work, consider becoming a subscriber:
Series Credits and References
Disclaimer: This content is for informational purposes only and does not and should not be considered professional advice. Information is believed to be current at the time of publication but may become outdated. Please verify details before relying on it.
All content, downloads, and services provided through 6 'P's in AI Pods (AI6P) publication are subject to the Publisher Terms available here. By using this content you agree to the Publisher Terms.
Audio Sound Effect from Pixabay
Microphone photo by Michal Czyz on Unsplash (contact Michal Czyz on LinkedIn)
Credit to CIPRI (Cultural Intellectual Property Rights Initiative®) for their “3Cs' Rule: Consent. Credit. Compensation©.”
Credit to Beth Spencer for the “Created With Human Intelligence” badge we use to reflect our commitment that content in these interviews will be human-created:
If you enjoyed this interview, my guest and I would love to have your support via a heart, share, restack, or Note! (One-time tips or voluntary donations via paid subscription are always welcome and appreciated, too 😊)



















