🗣️ AISW #042: Carey Lening, Ireland-based data protection expert

6 Ps in AI Pods (AI6P)

0:00

-54:20

🗣️ AISW #042: Carey Lening, Ireland-based data protection expert

An interview with Ireland-based data protection and privacy expert Carey Lening on her stories of using AI and how she feels about how AI is using people's data and content (audio; 54:20)

Karen Smiley

and

Privacat

Jan 27, 2025

Introduction - Carey Lening

This post is part of our AI6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content. Today’s interview is a bonus in honor of Data Protection Day.

This interview is available as an audio recording (embedded here in the post, and later in our AI6P external podcasts). This post includes the full, human-edited transcript.

Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for reference.

Privacat Insights logo, © 2025 Carey Lening

Interview - Carey Lening

I’m delighted to welcome
Carey Lening
(Privacat Insights
) as my guest today from Ireland, on “AI, Software, and Wetware”. Carey, thank you so much for joining me on this interview! Please tell us about yourself, who you are, and what you do.

Sure. Thank you so much for inviting me on, by the way. This is very exciting, Karen. So a little bit about me. I am, as the accent can probably tell you, not actually from Ireland. I was born and raised in the States. And I was a practicing lawyer for a number of years, and then decided to say, “Nah, not any more”, and became a consultant.

And then my husband and I moved over from the very sunny and currently on fire state of California to Dublin, Ireland, where fire is not really a problem here, because it's just rainy all the time.

But we've been here since 2017, and we really enjoy ourselves here. And I still work as a consultant, but primarily what I have been turning my attention and fascination to has been writing. So I have a Substack called Privacat Insights. And I write about all sorts of interesting things, which I think is how you found me?

Yes!

And I talk about AI. I talk about technology, law, and what I like to say is the issues around ‘fractal complexity’ between those systems. So there's a lot of complications when you're trying to mesh messy things like human-created laws and norms and whatnot with technology, and with new and sometimes more rigid binary outputs like you would get in software. So yeah.

So that's what I've been focusing on. That's what I do. And I've worked for a bunch of companies throughout the time, you know, big companies, small companies, and have honed my ability to communicate these complicated things to people, across different levels in the organization and across different sizes, in a way that I think makes sense to most people, and that they can kind of get behind. And so writing is an even better and more efficient vehicle for that, because I get more people actually listening to what I have to say. Yay!

That's great! So it's interesting that you have a background in law and that you've worked in California. I'm not sure when CCPA was passed, but that's probably something that you've kept up with. And then, being now in Europe, you have this perspective on GDPR and everything that's going on there with privacy and data protection and such. So I'm really looking forward to hearing your insights on AI and data.

Oh, yeah. Well, the GDPR came before, in 2018. And then the CCPA followed in 2020. So, yeah, I've been aware of both, and all those lovely changes, and then all the other various laws around the world. It's exciting stuff.

That's great! What is your level of experience with AI and machine learning and analytics, and whether you've used it professionally or personally, or if you’ve studied the technology?

Yes to all of those. Maybe not analytics. I was never really that much in analytics. But in everything else, I would say I'm pretty versed. Not expert, because I refuse to say I'm an expert in anything, other than the AI cat, which is - I joke because I have 9 cats at home. And so I am a cat expert at this point, because I've fostered hundreds of them, and I have these 9 cats, and they take up all my time. And so if there's ever an AI cat, I will automatically become an AI cat expert as well!

But, yeah, I spend a lot of time, like I said, looking at the law, but also looking at the technology. I have a reasonably decent technical background. I have forced myself to try to learn programming. I am terrible at programming, but I have at least tried to learn it. Importantly, I'm married to an engineer. And after almost 12 years of marriage, some of the engineer-y technical things start to rub off.

And so I got really into AI because it's technical, but it's also a very messy legal policy and societal area. And so those sorts of things just automatically get me excited, and I want to learn everything I can.

So I've spent some time doing research, reading literature on different issues related to large language model training, and then also concepts called ‘machine unlearning’ or ‘forgetting’. Because there's very interesting legal issues around that, that we can maybe talk about in a little bit, depending on the other questions.

Alright. Yeah. That sounds interesting. So you have 9 cats! The most that I ever had was 6, but don't have any at the moment.

This was not by design. It was a - ah, my poor husband!

Mine wasn't by design either! I had 2. And then there were circumstances where I ended up adopting a family of 4, mother and 3 babies, 3 kitties. But they were sweet.

Aw. Yeah. Aw.

But, yeah, cat lover here too.

I have one sitting in my lap right now, purring. Because every time I'm on the phone, or on a podcast, he immediately wants to be involved. He'll ignore me for the entire rest of the day, but he's like, “No, but you're talking on the phone, and you're not petting me, so pet me.”

Ah, that's funny! So you mentioned that you've tried to learn programming. Have you tried to write programs that analyze data with machine learning?

Oh, yeah.

Okay!

Yeah, well, I've done a lot more stuff with LLM API calls and stuff like that. I've written some Python scripts that will, for instance, do case summarization at a programmatic level [link].

Privacat Insights

Thoughts on the Law, Building a GPT, and Tsunamis of Change

I've been having a bit of fun with ChatGPT's new 'Create a GPT' feature. Right now, it's only available for paying subscribers…

2 years ago · 3 likes · Carey Lening

I've done some very, very simple scrapers with Python. Python's about the only thing that I can actually wrap my head around.

I wrote a little program in Ruby that does the automatic number generation for the podcast that we run that I was mentioning before we started recording. It has an automatic, or a random, number that gets generated once a guest decides. So it pulls a question up, based on the - 1, 2, or 3, depending on what the person says.

So, yeah, a little bit of programming. Not very much. And it's been much easier now that there's ChatGPT and these kinds of coder tools. Because I can understand - I have a decent amount of ability to read code and to understand it. But remembering the syntax is just not my bag. It's just really not. And so, that kind of solves the syntax problem.

And then, from that, I know how to write the request of what I want, in a way that the LLM can very easily kind of pick up on. I don't have to do a lot of iterations, which is nice. And then I can also kind of look at it and go, “Oh, this looks like it's doing something terrible. What can we do to fix it, or at least know where to go find and debug the problem?”

Yes. So with the LLM code assistants - I've tried some at various times. And I'm curious, how often do you get code that doesn't actually compile?

I usually will end up getting something to work within 2 or 3 iterations. Part of it is that I have learned, again, being married to an engineer for so long. There's a joke that we had long ago. I used to have him kind of help me with code. And I was like, “Come on, David. Teach me how to program.” He was, very patiently, in the first couple of years, be like, “Okay. Fine. Fine. Fine. We'll try.”

And I would get so frustrated. Because, again, the syntax is just opaque to my brain. I don't get it. But what was fundamentally wrecking HIS head was: I didn't know how to ask the questions in the right way. I didn't know how to structure my request in a way that made sense to computers, and therefore made sense to him.

And so eventually, we got into some fight or something, and he was exasperated at my inability to understand Python. And he's like, “Husbot does not compute.” And I'm like, “What?” And he's like, “Husbot does not compute!” I'm going, “What the hell is this?” And I realized: I need to break this step down for him. I need to break what I am asking for down into these very small discrete parts.

And if I have to do that for him, I also have to do it for the computer. Because my husband's much smarter than the computer, but same kind of mentality. So that mental shift helped me learn how to think about problems in a way that I can actually break them down for programming.

And so, with an LLM, it is like talking to my husband. Again, a more limited version of my husband. But that same approach works really well.

So I get something in, you know, 1 or 2 iterations, but I start small. I start with, “First, I want to do this. Step 2, I want to do this. Step 3, I want to do this. Step 4, check for errors if this happens. And then step 5, provide me an output.” Those kinds of very specific steps.

And I think if you adopt that and break your problem down that way, you get results much faster and much cleaner with less sadness and headaches. At least that's what it's been for me.

Sounds good. Can you share a specific story on how you've used a tool that included AI or machine learning features? Whether it's the chatbot for helping you write code, or for some other type of application things that you were writing your LLMs for? I'd like to hear about your thoughts on the AI features of those tools, and what worked for you and what didn't - what went well, what didn't go so well.

Yeah. So this isn't something that I wrote. But today, I've been working on an analysis of DeepSeek, which is another one of these LLM deals that are coming out of China right now. And it's kind of a OpenAI / Gemini competitor. I was fascinated by this because someone who subscribes to my newsletter was in the chat room that we have. And they said, “Hey, have you ever heard of this DeepSeek thing?” And I'm like, “No, I have no idea what this is.”

But I started looking, and I'm like, “Oh, God, it's China. This is already going to be an interesting one.” But I got fascinated by looking at the terms of service and the privacy statements. And then I wanted to go look and see, because there was some outrage about, “Oh, these guys are demanding all this information and claiming outputs and doing all this other stuff.” And I went, “Well, actually, I think most of the LLM providers already do this.” And I wanted to check.

So I took 6 different LLM providers: OpenAI, Gemini, Perplexity, Anthropic, one from Baidu, and then the DeepSeek folks. I took their policies. I whacked everything into NotebookLM by Google. And I said, “Hey look, read these policies. Read these terms of service, terms of use, and privacy notices.”

Privacat Insights

I Do a Deep Dive on DeepSeek

Over on our members-only Slack channel (which is turning out to be a boon for my ego and a resource for surfacing blog ideas), one of my dear subscribers, PatrycjaZarskaCynk, tipped me off to a new LLM platform — DeepSeek, a ChatGPT/Gemini/Perplexity-type chatbot that launched last month (late December 2024…

6 months ago · 5 likes · Carey Lening

And this tool is amazing, because it was designed by Google with researchers, specifically a guy who I think used to work for, like, the Washington Post or something, but with researchers in mind. You give it a huge corpora of information, of documents or whatever. And it goes through, and it can read all those things lightning fast. And then you ask it discrete questions.

So you ask it, for instance, “What is the data retention policy across these different privacy notices?” And it's really, really good at being able to pull it out. And more importantly, it's really good at being able to pull out within the document and highlight where in the document the thing you are asking for is referenced. And it's a lot better than control-F, because instead of me having to think about ‘data storage’ versus ‘data retention’ versus ‘retention policy’ versus all these other kinds of similar terms, synonyms for one another, I can type in, in natural language, “Tell me about how the companies deal with data retention.” And it will pull down a nice list, with citations that I can click on, and immediately see in the document that I have provided where it exists. So it's an instant validation check.

Now sometimes NotebookLM makes stuff up, right? They all do.

Right.

It's a lot better than most, because it's taking an approach where it's got the baseline trained large language model infra under it, like everybody else, like OpenAI and Gemini and everything. But it's also applying a RAG model, Retrieval Augmented Generation approach, so that it is constraining itself to just the records you are actually interested in reviewing. It's great for things like research papers, or big heavy technical documents, legal contracts, that sort of a thing.

You still need your brain. You still need to be able to do the research and review, because if you don't, you will be sad. But it is really a huge time saver and mental overload preventer. Much easier to look at and compare contracts, or terms of service or whatever, in a way that is in one page, than across 12 different tabs.

I love it. I hope Google doesn't get rid of NotebookLM. But, as is Google's tendency, they probably will put that in the Google graveyard within, like, the next year, because it's actually really useful.

That's funny 😆. But it's a good story of how you used it to do something. I know with trying to understand terms and conditions, this is something I hear a lot of people talk about, that it's 10 or 20 pages of legalese.

Yep!

And it doesn't make sense. And so over 90% of people don't even bother trying to read them, and I can't really fault them for it.

No. It was really interesting. I was summarizing my thoughts on these after reading all these documents. And NordVPN had done a study in 2023 where they surmised that, to read the privacy policies or privacy notices of the 96 most common websites that people visit, would take 47 hours in a month. 47 hours! A full work week. Over a full work week. That's mental! Like, no one's gonna do that. I'm not gonna do that.

The only reason I did it now is because I was kind of more intrigued by the potential comparisons between what a Chinese tech company policy might look like, versus an American policy. And there’s a lot of differences, it turns out.

But no one is going to do that. No one is gonna spend that time. And I don't spend that time. If I'm not getting paid for it by clients, or having some weird problem like I did with this DeepSeek privacy notice thing, I don't spend the time reading them either. Because life is short, and I would rather pull my toenails out than do that every time.

You’d rather pet a cat, I know!

For sure. Always, always rather pet a cat. That's just a given.

So you've made some good use of some AI-based tools. Do you know of any situations where you have avoided using AI-based tools for some things? And can you share an example of when, and why you chose not to use AI for that?

Oh, yeah. I do not use AI for anything that leaks personal data of individuals, or that would create a confidentiality or disclosure issue with my clients. Just WAY too many problems with that. These systems are obviously owned by gigantic corporations. There's not really a lot of reason to trust them, for one.

But, also, in most people's use cases, they want to say, “Here, tell me if I should hire this person”, or “Tell me if I should go do this thing”. They want a decision-making tool. Or they want a search engine. It's kind of one of those two things.

That is not what these things are really good at. That's not what they're designed for. And it's almost a guarantee for sadness and despair and funny jokes on the Internet when people trot out AI slop, and get caught for trotting out AI slop. So there's lots of those kinds of cases.

So I'm very constrained in my uses, which is why I have a degree of optimism for AI that a lot of my fellow legal and data protection folks don't. It's because I think I have kind of understood where the limits are. And I am very firmly in the camp of “I'm not using it for any of these purposes, because this is just a terrible use of this tool and this technology.”

There was a story that went around toward the end of last year about a product that someone was developing using AI to support the legal profession. And there were a couple of things. One was the high rate of what some people call ‘hallu-citations’, where it made up citations. Obviously that's not something that you would tolerate, even from a paralegal or a junior associate.
But one of the bigger concerns, and this is where it gets more into the privacy that you were talking about, is that the terms and conditions for that tool would have exposed the legal firm's clients' confidential information that they put into the tool to the people who ran the AI tool platform. And even the lawyers who read that and agreed, “Yeah, let's try this tool”, didn't catch that.

Yep. Because they don't read. Because again, if it takes 47 hours of your month to read contracts, you're not going to be reading them all. Again, I think a lot of people have an idea that these tools are oracles, that these tools are infallible, or that they are always going to be producing correct outputs. But they're not. They are predictive statistical spitter-outers of words. They are not oracles. They are not guarantees. So anytime you're asking ChatGPT or Gemini or Anthropic to, “Find me citations about this particular legal issue”, it's going to generate whatever the most probabilistic collection of words meet that standard. It's not necessarily going to give you the correct outcome.

So, say “3 F third, 1 2 3 4 5” might be a legal citation, and those things are coming proximately close to one another in some other thing, but it's probably not gonna be relevant to whatever it is you're actually doing.

So, yeah, the ‘hallu-citations’, none of that surprises me. It's a terrible use case, for now.

Right. You mentioned trust earlier - we're definitely going to come back to that. One thing I wanted to explore with you a little bit is this topic of where AI and machine learning systems get the data and content that they use for training. We see this with the LLMs and with various image and music systems as well. They use data that users have put into online systems or they published online, disregarding any copyrights or rights holders' objections. And a lot of companies are not even transparent about how they intend to use our data when we sign up for a service online. So I'd like to hear how you feel about companies using data and content for training their AI and ML systems and tools, and what your thoughts are about what some people call the 3C's, the need for Consent, Credit, and Compensation to people whose data is used for training AI and ML systems.

Well, for one, they're all violating the law. 100%. Every single one of them. Anthropic is slightly less bad than the other ones, because I think that they've at least made kind of a passing effort to get either authorization or licensing of materials. Or they're working with actual public domain or Creative Commons materials.

But everyone else is just 100% scraping stuff from the Internet. A lot of times, it's crap. It's frequently without any kind of license agreement. It's certainly without consent, because that is not actually possible, certainly not possible at scale. And, you know, they don't care.

And the law seemingly is okay with that, for now. We've got all these cases going on in courts, primarily in the US, but I think there's a few in Europe. There's a few in other countries.

But yeah, until someone actually shoots the “golden goose”, they're going to keep doing what they're doing. And they are in fact going to keep breaking the law. That is just “brass tacks”. That is what it is.

To be fair, when automobiles came on the scene, too, they were also technically violating the laws as well. They were violating horse-and-buggy laws. They were violating all sorts of ‘common carriage’ laws, because it was a new technology. And so we need, maybe, new and improved laws, or we need a revision in how the laws look at things.

I think concepts - and this is something that I am frequently yell-y at my US friends about - concepts around consent are fraught, for SO many different reasons. There's a lot of things that we do in life that involve our data where consent is a WILDLY inappropriate lawful basis or reason for processing data. It's SO wildly inappropriate that it becomes a way that tech companies can take advantage, and basically do shitty things - knowing full well that they will never obtain consent, and that that will never be enforced, because it can't be. There is no meaningful way that OpenAI or Anthropic or Google or anyone else is going to get all of our consents. There's also no way that they're going to compensate all of us. This is a similarly tried-and-true argument for the people who think that they can own their data. It's just not the world that we live in, and it's not meaningful.

Privacat Insights

Sliding Scales & The Data Ownership Debate

A few weeks ago, I put up a post on LinkedIn, wherein I grumbled about how the ‘own your data’ argument was making the rounds again and why it was annoying. Here’s a select Tl;Dr from the thread…

10 months ago · 1 like · Carey Lening

Compensation makes sense in some cases, I think. Right now it's mostly one big corporation paying some publisher somewhere, who has not bothered to do the right thing and get consent or agreement from the people whose works they're publishing. And that's a whole other different fascinating conversation to have.

But right now, the idea that people want this kind of truly equitable one-to-one, “OpenAI is gonna pay me for all my content and data”, that's just never gonna happen. And we need to come to terms with that.

Are you familiar with the Fairly Trained organization and movement? Have you heard of that?

I am! I know them because, Michael Bommarito and Jillian Bommarito, they trained their Kelvin Legal Model [KL3M], I think, and got the first Fairly Trained certification deal.

I know it's possible. And there are probably uses, particularly for small discrete language models, and particularly for things where the provenance of the data itself is more amenable to a large-scale licensing agreement, like court cases, for example. I think it has some utility and benefits for certain situations, for what I think would be more easily classified as small language models, instead of large language models. I can see valid use cases there.

But I still, and maybe someone with the Fairly Trained group can walk me through this, I still cannot mentally conceive of a situation where the large language models that we know and love and use will ever be able to get Fairly Trained certification. Or do things in a fair and compensable way, of the standards the organization has laid out?

Yeah. I think the biggest thing there is that those large companies don't have any business incentive to do it the right way.

Yep. Yep. That too. Like I said, the laws aren't really enforcing anything. They're not going to get fined. The GDPR has been a remarkable example of good intention - promised fines and staggering business impact. And there's been some things that have definitely improved under the GDPR. But the promised, hoped-for big huge fines to shut down big tech, or to massively change behavior from what are ultimately data-extractive business practices, that hasn't changed, even with the GDPR. They build in the line item of, “Okay, well, we're gonna get fined by Ireland, or we're gonna get fined by Germany, or we're gonna get fined by the Court of Justice or whatever” into their business, into their profit and loss statements.

It was just part of their standard operating budget.

Yeah, yeah, exactly. Not being totally sarcastic there. That is actually exactly what they do. And the large language models, you know, OpenAI's and and Google's and everyone else, that they're doing the same thing. Because there's never gonna be enough money that they get charged that will make it more costly than actually trying to comply with what are actually the copyright and data protection laws, mostly because it would be impossible.

I wanted to talk to you about, with your background in data privacy, the upcoming Data Privacy Protection Day - Data Protection Day?

It's either. I think it depends. Like, in the US it’s ‘Data Privacy Day’. At the UN and the EU, we firmly scream, “No, it's Data Protection.”

Okay! Alright. And as consumers and as members of the public, our personal data and content has almost certainly been used by AI-based tools or systems. Do you know of any cases that you could share, obviously without disclosing any sensitive personal information?

Any cases of personal data being used in LLM systems?

In any AI-based tool - it wouldn't have to be a generative AI tool.

Yeah. It's EVERY tool <laughter>. Even with machine learning, in the old-school style, there's medical machine learning devices or systems that are being used.

Well, there was one that I discovered a couple of months ago that was being used to identify whether or not men had STDs on their junk. Obviously, in order to train a model to do that, they had to get pictures of people's anatomy. And they did that without any kind of licensing or permissioning or anything else. That company fortunately got shut down, but yeah. It was marketed to women, primarily, to, you know, before they hooked up with a guy, they would allegedly get his consent, take a picture of his member. And then the machine learning model would allegedly analyze it and say if he had an STD or not. It was a wildly invasive, insane use of machine learning. And, like I said, I'm very glad that they got shut down, sort of. I say -

Wow.

I say “sort of” because they kind of still have a thing, but it's - yeah, ugh, it's complicated. But they're at least not making it publicly available to anyone who can download an app on the App Store, which is nice. So, yeah, so there's a good case.

There's also the situation that I had that led me to go learn about ‘machine unlearning’, which is that these providers have my personal data and your personal data and everything else. If you do a search for “Who is Carey Lening?”, on a lot of the language providers, they will come up, very helpfully, with lots of information. And so there's a right in the EU, and in states like California and Texas and Virginia, that say, “Hey, if you have your personal data out there, and you don't want the company to be having it, and they don't have a lawful reason for processing it (which they don't, because they were processing it illegally), you can demand that they delete it.”

Except that they CAN’T. So they have to do interesting, tricksy things to make it look like they deleted it, but they don't. That is a 3-hour long conversation that I will not get into here. But I did write some very interesting articles about the subject.

But my personal data was one of those things that I knew for a fact OpenAI and Perplexity, Gemini, and all these other ones had - oh, sorry, not Gemini. Gemini and Anthropic are actually pretty decent about not including, or at least, making personal data available.

If you can share some links to the articles that you've written about this topic, we'll include them in the interview at this point, so the people who are reading and listening can follow-up on that.

Sure. Happy to do it.

Great. Great.

Privacat Insights

Can LLMs Unlearn? Part 1: Predictable Law, Unpredictable Machines

For those of you who are following along, this has been a many months-long quest to discover whether or not large language model (LLM) providers can comply with data subject rights like the right to erasure & the right to be forgotten (RTBF) and whether LLMs generally can unlearn. To make life simpler for everyone, I’ve created some breadcrumbs…

a year ago · 7 likes · 2 comments · Carey Lening

So do you know of any companies that you intentionally gave your data or content to that made you aware that they might use your information for training an AI machine learning system, or does it come as a surprise?

No, none of it's a surprise. But I'm a nerd, and I read these things. You know, I read contracts. So Zoom, for instance, unless they have changed something recently, they released in their privacy statement - oh God, a year ago - that they were going to take any conversations that were recorded, especially any conversations that implemented their AI chatbot thingy. And they were going to use all that data, voices and transcripts and text and names and everything else, for training their machine learning model.

Every single one of the large language model providers, a good chunk of even more generalizable automation tools, they all use the data.

Outside of occasional hobbyist things, for me to see what information that they might have about me, I don't give my information away. I use email privacy email addresses. I don't include my name. I keep my inputs pretty constrained.

Again, that's mostly to do with how I use the systems anyway. And the only time I really ever do searches on myself, or about people, for that matter, is if I am kind of stress testing systems. Because I plan to file a complaint, or because I'm gonna try to do an erasure or ‘right to be forgotten’ request.

Are there any cases where a company's use of your personal data and content has created any specific issues for you, such as privacy invasions, phishing, loss of income, anything like that?

I mean, it's always privacy-invasive, right? Google is privacy-invasive. Any kind of data broker is privacy-invasive. These are all using automated decision-making tools and automated systems and services, machine learning models, whatever, to make decisions about you.

Advertising is using machine learning models to make decisions about you. It dictates what you see in ads. It dictates what kind of offers you're given when you're shopping on Amazon.com or Temu or any of these other sites, right? They always have impacts.

Has it had an impact that has led to a compromise of my ‘fundamental rights and freedoms’, which is the standard here in Europe? Probably? But in Ireland, it costs an awful lot of money to bring a lawsuit. In the US, it definitely costs a lot of money to bring a lawsuit. The Irish regulator, God bless them, they don't move very fast, if at all. So there's not a lot of options.

I've written a little bit about this in the blog. It comes down to a collective action problem. And the companies know this. So the collective action problem is, of course, that, individually, we're very powerless. This needs to be a class of people, millions of people, or a governmental entity, or someone with insanely deep pockets, who can bring an action and hold organizations to account.

Privacat Insights

Are we Already Living in an Optimal World?

I am currently sitting in London Heathrow waiting for a extremely delayed flight back to Dublin. I was here to present at the brilliant Gikii conference at UCL! It was a wonderful, intimate event, and I got to yammer at the audience on a subject that I’ve been meaning to write on for some time: namely, how the book…

10 months ago · 3 likes · Carey Lening

The laws don't seem to really meaningfully move the needle. There’s some examples in some contexts where it's been beneficial. But in terms of data stuff, GDPR helps. CCPA helps. But it has relatively small impacts. People are constantly, constantly having really negative things happen. You know, there was a case in, I want to say it's Tennessee or Kentucky. There was another one in the UK where people's benefits were being dictated by automated decision-making. Basically, a machine learning model that was deciding to accept or reject people, based on biased data, and assumptions about individuals from things like their gender and their age, not related to their needs or status. Not related to something that actually should be used, or that would more reasonably be used as a metric. It was, “If you are a woman and you are under the age of 35, you are more likely to get pregnant, so therefore, you shouldn't have benefits.” It's that kind of stuff. So this stuff happens all the time. And, you know, people sue when they can, but it's hard.

You probably heard about the recent uproar around United Healthcare and the way that they were using machine learning models to routinely deny a very high percentage of benefits situations.

Yep. And they all do it. Every single one of the US health care companies do it. It's the absolute worst. I mean, these people - I don't condone violence, but I can understand where people, how they get into that state, because it's infuriating. It's deeply, deeply screwed up that we are in a society where this is the status quo, and this is the accepted behavior, and there's nothing that is likely to change that, at least in the foreseeable future, certainly not in the US.

In Europe, they keep trying. In some other countries, they keep trying. But the US had some hope, with the recent FTC actions, especially targeting data brokers and targeting other kind of abusive practices by - I don't think it was insurers, but I want to say it was financial services, I think, and someone else, where they actually went in and said, “Okay, no, you can't be using it this way because this is actually harming people.”

Yeah. It's interesting, when I've talked with people on these interviews about GDPR and CCPA. One thing that has come up is that most of us in the US are envious of people in Europe who have GDPR protections that we don't. One of the better examples there is what happened last summer with Meta, and them saying that they were going to make it official that they were going to start using our content, if we had any, in Facebook or Instagram or their other services - to start using it for training AI and machine learning models. But you were only really able to opt out of it if you were in Europe and you had GDPR protection. Because in the US, you could go through all the motions of saying “Here, I want to opt out”. Then the answer was basically, “Yeah, we don't have to do that.”

I mean, they didn't have to do it for a lot of states, but they did have to do it for states that have GDPR-ish - not GDPR, but GDPR-ish - laws. Most of the US states have an objection right. And so they do have to honor it for that.

And now even in Europe, even on the European side, at least until the Data Protection Commission stepped in and spanked them up and down, left and right, the opt-out process was a nightmare. I did it. It was a pain in the hole. It wasn't easy, and they intentionally obfuscated it, made it hard to find, and complicated, and unpleasant. And they did it on purpose, because they don't want you to opt out. They don't wanna make it easy. So that's what a lot of these companies did.

You know, I deleted my Twitter account for about 7 different reasons, but one of them being that they were going to wholesale train on tweets, and did not seem to give anything about whether or not users consented to this. At least they had started saying you could opt out. And then, I think, the ketamine-fueled dictator-in-chief right now and on the Twitter side, who shall remain nameless, decided “Nah, we don't care.” So it was like, “Okay, well, F you, I'm leaving.”

You probably also heard about LinkedIn and that recent fiasco. One thing that was interesting there is that those of us who were not in a region covered by GDPR were opted in by default. So that, basically, anything we had up till that point was, whether we liked it or not, they were claiming they had the right to take it, but you could opt out from that point onward. But people who were covered by GDPR were automatically opted out, whereas we were not.

Yep, didn't even see the pop up. It was a fascinating little exercise. And actually, what's really interesting is, a lot of that had more to do with the Digital Markets Act and the Digital Services Act than it did with the GDPR. It had a little bit to do with the GDPR, but it had to do with the Digital Markets Act. It has all these rules about, in corporate, if you are a large online provider, very large online provider, or a large service like LinkedIn is, you have extra, special bonus obligations where you cannot just combine data, and can't use things just willy-nilly, like you can in the States. That is a unique to the EU set of laws.

The DMA has actually been, in my opinion, more effective than the GDPR, much faster, because it's constrained. And they are going after a handful of companies, instead of trying to boil the ocean for every single bad act, which I think is, in the net, a better way to go. Most organizations are largely ignorant of data protection, but they're not evil. They're not intentionally trying to do the wrong thing. And they're not using data in a way that's creepy and extractive.

And so applying the very strong, stringent obligations of the GDPR sometimes is net negative, because people just sort of throw up their hands, and they say, “Well, okay. We don't know what to do, so we're gonna do nothing.” Or we have to follow all these rules that are largely designed to go after the worst behaviors of tech companies. And they say “Well, I don't know how to operate here. I don't know how to do that.”

That's a big problem. Whereas if Europe spent a little bit more time, in my opinion, just primarily focused on the worst offenders - some of those are small, but most of them, it's the usual suspects - I think that might actually meaningfully impact, but they'd have to target those organizations, and that's not exactly what they're doing. And in Ireland, everything is complicated, again because the court systems here are similarly costly, in a way that they are not in countries like Germany and Austria. And so there's more success in those countries. And I think if more tech companies were governed by those laws, it might move things a little bit faster. But Ireland is a slow, sloggy common law system that costs silly amounts of money to bring in a case or a court action, and so that limits effectiveness. Like, the regulators can fine these organizations like crazy, but the company instantly appeals, and then it sits in litigation for a decade.

Yeah. I hadn't actually heard too much about the Digital Markets Act, so I'm glad that you brought that up.

Yeah. There's so many laws <laughter>. I could keep talking about laws - anyway.

Yeah. Well, it's good to hear that there's one that is proving to be effective to some degree at least.

Yeah.

That's good! So we've talked a lot about the things that companies are doing that basically violate our trust, and then fail to protect our data and our privacy. There have been some surveys that public distrust of AI and tech companies has been growing. And that's understandable, right, in the face of everything that we keep learning about what they're doing with our data. What do you think is the most important thing that AI companies ought to do to earn and to keep our trust?

Oh. I mean, a lot of it has to do with fundamentally changing their core business models. But, again, as I've talked about, and you acknowledge, so long as the market rewards that behavior, there's no incentive for them to care. Cory Doctorow, you know, refers to it as ‘enshittification’.

Yep.

First, you love your users. Then you focus on loving the business owners and the enterprise customers. And then you kind of gradually screw over the users. And then you gradually screw over the business owners, to love and appreciate the shareholders. And then eventually even screw over the shareholders. And everything turns to shit. Sorry, Cory, if I'm fundamentally botching that one, but it's been a minute.

So we're in kind of an enshittification state right now. And presumably, a company could repair that by being more transparent, by making better choices that are less harmful. I think if data brokers died in a fire tomorrow, that would go a long way, because I think that there's a lot of disincentives for markets that promote that. The data brokers in particular have lots of net-negative harms. People's lives are actually impacted.

But the companies could do better. They just don't want to. If they change their business models, yeah. If they stopped doing data-extractive things. If they built in more privacy-preserving technologies. If they, you know, actually incorporated things where they were listening to their users. That would help.

You know, I’ve been kind of a ‘Debbie Downer’ in this call, perhaps. But one area where I think they could make appreciable improvements is if they shifted to on-device processing. Federated learning and these other kinds of uses, where users have a little bit more control, and data is not immediately getting shoved back to these massive corporations that are not necessarily protecting the data very well, and shifting it around, and sharing it with everybody and their brother. I think if they started to incorporate privacy-enhancing technologies and doing more stuff on on-device learning, and then using, say, aggregate data or pseudonymized data, I think that would go a long way.

There's very few cases where they can genuinely use anonymized data, and I think that that's an education thing, an education component that people in my field probably need to start teaching people about. I don't know if your listeners necessarily - where they are in the technical front, but the distinctions between pseudonymized and anonymized data here in Europe are: pseudonymized data can technically be used to map back to an individual. Think a user ID that doesn't include the name and the email address fields, right?

Mmhmm.

That ID is still personally identifiable data. The organization has the database that might still have that information. They can still identify you. And there's all sorts of examples, like the Netflix challenge and some of these other things where people have been able to use what is seemingly anonymous data to re-identify an individual.

Truly anonymous data is hard. Truly anonymous data usually ends up getting rid of signals that might actually be meaningful.

In the healthcare space, for instance, there's a lot of times where you have a really rare disease. It's effectively impossible to do any kind of data analysis against that disease. If you're one of 5 people or, say one of 300 people, or 3,000 people even, that have something like Tay-Sachs, or some kind of genetic abnormality, it's trivial to re-identify those people.

In Ireland, it's kind of a running joke that everyone knows one another within 2 degrees of separation. So things that would be maybe anonymous in a country of 350 million are vastly less anonymous when you know that the person whose record you might be looking at, which is allegedly anonymized, actually refers to Bob O'Connell down the street. And, yes, I have actually had cases where that has happened. It is mind-blowing.

Yeah. There was some discussion some years ago, and I was in a corporate leadership role, and there were some questions about doing employee surveys. And they ask you “Oh, so it's anonymous.” And so even if you assume that they don't trace the IP address or the email address or anything like that from the account, if you check off a box that says, I am female, and I'm in this age bracket, and I'm in this department - how many women of my age are in that department? Oh, you know, me, and maybe one other. Answering that survey really wouldn't BE anonymous for me.

Right. Yeah. There's enough demographic information that, if they're asking about it, it does become really simple to figure out who you are. And, unfortunately, I know of cases where that has had negative effects on people. You know, if they say something, if they're honest about the organization - well, companies, turn out, they don't really want honesty, always. They want glowing kudos and accolades. And they get really pissed off when you say, “Gosh, this work environment really sucks.” Even if you're being candid and saying what's true, they don't like that. So they get fired or they get retaliated against.

And it's really hard to say, you know, a lot of people don't necessarily trace back that it's that company survey, or that exit, or that HR interview that they had, legitimately trying to, you know, raise a concern, that that's what's led to their retaliation or dismissal.

So we've talked a lot about what some of the really big companies are doing and some concerns around that. I'm very interested personally in the startup space and solopreneurs and entrepreneurs. And I'm wondering, if you were talking with someone who was just getting started, wanted to launch a new business, and wanted to use AI-based tools, or wanted to build a business around an AI-based tool as a platform, what advice would you give them on how to start off in a good way, in an ethical way, and not be evil?

Right. I think it depends on what kind of business they have, right? There's plenty of use cases for these systems where they're not evil. They're not creepy. They can be genuinely beneficial.

I go back to the medical space, for instance. Or I go back to the NotebookLM thing. If there was some startup that came up with a really cool tool like that, I would happily pay them money for it, for the kinds of use cases that I'm doing.

And there's definitely some open source things that you can host locally, and you can do that. So hosting locally, not centralizing, is a good way to do it.

Thinking about the use, and thinking about the kind of data you need, is just critical. When I go back to an organization, and they come to me and they say, “I have this new shiny thing that I wanna do with data. But it's gonna involve collecting all this information about people. And I wanna do this in a privacy-preserving way.” And I say, “Well, the first thing I wanted to understand is what kind of data. And the next thing I want to understand is why. WHY do you need this? DO you need this?”

And companies that want to do it right - and I have clients that have absolutely done this thing right - we spend time thinking about, “Okay, well, how can we do this? How can we achieve your goal in a way that is privacy-preserving, that's data-minimizing, that will build trust?”

I had a medical device company, and we've been working together off and on for ways that they can do what is genuinely innovative and beneficial to the human race, in terms of detecting certain kinds of diseases, in a way that is privacy-preserving, in a way that is minimizing the harm. And that involves things like ‘differential privacy’. That involves things, in their case, like aggregation as opposed to a direct identification. It involves limiting the kind of data that they were collecting. And it involves transparency.

I've seen it happen. And startups are great for that, because you can start early. What's really hard is when an organization is so big and so ossified, and they've built their entire business model around either bad guidance or just winging it. Because then it's really hard to change. It's really hard to iterate and fix the problems when you've already built in all the data infrastructure and the collection.

So there's that. Like I said, privacy by design is a lot easier with the startups. And then just kind of thinking from the standpoint of users, as opposed to from the standpoint of “How much money can I make?”

So looking at it, I spend a lot of time educating people. Not on all the different categories of personal data, or the laws, or any of this other garbage, right? Because no one cares. But what they do seem to resonate with is, “How might this impact someone I love?”

When I talk about personal data, I don't give them the laundry list of everything that could be considered personal data. I say, “Is this something that you think you could use to identify a loved one, a vulnerable person, your mom, you?” And we talk about how. If they don't think it is, I'm like, “What about this? What if I put this with this? What if I do this? If I use this ID and this phone number together? What if I use this ID and these kind of behavioral patterns? Can you identify them?” And they're like, “Oh, well, I guess so. You're right.” Because there's a small number of people, or even if the individual data points aren't identifiable, together they become identifiable. So that's what I spend a lot of time giving guidance on, is that thinking and reframing.

It's just like when I was talking about software. It's breaking the problem down. When I'm trying to get a system to do code, I break that problem down into these discrete parts that are understandable to the system. It's the same thing with data protection. It's breaking things down to a way that normal people can understand and get it.

Because no one cares about the law. Genuinely, no one cares about the law, except other nerdy lawyers and data protection dorks like me. And they shouldn't have to, you know. I try not to talk about GDPR article numbers, because who cares. Or the equivalents in the CCPA, or the Texas privacy laws, or any of these other things. No one cares. But they want to understand how it affects them, how it affects their loved ones, and that's my job.

Thank you so much for sharing all of these thoughts and stories with me and with the audience! Is there anything else that you'd like to share about Data Protection? Sounds like you're doing some really interesting work with your clients.

Yeah. I mean, I like what I do, when I can actually get positive change happening. And so one of the things that I'll share is, I'm not doing anything specific for Data Privacy or Data Protection Day, but it is the 28th of this month. And so releasing the podcast on the day seems appropriate.

And I think my main takeaway is: come read my blog! I have a goal this year of trying to triple my subscriber count. So if this podcast helps, yay! And just trying to get the word out there, trying to get people to think a little differently.

I think I've been more successful than I realized, because it turns out that if you break these things down into stuff that makes sense to people, as opposed to stuff that's just like boring recitations of the law, or boring recitations of court cases, most people I think are on the same page. Most people recognize this stuff and the behaviors are bad.

And I would like to get regulators to be a little bit more aware of what I'm writing about, and also some of these tech companies, the folks that can actually make a decision and a difference. So that's my overriding goal. But I'll share links, or you'll share links, I guess, and hopefully get a little bit more interest!

Right! Well, we'll have a big link to your newsletter embedded in the article. And the best way for someone to get in touch with you, if they wanted to talk to you about helping them with their business, what would that be?

So they can either email me or ping me on the various socials, but probably emailing is the easiest. And it's my name, so it's C A R E Y at priva, P R I V A, dot C A T. There's a whole fun story about how I got the Catalan domain name.

The dot cat! What a cool URL.

Dot cat, right! But I have to go to Barcelona now to go do a thing! Anyway, yeah, so that's, carey@priva.cat.

Nice. Alright. Well, Carey, thank you so much for joining me today for this interview! It's been a lot of fun, and I thank you for your time.

Thank you, Karen.

Interview References and Links

Carey Lening on LinkedIn

Carey Lening on Substack / Privacat Insights

Privacat Insights

From Frustration to Automation: How I Built a LLM-Powered Legal Case Summarizer

I’m going to tell you a little secret about myself: I am annoyed easily…

5 months ago · 5 likes · Carey Lening

About this interview series and newsletter

This post is part of our AI6P interview series on “AI, Software, and Wetware”. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools, or are being affected by AI.

And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”:

"But I don't use AI": 8 Sets of Examples of Everyday AI, Everywhere

Karen Smiley

September 17, 2024

Read full story

We want to hear from a diverse pool of people worldwide in a variety of roles. (No technical experience with AI is required.) If you’re interested in being a featured interview guest, anonymous or with credit, please check out our guest FAQ and get in touch!

6 'P's in AI Pods (AI6P) is a 100% reader-supported publication. (No ads, no affiliate links, no paywalls on new posts). All new posts are FREE to read and listen to. To automatically receive new AI6P posts and support our work, consider becoming a subscriber (it’s free)!

Series Credits and References

Audio Sound Effect from Pixabay

AI, Software, & Wetware Interview List

Karen Smiley

August 5, 2024

Read full story

AI Glossary and References

Karen Smiley

August 10, 2024

Read full story

AI Fundamentals #01: What Is "AI"?

Karen Smiley

September 16, 2024

Read full story

If you enjoyed this interview, my guest and I would love to have your support via a heart, share, restack, or Note! One-time tips or voluntary donations via paid subscription are always welcome and appreciated, too 😊
Share