6 'P's in AI Pods (AI6P)
6 Ps in AI Pods (AI6P)
🗣️ AISW #054: Aaron Wilkerson, USA-based data and analytics leader
0:00
-52:27

🗣️ AISW #054: Aaron Wilkerson, USA-based data and analytics leader

Interview with USA-based data and analytics leader Aaron Wilkerson on his stories of using AI and how he feels about AI using people's data and content (audio; 52:27)

Introduction - Aaron Wilkerson

This post is part of our AI6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.

This interview is available as an audio recording (embedded here in the post, and later in our AI6P external podcasts). This post includes the full, human-edited transcript.

Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence? for reference.


Photo of Aaron Wilkerson, provided by Aaron and used with his permission. All rights reserved to him.

Interview - Aaron Wilkerson

I’m delighted to welcome

as my guest today on “AI, Software, and Wetware”. Aaron, thank you so much for joining me today for this interview! Please tell us about yourself, who you are, and what you do.

Sure. Thank you. I appreciate you inviting me. Looking forward to our discussion. So I'm Aaron Wilkerson, Director of Data Strategy and Products at Carhartt.

I've been working in the data industry for about 18 years now. So I'm one of the few people who have worked in data their entire career. So out of college, I wanted to go into technology consulting.

My first job was working in a kind of data team. This was back when we had actual physical servers. We were building up reporting servers. We used to do business objects back in the day before SAP bought Business Objects. It was Cognos Reporting back before even IBM bought them. So we're using a lot of those different BI [Business Intelligence] cube type of reporting platforms. I was in the team that was responsible for building up those reporting capabilities more on the administrator side of building out capabilities.

I was there for a couple of years, and got to traveling a little bit, decided I didn't want to travel as much. So kind of leaned in on the technology side of business objects. So became more of a business objects administrator, architect. So I can build you up an environment. It’ll deploy all the capabilities for you. So I really leaned into the reporting side of the data area.

And then eventually someone asked me about ETL [Extract, Transform, Load]. So I didn't know what that was. I learned about ETL. So I became an ETL developer for a couple of years, learned about data warehousing because at that point, I had really no clue what data warehousing was. So I really dug into the world of data warehousing.

Was very curious, but then I dug into architecture, so became kind of a pseudo data architect for a little bit, kind of digging into the whole ETL data warehousing area. And eventually I wanted to try out leadership, so after architecture, dug into becoming a team lead and a manager.

So I've been doing, I would say, data leadership for the past 10 years. So I've been doing quite a few different roles, you know, development, administration, light DBA work, report developer, data analyst, business analyst. So, done a little bit of everything in the data world.

One thing I haven't done, I've not done a pure data science or machine learning or AI-specific role. But definitely have dabbled in that a little bit. But I've done pretty much everything else you can do in the data industry. So I can definitely talk shop and speak the lingo because I've done it. I've done it all.

Yeah, that's great that you have such a broad background in the whole area of data. And like you said, there are not that many of us that actually have been working in data all along. A lot of people come into it from other domains, which is interesting too. I think we get the value from having different perspectives there. But that's a good overview. So thank you for sharing that.

Absolutely.

So what's your level of experience with AI and machine learning and analytics? I'd like to hear about how you've used it professionally. Or personally, or if you’ve studied the technology.

I've been tracking this area for, I would say at least the past 12 or 13 years. The name has changed. When I was first working, we used to call it just predictive analytics. So that was the big term at the time, like in the early 2010s, was predictive analytics and learning about that.

And then eventually they kind of changed it into data science. Everything became - no, I’m sorry, it was, it was predictive analytics, it was big data. Everything was big data. And I was curious about big data. And then everything was data science, data science conferences.

And then everything became machine learning. And then everything became AI. So I would say I've been tracking along the entire time, like learning about, “Okay, what is Predictive Analytics? Okay, what are like these different data science models with machine learning models?”

So I would say I've been following along, kind of on a nerdy side, just trying to learn about, like, “What is this industry? What do you use for these things?”

I did attend a couple of conferences where I did learn more about algorithms. So I went to a TDWI conference where they talked about classification algorithms. I learned about linear regression, logistic regression, decision trees, random forest, and all those types of things.

So I would say, definitely, I know the lingo. I know kind of the conceptual understanding of all those from a professional realm. So definitely I've never been in one of those roles. I've always been on kind of the analytics, BI leadership track of it, but not - I'll say many companies I've been to have not had a pure data science function.

They may have brought in a data scientist for a project, but never had a peer team for that. But what's interesting at Carhartt is that we do have an actual data science team at Carhartt. We have a manager of that team. We have some machine learning engineers. So it is cool being a little bit closer with that team just to understand things that they're doing.

So I'd say on the professional side, I've been following along for the past 12 or so years just to see how that industry has evolved. I know recently with the GenAI kind of boom, that's very exciting, but it sounds like those traditional data science, machine learning techniques are still needed. I mean, business data people think business data is exciting, but I would say it's very basic in terms of business data.

Now, if you use very basic machine learning techniques, I think you'll still get a lot of value. I know there's a couple people even on LinkedIn that talk about just with Excel, there's tons of things you can do with machine learning and data science, with all the different ways you can do modeling. So I know there's a lot of opportunities out there, but I would say it's still the bare bones.

From a personal lens, I do use AI tools. The biggest is I do a lot of writing on LinkedIn, so Grammarly. Another is a Hemingway app. So there's a couple genAI apps I use for writing, which I find very helpful because — I mean, I did pretty well in English, but they find things like semi-colons. I've never really learned how to use semicolons. But when I use these different apps, all of a sudden you see me using semicolons and all these different things. Those have been hugely helpful in terms of editing.

I don't really use them to generate content. I would say I write my own content. I use them to do proofreading editing. So they will help me rewrite things and kind of figure out better ways to say things, which I'm a huge fan of.

So I'll say personal life is mainly - I'm an Android user. So I really don't use, like, “Hey, Google” to do things for me. I really do that stuff myself. So I would say just really the writing tools. I use a lot of AI on a day-to-day basis.

So I'll say that's my kind of exposure. I definitely know the field. I mean, I write on LinkedIn about AI, different topics, just following the big genAI boom the past couple years, all the different language models, so definitely been watching it from afar.

There's one thing you said there that I really want to call out, which is that AI is very popular. A lot of people are very excited about all of its potential value. But there's a tendency to run around with it like a big hammer, and everything is a nail, and apply it. “Okay, here I have AI. What can I do with it?”

Yeah.

The way I like to look at it is: what's the problem you want to solve, and what's the simplest thing that can solve it? And a lot of times it's not AI. It's not machine learning. It's, you know, simple regression or visualizing data on a graph, and you can get the insights you need from that. So I'm really a big fan of keeping it simple. So it sounds like we have a similar mindset there.

Yeah, absolutely. And I've listened to a lot of podcasts like my AI in the big firms. Everyone says the same thing. It's great you have all these tools, but you really need to know what you're going to use it for. Because, you know, you can learn the hard way – like, you bring in all this new technology, toolsets, you hire all these people. But if you don't understand what you're doing, you're just wasting a lot of money or the company's money. And that usually doesn't last that long. There's expectations.

Like if you want to spend a million dollars on hiring a bunch of data scientists, bringing consultants to deliver these things, but if you really don't have a project to work on, then you're kind of chasing around everyone saying, “Hey, what can I use AI on? What can I use AI on?” So that's something that I think you really want to avoid as best you can.

But I think that's what I hear a lot of, from different practitioners just around the industry, in terms of you really need to understand what the problem is, before you go around saying “AI”.

So I think when people talk about AI strategy, “What's your AI strategy?” - that's kind of leading technology-first. Like saying, like, “What's your cloud strategy?” or “What's your digital strategy?” Unless you understand what problem you're trying to solve, the business challenges, opportunities, I think a lot of people are getting themselves in a lot of trouble, like trying to go around with these different AI.

I mean, I see a lot of companies bringing in consultants to define the AI strategy. So they go interview all these different people in the business, essentially trying to find work, like, “Okay, what do you have? What are you working on?”

So you kind of cobble together this AI strategy, but it's a bunch of different pieces of it. It's nothing cohesive. They're trying to shoehorn AI into your organization, which becomes very expensive with very little ROI, and it just doesn't work. It works for the consultants because they get to charge you for the big PowerPoint presentations. But it doesn't really help the business out much.

Yeah, I'd like to hear some specific stories about the ways that you've used some of these tools. You mentioned Grammarly and Hemingway. So I had done some experiments with Grammarly to measure the readability of my writing last year. And it has some nice capabilities, but I also found some of its suggestions kind of annoying.

Yeah. Yes. Yeah, I think that's the rub of it, too, when people talk about AI replacing everyone. It's like those are situations where I use Grammarly with a grain of salt, right? Because I'll write something that to me sounds very good.

But I would say they kind of take, sometimes, the humanity out of your writing, or like the different expressions. So sometimes I may add in some different adverbs or adjectives to describe something, or kind of give it that personal sense. And their suggestions are, “Well, take that out. It's too many words.” Like, yeah, but I need that because that helps me tell the story. Instead of saying “I really, really want to go to the store. I really want to go to the store.” They'll say, “Well, take out the word really and just say, I want to go to the store.” But I want to emphasize how much I want to do these things.

So I would agree with you that those are situations where I do my writing, if I'm trying to make a point or try to describe something, I do have to ignore sometimes some of the suggestions. Because they will take out, like, yeah, grammatically, I don't need that word. But if I'm trying to share a story and trying to connect with other human beings, I need certain words or certain phrases. Or some things are not grammatically correct, but people understand what they mean.

So I would agree that I think that's the whole “human in the loop” part of it, where they're aids. They kind of help you out, but you can't just take everything they say at full value. So I don't ever just write something and then they fix it and I post it. I have to kind of look at, “Okay, I need to add that back in there. This needs to be back there.” So definitely you need that review of it as well.

So other than grammar checking, is there any particular problem that you would really like to have solved that AI might be able to help with? I'm wondering what it would take for you to kind of go all in on using AI for something other than the grammar and writing.

Yeah, I was trying to think about that. I think the purchasing, the buying of things. And I think that's where Amazon got caught with Alexa, where they thought that everyone was going to say, “Hey, Alexa, purchase, buy my groceries from Walmart.” But people didn't do that.

So I think that's a challenge, like trying to tie AI to actual purchasing decisions. So I know some people will say, “Well, I can use AI to make dining reservations. Hey Alexa, you know, book for me 12:30 at my favorite restaurant.” But the problem is, if your restaurant is booked at that time, where else do you want to go? The AI won't know that.

So I think that's the challenge I have is just the hesitancy. I want to say, well, whatever AI system, you know, “Go buy milk from Walmart.” But I have to be very specific. So I think that's the challenge is the specificity of, “Well, buy Walmart brand 2% milk, but only if it's like a dollar. If it's more than this price, then I don't want it, because I'm not spending that much money.”

So I think I would have to get comfortable with really understanding how to describe my own desires. But if I'm able to do that, then yeah, I think I would be more inclined to do that. But I think I have to give it so much context to my own personal taste. Like pizza. Like, well, “Can you order this pizza?” But it's also the pricing. Like, well, “But if I have a coupon, check if they have any deals on the site. If they have deals on the site, apply this code.”

So I think there's so much you'd have to kind of unpack, like how your mind works, to then describe that to an AI system to say “Do this, but if you can’t do this, yes, go ahead and proceed with that.”

So I think that's just me probably having to get out of my own way just to be okay with like, “Okay, it may mess this up. But it saved me a little bit of time not having to go to the website.”

Yeah, you mentioned looking at the price. Certainly, if you had a standing order for eggs, you might want to be rethinking or having some control over that right now!

Oh, yeah. Could you have been like, “What did it screw up? It screwed my order up.” I'd be like, “No, that's just the price of eggs!” Like, well, if I knew that, I wouldn't have ordered those eggs. I think some of the times humans work, like, “Well, if I knew that, I would have changed my decision.” But you have to describe that to a system.

Yeah, I’m curious, did you ever try the online ordering when we were in the depths of COVID, and a lot of people were getting groceries delivered, or something like that? Have you ever tried any of those sites where you could say, “Well, I want this, but if you're out of stock on it, then I want you to do that.” It's certainly not AI. It's a very manual process of specifying it.

Yeah, I've actually got to turn those off. Actually, I use Walmart. So I get a Walmart delivery every week to my house. I have done substitutes, but sometimes the substitutions aren’t very good. I think one time I ordered some chicken breasts, but they ended up sending me like frozen chicken, which is not what I wanted at all. So I usually find that I turn off the substitutions, because sometimes they're very much off, and the flavors are different, so I usually kind of turn off. I mean, I do a lot of online ordering. I do deliveries, but the recommendation sometimes, like what they'll substitute it with is just, it's not at all what I want.

So if there is a way to kind of train up a model to say, “Okay, these are the types of things I do want. These type of things I don't want.” So I think to your question, I think it’s really, you would need the AI to kind of learn. So it depends on what, if it's trying to learn you, you have to give it these examples to learn your mannerisms and your behavior to then think on behalf of you. But is there a lot of effort to do that? That's the question.

Yes. And is it worth it, right?

Right, exactly. And if I do, is it really that hard to order groceries? Or you're doing more of the convenience of it, like your time savings. Or you're just like, “I just want to nerd out and say that I did this with AIs”. And that's what a lot of people struggle with too, is, you know, are you really trying to solve for something, or just kind of playing around with new tools and technologies?

Right. So this has been a good discussion about ways that you have used a few AI tools. I'm curious if there are any times when you have avoided using AI-based tools for some things. And if you can share an example of when, and why you chose not to use AI for that?

Trying to think. Like substitutions or recommendations. For Netflix, recommendation algorithms, I would say they may recommend something. I say, “Well, not interested in that”. So I'm not really avoiding. I'm just kind of ignoring the recommendations, or I'm ignoring what it’s trying to present to me, because I'm not really interested in it.

But I would say the tricky part on the consumer side is that I don't think we all know the uses of AI. I think some of the consumer apps we use may use AI, but it may not be obvious to us.

So I can't think of a situation where I'm intentionally saying, you know, “I'm not going to use that”. But there may be times when I'm ignoring things, but not knowing that AI was serving up recommendations or giving me suggestions or trying to do things on my behalf, and I said ‘no’. So it's definitely possible.

Yeah, Netflix is a great example because they're well known for having machine learning under the hood, and recommendation algorithms, and having their competitions for improving the recommendations and such.

I use an analogy of an iceberg. There's some things with AI and machine learning that we can see that are above the waterline: the generative AI, and the robots that walk around.

Yeah.

But there's also a lot under the waterline, where we may not even realize that AI and machine learning are in the product in this situation.

Yeah. Yeah.

I'm curious, you had mentioned something earlier about AI meeting summary tools. Did you want to say a few words about that?

Yeah. That is something that my company does use a lot of. So we use Copilot for Office 365. So that is something that I think that really has caught on. I think that everywhere I go now, anytime you have virtual meetings, someone is clicking ‘record’ to get some type of AI meeting summary. So I would say that's something I have seen a lot of.

But when it comes to meeting summaries, I usually don't read them. I actually want to go through the meeting, because especially now you can do it faster. So I have a half hour meeting, if I listen to it at like 1.25 speed, I can get through it in like 20 minutes. So I usually prefer to go straight to the source to get the information.

That is an example of something I do avoid, is those AI meeting minute summaries, because usually they're very long. You end up having to read through the entire summary, which sometimes could be very long. So either I can spend time to read through the summary, or I can just look at the video myself.

So that is an example of, I do avoid those AI meeting summaries or trying to do transcription. I could listen to the meeting if I really want to know what happened.

Unless I'm trying to find something extremely specific, like, “Well, I remember I was in a meeting, but someone said something, so let me go try and find that very specific sentence or word.”

But I would not avoid a meeting and say, “I'm going to look at the notes and get the summary of it”. I guess I don't trust it enough to get all those nuances just yet.

But I know that's something pretty hot right now. I'm seeing AI transcription pretty much almost every meeting. If it's more private nature, one-on-ones, no. But general meetings, I'm seeing a lot of that being pretty pervasive.

Yeah, my team at Wind River piloted the Zoom meeting summary features when they were first coming out. And we had some interesting experiences with the meeting summaries not being accurate, or naming the wrong person in the action items, or things like that. So I'm sure they've improved somewhat since then. I'm not currently using those.

I was wondering if you felt that the tools weren't accurate enough for you. It sounds more like you feel like the summaries don't capture some of the either audio or visual nuances or cues that you can get from watching it, or just playing it back at least.

Yeah, I think it's that. I think there's facial expression, there's mannerisms that people say, that people can give you a look like “I don't understand that” - they kind of give you a look.

I think there's usually just doubts. If I am needed to take this meeting and kind of do something with it, I don't want to look like someone who's unprepared. So like, “Well, I looked at the notes. I was like, well, I didn't say that. But AI notes said you said that. Well, I didn't say that. Oh, let me go check the AI notes.” You also don't want to get caught off guard with someone being like, “Yeah, that didn't happen, no one would say that.”

I do think there's also the talk times, right? So sometimes you worry, “Okay, am I talking too long in these meetings?” So it does break it out by person, like, “Well, this person talked for this much time. This person talked for this much time.” So it does give you a sense of meeting etiquette, and who's dominating a conversation, and who spoke the most. I do like that it does identify people by who spoke. So I think there are definitely a lot of nuances that it does pick up on. Certain parts, yes, very good. Certain parts, I may just go straight to the source and do it myself, just to get the understanding of it.

Yeah, that makes total sense. So thank you for sharing that example.

So one concern that I wanted to talk with you about today is where machine learning and AI systems get the data that they use for their training. You have this deep background in data, so I'm interested in your thoughts on this.

Yeah.

A lot of these systems will use data that people have put into online systems, or they've published online. Or they signed up for Netflix and they gave it their birthday and their preferences. And it knows who their family members are.

Yeah.

So there's all this information that's out there, and companies aren't always very transparent about how they're planning to use our data when we sign up for these services.

Yeah.

So I'm wondering how you feel about companies that are using data and content for training their AI and ML systems and tools. And specifically wondering how you feel about whether companies should ethically be required to get consent from people, and maybe compensate people whose data they want to use for training their systems.

It's a great question. I think that's where the AI wars - like, it's a lot of legal battles right now. And we're seeing the AI policy get shaped through lawsuits and settlements and figuring out how to properly compensate someone.

My data is somewhat secure. But if you look me up, you can easily find my address. You can find my wife's name. You can find my phone number. So I don't try to pretend like our data is completely private. I know that a lot of information is just out there. It's out there for people who sold it, people have done things with it.

People have probably seen, lately, a lot of terms of service updates. And I'm sure there's usually little caveats and say, “Well, we may use your information to train an AI model”. I think that's just the way of the world now. I think that the only way to get out of it is to say, “I'm not going to use that service”. But the second you sign up for a service, that's what you're signing up for now. Especially if it's a free service, it's the kind of thing where “if you're not paying for something, then you are the product”.

So if I sign up for a free LinkedIn account, I've assumed that they're going to use all my information for this. If I sign up for a free Facebook account, they can use all my information. Well, I think that that's just become a norm where you just expect that – stuff that's NOT like social security number, bank account numbers – that's just going to be used. But also, I think that's not a new problem. Companies have been doing that for decades, even before the big Gen AI boom has been going on. So I think that's just something that we've all come to expect.

The question is monetization, right? I think that's where it differs. For example, I write on LinkedIn. I have a small newsletter on the side. So If I find they were using my newsletter to come up with a summary, to then sell that as a consulting service, that would be different. So if I can attribute my content or IP to them monetizing that, that's when I think it becomes, “Well, I need to get a piece of that, because you guys are making money off of my information.”

So I think that's where it depends on the monetization, right? So if someone just takes my demographic, my race, ethnicity, date of birth - that's not necessarily your IP. It's just attributes about you that they're using to make recommendations. But if they went in and found old essays and papers and stuff you wrote, that's more IP based that they're selling. Using my information to make money? I would like a piece of that, you know?

I mean, we do right now with music, like copyright. Copyright infringements, trademark infringements. That is a very big piece of our marketplace. That happened with Napster 20 plus years ago. People were, “Yeah, I put a lot of effort into that. So I want a cut. I want a percentage of what you're making off of that.”

I'll tell a quick story. So back 20 plus years in Napster days, I used to actually download music illegally. I used to burn onto CD, and I used to sell it to people. But the burner I used was my brother’s. My older brother had a burner. So I was using his burner and playing CDs. I would sell them. I gave him no money. Then he found out about it, got very upset. “You're using all my stuff. You're making money off of my burner, my blank CDs, and I’m not getting any money back for it.”

So I think that's a normal thing that the people who are providing those inputs and those raw materials do want a piece of that. So I do think that that's the question, just the monetization. And how do you make it fair for those people who you are benefiting off of, their content and intellectual property?

Yeah, you mentioned the law and all the lawsuits. There's over 30 that are active right now, just in the US alone, never mind anything that's going on overseas.

So it's a big area and music, especially, is one of the big topics where the major lawsuits are happening, where they're looking at copyright infringement. It’s exactly as you're saying that they're profiting from it. And they're able to trace the original creators’ music to the outputs that are coming out from some of these music generation tools. So it's a really interesting area.

And as far as writing, I've heard some people comment that, “Well, you know, if somebody wrote something on the internet and they posted it on WordPress, and that person's not around any more, you can't say that an AI tool shouldn't be allowed to use that for training.” But on the other hand, there are books. There are authors who publish books and they put a lot of effort into those.

And there was a story about Meta that just broke in mid-February. Meta had torrented 82 terabytes of data from pirated books. This is coming out as part of, I think, discovery from a lawsuit that's being prosecuted. And so that's obviously a copyright violation. And it's not just incidental or, “Oh, I happened to find something, and let me use it”. 82 terabytes is a LOT!

Yeah, and I think if you look at any time that you're selling a product or a service, you have vendors, right? So if I want to create a lemonade stand, I need water that I pay the city to distribute water. I have to pay for the lemons that I got from somewhere. So any piece of the supply chain, like all your different bill of materials, you have to make sure you're compensating – you're paying for all those different materials. And that's just like a basic economic principle.

If I front a hair salon, I have to buy the booth. I pay for electricity. I got to pay for the space. But you got to pay for all the different things.

So it's interesting to say that, in the AI space, “I'll pay Amazon to host it. I'll pay NVIDIA for my chips. But the actual data that it takes to create the models, I'm not going to pay for them - I'm just going to pull that myself.”

Well, that doesn't pass the basic sniff test. And it just doesn't make sense. Or flip it, like, would you want people to use your content or your things that you've written to make money off of, or profit off of? It's not a nonprofit.

To your point, I think it'd be interesting, just wild legal battle. I think that's where a lot is going to be settled is in court. Where they determine, “Well, in this court case, that court case said that.” So I think that's where we're all kind of watching from afar just to figure out how the litigation is going to settle out. I think that's going to shape a lot of just policies around AIs in terms of service and just legal agreements.

Yeah.

And revenue sharing. I think they're also trying to figure out the revenue sharing model. I think that's also being played out as well.

Being able to trace - for instance, if I were to use a tool and generate a song and how do I know which songwriters’ or which composers’ contributions ended up in my song, so that they can be credited properly? It's technically hard.

Yeah.

But AI is already hard. So it's not like this can't be done.

Yeah. Or you have to wait and they'll just sue you. That's how you find out, is whoever shows up and sues you. But that's just more of an expensive route.

Yeah, yeah. So as someone who has used some of these AI-based tools, such as Grammarly, do you feel like the tool providers have been transparent with you about sharing where the data came from that they used for building that tool?

I mean, I will be honest and say that I haven't really looked. I've used the service, but I don't necessarily look into where they got it from. I won't say I don't care, but I think we're all used to using products and services, but not knowing where they got it from. So when I go to the store and get milk, I don't ask “Well, which dairy farm did you get this from?” When I get water out of the sink, I don't figure out, “Well, which river or which body of water did you pump this out of and process through?”

So I think that we're all used to being somewhat ignorant about the ways in which we got that product or service. I think AI is no different in terms of, if there's no inherent risk or harm, if I know that this language model didn't harm someone in the process, I don't really look at my Bible and say. I'm sure if I did a search, I probably can find out where they got it from, but I would say I'm not taking it upon myself to go search that out.

But I would say it's pretty normal, though, in my day to day. I'm staring at, like, a paper towel. I don't really know where they got the paper from to make the paper towels and things like that. So I would say that's pretty par for the course. There's only certain far down the rabbit’s hole you may want to go to figure out where they source this from, but obviously I've not done that with any of the AI tools that I use.

Most companies are not very transparent about where they get their data or how it was enriched and processed. There's a lot of aspects to it that I think we're starting to uncover. For instance, some of the data enrichment is done by exploiting workers in Kenya and other countries, or in the poorer parts of the US.

And as we find out more about that, then I think people DO want to know, and they do want to be able to identify which companies are behaving ethically, which ones stole content and which ones paid musicians to create the snippets for them. And that's something that I think is becoming of increasing interest.

We tended to assume, I think, like you said, that they behave ethically, and that there's nothing there to worry about. But we're starting to uncover cases where it's like, “Oh, well, this is not good. And do I really want to support that with my money?”

And I think a lot of consumers deal with that. Even like with coffee, you think about fair trade coffee. That was… what, over 20 years ago, that I think ethical sourcing of coffee and the whole fair trade coffee and organics and understanding where it came from. Sweatshops, when it comes to the very big brands around sweatshop labor, the diamond trade. So I think there's a lot of that that's historically been around.

It's just as a consumer, how far deep down do you want to know where your stuff came from, or where they got that from, or how they process that? Not to say I don't care, but it's just something that wasn't that top of mind when I started using these tools is, “Well, I wonder where they got this from and what model they're using?” So to me, it's been more about, “Well, is it accurate? Can I use it? Is it helpful for me?” And then, you know, move on.

And honestly, it would be quite a lot of work for anybody to go in and figure out - if I'm trying to use Copilot, “Okay, where did they get the data that they use for training this part of the Copilot model, and what about this image generation?” So it's not an easy thing to do. And I think it's on the companies, really. We have to raise our expectations of them and expect them to make it easier for us to know.

Yeah. Or we expect the company to do that. Like you putting your trust in the company to say “I'm trusting that Microsoft is working with OpenAI, that they're going to figure all this out for me.” So that I'm putting my faith into Grammarly, I'm putting my faith into Hemingway app, or Microsoft, that they're going to do these things. Could be blind faith or blind trust. I think that's where it's implied that we're kind of trusting that they're figuring it out and they're humane, ethical people or ethical company, and they're doing that on our behalf.

But then you find out on the other hand, “Well, actually they don't do that. Well, I may need to make a switch.” But we make a lot of assumptions on a day-to-day basis. So that's one of the many assumptions we have to make.

Yeah, and one of the risks there – for instance, this came up last year with Adobe, I don't know if you remember. They had this big announcement with Firefly and it was right around the end of February. And they were saying how they had used only ethically sourced data and it could do all these great things. I thought, “Wow, that's pretty neat that they're being so upfront about it.”

But we discovered over the summer that actually it wasn't 100% ethically-sourced data, and they didn't have the entitlement to it all. And they didn't compensate some of the creators who were their paying customers for years and had their content in the Adobe sites and Adobe used that.

And then they took, I think, 5% to 10% of their content from Midjourney to train it. Midjourney was not ethically sourced; that's pretty well known.

Yeah.

So it kind of poisoned the well. They had this initial claim that, because everything was ethically sourced, it was safe for enterprises and people to use that and rely on it not being a copyright infringement. Well, they've poisoned the well on that. And now it's not necessarily safe. So they kind of shot themselves in the foot there. They did some good things, but they didn't do enough to make it safe for their customers and people that wanted to use their tools to avoid these legal and even financial risks of getting sued for infringement. So it's interesting to see how they've been responding to that.

Well, it's interesting. I had this similar discussion. We used a cloud vendor for one of our applications. They were in our office for an on-site discussion. So they're talking about all these AI capabilities they have and you have all that access to all these capabilities.

But I kind of called out and said, “We're not your only customers. You have many different customers. So if you're offering me these capabilities, you're offering that customer the exact same capabilities. And if you're going to serve some recommendations to us, you're probably also going to serve up similar recommendations to your customers, who could be competitors. And all of our data is sitting into your cloud platform. And it's great to talk about recommendation machine learning, but there's a certain amount of data that you need. So if we only have a very small bit of our data there, the quality of our recommendations are probably not very good. Or you have to source in a bunch of data to make recommendations.” So I asked that question. They kind of gave me a very candid “Your data is private, it's siloed. We don't mix it together.”

So I think that's where the question too becomes, as we come in more SaaS and cloud-based, we put all our data into the cloud and these SaaS tools. How do we know that some of the companies that we’re paying for it are also using our own data to give recommendations to our competitors? Or we're getting the exact same recommendations about things? It's like the race to the bottom where everyone's kind of getting service, the exact same recommendations from these different cloud platforms, based off of everyone’s data.

Yeah, so it's interesting thinking about these different SaaS cloud providers. Either they have crappy data, or they have they have low quality data, or small amounts of data. But if you're selling as one of your offers, “We can give you recommendations and all the things off your data”, I kind of questioned, “Okay, but you need more than just mine. Like, mine is not going to be enough to give me a high-quality recommendation.” Of course, they have to say, “Well, we don't do that and we keep your data private and protected and siloed.”

So that is something I'm kind of curious about, as we move on. To your point about disclosing where people are getting information from, I think on a B2B lens, that's something I'd be interested in. Just how those questions continue to come up around, where are different people sourcing data from? Very industry-specific or business-specific? That means you got that from a different company. You can't just find that just on the internet. That's not just publicly available information. So yeah, I mean, that's something else I was thinking about recently.

Yeah, this is a really interesting area. When I was looking at Grammarly, I dug into their terms and conditions and what they do and don't do with people's data. And if you have a paid enterprise type of account, they promise that they do firewall off each enterprise's data, so that the text that they're analyzing doesn't end up showing up in a competitor's site.

But if you're using a free account, which I was, then it's fair game. Anything that you put in there is something that they will use and mix in - which, that's the price for using the free tool, I guess. I was always very careful not to use anything that I wasn't planning to publish online publicly in a matter of a day or two. Because I'm just using it for scoring readability, so it didn't really matter. But I thought that was interesting to see how they were handling that.

But yeah, in a lot of cases, they couldn't possibly train on just one person or even one company's data, and get useful results from it. You need a certain amount of data, and you need diversity within that data to really be able to give good results.

Yeah, to your point, it may be a situation where you have to pay for it. So if you really want that type of privacy or you want to protect your content, you’re going to have to buy. The free services are out there, but it may be that if you really want to wall off your content, you have to pay for that service to ensure that your data is not being used. Or else if you use a free version, it's available for everyone else. So that is another interesting kind of you have to pay for privacy, so to speak.

Yeah, and that's been really controversial in some of the other areas outside the US that have better regulations on privacy. They are actually looking at cases where people were kind of being forced to pay for their data to be protected and saying, “No, everybody's entitled to at least this minimal level of data protection.” So there's been some activity around that, but not in the US yet, so far.

Yeah. Well, that even happens with SaaS, and getting your data out. So I paid for the platform. I put my information out there. But if I want to get my own data back, I got to pay like another 5 grand or 10 grand just to extract my own information out of there. So it becomes… these different revenue streams for these companies, that there's different new ways they can kind of get money from their different customers.

So I think, unfortunately, this is just another avenue that they can kind of say, “Well, if you want now to ensure this, you got to pay this as well.”

Yeah, so this is all a good discussion around when we purposely put data into a tool and we are signing up for it and we're using it. There are also situations where, just as members of the public or people who are walking around, or flying, or taking trips, or walking down the streets, that our personal data and information may be getting collected and getting used by systems that we don't necessarily have any knowledge of or any control over. And so I'm wondering if you know of any cases where your personal information has been captured this way and shared really without your consent?

I've not been hacked before, but I know that sometimes you get those alerts. Like the data leaks where you get emails saying, “Hey, there was a hack.” The company got hacked and then they uncovered it. So, I mean, I've had some fraudulent transactions being used against some of my personal accounts, which I had to go and cancel it. So I was in those situations, more of the illegal hacking and things like that.

My information has been used because someone got it from somewhere. But not on some of the AI tools. I've not known personally just my personal information being used. I mean, I'm sure it is, but I don't know any specific examples like that where I know for sure, “Oh, they use my information to do this and I can see it right there.” I may not be able to attribute at that level.

Yeah, this is, I think, again, coming back to the iceberg, where a lot of this is happening without us necessarily realizing the extent or the ways in which it's being used. One example is, you mentioned having an Android phone. So the phone can do things like identify people in pictures and automatically tag them or the social media sites do that. Or LinkedIn.

Oh, yeah. No, I would say you're right about the picture thing, like people finding their pictures or pictures of someone they know being used somewhere. Like, I didn't authorize that.

Yes, exactly. And you had mentioned LinkedIn. There was this bit of an uproar last summer when LinkedIn announced that they were opting all of us in by default

Yep.

and giving them permission after the fact to use all of our LinkedIn profile information, which has some personal content, and the post information, for training their AI tools. And there were even some questions about whether they’re using our DMs for training – which freaked me out a little bit, because we tend to think of DMs as private, and you wouldn't want that information leaking out into some tool.

But the only option that they gave us was opting out for our future posts, which I did right away. I know some people that actually didn't, because they said, “Hey, you know what? I'm smart and I have a point of view, and I want my point of view represented in whatever they're doing. So I WANT to share mine so people have different views on that.”

But LinkedIn wasn't very transparent about that, or very accommodating. Actually, the people in Europe who were covered under GDPR didn't have to deal with that. But we did.

Yeah, I do think it is going to be very interesting, like you said about the comic. Because we can opt out but a lot of these, the data is sitting somewhere. So for example, when I text on my phone, I also go Google messages so I can text on my computer, which means that all those text messages are sitting on a server. You think they're all local. They're not on your phone. They're sitting in the cloud somewhere. They pull from that. Same with DMs, Office 365 model. All your messages are not on your laptop. They're sitting on a server farm in some data center somewhere. And you're hoping for them to be benevolent to not use it. But there's no guarantees that they're not using it.

Of course, they can de-identify, they can anonymize it and say, well, we don't know who this is or what company it is. But I'm sure they have billions and trillions of just messages, emails at their disposal that they can use. Because we may have a tenant, but that tenant is usually just a virtual, logical separation, but they have it all. It's sitting out there for them to use. So you kind of have to hope, it's a term of service. You know, you may be able to do some auditing, to kind of make sure things are in place, like you're opting out of the contract and things like that.

But I mean, that's the price we pay for the cloud, like the convenience of not having to have a full-time office, like a Windows AD administrator or like an office administrator. The price you pay of putting all that stuff in the cloud is convenience. But that means they have access to all your stuff and they can use it.

So that's kind of the rub of, you know, being able to have all this convenience and, you know, I'm able to look at my phone and use Word for my phone and my computer, and Google Sheets on my laptop. So I think that's the trade-off of having all this new technology and all this connectedness of everything kind of being accessible and not on local devices anymore.

Or that's what you have to do. I think if you really want that, you have to really go back to an on-premise, siloed lockdown environment. So if you have transactions that you need that, I think that's where you have to pull it on-prem, have a lot of security and structure around that, if you really want that type of protection and privacy.

Yeah, there's been a lot of discussion lately with how much resources it takes to run these models and how it has to be run in these huge cloud systems. But then there's been some more talk recently that, for instance, DeepSeek came up with some more efficient algorithms. And there's been some talk, even, that some of these more-efficient language models can run locally on a machine, so that you can run it yourself. And it'll be interesting to see if more people start doing that. Obviously, it's not something an everyday ChatGPT user is going to want to do or be able to do, but there are going to be some people that will want to do that.

Yeah, there's a discussion around small language models, and do you really need to build out this thing? Or can you just take a little bit of information and build your own very small, for very specific use cases, a very specific data set?

All the major companies, it's going to become table stakes where they all have a large language model. You see that now. Every single tech company, all, now they have their own. So now it's like, “Okay, how do you then differentiate yourselves, or create platforms or enterprises?”

A car does not need to have a mega large language model for our business. So I think that's going to be interesting just to see like over the next couple of years. A large language model, that's expected. They expect that they have a model, a chat interface. Those are just table stakes.

So then what's the new differentiation, and what's the thing that they're specializing that's going to separate them out? So I think that's what's going to be intriguing to see what comes.

Yeah, I want to go back to one of the questions about data breaches you had mentioned earlier. At least in the US, it seems to be really common. Are there any examples where a data breach has caused a specific problem for you? Or have you been able to avoid that so far?

So I've had a couple of data breaches where my passwords were leaked somehow. And then I found some fraudulent activity. And I would say, I know that a lot of people don't like the multi-factor authentication. But I would say it's going to become the norm because it helps out. So if you have a situation where you only have a local password on those different commerce sites. That's how they're able to log in.

So I've seen people make fraudulent purchases on my Target account. Same thing with Kohl's, on my Kohl's account, they made some fraudulent purchases. So I had to call the company. Usually I get email confirmation. Well, I didn't do that. So you just have to call and say, “Okay, these are fraudulent. Please cancel these orders. I didn't place that order.” I've not financially had a loss where they charged something. I've been able to quickly say, “This is fraudulent. Please cancel it.” and then exit out of there.

I kind of welcome multi-factor authentication, because then it's very hard to hack that, when you have to get a confirmation from your phone, and things like that. So I think those breaches will now hopefully be remedied by just different ways in which we do have the different multi-factor authentication and having your text message to put your code in, or pressing the button on the phone to kind of confirm that it's you. I think that all becomes, you know, just things that we get used to.

Yeah. My final question, and then we can talk about anything else that you'd like to cover: So with all of these discoveries about how companies are using our data and in ways that we didn't intend and having these breaches and our data not being secure or private, there's been an increase in public distrust of these AI tech companies. And in a way, that's healthy, because it means that we're discovering more about what they're actually doing and who's doing well and who's not.

Yeah.

But when you think about what it would take for you to trust a company with your data and to have that sense of confidence, that you don't need to worry about digging under the hood, because you know that they're behaving ethically.

What is the number one thing that you would want them to do to give you that sense of trust?

Ooh. That's a tough question. It's very tough because they're publicly traded companies who have shareholders, board of directors. Of course, I have a 401k and I invest. So I know most likely my investments, and a mutual fund has many of these tech companies in there, the portfolio companies.

So I'll say the trust is very difficult, because I think kind of it goes to the persona of the leader, right? So you think about Meta or Facebook, like you think about Mark Zuckerberg. You think about Tesla or XAI, that's Elon Musk. Microsoft is Satya. So I think it really depends upon the perception of the CEO. I think those guys are becoming much more celebrities now. Like with Amazon, you have Bezos. But I think it's becoming much more the openness and the candidness of the leader, like the CEO or the person in charge of the company.

And if they're close to the vest about what they're doing or they're doing podcasts - I know there's a lot of congressional hearings. I think it becomes more public perception of those leaders. And do you feel comfortable that that's the right person that's going to lead that company, and that they have some level of trust?

As I'm thinking about this out loud, that's kind of what comes to mind is the brand of the person in charge of it. Because they're the one leading the charge. They're the ones who are kind of making a lot of the final decision, and sign off on these things. So that's top of the mind is just that type of thing because it's so hard. People come and go on some of these companies, and it's hard to keep up with that. But those kind of people are more of the face of the brand. You know, that's what is much more readily apparent to say, “Yeah, that guy makes sense. That guy, I wouldn't trust with my own kids.” So yeah, that's kind of something that comes top of mind.

Well, thank you. So that's all of my standard questions. I appreciate you making time for this interview. Is there anything else that you'd like to say about AI or data or anything else that you'd like to share with our audience?

I think it's an interesting time we're in. I think the challenge is what's real and what's fake? I think the use cases are huge. So I think trying to figure out what are really the use cases on the consumer side and the enterprise side for how to use AI. And also, are we trying to shoehorn AI into existing things? Or are we actually trying to transform the business? In some cases, it may need a total transformation to use AI. But it's very hard to get buy-in to say “We're going to change our whole business to make use of AI.”

So, yeah, I think it's going to be a very interesting couple of years just kind of seeing the winners and the losers. And like, I've seen a lot of people get laid off and I think some of that’s through different AI strategies. So I think it's going to be interesting just to kind of watch and participate in this new world of AI. Because every time I listen to a podcast or interview, it's all about, well, AI this and AI that, and platform that. So I think kind of reading, kind of pulling a couple layers deep to figure out just the truth behind that, and what they're actually using it for, and is there actual benefit? I think that's going to be interesting to see.

Yeah, you mentioned potential job losses, and I know we've seen this already happening – for instance, with writers and artists and musicians and other people that have taken financial hits to their livelihoods as a result of the increase in AI. And there are some concerning projections about that. We're starting to see it happen, also, in the world of software development and in different organizations.

Yeah.

Do you see any of that threat in your area?

I would say not, I've not seen that yet. Just because I think it'd be very tough. You can try to replace. I equate it back to when we offshored. So 20 plus years ago in IT, the company decided to offshore their IT. So they said, “IT is a call center. We're going to outsource this overseas.” They saw the cost and quality go down. Ended up bringing it all back in, because they realized they need to digitize and they want to have a different proposition, a competitive advantage.

So I would say I think we are going to see layoffs within software and things like that. But I feel like it's going to be a very similar thing where you realize that if someone needs to explain this, if it hallucinates and gives you very wrong code, you have to fix it. You're going to need people to come in and take care of that. Or if you want to scale that, you also need people to help scale that. So I think we are going to see kind of a pendulum switch, where you see a bunch of layoffs and a bunch of people get let go, but I think you're going to increasingly see that come back.

We saw that with marketing. There was an article about how companies were laying off, like they were letting go of marketing ad firms to do AI. But they eventually had to bring them back because the quality was so crappy on what they were getting from the AI tool.

So I think, unfortunately, you're going to see a lot of people be used as guinea pigs to kind of look at how AI can take over certain roles. But I think that long term, you're going to see a lot of that come right back, just because companies realize that AI tools can only do so much. Or how much do you trust the AI tool with deploying user-facing code? So yeah, that's my take on it.

Yeah, one of the interesting things, I think, is looking at using AI tools like Copilot for writing code. And there's a certain degree to which it helps the junior people with less experience to get started. Or if someone was very experienced, but it's in a language that they haven't used for a while, for instance, if they forget this Python library to call, that it can be helpful for that.

Yeah.

But there's a question of, if someone is reliant as a junior on these tools, then how do they ever build up the knowledge and the wisdom to be able to operate at this higher level? How do you grow your senior engineers when they aren't going through the learning experiences in development?

Yeah. Well, and I think complexity, it blows up. We need to do more complex problems, and the AI system can't do it. Then you realize, “Well, I have to go now and bring in a high-priced consultant to do this, because my guys can't do it.” It kind of becomes table stakes where it could take these type of jobs and do these types of tasks. But if you need things that are more and more complex, that's where I think you'll be, “Okay, it can't replace these folks.”

The idea that AI is going to replace your entire department, like your engineering or coding department, I think I just don't see that happening. Or what you're doing is so basic, then I would say most likely you have competitors that are just going to do the same thing that you're doing. If you're able to outsource that entire thing to an AI system, then you're doing something very simple, and you have to have a different competitive advantage, hopefully that you have.

Yeah, that's a great perspective. Well, thank you - is there any other thought that you want to add?

No. So like I said, I'm pretty active on LinkedIn. So definitely people can find out about me on LinkedIn. I do write about these things often. So a lot of things I'm talking about now, I usually write about it or comment about it on LinkedIn. I'm pretty active almost daily posting.

Also, I have a newsletter. So if you look at my profile, you'll find a link to my newsletter. So I also write about that more, longer-form. There's interesting stuff that I nerd out on this stuff. I think it's interesting just to watch how our industry is changing and trying to keep tabs on it and sharing lessons I've learned along the way. It's kind of one of my passions. So yeah, and definitely enjoyed the conversation, and love doing things like this.

Well, great. Yeah, it's been a lot of fun. Thank you so much, Aaron!

Interview References and Links

Aaron Wilkerson on LinkedIn

on Substack

Leave a comment


About this interview series and newsletter

This post is part of our AI6P interview series onAI, Software, and Wetware. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools, or are being affected by AI.

And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”:

We want to hear from a diverse pool of people worldwide in a variety of roles. (No technical experience with AI is required.) If you’re interested in being a featured interview guest, anonymous or with credit, please check out our guest FAQ and get in touch!

6 'P's in AI Pods (AI6P) is a 100% reader-supported publication. (No ads, no affiliate links, no paywalls on new posts). All new posts are FREE to read and listen to. To automatically receive new AI6P posts and support our work, consider becoming a subscriber (it’s free)!


Series Credits and References

Audio Sound Effect from Pixabay

Microphone photo by Michal Czyz on Unsplash (contact Michal Czyz on LinkedIn)

Credit to CIPRI (Cultural Intellectual Property Rights Initiative®) for their “3Cs' Rule: Consent. Credit. Compensation©.”

Credit to Beth Spencer for the “Created With Human Intelligence” badge we use to reflect our commitment that all content in these interviews will be human-created:

If you enjoyed this interview, my guest and I would love to have your support via a heart, share, restack, or Note! One-time tips or voluntary donations via paid subscription are always welcome and appreciated, too 😊

Share

Discussion about this episode