AISW #021: Quentin Vandermerwe, Canada-based software product manager 🗣️ (AI, Software, & Wetware interview)

6 Ps in AI Pods (AI6P)

0:00

-42:36

AISW #021: Quentin Vandermerwe, Canada-based software product manager 🗣️ (AI, Software, & Wetware interview)

An interview with Canada-based software product manager Quentin Vandermerwe on his stories of using AI and how he feels about how AI is using people's data and content (audio; 42:36)

Karen Smiley

and

Quentin Vandermerwe

Oct 10, 2024

Introduction - Quentin Vandermerwe interview

This post is part of our 6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.

This interview is available in text and as an audio recording (embedded here in the post, and later in our 6P external podcasts).

Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for reference.

Acronyms and terms used in this interview:

AGI, AI, LLM, ML = Artificial General Intelligence, Artificial Intelligence, Large Language Model, Machine Learning. See the AI Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for more details.
IoT = Internet of Things

Photo of Quentin Vandermerwe. provided by Quentin and used with his permission.

Interview

I’m delighted to welcome Quentin Vandermerwe, author of “Rhythms of Reason” on Substack, as our next guest for “AI, Software, and Wetware”.
Quentin, thank you so much for joining me today! Please tell us about yourself, who you are, and what you do.

Hello, Karen. Thanks for inviting me to this interview. I am Quentin Vandermerwe [he/him]. I’m located in the greater Vancouver area in Canada. I'm kind of a jack-of-all-trades, but I consider myself primarily a product manager, and I've been managing products for almost 20 years. I also dabble in enterprise IT architecture and software development, focusing on embedded and IoT firmware. I like to write about all kinds of topics, hence my Substack presence. In my spare time, I like thinking with IoT projects. I like to enjoy nature, and driving really fast (and that's on the autocross track, not on the public roads).

Those all sound like fun hobbies!
What is your experience with AI, machine learning, and analytics? Have you used it professionally or personally, or studied the technology?

I've encountered AI and machine learning in various places during my career. My first engineering position was as a control systems engineer, where I was designing control algorithms for decision-server systems. One of the techniques for designing and simulating control systems involves building a mathematical model of the system in so-called “state space”. You define the system in terms of a set of linearized differential equations, and then you represent those equations in matrix form, and then you can digitize the matrices. And kind of, at every step in time, apply the matrices to the instantaneous state vector, which gives you the predicted next state, and you can actually correct that state using actual measurements. That's a fairly powerful technique for controlling systems.

As it turns out, those kind of matrix calculations are exactly the kind of calculations used in the neural networks. So when I did my first machine learning course, everything was very familiar, and I was able to reuse my knowledge of algebra. We even used the same tool called MATLAB for the calculations and simulations, although I believe that they now shifted to Python.

My first actual experience with implementing machine learning was in 2002. At the time, spam was a major problem. Everybody's inbox was flooded with spam, and then I came across an article by Paul Graham. He's one of the founders of Y Combinator. The article was called "A Plan For Spam", and he proposed training a simple Bayesian classifier on a corpus of ham and spam, and then use that to automatically classify and flag spam in new emails.

So at the time, I was actually running my own mail server, which is not something I would recommend today. I've been playing around with white lists and black lists and heuristics for identifying and dealing with spam. And as soon as I read Paul's article, I thought, ah, this is going to work. And of course, I had a massive corpus of good emails and spam already. So I spent a couple of hours writing an implementation of a classification algorithm. And I was able to train my classifier, and suddenly my spam problem disappeared. It was just so good at identifying spam.

Of course, the open source community picked up on the same algorithm and tools like Spam Assassin, and started adopting the same way of classifying spam. And if you use Gmail or Outlook or pretty much any email system today, there's probably a Bayesian classifier behind the anti-spam tools.

My next encounter with machine learning was the Netflix prize that they offered in mid-2000s, I believe. So they had a prize for improving their recommended algorithm. And I dabbled a little bit in that, but there were some people who spent a lot of time and effort and computing powers on that. So I didn't have enough time really to compete with the top people in that area.

So then, fast forward a few years. Geoffrey Hinton published his papers on multi-layer neural networks and the successes he had with that. I got very interested, and I actually completed the machine learning course presented by Andrew Ng from Stanford University. And then I did a more advanced course that was presented by Professor Hinton himself at the University of Toronto. So I know the math and the basic techniques we are [using] in modern neural network-based machine learning. But I consider myself an enthusiastic amateur in terms of actual implementation.

I've played around for a while using machine learning for rapid image recognition for a project that I had in mind. But unfortunately, Siemens saw the same opportunity, and they announced a commercial system before I could. That's why you don't quit your day job while you're working on your startup idea!

When public LLMs came along in 2022, I went through a phase of trying to use ChatGPT for everything. But I soon ran into its many limitations. So we'll talk about that later in the interview, I think. You have to be very careful when using the output of ChatGPT and taking at face value what it tells you. So yeah, that summarizes my experience, I think.

That's a pretty impressive and broad set, and shows that you've got deep experience with how the tools have evolved.
Can you share a specific story on how you have used AI or machine learning? And what are your thoughts on how well the AI features of those tools worked for you, or didn’t? What went well and what didn’t go so well?

Yes, I again have quite a few stories. I mostly use LLMs professionally for outlining. It's really good at summarizing the long article, although again, as I mentioned, you have to be very careful and double-check. And if you are summarizing anything important, it may be better to just read the article than to have the machine summarize it for you. But I have a few specific examples of where I've used AI or machine learning.

First example is whenever I have a very simple piece of code to write, I often just give the specification to ChatGPT or one of the other LLMs. I also have the GitHub Copilot plugin. And it often does a decent job of writing simple functions, if I describe the inputs, outputs and functionality in detail.

The second example is I was recently a volunteer tutor for students at a technical college. I basically audited a complete computer science course over the course of a few years, just to keep my brain cells exercised and keep up to date with modern computer science techniques. As the so-called expert, I had to complete the class exercises before everyone else. But I didn't receive any cheat sheets or anything. So I tried, of course, ChatGPT and other LLMs. But I got very mixed results. Using the AI to write the simple C++ class works pretty well. But if you ask it for a class that actually implements any non-trivial functionality or algorithm, you tend to get mediocre, inefficient, or wrong results from that. You have to spend a lot of time cleaning it up.

So the one thing that did work well was the real-time syntax checking for programming errors. It's really nice to have the AI correct your syntax errors as you go, then, rather than have to write your code, as we used to in the old days, and then spend about an hour running compiler passes and fixing one error at a time.

Another example for the same courses, I used the Microsoft voice recognition tools to extract the transcripts of the recorded lectures.

I used an LLM to clean up the transcripts so I could go and do a search on what the lecturer said if I needed to look it up.

Next example, as a product manager, I found AI to be less useful than you would think. It's okay again for generating an outline for a well-known document type. But actually filling that document in with anything more meaningful is often not successful.

The next example is kind of a quirky one. Whenever people are looking for a new position, you want to customize your resume for every job that you are applying for. And doing that manually gets really old, really fast. So I think many people spend some time cobbling together some kind of resume generator.

Back in the early 2000s, I made one using the technologies of the day. I used XML for representing the data in XSL schemas for translating the data into printable format using tools like FOP, Xerces, and Xalan at the time.

Later, I realized that life is too short to wrestle with XSL. It's just a terrible, terrible language to work with. So for a while, I went back to a plain old resume just written in LaTeX and customizing it manually.

Earlier this year, I decided to try automating the thing again. So I defined a YAML schema for representing the resume inputs and cover letter input, and used Mako templates for doing the transformation to text and LaTeX and other formats. So I wanted to have a browser front end where I could just enter the details of a position, you know, the company name, address, position name, and then add some custom text, customization, and select the resume template based on the type of position. And then that would hand off the input to a back-end processor that would generate the YAML input files, and then finally generate the actual PDF and HTML and text resumes and cover letters, using the Mako templates.

So to do all of this, of course, needed quite a bit of coding. So I edited the detailed text specification, I fed that to a few LLMs, including ChatGPT, and asked them to produce the code. And they actually did a pretty good job of that. ChatGPT created the Flask-based Python app for me for the web front end, but it needed a lot of debugging. And the LLMs, I found, frequently hallucinated non-existing functions or used outdated libraries. I had to rewrite about 50% of the code in the end. But I think it saved me several hours of research and just typing to do the project. So that was a fairly interesting exercise.

Next example is I use AI every day for spelling and grammar checking. My fingers type faster than is good for me. I make lots of errors. I use a tool called Language tool as a browser plugin, and plugins into my word processors. And it does a really, really good job of correcting spelling and grammar. It also suggests sentence rephrasing, which I rarely use, because I don't like the way it does it.

Another use case is diagramming. If you want to have a laugh, tell ChatGPT to draw you a diagram of just about everything. It's pretty bad at that. But if you want to diagram specific things, like processes or state machines or even small algorithms, you can ask ChatGPT to do that, but to generate the output in GraphViz input format, rather than trying to create the actual picture. And it's actually pretty good at doing that. I think that's probably just the way that the training was done.

Next example is, I actually, when I was talking with Karen about this interview, I tried using the text-to-speech AI at elevenlabs.io to generate my voice 1. And I spent quite a bit of time training the voice model, probably longer than the actual interview is taking. And the process for doing the translation went very smoothly. But it did make quite a lot of mistakes, so I had to rewrite many of the input sections to get it right. And at the end of the day, we kind of compared Karen's live voice with my generated voice. And I think it fell into the "uncanny valley". It just didn't sound natural enough. But I am very enthusiastic about the technology. I think, for things like narrating articles and so on, it's one of the most promising uses of AI.

And then finally, my niece is studying engineering. And once in a while, she sends over an engineering math problem that she struggles with. I am very lazy. Takes a lot of time to type out equations in Word or any other system. Often just to get the detailed steps, I use an LLM. LLMs famously suck at math, but there's a tool called Wolfram Alpha that's really good at math, but it sucked at input. So if you actually use the two together, the Wolfram Alpha plugin to ChatGPT, then you can get it to solve pretty complex math problems for you, really comprehensively.

So those are some of the ways that I've used AI's.

Wow, that's a really broad set of examples and really interesting. Thank you, Quentin!
I'm really interested in how you combined ChatGPT type input with Wolfram Alpha and got the two to work together. That sounds pretty cool.

Yeah, for Wolfram Alpha, at first I kind of - I tried all kinds of ways which didn't work all that well. I asked ChatGPT to specifically create output for Wolfram Alpha. And it did work, but then they published a plug-in actually for ChatGPT. You can just go into ChatGPT, activate the plug-in, and tell it to use Wolfram Alpha to solve the math.

Ah, nice.

Looks that way.

Yeah, that's really convenient. And I also liked your example about diagramming and getting it to create something that is a useful diagram by giving it a suitable specification. That's pretty cool.

Yeah, as I say, the trick to that is to not have it try to draw the diagram, but just tell it to give you GraphViz output, which is a text format, which it's good at.

You also mentioned using ChatGPT and GitHub Copilot to help you write code. I had similar experiences with hallucinated library functions. When I was writing code for my oxygen data analysis side project, it just made up API calls that simply didn't exist.

Yes. So specifically for this questionnaire, I went through an exercise of using ChatGPT to generate some example code. I'm doing a small embedded project where I need a simple menu user interface. I thought I'd ask the AI to write the C++ module to implement the user menu. Very simple multi-level menu with navigation through WASD keys. And I also asked ChatGPT to generate a test program to demonstrate the use of the module.

So the first module kind of worked, kind of looked okay, but there was a case statement in the middle where it actually read the keys, which was just incredibly, terribly wrong. It had multiple matches to multiple keys and it just didn't work. 2 So I tried to use the AI itself to improve and fix the mess, but it never got it right, so I had to fix that manually. At the end of the day, though, I got working code after about 20 minutes of back and forth, which is not too bad.

However, the quality of the code was really bad - really is repetitive, non-idiomatic, and it looked like something an amateur might write, or somebody who learned C in the 90s and never kept up with C++ and subsequent language developments. Even the 90s guy would have used the function pointer array for looking up the menu actions. But the LLMs just created pages and pages of repeated code inside CASE and nested IF statements, and of course, the hallucination of non-existing libraries, and so on.

I would certainly not use an LLM for production code, as I mentioned before. If you have a small, well-defined function, it's really good at creating that for you, but just don't ask it to write a whole application for you - it will turn out badly at that point in time.

Yeah, I noticed similar things when I was working on that oxygen data analysis program - that the code that it generated was kind of simplistic, and not at all the way that an experienced software developer would have written that block of code.
This reminds me, do you remember when Microsoft Word first claimed that it could generate HTML from a Word document?

Oh, yes. Yeah, that was hilarious.

I had tried it very early on with a page from my personal website, and the HTML what generated was just awful. I mean, it was at least three times bigger than my hand-coded webpage, and it was just way too hard to maintain, and I gave that up really quickly. I'm sure the tools have gotten better.

I did exactly the same thing, yeah, and it was just crazy awful. Yeah, I'm sure the tools are much better these days. If you go into GitHub and you open a random project, you'll probably find pretty terrible code there. I think most of the code that gets published on the Internet is pretty bad. Don't follow modern practices, necessarily. I think if you just let an LLM loose on the Internet, it's going to pick up all the bad examples, together with the good ones. So you need to have a more focused training for an LLM to produce good code.

That's a great point. I just did an interview with a software architect, and she pointed out that the ‘best of the best’ code is not out there publicly available to be scraped.

Yeah.

So what you're getting, when you use a tool that's trained on what's not the best, is you get subpar code in many cases. So she had some other examples where it could be used in a way that would be more effective: for instance, training it on your own code base if you're in a software sustainment activity, and you're coming in cold to a new set of software, and have it explain the code to you - that it could be useful for that. But having it generate code depends a whole lot on the quality of what's in there.

Oh, absolutely, yeah. Several times in my life, I've started working at companies with massive code bases, and it takes months just to figure out what the code is doing. Yes, I think LLM playing on that specific code base would be a godsend to any new developer.

Yeah, it makes sense. So you've shared some really great examples of ways that you've used AI tools. Have you avoided using AI-based tools for some things, or for anything? And can you share an example of when, and why you chose not to use AI in those cases?

So as I mentioned before, I have avoided using AI for product management related work. I just have not found it to be all that useful except for generating generic outlines for documents and so on. I actually did a so-called product management AI course a while ago, and it suggested using the AI for market research. But I've tried that, and it just doesn't pick up everything that it needs to pick up. It picks up a lot of irrelevant information. It tends to be more of a time waster than a time saver.

I've also avoided using AI for high-level software or IT architecture design, because it sucks and it hallucinates. And it will argue with you about sucking and hallucinating. It seems to be more of a hindrance than a help.

And then something that I'm really passionate about, I would never use AI for writing an article, or even something like an email, either personally or professionally. I believe the whole point of writing is to convey your own personal opinions, experiences, and feelings. And so I want my writing to sound like me, not the output of the worst copywriter in the world.

And as we discussed earlier about the software, the same goes for general writing. LLMs were trained on all the writing on the internet. They're going to be generating kind of average copy, which is generic and average-sounding and not very exciting. Many people, including myself, can immediately identify stuff that's written by AI. So I would never use that for my writing.

I've also avoided using AI for generating images for professional use because of the copyright issue. I have no idea where the images it generates come from, what data was used to train it. And I think it's a legal minefield, when we're already starting to see lawsuits. And then that's even aside from the ethical considerations of using images that were not intended for commercial purposes, and not paid for to train the AI.

And finally, I don't use AIs to generate music, and I don't listen to AI-generated music, because it's just terrible.

Yeah, that sounds like a thoughtful combination of practical and ethical reasons for choosing when not to use AI-based tools. And I'm definitely with you on those.
You've alluded to this already: One of the common and growing concerns nowadays is where AI and machine learning systems get the data and content that they train on. Often they're using data that users have put into online systems or published online, and companies are not always transparent about how they intend to use our data when we sign up. For instance, you mentioned a grammar and spelling and writing style tool as one example.
How do you feel about companies that are using data and content for training their AI and ML systems and tools? Should ethical companies get consent from, and compensate, the people whose data they want to use for training?

Yes, absolutely. Today, they should get permission. The default copyrights law is basically that everything that you publish has copyright attached to you, and people are not allowed to just use that. Now, AI is kind of trying to circumvent that by rephrasing and combining many inputs. But I think there's still lots of issues around that and there are going to be lawsuits. And it's just not ethically right to use people's hard work without their permission.

Some people, if you look on Flickr for example, there are people who publish photographs with Creative Commons licenses 3 and they say, I do this photo and I like everybody in the world to use it for free. And that's fine, but I think content needs to be labeled as such before people should be able to freely use it. Otherwise, they need to get permission.

And as I mentioned, the industries, like the book publishing industry, are pretty powerful, right? They managed to shut down Google's book scanning project. So I predict that they are going to be severe limitations on what AI products can do in future. And that's going to inevitably result in price increases, due to royalties that we'll have to pay. And unfortunately, the side effect of this is that data is going to get more difficult to just scrape. So there's going to start being paywalls around any valuable data sources. It's probably interesting to see how that shakes out in the long term.

Yeah, definitely. And you mentioned about the images and copyright being automatic, but some people are putting their pictures out there for instance on Flickr so that other people are allowed to use them. I've got a page on my site, which I call my ethical shoestrings. I have a list of all the sites that I've found so far where people DO say, hey, you're free to use this picture of mine. You just have to give me credit. And there are different Creative Commons licenses that apply to this, where they require attribution or they allow commercial use.
And so I've been going into that, and I deeply appreciate that some people do that, because it gives me a way to use high quality and engaging pictures on my posts, when I'm still working on a shoestring budget and can't really afford right now to pay for pictures. I definitely appreciate that.
And it's one of the reasons I try to keep my posts without a paywall, because I don't want someone to not be able to read and to hear these interviews because they aren't in a financial position to pay for them. And I think it's important to get the information out. But I wouldn't want someone scraping this interview and using it as a source for an AI!
As a person who uses AI-based tools, do you feel like tool providers have been transparent about sharing where the data used for the AI models comes from and whether the original creators did or didn't consent to its use?

I don't see much transparency, from what I've looked at. When you look at ChatGPT, it certainly does not tell you where the data comes from, or any other tool that I've used. So I don't think that they have been very transparent.

Yeah, there's one initiative that I learned about earlier this year called "Fairly Trained"4. It was started by
Ed Newton-Rex
after he left Stability AI. And they are developing these certifications for companies that DO train their AI systems fairly, and using ethically-sourced data, or they get it from producers who they pay directly for creating that content.
And I'm really happy seeing that initiative - I hope it takes off in a much bigger way than it has so far. I think it's still under 20 companies that have been certified as Fairly Trained. And it's, of course, none of the really big ones that we hear about every day that produce some of the tools that we've talked about.
I would like to see that initiative, and other initiatives like it, get more momentum, where people appreciate that it's worthwhile to look for and use tools that are fairly trained and use ethical data sourcing.

Yeah, absolutely.

So when you've worked with building an AI based tool or system, is there anything you can share about where that data came from and how it was obtained?

So the one time I needed to find image training data, specifically, there was a cloud-sourced effort called ImageNet. And that's similar to what you just mentioned. That was an early collaboration by academics and others to create a set of labeled images for AI training. And they knew that those images were going to be used for image training. And I think some of the people who worked on that were even paid for doing the classification work. So that was in my mind an ethical source of data.

And of course, I mentioned my anti-spam tool where I just used my own data.

Yeah, those are great examples. Thank you. That makes sense. On the other hand, as members of the public, there are cases where our personal data or content either may have been used, or have already been used, by AI based tools or system. Do you know of any cases like this that you could share?

So not a lot. The one thing is before this interview, I asked ChatGPT about myself and it came back with a response that looked very suspiciously like a copy of my Substack profile. Yeah, it seems to have scraped Substack for that.

Yeah, that is interesting. I know in Substack, we do have the option to specify that we don't want our newsletter content used for training AI. That might not extend to protecting our profiles from being available to bots for AI based search and other purpose. It's interesting that your profile showed up. I haven't tried searching it to see if any of my newsletter content has been scraped. I do have the setting configured so that my content isn't to be scraped, but as Substack points out, that's only if the bots respect that setting, and not all of them do.

Yes, I actually dove a bit deeper into that. And I, when you invited me to this interview, I realized that I did not actually have a Substack profile set up. So it did not scrape my profile. It actually scraped my content to create a profile. I've now fixed that. I added the profile. But yeah, it was actually scraping the content.

And you're correct, yes, there are settings. But, like in the Douglas Adams novel, you know, the settings are on display in a dark cellar with no stairs at the bottom of a locked filing cabinet, you know, with a sign on the door saying "beware of the leopard". It's really hard to find those settings. They are not surfaced by Substack necessarily. [Substackers: here’s a tip on how to find and change this in your newsletter5.]

Yeah, and the bigger concern, I think, is that there have been some prominent stories lately about bots that simply don't respect those settings. For instance, there's different bots that are used. There are certain spiders that are known, and you can specify as a website administrator that you want to disallow that specific bot. But some of the unethical tool providers are simply using a bot by a different name and not revealing that, and then using that one, saying, well, this one's not disallowed. And then it goes in and scrapes everything.

Yeah.

So that's really, really unethical behavior. And there's not much of a way for Substack or any other platform to keep that. It's like cyber security. Keep coming up with countermeasures to the countermeasures to the countermeasures.
Do you know of any company that you gave your data or content to that made you aware that they might use your info for training AI or machine learning? Or have you been surprised to find out that a company was using your data for AI?

Other than the Substack example that I mentioned, I'm not really aware of any such use. But, you know, every time you go to sign up to a website where they might be collecting data, you have to accept their EULA. And I'm sure I must have agreed to let them use my data for whatever, or to sell my soul to the devil in some of those EULA's. I am not specifically aware of any sites.

Yeah, from what I've been seeing lately, some of these social media sites are the worst about this. People opt in, expecting them to share the information with the friends that they connect to on the social media sites. And then they find out that these sites have used it for purposes beyond that.

Yes.

Facial recognition and automatic tagging and such. I just found out from the software architect about a podcast where the host, Mark Miller, teamed up with a lawyer and they pulled a EULA and went through it in detail, and the lawyer explained about what this doesn't do. And they looked at the points in the EULA that were of concern. Really interesting. I've just bookmarked that one earlier this week 6, and I want to go back and go through some of those things. Pretty interesting.

Yes.

You mentioned selling the soul to the devil, too - I think that was buried in someone's terms and conditions.

That was an April Fool's one. They changed the EULA Fool on April 1. And so if you clicked on it, you sold your soul to the devil!

Another thing, when we talk about use of data - has a company’s use of your personal data and content ever created any issues for you, such as privacy or phishing? And if so, can you give an example?

So I've actually seen that a couple of times when I start a new position and I update my LinkedIn profile. Somehow these scammers get notified of that. They must have some illegitimate subscription to the API or something. And then about a week later, I get a SMS from the so-called CEO of the new company. They always have my phone number and my name right. They tell me that they're stuck in a conference and they need a bunch of gift cards, and ask me to buy the gift cards, SMS them the numbers on the gift cards, and then submit the receipt for reimbursement. So I always, of course - gives me a good laugh every time. But if you look on sites like the r/scams sub-reddit you can see that about once a day, somebody gets scammed by that, and they lose a lot of money. So that's one way.

The other ways are when I search for stuff online, I do a lot of work to block advertising, trackers, I even block it at my router; then I have browser plugins to block tracking. But still, when I search for something strange like "bacon-flavored dental floss", the next day when I go to Amazon, what do you know? There's a recommendation for bacon-flavored dental floss in my recommendations there. There's some people definitely use sneaky ways of using your data in ways that you don't intend it to be used.

One thing that comes up in the tech community once in a while, and I don't have confirmation of that, but sometimes you have a conversation with your friend, again about something strange, like light saber chopsticks. And then you go to Amazon, you've never typed those words into any system. But suddenly you get recommended that on Amazon or wherever. So they are always questions. Your phone, does it have an always on microphone that's spying on you? There are different opinions on that.

Yeah, many people have reported similar anecdotes about websites that seem to be using for ads some information about personal interests that were never entered or expressed online.
I just had an interview guest who mentioned de-Alexa-ing her house to get rid of the devices that are always listening, by design. And whether our phones listen too is a potential nightmare that would be very hard to block and still have the phones be useful.

Oh, absolutely. Yes.

Not surprisingly, in light of these and other ethical concerns, public distrust of AI and tech companies has been growing. What do you think is THE most important thing that AI companies need to do to earn and keep your trust? And do you have specific ideas on how they can do that?

Yeah, a thought I alluded to before - I think they need to surface the source of their data. If you go to Google Image Search, it lets you select images by the license type. So you can select commercial images, and then hopefully if you use those images, you would hunt down the originator and pay them royalties. Or you can select Creative Commons licenses, and there are various flavors of that - commercial use or personal use.

So Google already gives you, in terms of images at least, a lot of options for selecting data that fits the way that you want to use it. And I think ethical AI companies should start doing the same thing. They should allow people to license their data, similar to the Creative Commons licenses for photos and other things. And they should, for end users, they should be able to specify what license they want for the data that's used as inputs to what the LLM spits out at them. And of course, that means that every LLM response should ideally come with a list of source references7.

Yeah, there was an initiative I just saw the other day relating to music, which you mentioned earlier, that they are giving music creators a way to opt in to have their music used in the system. And it's one of the companies (Musical AI) that was certified earlier this year as Fairly Trained 8. And they're now setting up a way for music creators to be credited and compensated and to have their music used, and it gives them an additional revenue stream. And I think that's a really cool initiative.
When with LLM's or anything where you're using - for instance, for writing code, it's not just a matter of the ethics, but also the legal liability that someone could be incurring by not being restrictive as far as the sources that they use. And you mentioned that earlier.

Yeah.

This has been a really fun discussion! Is there anything else you'd like to share with our audience?

Karen linked my Substack profile in the interview, or will be linking it. I've been publishing some stories and articles there. I'm working on, fairly advanced, one mathematical article series right now. And it's going slowly, but I hope it would be worthwhile readings. If you would like to subscribe, you're welcome to.

And then just as a final thought, I would like to mention that one thing that I find missing, in terms of the focus of LLM's, is: we can ask questions and you can get answers from LLM's. But what we are missing and what many companies are trying to do, without much success, is to have intelligent agents. So I want to give my AI some tasks and just tell them, go and do that. And that would probably involve creating and answering multiple prompts.

For example, I want to ask AI to create a working website based on a verbal description of the style and text input or have it prompt me for context. And there are websites who purport to do that, but they are not very good at it yet. I think there's a lot of work to be done there.

Another task that I do all the time is I spin up a new container or a virtual machine - and just to create the specification for that virtual machine. It's a tedious task and AI should be able to just complete all the steps to do that kind of thing, in my mind.

The other thing, again, talking about email - machine learning algorithms are good at filtering spam, but nobody has really managed to create a good product that would monitor my inbox and watch what I do with my content. Like filing stuff in specific folders from specific sources, or archiving stuff, or marking stuff for follow up. Just watch me for a while, see what I do with my email, and then automatically do that from now on. And nobody's gotten that right yet.

And I want - the other main thing is, when I'm out and about at 11pm and I'm hungry, I have yet to find any AI or assistant that can show me the closest sushi restaurant at 11pm that's still open. It sounds like a simple thing to do, but there's actually quite a few steps involved in that. And so it's again a task-based action that nobody has successfully programmed an AI to do yet.

So those kind of things are what I call my Turing Test for successful and useful AI. And I hope that we will soon get the technology to be able to do that for us.

Yeah, I agree. Those all sound like really useful and practical applications that aren't interfering with use of human intelligence. With AI, sometimes people say, you know, the A doesn't have to stand for artificial. It could stand for Augmentation. It could stand for Automation. And those are some of the more practical and useful applications, not pursuing AGI [Artificial General Intelligence] necessarily.
I know that some of the things that you've talked about, I've also been wishing for. I looked into the automated website creation. One of the sites that I use to create a free logo for my personal newsletter, Zarla, has a paid feature for generating a website based on prompts. But I didn't try it out, so I don't know how well it works. And I haven't really read any praise for it online. So I'm guessing it's still a work in progress.
I agree with you completely. I really don't understand why an AI-savvy email provider, that's already using machine learning for other purposes, can't automatically filter and classify my emails based on my patterns of doing this, instead of me having to manually define and maintain the filters. They're already mining our emails for their own purposes. So it'd be nice if they used those capabilities for our benefit.

Yes, absolutely. Exactly.

Hopefully as the technology gets better and companies adapt to doing things that people find to actually be of value, we'll start to see more of those capabilities come through.

Yeah, in terms of that website generator, I've seen a few that do ask you the questions and you can spin up a reasonable website. But in the HTML that I generate, it's like I mentioned what Word was like 20 years ago. It's just terrible and totally unmaintainable.

Yeah, and those tools probably will get better, but it seems like we're not there yet.

Yeah.

Any other final thoughts?

No, I think that's it. Thank you again so much. It was a really fun conversation.

It WAS fun! Quentin, thank you so much for joining our interview series. It's really been great learning about all the different things that you're doing with artificial intelligence, and what you'd like AI to do for you, and how you decide about using your human intelligence for some things, and how you feel about companies using our data for AI. So thank you very much.

You’re welcome.

Interview Guest References

Quentin Vandermerwe on LinkedIn

Quentin Vandermerwe on Substack (Rhythms of Reason)

About this interview series and newsletter

This post is part of our 2024 interview series on “AI, Software, and Wetware”. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools or being affected by AI.

And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I don’t use AI”:

"But I don't use AI": 8 Sets of Examples of Everyday AI, Everywhere

Karen Smiley

September 17, 2024

Read full story

We want to hear from a diverse pool of people worldwide in a variety of roles. If you’re interested in being a featured interview guest (anonymous or with credit), please get in touch!

6 'P's in AI Pods is a 100% reader-supported publication. All new posts are FREE to read (and listen to). To automatically receive new 6P posts and support our work, consider becoming a subscriber (free)! (Want to subscribe to only the People section for these interviews? Here’s how to manage sections.)

Enjoyed this interview? Great! Voluntary donations via paid subscriptions are cool; one-time tips are appreciated; and shares, hearts, comments, and restacks are all awesome 😊

Share 6 'P's in AI Pods

Credits

Audio Sound Effect from Pixabay

AI, Software, & Wetware Interview List

Karen Smiley

August 5, 2024

Read full story

AI Glossary and References

Karen Smiley

August 10, 2024

Read full story

AI Fundamentals #01: What Is "AI"?

Karen Smiley

September 16, 2024

Read full story

End Notes

For insights on ‘voice cloning’ tools like Elevenlabs, see our May 9 post “Say what? Adding audio voiceovers with AI, ethically”

Say what? Adding audio voiceovers with AI, ethically 🗣️

Karen Smiley

May 10, 2024

Read full story

Some of the bad C++ menu handling code generated by LLM in the first iteration:

§   switch (ch)
§   {
§   case 'a':
§   case 'A':
§   case 'D':
§   case 68:
·          return KEY_LEFT;
§   case 'w':
§   case 'W':
§   case 'A': // Arrow key codes for some terminals
                                // QV – really?? How about the duplicate case?
§   case 65:
·          return KEY_UP;
§   case 's':
§   case 'S':
§   case 'B':
§   case 66:
·          return KEY_DOWN;
§   case 'd':
§   case 'D':
§   case 'C':
§   case 67:
·          return KEY_RIGHT;
§   case '\n':
§   case 10:
·          return KEY_ENTER;
§   default:
·          return 0;
§   }

For info on the types of Creative Commons licenses, see this page: https://creativecommons.org/share-your-work/cclicenses/

List of models certified as Fairly Trained: https://www.fairlytrained.org/certified-models

Here’s how to configure your Substack newsletter to notify third party bots that you opt out of scraping for training AI:

The Substack AI and search options are newsletter-specific - for instance, here’s what I have configured at karensmiley.substack.com/publish/settings#pub-details.

“That’s in my EULA?”, Mark Miller and Joel MacMull, 10-episode podcast. They cover:

This interview was recorded on August 19. On September 18, Quentin discovered that Google Gemini now shows its sources at the bottom of a response, and includes a link to "report legal issues" - just as he recommended in this interview 👍🏼

See our report on SOMMS.AI, now Musical AI, and their Fairly Trained certification: