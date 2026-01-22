Introduction - Dr. Salman Azhar

This article features an audio interview with Dr. Salman Azhar, a 🇺🇸 USA-based venture capitalist, angel investor, and entrepreneur. Dr. Azhar has unique access in the venture world after investing in 250+ startups over 20+ years, and generating a positive return with his first 12 exits. He specializes in buying pre-IPO unicorns’ shares from early investors. Dr. Azhar is a General Partner of Azimuth Opportunity Fund and Executive in Residence at Duke University’s Fuqua School of Business.

We discuss:

his experiences as a venture capitalist and angel investor, and how his team uses “VC GPT” to streamline their operations

how the Bay Area Chess group he helped to found has now coached nearly 50,000 kids in chess and better decision-making

relevance to AI of his middle brother’s long-ago advice to check his work with multiple methods whenever possible

being alert for biases (gender and other) in VC funding evaluations

how Uber and YouTube exemplified ‘seeking forgiveness instead of permission’ in their early startup days

why he is investing in AI-enabled solutions, not AI companies

and more. Check it out and let us know what you think!

This post is part of our AI6P interview series on "AI, Software, and Wetware". Our guests share their experiences with using AI, and how they feel about AI using their data and content.

This interview is available as an audio recording (embedded here in the post, and later in our AI6P external podcasts). This post includes the full, human-edited transcript.

Note: In this article series, "AI" means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI.

Share 6 'P's in AI Pods (AI6P)

Interview - Dr. Salman Azhar

Karen: I am delighted to welcome Salman Azhar as my guest for “AI, Software, and Wetware”. Salman, thank you so much for joining me today! Please tell us about yourself, who you are, and what you do.

Salman: It’s an absolute pleasure to be here. I still remember working with you in the 20th century, if I recall correctly. That hopefully doesn’t date both of us! But we were very young. We were like, uh, five-year-olds at that time, working in a high- tech environment! That’s an exaggeration, but we were a little bit older than five.

It always gives me a lot of anxiety when people ask me what do I do? Because I do several things. And the two main things I do is that I am a VC, or venture capitalist, with Azimuth Ventures, which is my own firm, along with my partner David Frazee. I’m also an Executive in Residence at Duke, where my job is mostly training the graduate students at Fuqua School of Business, which is Duke’s school of business. So those are the two main things that take up my time.

And then I also have a volunteer role at Bay Area Chess, which is a nonprofit chess organization I created back in the early 2000s to bring chess for better decision-making in schools in the San Francisco Bay area.

Karen: It’s great that you’re able to sustain your involvement with the chess organization, even though you’re no longer on the West Coast.

Salman: Yes, I can do a lot of work remotely. Especially after COVID, it becomes pretty normal for people to work out of home. We do have an office for Bay Area Chess, but most of us work remotely, other than when we are actually in classrooms. And that’s usually handled by coaches now. Of all the things I’ve done, I’m really proud of that, because we have tens of thousands, probably close to 50,000 children that have learned chess. And that has helped keep them off the streets, off video games, and just taught them better decision-making.

It hasn’t brought me a lot of money, hardly any money. In fact, I put in money to make it work. But it does give me a lot of satisfaction.

Karen: It’s a great initiative. Thank you for sharing that introduction. Can you tell us a little bit about your level of experience with AI and machine learning and analytics? You’ve definitely studied the technology professionally. I’m wondering how much you’ve studied it, or if you use it professionally or personally.

Salman: I was first exposed to AI when I was a graduate student in computer science at Duke University, back in the late 80s and early 90s. At that time there wasn’t much hype around AI. It was just another field. Since then, I’ve used AI off and on. These days it has become a household word. And a lot of what people think AI is a version of AI called generative AI. You can think of it as a fancy auto-complete on steroids. Like if you’re typing Google search, it will auto-complete. But it has a lot more computational power that goes into making more accurate predictions. For example, if I am typing, I have a habit of saying “Best wishes to you and your family”, it would autofill that for me. And it would autofill something else for somebody else who has a habit of typing something else.

So it has a lot more training that we have been able to get better predictions. And that’s really the reason for its popularity. But at the heart of it, it is a very, very advanced auto-complete or prediction system.

Karen: Can you talk about the ways that you have used it, either personally or professionally?

Salman: So lately I have used most of the LLM, or large language models, such as ChatGPT and Google Gemini to do research. In my VC world, we are trying to get deal flow. And what used to take us days to figure out and Google and so on, we can now actually have a conversation. We have a standard script that we go through to decide whether a company is investible or not at the very top. And we would reject a company based on what we learn from ChatGPT or VC GPT. But we would never invest in a company just based on that. Before considering investment or before actually deploying funds in the company, we would do much deeper research. That usually takes days and days.

What it has really helped us do is accelerate a lot of our work and also save a lot of money in terms of the resources, particularly human resources, that we would spend otherwise. For example, a company our size would need about three to four times as many people in order to accomplish the level of work that we are doing. But because we have generative AI, and we have a script that we can use to get information about the companies before investing in them, really saves us that time. And we are able to function with effectively about five full-time equivalents as opposed to 20 full-time equivalents.

Karen: So would you say that that is allowing you to look at more prospects, more companies that you might want to consider investing in than you would’ve been able to before? Or how does it affect the way that you do your business?

Salman: We are able to do it with less staff or fewer people. Our overall fund deployment is right now limited by the capital that we have available to deploy. But it could easily have been the other way around. The restriction is really that we can’t hire enough qualified people. Either it has shifted the problem to capital being the most important constraint, or you can also look at it as that we are spending much less money in terms of getting as far.

And it varies. Basically, when you’re looking at an experienced person, like 20 to 30 years of experience, it probably gives them 50% more productivity or faster. But when you’re looking at more junior resources, we are looking at 200 to 300% more effectiveness. Because we don’t have to train them, and they don’t have to know as much before they start doing it.

So somebody with your experience may be easily able to say, “Okay, this is not worth even looking at.” So it doesn’t give you that level of productivity increase or efficiency increase. But for a junior person, it can mean 300, 400% productivity increase, which is actually interesting because that does make the employment a little bit challenging. A lot of the junior jobs are easier to replace than the senior jobs, as far as generative AI and its impact is concerned.

Karen: Do you find that using the VC GPT tool helps your juniors to learn faster than they would otherwise?

Salman: Definitely. So we can just give them a script and they can just go off without much training. And they can learn on the fly and if they have any questions, they can actually ask VC GPT, you know, “What does liquidation preference mean?” In previous times, we’d have to go through a whole training process where we’d have to teach them the terminology and so on. And this way, we teach them more on how to use VC GPT, and then just set them loose.

Karen: That’s a good overview of what VC GPT does well. What would you say it doesn’t do well?

Salman: So this is a problem with just generative AI in general is that it is not very accurate at times, and it doesn’t tell you when it is not accurate. I wish it had some kind of things like, “I expect this company would go public in two years +/-6 months with 95% confidence.” What it does is it will give you, sometimes, a range that “This company may go public in 12 to 16 months”. Or it may give you one thing like, “This company is likely to go public in one year”. But it doesn’t give you the confidence interval, or the range, of the best case to the worst case that can happen.

And that’s where it really lacks: both the unknown accuracy and also how confident it is in its answer. For example, if you ask me a question and I do not know the answer, I would tell you, “Karen, I’m sorry I don’t know the answer, but I think the answer is about a hundred dollars, right? But I really don’t know.” But VC GPT and other large language models would say “The answer is a hundred dollars.” And they won’t tell you whether they know for sure, whether they’re just guessing. And that’s really the reason why, before deploying capitals, we have to spend a lot of manual time doing deeper research.

Karen: Have you had any issues with references that it gave you not turning out to be real?

Salman: No, I personally haven’t had that issue, but I’ve heard other people have had that issue. I have had more of an issue that it hasn’t consulted what I would have consulted in order to get an answer. So more like missing references rather than imaginary references. We do ask our analysts to check the references when they look at a diligence report or any memo that they write.

And that’s, by the way, a very important point, that rather than just take what it says at face value, it usually gives you a link where you can go check at the original source. And sometimes that source is not a valid source. There are lots of internet personalities. Like they have made up all these pages and they have boosted their pages and stuff. And you would have this person who may not even exist. Let’s say John Smith, who is one of the top VCs ever because he just has 20 pages that say that. And all of a sudden it picks it up. So we do ask people, and we do also check the references ourselves that this is not a fake profile, or this is not a fake company.

Karen: Some of the people that I’ve talked to have mentioned using different LLMs because they are more likely to give you accurate sources, or to provide the direct links to the sources and to be more verifiable. I’ve heard some people say that Perplexity is better for that, for instance, than ChatGPT.

Salman: Yeah. Our principal is a big fan of Perplexity. We are living in a different world if we are using chat GPT and Google Gemini. But that’s really great that we have different people in the company that use different sources. And if they converge on something, it is more likely to be correct than just one source. So it is always a good idea to have multiple sources. Personally, if I had more time, I would do more than Gemini and ChatGPT. I can’t really check all the different things. Some people would say Anthropic is the way to go. So it’s really good that people use different sources.

Karen: I just did some data analysis on some of the data from these interviews, actually -- the first 80 interviews that were completed. I had a conference presentation a few weeks ago. I started out by using just ChatGPT. And then I said, “Just for fun, let me just compare it to Claude.” And then, “Hmm, these aren’t the same.” I tried NotebookLM, and they weren’t the same. So it’s definitely something that I’m digging into more, to figure out why the answers were so different. Just asking them about someone’s general sentiment about AI, and I got widely varying answers, and I really hadn’t expected that.

Salman: Yeah, my middle brother had a very good observation, a long time ago when I was growing up. I was probably a teenager or maybe even less than that. He always encouraged me to check my work by arriving at the answer by two different methods. And if those two answers are identical, then I have verified that. Let’s say it’s a math problem. If I am able to prove it this way and another way, that means that that proof is more valid than if I just prove it, because I can make a mistake. Same thing with calculations and stuff. And this applies to LLM more than anything else, that you need to have those cross-references and check.

LLMs are very much like asking people, right? I may ask you, Karen, a question and you will give me an answer. And then I ask somebody else, Katrina, the question. And if she gives me the same answer, it’s more likely that both of you are correct. And if you are different, then we need to basically ask somebody else, and see, maybe there is something that somebody is missing there.

Karen: That’s a good observation. Sounds like you’re making good use of AI for your venture capital work. Are there any situations where you deliberately avoid using AI?

Salman: Before final deployment of funds. I think that that is the time where -- the formal word for that is AI hallucinating. Just like people hallucinate, AI can hallucinate. So that is the time that we really go deep into financial analysis or product analysis and things like that. And any time before making a critical decision.

Karen: You mentioned asking me something and then asking Katrina something, and if we have different answers. In a way, we are each like a large language model. We’re all dependent on our training data and what we’ve been exposed to, and our experiences, and what we’ve seen, and the biases that we picked up from our environments, perhaps without knowing it. Large language models, I think, are very much like that. So one of the questions that comes up frequently is, where do these large language models get the data that they use for training? And how do we know whether or not it was obtained with consent, if it was analyzed in a way that looked out for biases, things like that? So I’m curious what your thoughts are about that, about how they are acquiring their data?

Salman: Yeah. The analogy between a person and an LLM – by LLM, I mean something like ChatGPT – is actually very valid because people, just like ChatGPT or NotebookLM, or any other generative AI, have biases based on what data they have fed for training. So if somebody has grown up in a very racist environment, that’s what their training is, and that’s what their bias is going to be. Or a sexist environment. And that’s the same thing with the training data. Ultimately, it all depends on what training data is fed, whether that training data is accurate, whether it has any biases, how much of that training data is fed. Because you just can’t feed a little bit of training data and expect it to come up with answers.

Ultimately, in order for us to get over those biases, we have to really make sure that the data that we are using to train is free of biases. And that is a problem that most of these companies don’t have much of an incentive to solve. Because already it costs a lot for training. And for them to have an extra check and balance is going to be harder.

So what they do is they usually put guardrails, which basically says, “Okay, these are the things that you shouldn’t be talking about, and these are the things you should be talking about.” So for example, if you have anything related to healthcare, you may have an AI agent that takes your call. But that AI agent would never interpret any test results for you, or answer any medical questions, or tell you anything about what medicines you should be taking and not taking, and things like that. So that is usually left for the doctor to interpret.

So I think the similar aspect is important, regardless of what the application is, to have more accurate and less biased data.

Karen: From the times that you’ve been using the VC GPT for analyzing data, do you get a sense of whether it reflects historical biases, for instance, in who traditionally has gotten venture capital funding in the past, and tending to perpetuate it? From knowing you, I’m guessing that’s something that you’re sensitive to, and that’s why I’m curious if you’ve noticed anything there.

Salman: Yeah, so I haven’t looked for anything there. I’m sensitive to that. And that’s basically what our final manual checks before deployment are. The numbers, for example, on gender bias in the VC industry are just crazy. I don’t know what the latest numbers are, but at one time it was like only 3% of women founders got funded compared to 97% of male founders. Clearly, regardless of how sexist you are, you can’t say that number is where it should be, right? Even if you are sexist out of the hell, you should have better than 3%, at least.

So we look out for bias. And there are also other biases, in terms of the field that these companies are in, where their head office is located. We have a lot of investments in Latin America. A lot of people in America think Latin America is basically what they say in Breaking Bad and Better Call Saul, just all these drug lords. And probably people in Latin America think all Americans are basically drug addicts. I don’t know. So anytime we are kind of removed from something, we judge, we have more biases about that. So we watch out for those kinds of biases, especially when we are getting deep into diligence and doing our analysis manually.

Karen: Good to hear. So another question that comes up with training data is whether the companies that are obtaining the data have gotten the consent of the people. Or if they’re using, for instance, copyrighted materials that they don’t have the right to. Or there’s something that’s posted on a website, but it’s under a certain set of terms and conditions that require the creator to be credited. And in many cases, the consent isn’t happening, the credit isn’t happening; and in most cases, there’s no compensation to the people whose work they’re pulling into the training, and in some cases using to compete with them. One example is in the music area. So I’m wondering what your thoughts are about that. Is it for the greater good for them to be able to take that data with impunity? Or do you feel that consent, credit, and compensation should be the expectation?

Salman: “For the greater good” sounds so imperialistic. It kind of reminds me of Star Wars or something like that. Or even some of the more recent past, where a country would colonize another country “for the greater good” of that country.

So it is definitely not for greater good. It’s really for profitability. This is an unfortunate thing about tech startups at least, is that they have a tendency to seek forgiveness as opposed to permission. And sometimes even when forgiveness is not achieved, they would rather litigate or settle that litigation.

Uber is a classic example. It was banned from picking and dropping up to the airport. And they told the drivers that, “Look, we’ll pay your fines. Just go and pick and drop at the airport.”

And YouTube beat out Google Video. Some people may not be old enough to remember that Google Video and YouTube went head-to-head. And who would you expect to win that battle? Google Video, right? But YouTube won and ultimately Google ended up acquiring YouTube. And the reason YouTube won is because Google Video was very conscientious about piracy and YouTube wasn’t. YouTube basically just allowed everything. So that’s where people went to watch pirated movies, et cetera, et cetera.

And that basically gave them enough user traction that Google Video lost out and then Google acquired YouTube. And then it got to a certain point where it had to clamp down on the piracy and stuff. Because smaller companies, nobody really would chase them and sue them. But when you are a multi-trillion-dollar company, like Google is now, they become more defensive. But when you’re a small company that’s a few million dollars’ worth, or a few hundred million dollars’ worth, they usually take chances. And that’s what we are seeing here.

Actually we are seeing that even in bigger companies, like Gemini or Microsoft’s Copilot and stuff, that they’re using data without permission. And they would probably throw people some bone in order to settle.

It’s not ‘greater good’. It’s just so that they can win faster than the others. Because there’s going to be a lot of blood on the streets when the AI battle is done. It has to be a multi-trillion-dollar market in order for it to justify the valuations and the amount of investment that is going into AI.

Karen: The valuations are ridiculous, I think, when you look at the value that’s been delivered. And especially for some of the newer startups, they have no products and haven’t even announced a roadmap, and yet they’ve gotten these tremendous investments. It’s really difficult, I think, for people who aren’t in the venture capital world, and maybe for you as well, to understand and to justify.

Salman: Yeah, so we don’t chase fads. We are not investing in AI these days. We are investing in AI-enabled solutions, but not AI itself. If a company is using AI to solve a problem more efficiently or more accurately, that is when we would invest in a company, but not otherwise.

For example, we have an American company that does a credit rating for developing countries where there are a lot of unbanked customers. So you don’t have FICO and all these scores. And they use AI and data analytics in order to give more accurate predictions. That’s where we would invest. But we won’t invest in OpenAI or anything like that.

OpenAI may be a great investment right now. But it may not be. And I don’t know who the winners and losers are. It reminds me of the dotcom era, which again, a lot of our audience would be too young to remember. But if you look at all the dotcom companies combined, they had predicted the total market share is going to be something like a hundred thousand percent, where it physically can be only a hundred percent. So basically, the market share that everybody had predicted on the average was about a hundred times higher than it ended up being. And there were few big winners, like Google and so on. But there were also a lot of losers that people lost a lot of money in. So, just because we can’t predict who is going to win and who’s going to lose, we say “We are not playing this game. Somebody else can.”

Karen: Makes sense. And maybe it’s due to our shared background, but I think we have similar ideas about the most useful AI applications not necessarily being generative, and really focusing on solving a very specific problem that is worth solving. And I think there’s two parts to that. One, is it worth solving? And two is, what’s the right technique to solve it with? And that’s where I like to put, at least, not my money on the scale that you are, but where I like to put my energy.

Salman: Exactly. No, it makes sense. And if you are looking for an approximate solution just to get started or just see if a problem is solvable or worth solving, then it makes sense.

But people may have heard of neural nets. The issue with neural nets is that it’s like a black box. It will give you an answer. You would not know whether that answer is correct or not. And even if you find out that that answer is incorrect, you wouldn’t know how to fix the neural net to make it correct, because it is such a mystery. It is very much like a brain being a mystery.

I see people that can’t do simple math. I had one seller turn down an offer for a 45% discount and accept a discount of $50 per hundred dollars, which is 50% discount, a much higher discount. I’m sorry to kind of admit that. It’s probably unethical of me. But it’s even worse that it’s very stupid of people who can’t do simple math. You would never know why the neural net cannot do that simple math. And you wouldn’t even know whether it is incorrect or correct whenever it gives you an answer.

Karen: Yeah. A lot of the guests that I’ve talked to have told me about stories where they’ve tried using the LLMs for math and making timing and making plans, and found that it was quite limited. Your story just reminded me of, if you remember, when they tried to introduce the one third of a pound burger at one of the fast-food chains, and it flopped because people thought 1/3 was less than 1/4 pound.

Salman: Oh, wow. Yeah, exactly. So that’s the kind of math challenges that people do, because four is greater than three.

Karen: What you got into with the neural nets is a point on explainability, and that’s one thing that I think is in favor of simpler models. Because it may not be a fancy neural network, but if a simpler model can be understood, like a decision tree, you know how it came to the recommendation that it did, you can see it, you can adjust it, and it runs quickly. It doesn’t consume a lot of CPUs, doesn’t create a lot of CO2 emissions. In many cases, that can be the better solution. And it can be much more efficient to run on an embedded system, for instance. It’s the point of looking at how you choose the right solution, as opposed to running around with your AI tool, looking for a nail to hammer in with it.

Salman: And if I recall, when we first met, you were working on embedded systems, so you would know a thing or two about that!

Karen: I was, yep. That was my role there. That was a fun job.

Salman: Yeah. You mentioned the word explainability. I remember in the 2010s, that was a big push for Explainable AI. In fact, XAI at that time was not a company, it was ‘explainable AI’. But I see that has taken a back seat after the generative AI popularity. And right now, no, not many people are talking about explainable AI or why an answer should be what it is.

Karen: There’s some movement towards trying to introduce reasoning or what they call chain of thought, and trying to have a way for the models to explain how they came up with the answers they did. The question always is, how accurate is that?

Salman: Yeah. Or it’s like asking a five-year-old to explain why they want candy!

Karen: That’s a good analogy, thanks! So with the tools that you’ve used, for instance, your VC GPT, do you have a sense that the people who built VC GPT have been transparent with you about what data was used for training it, where that data came from? And as you mentioned, whether or not any biases have been considered?

Salman: I haven’t really looked at all the user agreements and stuff, but maybe they have. But it’s not something that I see much transparency on. But I do see transparency when it gives the links of its data sources. So that’s where I think the transparency comes in.

Bigger question is that I would like some transparency on what data sources it hasn’t considered. For example, PitchBook or Crunchbase as some data sources. I would love to know really what, of that, it has considered. And the reason I’m saying that I would love to know is because it may be that it has considered it, and it doesn’t want to report it because it hasn’t legally gotten permission to do that. I would just like to know, so that I don’t have to duplicate my work across different platforms and stuff.

Karen: Yeah, I was doing some investigations last year into different companies that were working in generative AI for music, and I ended up using PitchBook quite a bit to try to figure out where they were based, and who was backing them, and what their competitor situation was, and things like that. So I found it very useful, but not with a paid license. I only went as far as I could with the free account, and that was it.

Salman: Yeah. PitchBook itself is an issue because it sometimes has outdated information and sometimes just has plain incorrect information, because it relies a lot on self-reporting. Sometimes people can trick the system, or PitchBook doesn’t take good precautions. If you look me up, for example, in PitchBook, you wouldn’t find a lot of information about me because I have invested through different company names. And so I don’t have a name. The company XYZ would have a name, so you wouldn’t be able to trace that back to me.

So you cannot sometimes tell which one of these companies has Karen Smiley invested in. But you can probably tell ABC Inc. or DEF corporation, you will have that. So there’s a lot of indirection in PitchBook that makes it a little bit difficult to keep track of as well.

Karen: That’s one of the questions that come up when you think about, what data sources did they use? For instance, there’s been talk about using Reddit. And then once Reddit became known as a source that these tools were using, there was the incident about, if you remember, the recommendation to put glue on pizza to hold the cheese on. That was a sarcastic comment, but it went viral. And then people started piling onto that Reddit thread trying to get other unrelated knowledge into the large language models. So then the question of the reliability of the source, knowing where it came from, would be certainly valuable to someone who’s trying to assess the quality of the work. But there’s sources that are directly cited, and then there are just the more general sources that were used. For instance, as you mentioned, what percentage of founders are male versus female? And then, is there a way to un-bias that?

Salman: Yeah, no. So definitely that is a problem with AI in general. In fact, even before AI became popular, I was using some data analytics to do sentiment analysis for Toyota Motor Europe. And one of the tweets that somebody made is, “ I hate my ex-wife because she stole my Toyota truck.” Our system interpreted that as a negative sentiment toward the Toyota truck, not the ex-wife. And this is basically where a lot of AI can go wrong. And we had to give it feedback, which is sometimes called supervised learning, where we said, “Okay, this is actually a sentiment towards ex-wife”. And we in fact hard-coded it so anytime ex-wife is mentioned, or ex-spouse or ex of any kind is mentioned, all your sentiment is directed towards the ex, not on any of the cars that Toyota or anybody else is manufacturing. Because when you have the sentiment level of an ex versus anything else in the same sentence, the ex is going to be your dominant thing.

Karen: Yeah. Yeah, in that particular case, he was probably very fond of his Toyota truck, right?

Salman: Exactly. If he weren’t fond of his Toyota truck, he would be actually happy. “I love that my ex-wife stole my Toyota truck instead of my Nissan truck.”

Karen: So we think about data being used in data sources that are pulled into large language models. One question that comes up for people is concern about data privacy of their own information, whether it’s public information that they posted on websites, or private information from when they enrolled with, say, Netflix and put in some information, and that information then being sold off to data brokers and used for other purposes. Or just whether some personal information has been sold. I’m wondering if you know of any cases where your information has been used by an AI-based tool or system?

Salman: Yeah, so this is even before GenAI and the AI revolution. There are all these calling lists and email lists and stuff like that. And I still get a lot of spam, both on my phone and email that’s related to that. And that problem basically has just become more pervasive as time has gone by, as tools and all these data brokers, a lot of them nefarious, would get personal information. Sometimes that information is inaccurate. In fact, I know of people who have gotten background checks and lost job offers because of incorrect information from data brokers like Spokeo and things like that.

So part of the reason data brokers have a lot of incorrect information is because it’s very difficult to identify people. So for example, if you just look at name like Salman Azhar, which is pretty unusual in the US, I know of at least three other Salman Azhars. And how do you figure out who is who in that?

In fact, I have a very interesting kind of situation. I was flying back into the US and I was stopped by the Customs and Border Control because they thought I was a deadbeat father. And the reason they thought I was a deadbeat father is because there was a Salman Azhar in the same city, San Jose, who was a deadbeat. And they could not believe that there can be two Salman Azhars in San Jose. And that’s the same challenge that data brokers have.

I, by the way, knew exactly what was going on, because a few years ago, probably about five years ago, my wife received a call from a lady who claimed to be the mother of my daughter that my wife didn’t know existed. I was about to board a plane from Indianapolis back to San Jose. And she basically said, “Oh, a lady called and she claimed to be the mother of your daughter.” I said, “Well, I don’t know, that can’t be true.” And she said, “Oh, we’ll talk about it when you come.” This was before you could make calls and text messages in the planes and stuff. So I had the most difficult 4-hour flight, on an airline that no longer exists, ATA, to San Jose.

Finally we decompressed and there were so many similarities. This guy was a guest lecturer at Stanford at that time. I was a guest lecturer at Stanford. He had a PhD. I had a PhD. And then his PhD was in biochemistry and mine was in computer science, which is basically where things differed. And so when I was stopped by Customs and Border Control, basically, I said, “Well, I know what you’re talking about. That guy’s PhD is in biochemistry. Mine is in computer science. Here’s my laptop. You’ll see that there’s no biochemistry here.” Fortunately there was somebody there that was smart enough to know the difference between biochemistry and computer science. Otherwise, I probably would have been held up with charges until a lawyer got to me.

So the same thing, by the way, happens to data brokers. In fact, it’s even more common, because they cannot easily identify even a single person. I’m sure you know of other Karen Smileys. How would they differentiate without a Social Security Number, which is basically a unique identifier? If you have the address, or some numbers of the address, people can differentiate. So if you take my name or your name and add to it the street address, people can start doing uniqueness. But without that additional information, just by name, you can’t even place who it is. The city is not sufficient. Zip code is not sufficient for this.

Karen: Yeah, and that’s all definitely true, that this problem with data quality began years and years ago, and with data being scraped and stolen and repurposed beyond its original intentions. That’s all been true for a long time. The one thing that I think makes it different, and maybe a little more serious, with AI is that once that data gets pulled into a large language model, if it’s wrong, there’s essentially no way to get it back out, even if you get it corrected at the source. The large language model that scraped it is not going to get corrected. And so that information is now going to be stuck wrong, forever.

Salman: Yeah, it’s actually very much like a hot mic error. That once you are recorded and that is transmitted, and especially in these day viral things, like that glue on pizza, for example, over and over again, people would hit it, and the more they hit it, the more they reinforce it, in fact. And that makes it even harder to get over it.

Just like large language models, the challenge is that there’s no single verified data source. Because when you buy data from X, Y, Z, you would make copies of that data. So even if X, Y, Z, at your data source is corrected, the copies of that data would not be corrected.

Karen: Do you know of any companies that you gave your information to that told you that they would be using it for training an AI or a machine learning system? Either professionally or in your personal life?

Salman: I don’t remember any of that. I don’t think any company does that, right? There are all these data privacy agreements and stuff, which they take reasonable care. But then, there are hackers that would break into the system. But I don’t think any company asked me, “Can we use your information to train our LLM?” I think they just do it and then seek forgiveness or settlement later.

Karen: Yeah. LinkedIn is one good example of a company that chose to simply opt us in without our permission, retroactively. The only people who got a chance to opt out from that were those covered by GDPR in Europe, and those of us in the US and other countries did not get a choice. So now we have a way to opt out, but it’s only for the data we create from now on. And they’re definitely using it for training models – not necessarily a foundation model, although Microsoft owns them, so who knows?

Salman: Definitely, yeah. You don’t know what the real story is. And we get into all these corporate coverups where a lot of people are blowing the whistle. And still nobody listens to them, and they keep on going. I remember that Wells Fargo incident, I think about 10, 20 years ago, whether they were creating accounts without people’s permission in order to get bonuses and stuff. There were several whistleblowers that said “This is wrong”. And the executives knew about it and still kept on doing it, because that can inflate their stock price. And when the whole story came out, the whistleblowers’ information also came out that all these people were just ignoring all the ethics that was around it, or even the financial losses for their customers.

Karen: You mentioned that you had had some spam phone calls. A lot of us get those. I know I get them on a daily basis, and thank goodness my phone uses some AI to identify them as probable spam, and keeps them from bothering my phone. But I’m wondering if any of these uses of your data, with or without your consent, have ever caused any problems for you? Some very smart people I know have been phished. You know, they got the call at 3:00 AM when they’re traveling, telling them there’s something wrong with their account, and they try to fix it. Has it ever caused any problems for you or anyone in your family?

Salman: People have asked for money in my name, by either faking my email address, by changing a little bit the email, changing the domain name and keeping the user ID the same.

Karen: Wow.

Salman: But I haven’t had any issues, anything other than that. I get messages sometimes from a friend, “Can you buy me this Amazon gift card for this or that and I’ll reimburse you or pay you back later?” And so on. And my thing is always to call the person. And 99% of the time it’s fake. There are interesting things, by the way, that go on in telecommunications as well. You can spoof the caller ID. So I can call from my phone, but it would appear as if I’m calling from your phone. And when you are texting it is worse, because you can’t recognize. So I can actually text ‘from’ Karen Smiley, and get money by Western Union or Amazon gift card, and just keep that money.

There was even somebody that had a phone number that was similar to mine on WhatsApp with my profile picture that was trying to scam one of my friends. And my friend said, “Oh, you have a second phone number?” And he messaged me, and I said no. So “Okay, I’ll report them.” They even had my profile picture and stuff and just one digit off the phone number. So the unsuspecting person doesn’t even notice that.

Karen: Yeah, that’s interesting to hear about that. I haven’t had the experience of being on the other side of that. But one thing I have talked about with my family is, as deep fakes become more common, to set up a family code word. It’s one of the things that I’ve been recommending to people. So that if they get what sounds like even a phone call, not just a text message, but a phone call that sounds like it’s someone in my family. Or telling my mother, “Look, if you get a phone call asking you for money because I’m in trouble, it’s not me.”

Salman: That code word is actually a good idea. There used to be a scam some time ago about people asking you for your PIN of your ATM. I don’t know if you ever got a call like that, “Oh, can we have the PIN of your ATM to verify it’s you?” And this is before the chips came along, because the stripe could be duplicated. And so I used to tell them that my PIN is the last four digits of PI. And they didn’t know what I was talking. Go figure! I’m happy to give you my PIN.

Karen: Oh, that’s hilarious. I love that. I might have to steal that.

Salman: They didn’t know what I was talking about. And by the way, just a word of caution. These kinds of things, you should just hang up these days. I used to have fun with them. But we have reached a stage of deep fakes and stuff where you shouldn’t even say something. I’ve heard of questions like, somebody would ask, “Is this Salman?” And I would say yes. And they would then super impose my voice with something else. And I probably shouldn’t have said yes right here in your show! But I think people can now even, with deep fakes, imitate somebody’s voice. So there goes that defense.

Karen: Yeah. I looked into voice cloning tools last year, when I was looking into the ethics of AI for music. And there were a handful – literally less than five – that were ethical, and they make sure that you can only clone your own voice. And they make you read something dynamic and make sure that it’s you that’s recording it, and you’re not taking someone else’s recording. But there’s dozens of unethical tools out there that will let you clone from anything, and use a celebrity’s voice, and just all sorts of unethical behavior. And it’s really rampant out there. It’s definitely something to be aware of. So it’s deep fakes for audio, and now even somewhat for video, the video tools are getting better. There’s so much good that we could be doing with these kinds of tools. And they get exploited for these unethical purposes too. I don’t know that we’re going to ever be able to do anything about that duality.

Salman: Yeah. Yeah. So sometimes I just want to retire and live in the hills where nobody bothers me.

Karen: Yep. It’s a thought. All right. So, last question, and then we can talk about anything else that you want. We see public distrust of AI and tech companies growing, partly because we’re finding out what they’re doing with their data that they didn’t tell us they were doing, and things like that. But what is the one thing that you think that these companies should be doing to try to earn and then to keep our trust, if you think that’s possible?

Salman: I think they can try. I think the psychology of that is a different question altogether. But what I would love to hear is how accurate they expect their answer to be. And if I can get some kind of a validation that “Hey, the answer is yes, with 80% probability”, I can make my decisions a lot better than just having an answer “yes.”

And this is probably a more analytical answer than most people would give. But that is my biggest issue right now, that I do not know what is the probability that AI is giving me a correct answer. And I would be a lot more trusting of AI if I can get some idea of what the level of confidence is.

Karen: So you want a confidence interval. You don’t want $100, you want 80 to 110.

Salman: Yeah, exactly. Instead of saying, how much was this going to cost? I would rather that it says “It’s going to cost 80 to $220 with a 95% probability.” And now I have a range that I can work with. Or it can even say that, “It will cost a hundred dollars, and I may be off by $20.”

Karen: That certainly would help. They’ve designed the models to always sound confident. And there was that study that said that 47% of the time, the models say they are confident even when they are wrong. Just dead wrong.

Salman: That’s another level, right? They can tell me that this is the 80% confidence where it’s actually 20% confidence, it’s the opposite. That wouldn’t inspire much trust or confidence.

Karen: Yeah. And then, the question that always comes to my mind when we talk about these answers and how accurate they are, is how do we validate them? Like, how do we go back and know whether or not the predictions that we got – for instance, they predicted that this company was going to go public within 18 to 24 months – how often do those predictions turn out to be accurate, compared to the predictions that you would make on your own?

Salman: I think it’s a time versus accuracy issue as well, right? The answers that we are getting from AI, we can probably get more accurate answers, but it may take us 100 times as long, and people are not willing to wait that.

In fact, if I really think about our strategy as a VC company, we are using it at the top of the funnel, or top of our decision-making process, where we are okay if we don’t invest in an amazing company, and we lose out on it because AI gave us the wrong answer.

Karen: That makes sense. Well, those are my standard questions. Is there anything else that you’d like to share with our audience? Or any other thoughts on AI that we haven’t touched on, or any questions I didn’t ask you that you would’ve liked me to ask you?

Salman: I think you covered it fairly well. My parting thought would be to use AI but use it with caution. You do not know when AI is going to lead you astray. A good friend may actually lead you astray because they don’t know. Especially a good, arrogant friend who wouldn’t tell you, “Oh, I’m not quite sure, but the answer is 100.” And that’s where you need to be careful.

AI is great for you to have a conversation. Try to develop scripts that are standard for things that you do repeatedly. We have a script that helps us do diligence on companies. That way you can compare the same questions of one company to the other, at least to fix one aspect of your investigation, so that you can get consistent results. And continue to validate your decisions using more traditional research, decision-making process, before you execute your decision.

Karen: Yep. That sounds like great advice. Thank you so much. I appreciate you joining me for the interview!

Salman: Thank you for having me.

Interview References and Links

