📜 AISW #026: Prateeti Mohapatra, India-based research manager (AI, Software, & Wetware interview)
An interview with India-based manager and senior research engineer Prateeti Mohapatra on her stories of using AI and how she feels about how AI is using people's data and content
Introduction
This post is part of our 6P interview series on “AI, Software, & Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content.
Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for reference.
📜Interview - Prateeti Mohapatra
I’m delighted to welcome Prateeti Mohapatra from India as our next guest for “AI, Software, & Wetware”. Prateeti, thank you so much for joining me today! Please tell us about yourself, who you are, and what you do.
I'm Prateeti Mohapatra, a senior research engineer and manager at IBM Research Lab, Bangalore, India, working with a team of researchers in AI and operations, observability, and IT cognitive support. I also lead the Foundation Model for Automation and Networking workstream globally. My current focus is on bringing AI solutions to IT operations, providing more insights with reactive, predictive, and proactive capabilities across the application life cycle. I've been with IBM for close to nine years now and started my journey here in September 2015.
Very good. Thanks for sharing that background, Prateeti. So tell us about your level of experience with AI and machine learning and analytics, whether you've used it professionally or personally or studied the technology.
I have been studying this technology for a long time - for example, during my graduate days when I was working in speech processing - a very fascinating and difficult domain.
And then I've been applying AI in multiple organizations that I've been involved with. When I was with ABB Corporate Research, the focus was on understanding the requirements documents and trying to apply AI to understand how people work globally, how collaborations can work, and how to define metrics. And then I slowly moved into natural language processing when I was in Chicago. So it has been kind of a continuous journey of applied AI in different domains - a lot of learning throughout the years of my career.
I'm currently at IBM Research, with IBM's strategy around AI and hybrid cloud. So we have a lot of kinds of AI, use cases, AI applications, building AI models, and a lot of GenAI capabilities. We have a big play around what we call Watsonx models1, and they're for different kinds of tasks.
With my current role in AI for IT automation, AI in operations, and AI for tech support, there is a need to help operations engineers understand why an issue is happening. So understanding the data that is coming from different microservice applications, summarizing the data for ease of use for an end user. It can be a reliability engineer, a DevOps engineer, or a developer.
Then, we generate what sort of recommendations can be done to help mitigate the issues, proactively or reactively, and then automate it. So automation can be code generation or script generation in different languages.
So yes, I am learning new AI technologies and applying them in my professional life.
AI for operations sounds similar to products my team at Wind River was building for cloud monitoring - one was called “Platform Health” - so that all makes sense to me.
How about in your personal life? AI is very hard to avoid nowadays. It's everywhere in web agents and email clients and phones and everything. What are your experiences with that?
There are both pros and cons. When I binge watch shows on streaming platforms or browse YouTube, I see both the pros and the cons of AI. I get recommendations that are close to my interests, probably based on my browsing history. The recommendations are really good.
My 3-year-old son was just randomly looking at YouTube for the first time getting onto a laptop. And he fell upon a site that was about monsters, and he suddenly got very fascinated. I have tried different filtering mechanisms, for example, age range. But they don’t work. YouTube is not there anymore on our machines.
On a personal level, AI tools can be very helpful. There are very good tools like Grammarly. Even Gmail recommendations are good. I also use and rely on Google a lot. These tools are convenient and easy to use.
In my day-to-day life, I use these tools to find relevant articles, related work, or where the best possible food joint is according to my liking. I'm very much dependent on these things nowadays.
So in the web browsers, are you still using the non-genAI search functions, or are you trying the new genAI? Like in Bing, it's Copilot, or in Google, it's Gemini.
And the reason I'm asking is that a lot of people have started by using the AI overviews and generative AIs in the browsers by default, but then they switch to the the relatively old-fashioned search, because they find that genAI makes stuff up; like, it'll make up a restaurant that doesn't exist, in your case.
I have not used Gen AI for search. The search engines have gotten better with time if you give the right set of keywords. I've not felt the need to go into GenAI for search yet.
I do use ChatGPT. I got this idea from a friend and colleague of mine. Sometimes I have to make up some stories for my kids. I've used the ChatGPT features to give me a nice kid's story. When my daughter says, “Can you tell me a story about a river, king /queen, jungle, tiger, etc?” - it's very hard to come up with a story with all these aspects. I do see the power of these generative models which can create a nice story around them. And you can also instruct ChatGPT to generate age-appropriate stories.
Creating the story for your daughter is a good example of using AI at home!
Can you share a specific story on how you've used AI and ML at work? And I'd like to hear your thoughts about the AI features of those tools and how well they work for you or didn't, and what went well and what didn't go so well.
In my current role, the focus is on helping reliability engineers and support engineers to quickly and proactively identify probable issues. So for example, if you have an application hosted on the cloud, with tools monitoring the application and microservices in the application. AI capabilities in these tools can also predict and forecast when something might go wrong.
When an outage or failure has occurred, how can a model help quickly remediate it? Even before remediating, how can a model understand and do a quick diagnosis to help these engineers get those services up and running?
The data modalities are also very complicated. is a lot of complexity in these data modalities, where we have structured metrics. The application might have numerous microservices, and each microservice emits metrics. Each microservice can have logs, which can be structured, unstructured or semi-structured. There will be runtime information like traces, providing information on which service is calling which service. And then one may have other information like application or infra topology.
So, there are different data modalities that one has to attend to. We have graphs. We have structured, semi-structured and structured data. And then we also leverage historical information. If a similar kind of incident has happened, one can look at past tickets, knowledge articles, or technical manuals to understand how to help mitigate the situation. And then finally, we have automation that involves code and scripts.
So there is a mixture of language and non-language data modalities. Understanding and representing all of these modalities to do intelligent remediation, intelligent forecasting or prediction, detecting outliers, detecting anomalies, and helping to diagnose where the fault might be or the root cause might be is what we try to do in the AIOps2 domain.
We're leveraging a lot of IBM Research-built technologies - for example, Watsonx code and language models.
What kind of challenges do we face? One is we need good infra support. We need the power of GPUs, especially for using decoder models. Then the other problem that we have is IT and the incorporation of domain knowledge. The vocabulary might be very different. So the model needs to understand that language, the IT language. So, how do we incorporate the domain knowledge? That is a challenge that we face.
One of the biggest challenges we face is for each of the steps that I mentioned - anomaly detection, fall localization, action remediation, automation, code generation - there is a lack of ground truth data. So we have to start simulating a real environment. And that takes time because we have to simulate that whole setup, and then see if whatever we are proposing, or whatever our tools are doing for every step, is correct.
Then this joint representation of bringing all of the different kinds of data modalities together is a challenge.
The final challenge that we face is not just in using LLMs, but in understanding how to interpret them. When an LLM generates an answer, recommending it to the end user, having clear explainability helps.
That's a really good explanation. And you had mentioned explainability earlier, and that's certainly super important. So thank you for sharing that.
On the other side, are there situations where you've avoided using AI-based tools? You said you don't really use them for searching, and you stay with the conventional searches, which I understand. Are there other situations where you choose not to use AI to solve a problem?
In work, yes - if there are GPU considerations, or maybe the infra support is not there, or cost. Every call to an LLM model will have a cost. So these are some things that we have to consider. Using smaller models is another consideration.
When it comes to deploying these AI models, the challenge becomes how to deploy these models. In AIOps, where data is streaming continuously, it is really hard to use an LLM to process every data point. In such situations, we try to bring in intelligent ways of processing and optimization.
That’s a great explanation - thank you, Prateeti.
How do you feel about companies that use data and content for training their AI and ML systems and tools? Specifically, I'm wondering if you feel that ethical AI tool companies should be required to get consent from the people whose data they want to use for training, and compensate them for it.
Yes, I believe they should go through the legal approvals and follow a very diligent data preprocessing pipeline. Data privacy is very important, especially when it involves PI data. Many countries have regulations like GDPRs, and it’s important to follow these rules. Transparent and ethical practices are needed when using data.
We were just talking before the call. Somebody can steal your identity. Because these models are so powerful now that people don't want to show their faces. Or they don't want anybody to know how they sound, because anyone can just steal that.
And be more transparent. Be more transparent about what data your models have been trained on.
As someone who's using AI based tools, you mentioned that some companies are sharing the sources of their data and what they trained on. Do you notice tool providers being transparent about sharing this, and whether or not the original creators of that data consented to its use, whether it's legally and ethically licensed?
Yeah. I think so. With IBM, I see it as a very streamlined process of data acquisition, and legal data prep. There is a very defined process of how we should do it. There is a big team of data curation and data acquisition that goes through legal and goes through that whole process.
And many companies are doing it. There are probably many who have had issues, and I don't want to name them. But I think many companies are doing it, and the rest should be doing it.
Agreed. As members of the public, there are cases where our personal data content may have been used or has been used by an AI-based source system. You mentioned social media.
There was a big fuss about Facebook and Meta about a month or so ago. Opt out requests from people that weren't covered by GDPR were ignored.
Then, a few weeks ago, LinkedIn quietly slipped in settings for training with our data and content for their generative AI content creation systems. And it's ON by default. You have to actually go find your data privacy settings and opt out two different ways. A lot of people (including me) were not too happy about that.3
And the latest is X/Twitter - they’ve announced that starting Nov. 15, they’re planning to start using people’s content, unless we opt out.
(Readers: see End Note 4 for a link from Ravit Dotan on how to opt out of this.)Are there other cases that you know of where your information has been used or misused by an AI-based tool or system?
I don't know, but, my social media data is open. As you said, some companies have violated these GDPR rules.
Do you know of any companies that you gave it to that actually said, hey, we're going to use it? You mentioned Grammarly, for instance. Are you aware of companies that are being upfront about the fact that we're going to use your data for training our machine learning systems, or has it come as a surprise?
I'm sure many companies might be using it, and they probably have notified me. Maybe I've not paid attention. I don't remember such a case, but I may have accepted something without reading it. With so much content, it’s easy to just click accept.
Yeah. Terms and conditions are tough, especially when they're 20 pages long and tiny print, and it's a lot of reading. And sometimes it just says "We're going to use your data for improving the product". Well, that tells us nothing about what they're going to do with AI, or if they're going to sell or share the data.
You mentioned concern about identity theft. Has a company's use of your data ever created any privacy issues, or do you get phishing messages, or has it caused you any trouble?
I mean, the random phone calls. I get random phone calls from real estate agents. A recent trend that has started happening is that a call comes. It looks like a genuine number. And there is an automated message, and even that message also sounds very genuine. And then they ask, okay, to get more information, you press a number. And then everything starts getting wrong. So this happened to me a couple of times. And I've stopped pressing any number on my phone. Because it gets routed to I don't know who, and then after, like, 2-3 minutes into the conversation, you realize that this is something not right. So this has been happening a lot, recently.
Another incident is when you suddenly get your friend's Facebook messages. Your friend's Facebook account is hacked, and then you get a message that, “Oh, I'm in dire need of money”. And people have sent money for some things.
Keeping any digital information, personal digital information is getting scarier day by day. You don't want these digital footprints of your personal life over the web. It's a real scare, and it is scary.
Yes. So my last question is about public distrust of AI and tech companies, and what do you think is the most important thing that AI companies need to do to earn and keep your trust?
Explainable AI and being responsible are essential requirements. That is how people will trust and adopt these AI models in different domains. Fairness in AI and safe AI would bring more trust, and people would be not that reluctant to deploy your model in their environment.
Open sourcing is another way. A lot of companies are now open sourcing their models, open sourcing code, open sourcing data. That also brings a lot of transparency, and it brings more community building, and people are more aware of what is going on. They also feel a little bit empowered that you can contribute back, especially if you are into model development, so you can contribute back.
Yes, all of the talk about open sourcing is interesting. Some of the big AI companies are claiming they’re open source, but they’re not. They don’t share code or data, and at most they share the model weights.5 Others are legitimately engaging with the open source community and contributing back, which is great to see.
Can you talk a little bit more about fairness? I don't think we've gone too much into that, but we all know that AI has biases, based on the data that was used for it, and the biases of the people who trained it and developed it. What would a company need to do differently than what they're doing now?
The training data used to train models must be fair. I remember giving a talk on fairness and bias, and when we searched for images for the keyword “carpenter”, search results had images where the carpenter was a male. Similarly, for the keyword “nurse”, the majority of the images were female.
AI models can also be very biased - gender bias, and racial bias. Another example was given a set of wedding attire images - one had to describe the image using keywords. For regions where people did not know about the wedding attire, they could not describe the image at all. So, even humans have biases that can be easily transferred to AI models.
A study was done a few years back in IBM Research where the study focused on examining stereotypes and bias in the Hindi film industry (Bollywood). The analysis included movie plots and posters from films released since 1970. The study concluded that there was widespread presence of gender bias and stereotypes in these films.
Readers, here’s a link Prateeti shared to that Bollywood poster bias study: “Analyze, Detect and Remove Gender Stereotyping from Bollywood Movies”.
It is important to ensure that the data used to train AI is free from bias and that fairness checks are properly done.
Yes - the wedding attire and the Bollywood study are great examples of human bias which could be propagated into an AI system if it were trained on those images, scripts, and posters.
Is there anything else you'd like to share with our audience?
One thing we should talk about is risks. For example, in the IT automation space, AI models can generate automation scripts. But we should understand the risk of automating this script. What are the risks involved? Risks can be of various forms. The form that I am talking about is: once a response is generated, and you act upon that response, what can be the potential risks or impacts? Should we assess this beforehand? For example, if a service is faulty, and the recommendation is to delete that pod where the service is hosted, will this action be effective, or could it introduce additional faults?
There are other types of risks like information risks. One issue we talk about is hallucinations, where genAI models generate content that is not factually correct. There are other risks like risks of abusive language and hateful language.
These are risks we must be careful about, and it’s important not to blindly trust AI-generated content. Hence, there is a requirement for a human in the loop - for validation, and for giving the models feedback.
Good points! You know, I just read something earlier this week about how there's some news about having an LLM ask the human questions, rather than the human asking the LLM questions. Some people think that could spiral in a strange way.
Also, if people answer that LLM’s questions, they're potentially giving it some deeply personal information. Often, people aren't thinking about the fact that their prompts and answers will be used and reused in the AI system, and could resurface in someone else’s generated content in a harmful way.
Another thing I'm curious about: you mentioned that you have a 9-year-old daughter and a 3-year-old son. How are you helping them to understand that they shouldn't just blindly trust everything that they see, that AI is everywhere and to be aware of it? Like being alert to the potential of deep fakes, and they may get a call that's not really from their mom or dad - things like that. How are you navigating that? Because that seems like a huge challenge nowadays.
Apart from YouTube, I still prefer giving them books. Even for their homework activities, I give them books.
When my 9-year-old needs to research a topic, I'll print out the information I find and give it to her, so she hasn't started using these tools for her daily tasks. On her tablet, she mainly plays games or watches movies, and that's about it. She doesn't understand what AI can do yet.
She doesn't understand what I do. She thinks I'm a computer engineer who can fix computers. So my kids are not exposed to this AI world yet. At some point in time, they will probably, maybe from their friends or somewhere, they will understand the power of what these tools can do, but not yet.
Yeah. It's very hard to protect children from AI. There's been some movement recently, especially in countries that have GDPR, like you said, towards protecting children from it, but it’s challenging. One of my earlier interview guests, Angeline Corvaglia, has been working on this to help her daughter and other kids learn more about AI safety, with her “Data Girl and Friends” initiative.
Any final thoughts?
My final thought is: to use AI wisely. It's very important. AI is going to stay, and it's very powerful. We should use it responsibly. We should use it wisely.
Somewhere someone said this quote. It's not my quote. Earlier, it was "plus AI". And now it is "AI plus". 6
That is how the world is moving. It is great. It is a game changer. No doubt about it. Products are talking about the power of it. It's revolutionized the way companies are now working. Every company has that GenAI kind of aspect to it, and they understand the value.
The questions that you've asked, Karen, are very, very relevant. As I said, use AI wisely, be responsible, and be more open to people who are using your models, and that will help them gain trust. Building these models as a community will help each other out and gain more knowledge.
All right! Prateeti, thank you so much for joining our interview series. It’s been great learning about what you’re doing with artificial intelligence tools, how you decide when to use human intelligence for some things, and how you feel about use of your data!
Interview Guest References
Prateeti Mohapatra on LinkedIn
About this interview series and newsletter
This post is part of our 2024 interview series on “AI, Software, and Wetware”. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools or being affected by AI.
And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I don’t use AI”!
We want to hear from a diverse pool of people worldwide in a variety of roles. If you’re interested in being a featured interview guest (anonymous or with credit), please get in touch!
6 'P's in AI Pods is a 100% reader-supported publication. All new posts are FREE to read (and listen to). To automatically receive new 6P posts and support our work, consider becoming a subscriber (free)! (Want to subscribe to only the People section for these interviews? Here’s how to manage sections.)
Enjoyed this interview? Great! Voluntary donations via paid subscriptions are cool; one-time tips are deeply appreciated; and shares, hearts, comments, and restacks are awesome 😊
Credits and References
End Notes
LinkedIn AI training opt-out instructions, from Anonymous4 interview, via Ravit Dotan:
“Open-Access AI: Lessons From Open-Source Software”, Parth Nobel, Alan Z. Rozenshtein, Chinmayi Sharma / Lawfare, 2024-10-25
Links for further reading on "AI Plus”: