📜 AISW #055: Daria Markava, Georgia-based high school student and AI enthusiast
Interview with Belarusian high school student and data science & AI enthusiast Daria Markava on her stories of using AI and how she feels about AI using people's data and content
Introduction - Markava
This post is part of our AI6P interview series on “AI, Software, and Wetware”. Our guests share their experiences with using AI, and how they feel about AI using their data and content. (This is a written interview; read-aloud is available in Substack.)
Note: In this article series, “AI” means artificial intelligence and spans classical statistical methods, data analytics, machine learning, generative AI, and other non-generative AI. See this Glossary and “AI Fundamentals #01: What is Artificial Intelligence?” for reference.
Interview - Markava
I’m delighted to welcome Markava as my next guest for “AI, Software, and Wetware”. Daria, thank you so much for joining me for this interview! Please tell us about yourself, who you are, and what you do.
My name is Daria (she/her). I am originally Belarusian, but I’ve lived in Georgia for the last 3 years. I’m currently in grade 11 of high school.
I know school systems can vary worldwide - to confirm, does being in grade 11 mean that you have one more full year to complete before you graduate high school?
Yes, next academic year is my last year at school.
I noticed in your LinkedIn profile that you were a speaker in January at TEDx “Lisi Lake Youth”. Your topic was listed as “Artificial Intelligence: Youth Safety Matters”. Can you share a little bit about your talk?
Of course. The applications for the event came out around the time when the story of the boy who killed himself because of his obsession with a character.ai chatbot was on the news. I started reading more about what happened, and eventually I knew I wanted to speak about it, because it seemed like youth safety didn’t get as much attention in discussions on AI.
The main message I was trying to get across is that generative AI like that chatbot is very unpredictable, and that young people can’t always understand its risks and limitations when they encounter it online. It’s similar to social media, except the interactions with AI are so much more personalized, and that means you can be much more easily manipulated into interacting with it for longer (just a side note, I was talking mostly about AI chatbots for entertainment, like companions or social media bots, rather than the ones we use mostly for productivity).
The key issue here is that companies profit from those addictions, so they have an incentive to target such AI tools towards young people even though they must know how harmful it can be in the long term. It’s just very irresponsible.
I agree, and I’m so glad that you joined the voices speaking out about it!
(2) What is your level of experience with AI, ML, and analytics? Have you used it professionally or personally, or studied the technology?
I have used AI quite a lot, mostly LLMs, and I’m also self-studying machine learning and deep learning. I don’t have any experience training my own models, but I’d say I have a pretty strong understanding of basic concepts. I also co-founded a club at my school that teaches primary / elementary students the basics of AI, and I’ve been leading it for seven months now.
That’s a very cool initiative, Daria! I would love to hear more. What prompted you to start the club? How many other people are in the club and teach with you? And how did you come up with the content you use to teach the elementary students about AI?
I had our STEM teacher approach me about the idea to start a new club about AI, as he’s heard that I’m interested in AI and have some experience. I decided to give it a go, and we came up with an initial outline of the curriculum. We started with very basic things, like what artificial intelligence even means, and then gradually moved on to more complicated concepts, like machine learning and neural networks. Usually each topic is related to the previous one, so most of the time we had no problem deciding what the next topic should be. Recently I’ve tried to put more emphasis on ethics, like data collection, privacy, bias, etc. We’ve used many online resources to demonstrate how AI tools work, like the teachable machine, but for theoretical content, I create the slides myself. We have about 10 students now.
That’s awesome.
(3) Can you share a specific story on how you have used a tool that included AI or ML features? What are your thoughts on how the AI features [of those tools] worked for you, or didn’t? What went well and what didn’t go so well?
I’m currently working on a research project on NLP; sometimes a paper I have to read has a lot of irrelevant information, so it would be extremely time-consuming to read the whole thing. What I do then is feed the paper into ChatGPT, and it gives me the specific points I’m looking for. After that I can ask for more details on certain points, or go directly to the sections of the paper that discuss these points.
That’s a good example. Do you have a sense for how often ChatGPT gives you accurate or inaccurate information? Like, when you’ve gone to a section of a paper to check a specific point, do you always find it, and does ChatGPT’s summary seem right?
The summaries are usually good enough, but they are only summaries. I haven’t encountered any misleading summaries or made-up points yet, but when I check the points in the paper, there’s always more to each one than what the summary mentions (which makes sense, because it’s just a summary). Overall, I think working with text documents is where ChatGPT performs the best for me, compared to other tasks. However, I still always check the original paper.
Got it. I’d love to hear more about your NLP research project! Is this kind of project common for students at your high school, or is this a special course? What is the goal of the project, and what tools are you using?
This is a special online program my school nominated me for, which is really exciting. It’s a research project, where you pick a question in the AI field and eventually write a 5000 word paper on it. My paper is on how NLP tools reinforce dominant cultural narratives. Initially I was planning to research machine translation and/or low resource languages, but as I was brainstorming ideas, they started leaning more and more into social and cultural sciences. I guess interdisciplinary studies and the intersection of technology and society interest me more than purely technical aspects.
My school offers something similar as an extracurricular subject, and I’m doing it too. For that, I research how AGI is portrayed in the media.
It’s great that you’re highlighting how AI can reinforce existing biases and looking at media portrayals. Can you share your papers here on Substack when they’re done? I’d love to read them!
That’s great to hear! I’d like to try to have them published if they turn out well - but if it’s allowed to post them without/before publishing, I’ll definitely share them.
I will look forward to that!
(4) If you have avoided using AI-based tools for some things (or anything), can you share an example of when, and why you chose not to use AI?
I usually avoid using AI for writing. There were multiple occasions where I needed to write a short text (I don’t remember the topics though), and I asked a few LLMs to write it for me. Every time the output text was too vague, too many phrases that served no purpose but to make the text sound sophisticated. Perhaps my prompts weren’t good enough, but in the end I decided that instead of prompt engineering, I might use that time to write the text myself. And just overall, I would probably never use AI to do the work for me. It can help with brainstorming, or streamlining routine tasks, but I always create the final piece of work myself.
I know a lot of writers who feel that way about using AI for writing, too - including me 🙂. Have you ever tried any of the generative AI tools for creating images, or songs, or videos? Or have you mostly avoided them, so far?
Yes, I’ve tried generating images and videos before. It was mostly for fun, I simply wanted to explore what these tools could do. Some of them were really good, but if you compare the generated outputs to the works of real human artists, they’re just not the same.
I tried using Sora the other day, and it generated a hilarious video of baking bread, with very poor object permanence or spatial understanding. I used it as an example of hallucinations in my AI club when we were studying generative AI, and the kids found it really funny how a hand had six fingers and a table suddenly turned into an oven in that video :)
That sounds like an effective teaching demonstration :)
(5) A common and growing concern nowadays is where AI and ML systems get the data and content they train on. They often use data that users put into online systems or publish online. And companies are not always transparent about how they intend to use our data when we sign up.
How do you feel about companies using data and content for training their AI/ML systems and tools? Should ethical AI tool companies be required to get Consent from (and Credit & Compensate) people whose data they want to use for training? (the “3Cs Rule”)
Yes, the companies should definitely get the creator’s consent before using their work to train AI. Those people put so much effort into their craft, and for companies to just take that data and profit from it is simply unfair. Of course, compensating everyone whose data was used would be really challenging, but it doesn’t mean that stealing other people’s work is the solution.
There’s also an element of hypocrisy. When DeepSeek was released, I remember OpenAI accused the developers of using its GPT. So when a company uses artworks, books, etc, without the creators’ permission, it’s fine, but when another company (allegedly) uses a model without permission, it’s suddenly not okay to use someone else’s work to develop AI.
I think this is just an extension of a “hierarchy” that exists in society, where STEM and engineering are prestigious and respected, while creative work like art, literature, music, etc. is considered less important and not taken as seriously.
That’s a great insight on the hypocrisy of how some companies are handling questions of intellectual property.
(6) As a user of AI-based tools, do you feel like the tool providers have been transparent about sharing where the data used for the AI models came from, and whether the original creators of the data consented to its use?
I definitely don’t think any companies providing AI are transparent about that. On the news I’ve seen many class lawsuits where writers sued AI companies for using their books to train AI without permission, for example the most recent one where Meta used books from LibGen. ChatGPT, which I use quite often, was also trained on the corpora from the web, and that could include copyrighted content. And of course, image and video generators probably used the most copyrighted works.
Yes, it’s definitely a problem. The last count I saw for generative AI lawsuits was 39 in the US alone.
I just found out last week, from the search tool in The Atlantic’s article, that at least 11 of my published research papers have been stolen and included in LibGen. That’s a tiny drop in a very big bucket, though. I know some writers who have had multiple full-length books stolen in LibGen, which is just awful for them.
If you’ve worked with building an AI-based tool or system, what can you share about where the data came from and how it was obtained?
I tried running a model (keras.io/examples/nlp/semantic_similarity_with_bert/) that determines semantic similarity between two input sentences. The data came from Stanford Natural Language Inference corpus (there’s an actual citation for it 1). The dataset basically contains pairs of sentences and their semantic similarity, and it was specifically created for training NLP models, which seems ethical.
It’s so great to see that researchers are making their data available in conjunction with their papers nowadays. And that they’re being transparent about how it was sourced or generated.
(7) As consumers and members of the public, our personal data or content has probably been used by an AI-based tool or system. Do you know of any cases that you could share (without disclosing sensitive personal information, of course)?
I haven’t heard about any cases from people I know, but one example I’d like to mention is a dataset that contains photos of people who died in prison. Obviously those people didn’t consent to their data being used in that way, and this example really shows how little regard AI companies have for our privacy. (To give credit where it’s due, I’ve learned about this case and many others from Dr. Joy Buolamwini’s book “Unmasking AI”).
Yes, that’s terrible. I know Dr. Joy’s book and the AJL website have some appalling examples of bias and exploitation that people should be more aware of - thank you for crediting her.
(8) Do you know of any company you gave your data or content to that made you aware that they might use your info for training AI/ML? Or have you ever been surprised by finding out that a company was using your info for AI? It’s often buried in the license terms and conditions (T&Cs), and sometimes those are changed after the fact. If so, did you feel like you had a real choice about opting out, or about declining the changed T&Cs? How do you feel about how your info was handled?
Reference for this question: "Your Data is Worth Thousands—Here’s Who’s Selling It (And How to Stop It)"
I know that on LinkedIn and Substack, there is a “Block AI training” section in settings. It’s turned off by default, and I wasn’t even aware it was there in the first place until I decided to look through settings just to see what’s there. It’s good to know that there is still an option to opt out, but it’s presumptuous to opt everyone in by default without them agreeing. Even though you can technically opt out, the fact that your data and content could have been used to train AI before you discovered that setting is frustrating.
I completely agree - settings like “Block AI training” should not be OFF by default. I think the Substack setting is designed to block third party bots from using our content, not to control Substack using it. But still, the default should have been to block and not to allow.
LinkedIn definitely used our content before we had a chance to know about the setting, and the opt-out only applied to our future content. (And actually, there are two settings in different places that we needed to change to fully opt out, not just one. They were not up front about that, either.) I do know that people protected by GDPR were treated better with regard to AI training than those of us who aren’t protected. So that says it’s not a matter of not being able to do it, technically. It was a deliberate choice to exploit as many of us as they could legally get away with. Definitely not cool.
That’s a great point. They’re trying to grab as much data as they can before the owners realize it.
(9) Has a company’s use of your personal data and content created any specific issues for you, such as privacy, phishing, or loss of income? If so, can you give an example?
Luckily hasn't happened to me.
Good to hear!
(10) Public distrust of AI and tech companies has been growing. What do you think is THE most important thing that AI companies need to do to earn and keep your trust? Do you have specific ideas on how they can do that?
There are many things I can think of, but perhaps the biggest one is to start listening to experts, the people who warn about potential risks, rather than continue with the “move fast and break things” approach. A lot of educated, qualified people speak up about the harm AI can cause us, and ignoring them isn’t the way to earn public trust. So I think the best thing they can do is start thinking about people, and what’s best for humanity. I know it’s a bit unrealistic to expect this in our capitalist society, but it’s the only way to make sure that we benefit from AI as much as the companies promise.
Well said.
(11) Daria, thank you so much for sharing your thoughts in this interview. Is there anything else you’d like to share with our audience? And I’m curious, with everything you’re doing with AI, and everything we’ve talked about, I’m wondering: do you have a definite interest in pursuing a career in AI or in STEM? Or are you still mostly exploring at this point, as an 11th grader?
Thank you for this opportunity, Karen! Yes, I’ve already pretty much decided to pursue a higher education in AI and data science. I’m not sure yet if I’d like to have a career in industry or stay in academia, but I’ll probably figure it out during my undergraduate studies. One thing I know for sure is that either way, I want to center my work around responsible tech and ethical innovation.
I am so happy to hear that you’re aiming for a STEM career, Daria. You obviously have an aptitude for it, and we need more voices like yours shaping AI and other technologies in the future! Thank you again.
Interview References and Links
Daria Markava on LinkedIn
Daria Markava on Substack
Daria’s Jan. 2025 TEDxLisi Lake Youth talk on “Artificial Intelligence: Youth Safety Matters” (YouTube)
About this interview series and newsletter
This post is part of our AI6P interview series on “AI, Software, and Wetware”. It showcases how real people around the world are using their wetware (brains and human intelligence) with AI-based software tools, or are being affected by AI.
And we’re all being affected by AI nowadays in our daily lives, perhaps more than we realize. For some examples, see post “But I Don’t Use AI”:
We want to hear from a diverse pool of people worldwide in a variety of roles. (No technical experience with AI is required.) If you’re interested in being a featured interview guest, anonymous or with credit, please check out our guest FAQ and get in touch!
6 'P's in AI Pods (AI6P) is a 100% reader-supported publication. (No ads, no affiliate links, no paywalls on new posts). All new posts are FREE to read and listen to. To automatically receive new AI6P posts and support our work, consider becoming a subscriber (it’s free)!
Series Credits and References
Credit to CIPRI (Cultural Intellectual Property Rights Initiative®) for their “3Cs' Rule: Consent. Credit. Compensation©.”
Credit to for the “Created With Human Intelligence” badge we use to reflect our commitment that all content in these interviews will be human-created:
If you enjoyed this interview, my guest and I would love to have your support via a heart, share, restack, or Note! One-time tips or voluntary donations via paid subscription are always welcome and appreciated, too 😊
End Notes
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). [pdf] [bib]