AI Agents - An Introduction
AI Agents: Why, How, What – A Guide to Understanding, Building, Implementing, and Monitoring Intelligent Agents
Audience: Anyone who has basic knowledge on AI and wants to know more about AI Agents and how they are implemented (primarily software architects and developers).
Acronyms and Terms: GenAI (generative AI), GPT (Generative Pretrained Transformer), GPU (Graphics Processing Unit), LLM (Large Language Model), PII (Personally Identifiable Information), RAG (Retrieval Augmented Generation), TPU (Tensor Processing Unit). For more details on many of these terms, see our AI Glossary.
What are AI Agents?
AI agents perform tasks independently without human intervention. This is a little confusing, because we had so many automation projects, then machine learning projects in the past, and what is different with these AI agents?
In the context of Gen AI and LLMs, an AI agent is typically defined as an autonomous or semi-autonomous system that leverages large language models (LLMs) to perform tasks, make decisions, and interact with its environment. These agents utilize the natural language processing capabilities of LLMs to understand and generate human language, enabling them to carry out complex activities such as answering questions, assisting with workflows, automating tasks, or engaging in multi-step reasoning.
I am a visual person. So I tried to put 2 example scenarios below to explain the difference between previous AI systems and AI agents.
Example 1
In the above figure, Example 1 illustrates how traditional chatbots are evolving into modern AI agents. Historically, chatbots have been used as standalone AI use cases, designed to interact only with their specific databases and UIs to answer customer queries. However, if the chatbot needed to book a ticket or access other databases or enterprise-wide resources, it lacked the necessary permissions. With the advancement of AI agents, these systems can not only engage in conversations, but also retrieve the required information and complete tasks, such as delivering flight tickets or hotel reservations. The chat concludes with the successful completion of the customer's desired action.
Example 2
In Example 2, we traditionally performed predictive maintenance and estimated the remaining useful life of components, primarily to ensure that component stock is available before it is actually needed. This is especially critical for aero engine sub-components, as their unavailability could result in an aircraft being grounded. In such cases, an alert about a potential component failure is sent, and the personnel involved prepare the necessary components.
With AI agents, the agent can identify the regular supplier of the component and place an order for the required quantity at the next overhaul location of the aircraft. The agent is independently capable of completing the task by accessing multiple sources of information, provided it is authorized and expected to do so.
In the current scenario, with the use of LLMs like GPT-4 (OpenAI), GPT-3.5 (OpenAI), PaLM (Google), LLaMA (Meta), Alpaca (Stanford), Claude (Anthropic), Mistral (Mistral AI), Flan-T5 (Google), Falcon (Hugging Face), Bert (Google), etc., the agentic development takes rocket speed to implement complex use cases faster and more efficiently.
By providing the LLMs, sufficient tools (software functions), and past data (knowledge bases), the agents allow for faster development and efficient execution of intricate use cases, from conversational assistants to autonomous decision-making systems in various domains.
Why AI Agents?
With the use of ChatGPT, we started using the LLM to ask a question and follow up question, until we achieve the desired result. The agent can perform the same way, but without intervention of a human being, but providing all the required information and access to tools and information.
The growing demand for efficiency, accuracy, and personalization in modern businesses has underscored the need for AI agents. These agents excel at automating repetitive tasks, allowing human workers to focus on strategic activities that require creativity and critical thinking. Beyond automation, AI agents enhance decision-making by processing vast amounts of data and delivering insights in real-time. Additionally, they enable personalized experiences, adapting interactions and recommendations to individual user preferences, which fosters stronger customer relationships and satisfaction.
I could see this as an extension request from the business to go the extra mile in the direction of automation with intelligence. Humans cannot read so much and digest so much data and come up with a decision in a short time. But in any field, human beings achieved excellence for several decades. When there is a need for human beings to focus more on advanced innovations, they want to delegate the repetitive tasks which do not require any other knowledge. With the past human experience and real time data, it makes more sense to develop more agents to help the human race to uplift their potential on other useful activities.
What Are the Types of AI Agents?
Agents can be classified into different types based on their characteristics, functionalities, and the level of autonomy they exhibit. Here are the most common types of agents.
Single Agentic Type Architecture
1. ReAct Agents
In the paper referenced above, the authors explored the use of Large Language Models (LLMs) to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two. Traditional LLMs typically treat reasoning (e.g., chain-of-thought prompting) and acting (e.g., action plan generation) as separate processes. However, by integrating reasoning and acting, ReAct allows reasoning traces to help the model induce, track, and update action plans while also handling exceptions, while the actions enable the model to interact with external sources such as knowledge bases or environments to gather additional information.
Reference to implement this in LangChain: https://python.langchain.com/v0.1/docs/modules/agents/agent_types/react/
Reactive AI agents respond directly to real-time inputs without memory or learning, operating based on predefined rules and producing predictable outcomes, such as basic chess engines evaluating board positions or industrial robots performing repetitive tasks. They are efficient and ideal for stable environments requiring quick decisions, like automated assembly lines or simple voice assistants responding to keywords. However, they lack adaptability, long-term goal planning, and context awareness, limiting their use in dynamic or complex scenarios.
2. Model-Based Reflex Agents
Model-Based Reflex Agents are a type of AI agent that maintain an internal model of the environment, allowing them to make decisions based on both their current perceptions and historical data. These agents use condition-action rules to decide on actions, leveraging their internal model to handle dynamic and partially observable situations. When integrated with Large Language Models (LLMs), they can enhance decision-making by using the LLM for reasoning, generating natural language outputs, or interacting with users and external systems, making the agent more intelligent and adaptable in complex environments.
Refer to: https://www.geeksforgeeks.org/model-based-reflex-agents-in-ai/
3. Goal-Based Agents
Goal-based AI agents are designed to achieve specific objectives by planning and executing actions based on their current state and environment. When integrated with Large Language Models (LLMs), these agents can enhance their decision-making and planning capabilities by leveraging LLMs for reasoning, language understanding, and task-specific knowledge. LLMs help goal-based agents process natural language inputs, generate action plans, and adjust strategies based on evolving goals or user interactions. This combination allows goal-based AI agents to tackle complex tasks, such as interactive problem-solving or dynamic decision-making, with greater adaptability and efficiency.
Refer to: https://www.allaboutai.com/ai-agents/goal-based/.
4. Learning Agents
Learning agents are AI agents that can improve their performance over time by learning from their experiences, interactions, and feedback from the environment. These agents adapt their behavior through processes like reinforcement learning, supervised learning, or unsupervised learning, allowing them to make better decisions and achieve their goals more effectively.
When integrated with Large Language Models (LLMs), learning agents can leverage the model’s ability to process and understand complex language inputs, providing them with better context for decision-making. LLMs can help learning agents interpret feedback, refine strategies, and adapt their actions based on evolving data, making them capable of solving more dynamic and complex problems. This combination enhances the learning process by enabling agents to reason with rich, human-like language and learn from both structured and unstructured data.
Refer: https://www.geeksforgeeks.org/learning-agents-in-ai/
Multi Agent System Architecture
Note: We acknowledge that some of the current terminology in the world of AI Agents is not inclusive. One issue is that a commonly suggested alternative for ‘slave’ is ‘agent’, so when we are talking about AI agents, it would be confusing. We regretfully use the current industry terminology here since it is in the sources we referenced. However, we encourage researchers and writers to adopt more inclusive language ASAP (‘minion’ is one better option among many).
1. Master Slave Multi Agent Architecture
2. Collaborative Multi Agentic Architecture
Agent Architectures Applications
This interactive table summarizes use cases for common types of AI agents.
The Agent Development Lifecycle
Define the problem and environment
Choose an agent type and architecture
Select tools and frameworks
Train the agent
Test, validate, and optimize
Monitoring
In order to explain the Agent Development Lifecycle, we will look at a Question and Answer chat agent. The difference between the chat bot and this agent is that the agent solves the business problem independently.
The above diagram explains the RAG-based QA agent. We will consider simple chat on the PDF file content asking question-and-answer.
If the PDF contains any particular information like a patient record, then the previous documents and relevant medical documents are already stored in the vector database using offline indexing pipeline.
When we ask a question, we apply the same embedding model on the query to chunk and index, this helps to get the relevant chunks in the vector database. This augmented query is given to LLM to answer the query based on custom defined prompts to enhance the context.
The embedding generation involves multiple steps as shown in the below figure.
Agent Development Frameworks
Multiple frameworks are available, like LangChain and CrewAI. When it comes to production-grade, scalable, secured cloud solutions, we need to go for major players like AWS, Azure, and Google. I will cover the exhaustive features of Frameworks in subsequent articles.
The popular frameworks available for developing AI agents are:
LangChain – LangChain is one of the most popular and widely adopted frameworks for building applications with LLMs. It simplifies the process of chaining multiple tools and models together and is known for its strong community support and ease of use.
AutoGPT – AutoGPT has gained significant attention for its ability to autonomously perform tasks and chain LLMs together for goal-oriented activities. It is easy to use for automating complex tasks and has a growing user base.
WorkGPT – WorkGPT is widely used in automating tasks and workflows using LLMs, offering flexibility in task management and integration with other systems.
OpenAgents – OpenAgents is gaining traction for its modular approach to agent design and its open-source nature, making it an accessible choice for developers looking to experiment with AI agents.
XLang – XLang supports cross-language model integration and multi-model frameworks, making it appealing for more complex use cases, though it may not be as widely used as LangChain or AutoGPT.
MiniAGI – MiniAGI is a lightweight framework for building AGI-like systems, but its adoption is still growing and isn't as widespread as the others listed above.
XAgent – XAgent provides a flexible structure for building agents, but its community is relatively smaller compared to the more widely adopted frameworks.
ToolLLM – ToolLLM, while providing a solid template for data construction and model training, is more specialized and less widely adopted compared to the others in the list.
AgentGPT – While AgentGPT provides fine-tuning and local data integration features, its adoption is somewhat smaller compared to the more widely used frameworks listed above.
AgentVerse, AutoGen, AutoAgents, AGENTS – These templates are useful for creating multi-agent systems but are relatively newer or less commonly adopted, focusing more on specific use cases rather than wide-scale adoption.
What are the Challenges?
Large language model (LLM) agents have shown impressive capabilities,however, their autonomous, non-deterministic behavior and continuous evolution raise concerns about AI safety. To address these issues, it is crucial to implement observability in these agents, allowing stakeholders to understand how the agents operate, detect anomalies, and prevent potential failures.
AgentOps is a DevOps approach to observability for AI Agents.
From a DevOps perspective, ensuring the safety of LLM agents requires monitoring their activities throughout their entire lifecycle. This includes tracking the agents’ inner workings, logging relevant data, and analyzing behaviors to proactively detect issues. To support these goals, the paper (https://arxiv.org/abs/2411.05285: “AgentOps: Enabling Observability of LLM Agents”, Liming Dong, Qinghua Lu, Liming Zhu) introduces a comprehensive AgentOps taxonomy. This framework identifies key artifacts and associated data that should be traced to enable effective observability. It provides developers with a template for creating AgentOps infrastructure that supports continuous monitoring, logging, and analytics, which are essential for maintaining AI safety.
AgentOps helps developers maintain robust monitoring and control over LLM agents, making it easier to detect and address potential safety issues during deployment and operation. The modern DevOps observability tools like Prometheus, Grafana, OpenTelemetry, Elastic (ELK) Stack (Elasticsearch, Logstash, Kibana), Datadog, Seldon, Kubeflow, MLflow etc. can be used for the agent observability by adopting some innovative strategies.
Why are the Guardrails Important?
Guardrails in AI refer to mechanisms, tools, or policies designed to ensure that AI systems operate safely, ethically, and in alignment with specific goals or guidelines. These safeguards are critical in managing the behavior of AI systems, particularly in high-impact applications like generative AI, to prevent misuse, unintended consequences, or harmful outcomes.
Benefits of Guardrails
Ensuring Ethical Use:
Guardrails help enforce ethical standards by preventing AI systems from producing harmful, biased, or unethical outputs, such as hate speech, violence, or misinformation.
Enhancing User Safety:
They protect users from inappropriate or unsafe content by filtering out undesirable inputs or outputs, including personal data, toxic content, and unsafe recommendations.
Mitigating Risks:
Guardrails reduce the risks associated with deploying AI systems, such as legal liabilities, reputational damage, or user dissatisfaction caused by harmful or misleading outputs.
Compliance with Regulations:
Guardrails ensure adherence to data protection laws (e.g., GDPR, CCPA) and industry-specific regulations by implementing features like PII redaction and secure data handling.
Building Trust:
By demonstrating a commitment to responsible AI practices, guardrails foster trust among users, stakeholders, and regulators, which is vital for the adoption of AI technologies.
Contextual Relevance:
They allow AI systems to align with domain-specific guidelines, ensuring outputs are accurate, relevant, and sensitive to the context in which the AI is used.
Support for Developers:
Guardrails simplify the development process by providing prebuilt frameworks for content moderation, safety checks, and ethical considerations, allowing developers to focus on innovation.
Examples of AI Guardrails
Content Filters: Block toxic or inappropriate content.
PII Redaction: Identify and remove personally identifiable information.
Bias Mitigation: Ensure unbiased decision-making and outputs.
Grounding Checks: Verify that AI outputs align with accurate, contextually relevant information.
Explainability: Provide clear explanations for AI decisions to ensure transparency.
Real-World Applications of Guardrails
In generative AI, guardrails like Amazon Bedrock Guardrails or Azure Content Safety filter undesirable text or image outputs and enable ethical AI deployment.
In autonomous vehicles, guardrails include safety protocols to prevent accidents.
In healthcare AI, they ensure compliance with medical standards and patient safety regulations.
Guardrails are essential for ensuring that AI serves humanity responsibly and effectively, mitigating risks, and enabling ethical and safe innovation.
Cloud Solution Platforms
Developing AI agents using LLMs on cloud providers like AWS, Azure, and Google can be advantageous, because they provide scalable infrastructure and required hardwares like GPUs and TPUs to handle the computational demands of training and deploying LLMs. These platforms offer pre-trained models, APIs, and machine learning workflows that simplify the integration of LLMs into AI agents, saving significant development time. With seamless data integration and analytics tools, these providers enable AI agents to process and retrieve relevant information efficiently. They are constantly on their toes to implement security features, including guardrails, which can be an added advantage when AI agents must be developed to production grade.
AWS
AWS addresses AI ethics through Amazon Bedrock Guardrails, a comprehensive framework designed to ensure the responsible use of generative AI applications. With the recent preview of multimodal toxicity detection, Bedrock Guardrails now enables the filtering of undesirable content in both text and images, enhancing safety and user experiences. This feature supports configuring thresholds for content categories like hate, insults, sexual content, and violence, offering a tailored approach to content moderation. Guardrails also include safeguards for personally identifiable information (PII) redaction, contextual grounding, and automated reasoning checks, aligning AI outputs with specific responsible AI policies. The flexibility to use these safeguards across all supported and custom models makes it easier to build ethical AI solutions that prioritize content safety, privacy, and inclusivity.
Microsoft
Azure AI Content Safety: Microsoft offers Azure AI Content Safety, a service designed to detect and filter offensive or inappropriate content in both text and images. This tool helps developers create applications that comply with safety standards and provide positive user experiences.
(Refer to https://learn.microsoft.com/en-us/azure/ai-services/content-safety/)
Google
Google provides detailed information on implementing guardrails in AI agent development. For practical applications, Google Cloud's Vertex AI platform includes built-in tools and guidelines to help developers create AI models with integrated safety measures. These resources collectively provide a robust framework for understanding and implementing guardrails in AI agent development.
(Refer to https://ai.google/responsibility/principles/#our-ai-principles-in-action)
Conclusion
The widespread adoption of AI/ML and deep learning based systems had barriers like lack of data, lack of infrastructure for small business owners. With the invention of Large Language Models and using them, the approach of building AI agents helps even the small and medium players to enter this field and create the agents useful for them, without much worrying about the training cost or infrastructure needs.
Dr. Andrew Ng calls “the long tail problem” the “second big impediment to AI adoption”. (Quoted in https://www.bigdatawire.com/2022/03/25/how-data-centric-ai-bolsters-deep-learning-for-the-small-data-masses/; refer to the below diagram.)
The long tail issue will be solved with the AI agents, and expecting more and more industries will take advantage of the LLMs with lower prices. Hence, the long tail of users in LLM will be much longer, but with increased value to their level of business.
AI agent development is an evolving field and I am committed to staying updated. So I will be writing lots of follow-up articles on AI Agents in 2025!
Wishing you all Happy New Year 2025 and stay updated with the GenAI trends!!
Thank you for reading 6 'P's in AI Pods (AI6P). If you enjoyed this article, we’d love to have your support via a heart, comment, share, restack, or Note!
"These agents excel at automating repetitive tasks, allowing human workers to focus on strategic activities that require creativity and critical thinking."
Curious, do you actually believe this? I see this exact phrasing used by so many AI proponents and always wonder what it means because not one person has been able to explain what knowledge works look like in the absence of the so-called repetative tasks. The thing is, in most white-collar jobs, strategic thinking and creativity are an occasional requirement, not the norm. I'm not saying this is okay, but it means AI agents, if they are as effective as purported, will definitely gouge a big hole in corporate America and the bleeding will be hard to stop.