Safeguarding AI Agents: An In-Depth Look at A2A Protocol Risks and Mitigations

rlu · ‎08-14-2025

This blog written by : Yu Fu Jay Chen Yantian Hou Yilin Zhao Hui Gao Royce Lu May Wang

Technical Editors : Aaron Isaksen Sam Kaplan Aryn Pedowitz Victor Aranda

Google coordination team : Narayan Sundar Munish Khetrapal

Executive summary

The Agent2Agent (A2A) protocol revolutionizes the AI landscape by enabling developers to create interoperable AI agents that seamlessly collaborate regardless of provider or underlying technology, driving innovation and integration.

While the A2A protocol itself is designed to facilitate secure interactions, safely adopting it requires addressing critical security risks arising from implementation and usage practices. Those looking to integrate an A2A protocol should be mindful of the following:

Authentication and Authorization
1. Risks from improper implementation of authentication and inadequate credential verification.
Security Considerations for Agent Card in A2A Protocol
1. Agent Card Management: Issues stemming from poor version control, delayed updates, weak authentication, and exposure of sensitive metadata.
2. Agent Card Context Poisoning: Malicious content embedded in agent cards, potentially compromising downstream agents if the input validation is insufficient
3. Agent Impersonation and Shadowing: Cloning or mimicking legitimate agents to infiltrate workflows through flawed identity checks in applications.
Agent Infrastructure and Application

Agent Infrastructure Attacks: Crafting deceptive responses, exfiltrating data, and disrupting collaborative processes.
Agent Application/Logic Attacks: Compromising the A2A app via resource exhaustion, prompt injection, and malicious payloads, jeopardizing availability and confidentiality.

The A2A protocol itself is robust and secure by design, but—similar to HTTPS—the security of the overall system depends heavily on the proper implementation and management of clients and servers. Recommended practices to mitigate these threats include:

Clearly defined and enforced authentication and authorization mechanisms.
Rigorous credential validation following zero trust principles.
Comprehensive input sanitization to prevent exploitation.
Automated auditing and secure management of agent metadata.
Robust identity verification and secure sandboxing techniques.

By adopting these practices, organizations can significantly enhance the security and reliability of interactions within the A2A ecosystem.

Introduction to A2A

The Agent2Agent (A2A) Protocol is an open standard designed to bridge the gap between independent AI agent systems, allowing them to communicate and collaborate effectively. By providing a framework for discovering capabilities, negotiating interactions, and managing tasks, A2A enables seamless interoperability without compromising the autonomy or security of individual agents. Through these features, A2A fosters a collaborative ecosystem where AI agents can work together while maintaining their independence. The A2A protocol standardized interaction between agents, allowing clients to send requests and receive real-time updates.

What Problem A2A Solved ?

The Agent2Agent (A2A) Protocol was designed to address the challenge of interoperability and collaboration between independent AI agents by standardizing their communication. Specifically, A2A unifies interactions among agents built with different frameworks, languages, or platforms, ensuring seamless communication, regardless of the underlying technology, so long as both agents support the protocol.

To illustrate, consider a scenario where you initially have a weather agent and wish to expand it into a comprehensive travel-planning system by adding calendar planning, hotel booking, and flight ticketing capabilities—each requiring intelligent, specialized AI agents. A2A can streamline interactions across those agents, even where the baseline technologies were not designed to interact.

Without A2A: Traditionally every integration between a host agent and individual agents needed to be custom-built, resulting in a fragile, manual, and difficult-to-scale system. As a result, each new agent integration required significant manual adjustments and ongoing maintenance.

With A2A: Agents can discover each other’s capabilities and directly collaborate through a standardized communication protocol, making integration seamless and scalable. This significantly reduces the overhead involved in manual integration. Additionally, the A2A protocol makes it straightforward to expand the system’s functionality by adding new agents. It also enables agents built with different AI frameworks to easily interact and work together.

Here’s how the A2A-enabled process works in practice, using a travel-planning example:

Single Interaction: The user interacts with the orchestration agent, requesting a trip to France that aligns with their calendar availability, preferred accommodations, suitable flights, and favorable weather conditions.

Agent Collaboration: The orchestration agent autonomously coordinates with

The calendar agent, identifying available travel dates.
The weather agent, confirming optimal weather conditions during these identified dates.
The hotel booking agent, reserving accommodations that match the confirmed travel dates.
The flight ticket agent, finding compatible flights aligning with both the selected dates and accommodations.

Integrated Response: Finally, the orchestration agent compiles all this information into a unified itinerary—providing the user with confirmed travel dates, weather forecasts, hotel reservations, and flight details—all delivered seamlessly in one comprehensive response.

In short, the A2A Protocol significantly streamlines collaboration among AI agents, enabling rapid scalability and interoperability. The result is a richer, more intelligent experience for users, transforming complex multi-agent tasks into seamless interactions.

Key Concepts of A2A

The Agent2Agent (A2A) ecosystem involves several core actors. First is the User, an end-user or automated service initiating a request or goal requiring assistance from AI agents. Next is the A2A Client (Client Agent), which acts on the user's behalf, requesting actions or information from remote agents using the A2A protocol. Finally, the A2A Server (Remote Agent) is an AI agent or system providing services via an HTTPS endpoint implementing the A2A protocol. It receives and processes client requests, returning results or updates. Importantly, from the client's perspective, the remote agent operates as an "opaque" system, meaning the client interacts without needing to know its internal workings, memory, or tools. Figure 1 shows an example A2A workflow.

Figure 1 A2A Workflow

Communication in the A2A Protocol involves several essential elements. The Agent Card is a JSON metadata document typically available at a well-known URL (e.g., /.well-known/agent.json). It provides crucial details about an A2A Server, including its identity (name, description), service endpoint, version, supported capabilities (such as streaming or push notifications), specific skills offered, default input/output modalities, and authentication requirements. Clients utilize this card to discover agents and interact with them effectively. See figure 2 for an example A2A agent card.

JSON

% curl http://localhost:10004/.well-known/agent.json | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   593  100   593    0     0   2862      0 --:--:-- --:--:-- --:--:--  2864
{
    "name": "AutoGen Currency Agent",
    "description": "Helps with exchange rates for currencies using AutoGen framework",
    "url": "http://localhost:10004/",
    "version": "1.0.0",
    "capabilities": {
        "streaming": true,
        "pushNotifications": true,
        "stateTransitionHistory": false
    },
    "defaultInputModes": [
        "text",
        "text/plain"
    ],
    "defaultOutputModes": [
        "text",
        "text/plain"
    ],
    "skills": [
        {
            "id": "convert_currency",
            "name": "Currency Exchange Rates Tool",
            "description": "Helps with exchange values between various currencies",
            "tags": [
                "currency conversion",
                "currency exchange"
            ],
            "examples": [
                "What is exchange rate between USD and GBP?"
            ]
        }
    ]
}

Figure 2 : Example A2A Agent Card

When a client sends a message to an agent, it often creates a stateful Task, such as generating reports, booking flights, or answering queries. Each task has a unique identifier assigned by the agent and moves through defined lifecycle stages, such as submitted, working, input-required, completed, or failed. Tasks involve multiple interactions between the client and the server. Messages are individual units of communication between clients and agents, each assigned a role: "user" for client-originated messages or "agent" for server-originated messages. Messages include a unique message id and comprises one or more Part objects carrying the content. These parts can be textual (TextPart), file-based (FilePart), or structured JSON data (DataPart).

Artifacts represent tangible results produced by remote agents during task processing. Artifacts include generated documents, images, spreadsheets, structured data outputs, and other self-contained task results. Artifacts can be streamed incrementally and consist of one or more Part objects.

How does the A2A Client use the collected AgentCards to determine which agent to delegate the task to? Let’s examine the root instruction of the HostAgent from Google's A2A example.

Figure 3 is the root instruction of the HostAgent. During the HostAgent initialization, it fetches an AgentCard for each provided remote agent address. Each AgentCard details the agent's identity, description, and capabilities. These AgentCards are then aggregated into the self.agents string, clearly presenting the available agents and their functionalities to the underlying language model.

Additionally, the HostAgent maintains context for the ongoing tasks through current_agent['active_agent'], which is determined by the check_state method. This string identifies the remote agent currently handling an active session. When an agent is actively engaged, its name is explicitly provided; otherwise, it defaults to 'None'. This context-awareness mechanism ensures coherent multi-turn conversations, seamlessly transitioning interactions between user input, agent delegation, and task execution.

Python

def root_instruction(self, context: ReadonlyContext) -> str:
        current_agent = self.check_state(context)
        return f"""You are an expert delegator that can delegate the user request to the
appropriate remote agents.

Discovery:
- You can use `list_remote_agents` to list the available remote agents you
can use to delegate the task.

Execution:
- For actionable requests, you can use `send_message` to interact with remote agents to take action.

Be sure to include the remote agent name when you respond to the user.

Please rely on tools to address the request, and don't make up the response. If you are not sure, please ask the user for more details.
Focus on the most recent parts of the conversation primarily.

Agents:
{self.agents}

Current agent: {current_agent['active_agent']}
"""

Figure 3 Root Instruction of the HostAgent

Security Taxonomy & Analysis

Before deploying A2A in practice, it's crucial to recognize and address potential security risks inherent to such open agent ecosystems. The following section outlines a structured security taxonomy and provides an in-depth analysis to help safeguard A2A-enabled interactions.

Authentication and Authorization in A2A

The current A2A SDK (v 0.2.11) leaves authentication and authorization to whatever HTTPS middleware or API gateway the user already uses. What the SDK does provide is a schema in the AgentCard (securitySchemes and security), so each agent can declare which scheme(s) it expects (API key, OAuth 2, OpenID Connect, Basic, etc.).

As the agent developer, the developer populates those fields to advertise which scheme—say OAuth 2 client-credentials—and which scopes, keys, or roles required by the service. Once the developer creates that object, the SDK takes over: it serializes the fields to JSON, saves the document at /.well-known/agent.json, and, on the client side, deserializes the JSON back into an object that downstream code can inspect. In other words, the developer defines the policy while the SDK handles the bookkeeping that makes the policy discoverable.

Figure 4 securitySchemes and security in sample agent card

The runtime credential layer is entirely the caller’s and the infrastructure’s responsibility. A client must obtain a real credential—an access token, API key, basic-auth string, or whatever the card specifies—and attach it to every HTTPS request outside the SDK’s helper methods. The SDK merely reuses that pre-configured HTTPS client for its A2A calls. On the server side, validation is delegated to the existing stack—FastAPI dependencies, Express middleware, Cloud Run IAM, an API gateway, or any other component that can authenticate the header—and, if the token lacks the advertised scope, it rejects the request with a 401 or 403. Thus, the SDK remains agnostic, while the existing identity infrastructure enforces the rules.

When agents perform actions in an A2A environment, identity-related security issues can occur if backend logic lacks thorough authorization checks. Even if middleware or gateways validate credentials, insufficient backend verification of roles and scopes can allow unauthorized actions. In the previous travel-planning scenario, a user holding an OAuth token permitting calendar and hotel actions—but explicitly lacking flight-booking privileges—could exploit weak backend validation by injecting malicious instructions into the orchestrating agent's input. THis prompt injection deceives the orchestrating agent into creating unauthorized flight-booking requests, bypassing proper scope checks and potentially causing financial losses or disruptions to service availability. Therefore, it's essential for each A2A agent to independently verify user permissions and inputs, rather than relying solely on preliminary authorization.

Security Considerations for Agent Card in A2A Protocol

Agent Card Management Concerns

The A2A spec relies on the Agent Card’s version field to signal change. When the developer updates the card, they must bump the version; the SDK itself does not perform a comparison or issue a warning—it just serves and parses whatever JSON provided. Clients discover changes only by refetching the card and noting a different version.

Because the A2A spec discovery is pull-only, the refresh timing is up to the caller or its infrastructure. If an Agent Card containing sensitive information, or malicious instructions is published, clients may experience prolonged security exposure until they fetch an updated version. Thus, if the developer needs instant invalidation or rapid propagation of security fixes, consider adding a push mechanism outside the A2A spec. In production, it would be required to automate a version-bump check in Continuous Integration (CI)—the automated build-and-test pipeline—to avoid drifting metadata.

Additionally, insufficient authentication presents a security risk by potentially allowing unauthorized access to the agent card. Without proper authentication controls, unauthorized individuals could retrieve cards and harvest any capabilities, supported skills, agent endpoint URLs, or even sample prompts embedded in the card—anything that helps them fingerprint the system’s surface area and craft targeted attacks.

The inclusion of sensitive information within the agent card descriptions poses another vulnerability. Descriptions containing proprietary or confidential details could be exploited by attackers, enabling them to identify and leverage vulnerabilities. For example, a description such as:

"This agent generates quarterly compensation analysis by fetching raw salary spreadsheets from s3://payroll-prod-private and running the pipeline on https://hr-analytics.internal.example.com; requires Bearer token payroll-svc-admin for full access."

This inadvertently reveals an internal cloud storage bucket, a private analytics endpoint, and the specific name of an administrative token. Attackers could leverage these details to target infrastructure, escalate privileges, or access sensitive data.

Agent Card Context Poisoning

Agent Cards in the A2A protocol are often treated as trusted metadata sources; therefore, they present an attractive attack surface for prompt-injection and jailbreak exploits. When fields like description, skills, or example prompts are directly embedded into client-agent system prompts without filtering or context isolation, malicious cards can manipulate downstream LLM behavior.

Figure 5 illustrates this risk: when a client agent integrates the contents of the examples field into its prompt without sanitization, the malicious directive instructs the agent to ignore prior instructions and execute unauthorized shell commands. This can result in sensitive data leaks, unauthorized code execution, or severe system disruption.

Figure 5 : Agent Card with Embedded Prompt Injection

Rogue Agent Cards may carry either pinpoint prompt injections or generic jailbreak strings.Left unfiltered, these hidden instructions jeopardise the safety, functionality, and confidentiality of every agent that consumes them.

It's important to highlight that these threats are not limited to interacting solely with unknown or untrusted agents. Even trusted third-party agents can become compromised due to vulnerabilities or breaches in their own systems. Therefore, proper input validation, prompt sanitization, identity verification, and continuous monitoring mechanisms must be in place to limit the impact of compromised dependencies and prevent them from propagating attacks into your system.

Agent Impersonation and Agent Card Shadowing

Agent card shadowing refers to the unauthorized cloning or mirroring of a legitimate agent’s skills by the attacker. A closely related and often overlapping threat is agent impersonation, where an adversarial agent mimics the identity, capabilities, or interaction patterns of another trusted agent to infiltrate collaborative workflows or gain access to privileged resources.

Both attacks exploit weaknesses in identity management and request-level integrity in the A2A protocol, particularly in scenarios where agent identity or behavioral patterns are insufficiently protected or verified. They may use techniques such as crafting similar agent names, or copying and slightly modifying skill descriptions to maintain an appearance of legitimacy. Typosquatting in agent names and display titles is another common method to deceive client agents.

Figure 6 illustrates how a malicious agent card (right) closely mimics a benign, legitimate agent card (left). The malicious card subtly alters critical fields, such as the endpoint URL and skill descriptions, tricking client agents into interacting with attacker-controlled infrastructure, potentially leading to unauthorized access or data exfiltration.

Figure 6 : Example of Agent Card Shadowing and Impersonation

Agent Infrastructure and Application Threats

Although a malicious agent itself isn't inherently an issue within the A2A protocol, such an agent could exploit the protocol to launch attacks. Securely adopting A2A requires a thorough understanding of potential security threats from all angles. This section explores the risks posed by compromised agent infrastructure and exploited application logic within an A2A environment.

Agent Infrastructure Attacks

Agent infrastructure attacks target underlying communication channels, registration processes, and identity mechanisms within A2A deployments. Here, attackers can infiltrate multiple roles—tool executor, planner, delegation helper—to disrupt operations, exfiltrate credentials, or redirect legitimate traffic to malicious endpoints.

Here, compromised agent servers can extract and misuse credentials and operational data provided by client agents, leading to potential data exfiltration and widespread system compromise. They may also deliberately craft responses intended to manipulate or mislead client agents, resulting in operational disruptions or opening avenues for further breaches.

When an agent server is compromised, it can exploit previously granted trust to covertly extract sensitive credentials and operational data. Its established trusted status significantly lowers the likelihood of detection, enabling persistent unauthorized access and escalating potential damage.

A specific and critical form of compromised-agent scenario is the Agent Rug-Pull Attack, wherein previously trusted servers unexpectedly shift to malicious behavior after integration. In these scenarios, client agents may be tricked into executing unauthorized or harmful actions, compromising the original collaborative intent.

Another critical threat illustrated in figure 7 involves compromised servers advertising spoofed URLs. These malicious agent URLs redirect legitimate client agents to compromised endpoints, enabling unauthorized data access or system breaches. Compromised agent servers may also present spoofed documentation URLs, misleading clients into accessing harmful or deceptive documentation, leading to misinformation or further exploitation.

Figure 7 :A compromised A2A agent card containing spoofed URLs.

Agent Application/Logic Attacks

Agent application attacks exploit logic-level vulnerabilities within agent implementations. Attack vectors include resource exhaustion, prompt injection targeting LLM-based logic, and malicious artifact uploads, each of which can trigger unintended agent behavior, internal compromise, or operational disruptions.

Here, malicious agent clients can launch denial-of-service (DoS) attacks by overwhelming A2A server agents. For instance, an attacker-controlled client could issue excessive concurrent task requests or maintain numerous simultaneous Server-Sent Events (SSE) connections. Such activity exhausts server resources, severely degrading performance and potentially preventing legitimate clients from accessing the service altogether.

Additionally, attackers can exploit prompt injection vulnerabilities by crafting harmful inputs designed to manipulate server-side Large Language Models (LLMs) or tool wrappers. These manipulations might lead server agents to generate harmful or unauthorized content, execute unintended shell commands, access restricted internal APIs, or expose sensitive environmental variables and user data.

Finally, automated trust mechanisms in multi-agent systems are vulnerable to malicious artifact uploads. Attackers can upload harmful files, such as executable binaries, serialized payloads, or scripts masquerading as benign resources. Executing these artifacts could result in unintended code execution, deployment of backdoors, or internal system compromise. Common malicious artifacts include archives (ZIP, TAR), Python wheels, Docker images, manipulated documents, PDFs, ONNX models, and harmful scripts.

Broader Threat Considerations

In addition to the specific threats discussed above, the security landscape of A2A deployments should also account for transport-level security concerns, such as replay attacks, where attackers capture and retransmit legitimate requests to gain unauthorized access or disrupt operations. Moreover, supply chain threats, including malicious SDKs or dependencies, pose risks by injecting vulnerabilities or backdoors into agent implementations. While detailed mitigation strategies for these threats are outside this document’s scope, awareness and preliminary risk assessments are recommended as part of comprehensive security planning for A2A deployments.

Security Defense, Practices & Conclusion

Effective deployment of A2A protocols demands thoughtful planning, rigorous management, and ongoing monitoring of security-related components. Table 1 summarizes the Security Issues we discussed, and provides practical mitigation strategies.

Security Issues	Defense and Mitigation
Authentication and Authorization	Explicitly declare authentication methods in AgentCard Enforce strict credential validation at middleware/API gateway Independently perform granular backend authorization checks Adhere to least privilege and zero trust principles
Agent Card Management Concerns	Automate Agent Card validation via CI processes Promptly push updates through notification mechanisms Routinely audit agent descriptions and metadata for sensitive information exposure Require robust authentication mechanisms to access Agent Card endpoints Keep the Agent Card minimal
Agent Card Context Poisoning	Conduct rigorous prompt sanitization and filtering Isolate context carefully, establish clear boundaries to differentiate trusted and untrusted content Validate embedded descriptions, skills, and prompts rigorously Defining and enforcing strict rules about what can be included in the Agent Card metadata
Agent Impersonation and Agent Card Shadowing	Employ secure agent identity verification Regularly verify endpoint URLs and metadata against authoritative sources; Utilize advanced anomaly detection to identify impersonation promptly.
Agent Infrastructure Attacks	Validate and sanitize all agent inputs and responses Sandbox execution and lock down resources Validate and monitor agent endpoint URLs Deploy continuous monitoring and anomaly detection for suspicious activity
Agent Application/Logic Attacks	Enforce strict resource and rate limits Implement secure sandboxing and containerization techniques to isolate agent processes; Validate and sanitize all incoming prompts and inputs; Monitor interactions for abnormal patterns.

Table 1 : Security Issues and Mitigations.

By systematically implementing these authentication, authorization, resource management, content filtering, and monitoring measures, organizations can significantly enhance the security posture of their A2A deployments. Proactive and continuous attention to security, will allow for secure, reliable interactions within A2A ecosystems, ensuring the protection of sensitive data and the integrity of agent operations.

Palo Alto Networks Prisma AIRS Can Help

AI Runtime Security is a comprehensive solution designed to safeguard enterprise AI Agent applications and traffic flows. It protects against a wide range of threats—including AI-specific vulnerabilities such as prompt injection, jailbreaks, malicious responses, embedded unsafe URLs, data exfiltration, and prompt-triggered attacks.

Prisma AIRS combines continuous runtime threat analysis with real-time, AI-powered defenses to effectively stop attackers. It also leverages advanced AI-driven detection to protect AI models from manipulation and to help ensure the reliability and integrity of AI Agent-generated outputs. By securing both the models and their interactions in real time, it lays the foundation for safe, trustworthy, and resilient AI Agent deployments at scale.

Safeguarding AI Agents: An In-Depth Look at A2A Protocol Risks and Mitigations