Agentic AI: From Embodiment to Multi-Agent Systems

How to stop my robot from microwaving the cat? Grounding autonomy with knowledge graphs for safety, scalability, and explainability

Cover

TL;DR → Grounding is essential. Knowledge graphs (KGs) give agents verifiable structure; relations, and constraints.

Embodied foundation models ≠ reliability. They fuse perception + control, but KGs add explicit context, safety rules, reasoning paths, explainability.

Better than text-only RAG. Graph structure improves retrieval, causal reasoning, and auditability (see GraphRAG; temporal KGs).

Real use cases: medicine (SNOMED/UMLS), cybersecurity (attack paths), robotics (constraints, compliance).

Multi-agent at scale: graph algorithms (redundancy, consensus, path-finding) enable robust, coordinated behaviour.

— Based on my talk at the Alan Turing Institute’s Knowledge Graph Symposium 2025.
— Also published on Substack.

Short on time? Jump to:

How KGs integrate with LLMs
Where KGs and agents meet
Multi-agent graph optimization

Introduction: The Promise and the Problem

Large language models (LLMs) have transformed artificial intelligence almost overnight. Trained on vast collections of text, they can generate fluent answers, write code, or explain difficult concepts with apparent ease. This generative capacity has created a wave of optimism that AI might finally be edging towards something like general intelligence.

Yet appearances deceive. Today’s LLMs make fewer obvious mistakes than their predecessors, but they still lack any genuine grasp of relationships, causality, or domain-specific structure. They produce convincing language, but beneath that surface they remain pattern-matching machines without a true model of the world. This limitation becomes critical when language models are tied to action. An embodied agent that fails to understand the dependencies between actions and outcomes, cannot be expected to act safely. The risk goes well beyond the notion of hallucinations, and well beyond producing nonsense text.

In a chatbot, these limitations can be irritating but acceptable. They can also be acceptable in iterative coding agents. In a physical robot, a medical assistant, or a defence system, they are not not. A mistaken assumption about how an object should be handled, or how two facts relate, can translate into dangerous consequences or ineffective actions.

So, how do we take models that are outstanding at producing language and turn them into systems capable of acting effectively and safely in the real world? Models need to be grounded. That is, linked to structured, verifiable knowledge that both constrains and supports decision-making. One of the strongest tools we have available to provide such grounding are knowledge graphs (KGs).

The Rise of Agentic AI

The concept of an agent has been central to AI research since at least the 1980s.¹ From early rule-based and expert systems, through reinforcement learning and game-playing programs, to trading bots and recommendation engines, the idea has not evolved much. An agent is an autonomous system that can perceive its environment, make decisions, and act in pursuit of a goal.² However, what has shifted dramatically is the technological foundation that makes agents possible.

From LLMs to embodied agents

The generative strength of LLMs has opened the door to a new class of software agents, systems that can plan, orchestrate tools, and execute sequences of actions. By linking LLMs with APIs, databases, and even robotic systems, researchers have moved beyond the notion of passive digital assistants toward something that looks much closer to autonomous collaborators.

For example, OpenAI’s New tools for building agents introduce the Response API (a new API primitive for agentic apps with built-in tools like web search, file search, and computer use) and an open-source Agents SDK to orchestrate single- and multi-agent workflows.

At the same time, embodied AI has progressed from laboratory research into real-world prototypes across diverse settings. In warehouses, robot fleets now coordinate to locate and transport goods, seamlessly blending natural language instructions with sensor-driven navigation. In domestic spaces, assistive robots are beginning to follow spoken commands, manipulate everyday objects, and adapt to the messiness of unstructured environments. Humanoid prototypes demonstrate growing dexterity, bringing together balance, grasping, and locomotion in increasingly fluid ways. Drones already coordinate in real time for applications in surveying, agriculture, and delivery, while autonomous vehicles negotiate dense traffic and make split-second driving decisions. All of these developments converge on a long-standing aspiration, which is the creation of a general-purpose embodied agent. Naturally, the autonomous humanoid robot is seen as the holy grail.

Embodied foundation models

As early as 2022, the first truly compelling examples of using LLM-brains for robotics were presented by Microsoft Research (ChatGPT for Robotics ³). Through LLMs, it became possible for robots to understand human intent better than ever before, and to deconstruct tasks into series of effective actions necessary to achieve them, in an open-ended fashion. For example, if instructed to warm up lunch, the robot would understand that it needs to look for a microwave or an oven. However, systems based on language models alone lack intrinsic perception of the environment, of itself, and of any constraints. Thus, they are severely limited for real-life applications.

Figure 1. *ChatGPT for Robotics*, presented by Microsoft Research in 2022. A compelling example of the direction in which robotics is evolving. Figure copied from Vemprala et al. https://ieeexplore.ieee.org/abstract/document/10500490

Embodied foundation models go one step further. These architectures bring perception and control under a single roof.⁴ Where earlier systems treated vision, planning, and actuation as separate modules, embodied foundation models learn to link seeing and doing directly. The result is striking. When given sensory input, be it a video feed, audio, touch, or proprioception, together with a command, they can fuse these signals and map perception into action in ways that feel both fluid and general. Multi-sensory fusion is what allows embodied foundation models to move beyond narrow training regimes and cope with the unpredictability of the real world.

Evolution of embodied foundation models — Figure 2. The evolution of embodied foundation models. Figure copied from Ma et al. http://arxiv.org/abs/2405.14093

Development of embodied foundation models moves fast. Google DeepMind’s Gemini Robotics-ER 1.5 is the first Gemini Robotics model made broadly available, purpose-tuned for spatial reasoning, planning, and tool use. It is available in preview today via Google AI Studio and the Gemini API. Earlier this year, NVIDIA announced Isaac GR00T N1, as the world’s first open humanoid robot foundation model, with simulation libraries to accelerate real-world robot skills.

These models address the integration problem by making perception and action flow together, yet they don’t solve the issues of reasoning, alignment, and reliability. They cannot by themselves prevent faulty logic, impose ethical safeguards, or ensure that their plans align with human objectives. For agentic AI, fluid perception and action are necessary, but not sufficient. Grounding is still needed. An autonomous car encountering a sudden obstacle must decide whether to privilege passenger comfort, travel time, or uncompromising safety. Each of these outcomes encodes a value choice. Scaling up foundation models will not resolve such dilemmas, because the challenge is not one of model size, but of how context, constraints, and values are represented, prioritized, and enforced.

Knowledge Graphs: A Pillar for Safe Agency

Can we use knowledge graphs (KGs) as grounding mechanism for agentic AI systems? A KG encodes facts as entities and relationships, nodes and edges that capture the structure of a domain. Unlike free text, a knowledge graph is inherently machine-readable and queryable, because its very structure encodes semantics explicitly. Entities and relations are formalized, which allows not only retrieval of information but also reasoning about how concepts are connected. This built-in semantic layer is precisely what can make KGs a powerful grounding mechanism.

Why KGs matter for agentic AI

Knowledge graphs can provide the connection of agentic AI agents with the real world through:

Grounding: Agents can check their reasoning against a structured map of reality. Instead of inventing facts, they can query and verify.
Constraints: Safety rules, domain knowledge, and ethical boundaries can be encoded directly into the graph. An embodied agent bound by a KG cannot plan actions that violate hard constraints.
Explainability: Graphs provide transparent audit trails. Each decision can be traced back to specific nodes and edges, enabling oversight and accountability.

With knowledge graphs, LLMs stop spinning in thin air and start connecting their reasoning to verifiable structures. Deciding on the right course of action demands reasoning over context, relationships, values, and preferences.

How KGs integrate with LLMs

In practice, KGs can be combined with language models in several ways:

Retrieval augmentation: A query from the model can be used to fetch facts from a KG. These facts are then injected into the prompt or fed into the agent. This anchors reasoning in structured knowledge rather than relying on free recall.
Symbolic reasoning: LLMs can call reasoning engines that operate directly on the graph, enabling logical inference, constraint checking, and path finding.
Memory and planning: KGs can function as long-term memory, where entities, events, and actions are stored and continuously updated as the agent interacts with its environment.

As such, KGs can provide a semantic data source for all grounding requirements an agent might have, as seen below.⁵

Integrating KGs with LLMs — Figure 3. An LLM-based agentic system needs information about its environment, available tools, the past, and planning routines (“cookbooks”). On top of that, information about ethics, safety, and other constraints might be useful. Figure copied from Liu et al. http://arxiv.org/abs/2507.21407

KGs versus other approaches

Other approaches also seek to ground models, but each has limits. Retrieval-Augmented Generation (RAG) usually relies on unstructured text. This is powerful for injecting factual context but lacks the explicit semantics of a KG. It tells the model what was written, but not how things connect or prioritize. Rule-based systems can encode strict constraints, but they are brittle and difficult to scale in open-ended environments.

Knowledge graphs, in contrast, combine the breadth of large-scale data with the precision of structured relationships, making them uniquely suited to support agentic AI. Recent work has begun to operationalize this, showing how graph structure can directly improve retrieval and reasoning. Microsoft Research’s GraphRAG is an end-to-end approach that builds LLM-derived knowledge graphs by combining text extraction, network analysis, and LLM prompting/summarization for richer retrieval than vector-only RAG. Another notable example is OpenAI’s Temporal Agents with Knowledge Graphs, a system that constructs temporal knowledge graphs and enables multi-step retrieval over them.

Applications where KGs and Agents Meet

The impact of combining agents with knowledge graphs is evident in early applications across different domains. Each case highlights how structured knowledge constrains, guides, and optimizes agent behaviour in ways that free-text systems cannot.

Medicine

In clinical decision support, precision is extremely important. By grounding reasoning in a medical KG, linking diseases, symptoms, treatments, and drug interactions, agents can cross-check their outputs against structured clinical knowledge. Ontologies such as [SNOMED CT] (https://www.snomed.org/what-is-snomed-ct) or the Unified Medical Language System (UMLS) already encode medical relationships in a machine-readable format. Clinical trial results can be integrated as evidence nodes, allowing the agent not just to recall that a therapy exists, but also to weigh its evidence base, contraindications, and side effects. An AI assistant proposing a treatment, for example, could automatically traverse the KG to flag drug-drug interactions, providing clinicians with transparent, auditable justifications for its recommendations. In the recent work by Matsumoto et al. (2025)⁶ agents are used to query a biomedical knowledge graph to synthesize accurate answers to user-queries.

Defence and cybersecurity

In cyber defence, agents must operate in adversarial, constantly shifting conditions. Here, knowledge graphs capture both static and dynamic relationships, e.g., how the compromise of one node might permit lateral movement across a network. Faced with uncertainty, agents can use the KG to run what-if simulations of potential attacker moves and adjust defences pre-emptively. Under time pressure, decision-making becomes more reliable because every inference rests on an explicit model of threats, assets, and mitigations, not on improvised heuristics. The CRAKEN⁷ system, developed by Shao et al. (2025), shows how graphs enable reasoning in such settings, mapping entities, relations, vulnerabilities, exploits, etc.

Sustainability and robotics

For sustainability-focused agents, a KG can encode environmental context (habitats, soil, slope, microclimate), regulatory constraints (protected zones, flight restrictions), and operational knowledge (species–site suitability, planting cycles, machinery affordances). For example, a reforestation agent would query this KG to choose ecologically appropriate and legally compliant planting sites, while drone fleets or ground robots execute the plan, cross-checking constraints on site and providing an audit trail afterwards.

In agriculture, embodied agents can tie perception to agronomic knowledge, identifying crops, weeds, or disease symptoms while grounding them in an ontology of phenology, treatments, and withholding periods. The KG restricts actions to what is permissible, to approved interventions for a given crop and growth stage, and provides clear reasons for those choices.

In the circular economy, disassembly and recycling robots rely on product knowledge graphs, encoding bill of materials, material composition, and hazard warnings. Detailed knowledge is essential to plan safe disassembly sequences and direct recovered parts into the right processing streams. Similarly, in energy and building management, facility agents query KGs that link meters, assets, tariffs, carbon-footprint data, and maintenance history, producing recommendations that optimize energy use and emissions with transparent justification.

The common thread is that the knowledge graph encodes the domain semantics, rules, relationships, and constraints.

Multi-Agent Systems and Graph Optimization

The future of AI will be defined by networks of agents working together. The central challenge in such multi-agent systems is coordination. How can individual agents communicate, share objectives, and divide labour in ways that remain reliable and efficient? These interactions naturally form graphs with agents as nodes, and their communication channels or dependencies as edges. Thus, the KG can formalize the essence of what the agent is, what role it plays, and which constraints apply.

KGs for Agentic AI — Figure 4. For agentic AI, graphs can be used to formalize the networks of tools, workflows, agents, and knowledge. Figure copied from Liu et al. http://arxiv.org/abs/2507.21407

In an MCP and A2A-connected ecosystem of agents, there are multiple possible routes to achieve any goal. Graph algorithms characterize and govern the behaviour of such systems. For example, resilience can be provided by ensuring that the network adapts when connections are disrupted:

Edge redundancy allows communication to be rerouted if one link fails.
Node redundancy ensures that tasks can still be completed even if individual agents drop out.
Layer redundancy ensures that agents can still coordinate and pursue their goals, at both local and global levels, even if one layer of coordination is disrupted or compromised.

Graph-based resilience in MASs — Figure 5. Resilience can be achieved through graph optimization algorithms that maintain edge, node, and layer redundancy. Figure copied from Liu et al. http://arxiv.org/abs/2507.21407

Another critical ingredient of distributed multi-agent systems is consensus. For a swarm of drones to hold formation, or for distributed energy resources to balance supply and demand, agents must agree on a shared state even when signals are delayed or corrupted. When consensus is mediated by a KG, system objectives and constraints are explicit, and semantic alignment is maintained. The group decision reflects encoded goals and rules. Similar coordination challenges arise in other domains. In smart grids, each energy source or storage unit can be treated as an agent negotiating with others to maintain stability. In logistics, vehicle fleets can coordinate to reduce fuel consumption while meeting delivery deadlines, guided by KGs that capture road networks, traffic flows, and contractual commitments.

By combining agentic AI with graph optimisation, and grounding both in the semantics encoded by knowledge graphs, we can move towards effective and explainable distributed intelligence.

Conclusion: Towards Reliable Agentic AI

The way forward towards reliable agentic AI is to offer well integrated and concrete grounding to LLMs or embodied foundation models. LLMs bring adaptive reasoning and natural interaction. Embodied foundation models fuse perception and control. Knowledge graphs provide grounding, constraints, and explainability.

The convergence of embodied foundation models, structured knowledge representation, and multi-agent systems will define the evolution of AI in the next decade.

Wooldridge, M., Jennings, N.R., 1995. Intelligent agents: theory and practice. The Knowledge Engineering Review 10, 115–152. https://doi.org/10.1017/S0269888900008122 ↩
Raptis, E.K., Kapoutsis, A.Ch., Kosmatopoulos, E.B., 2025. Agentic LLM-based robotic systems for real-world applications: a review on their agenticness and ethics. Front. Robot. AI 12, 1605405. https://doi.org/10.3389/frobt.2025.1605405 ↩
Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A., 2023. ChatGPT for Robotics: Design Principles and Model Abilities. https://doi.org/10.48550/arXiv.2306.17582 ↩
Ma, Y., Song, Z., Zhuang, Y., Hao, J., King, I., 2025. A Survey on Vision-Language-Action Models for Embodied AI. https://doi.org/10.48550/arXiv.2405.14093 ↩
Liu, Y., Zhang, G., Wang, K., Li, S., Pan, S., 2025. Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects. https://doi.org/10.48550/arXiv.2507.21407 ↩
Matsumoto, N., Choi, H., Moran, J., Hernandez, M.E., Venkatesan, M., Li, X., Chang, J.-H., Wang, P., Moore, J.H., 2025. ESCARGOT: an AI agent leveraging large language models, dynamic graph of thoughts, and biomedical knowledge graphs for enhanced reasoning. Bioinformatics 41, btaf031. https://doi.org/10.1093/bioinformatics/btaf031 ↩
Shao, M., Xi, H., Rani, N., Udeshi, M., Putrevu, V.S.C., Milner, K., Dolan-Gavitt, B., Shukla, S.K., Krishnamurthy, P., Khorrami, F., Karri, R., Shafique, M., 2025. CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution. https://doi.org/10.48550/arXiv.2505.17107 ↩

Agentic AI: From Embodiment to Multi-Agent Systems

Introduction: The Promise and the Problem