Graphs have long underpinned cybersecurity; their importance has only grown with cloud-scale complexity.
I previously explored how defenders can protect their CI/CD environments using graphs, mapping repos, build jobs, secrets, runners, and cloud credentials into connected views that mirror how attackers think.
This article revisits that idea in the era of large language models and shows why graphs are key to moving AI for security from hype to something operational.
tl;dr: when you combine graph representations with LLM reasoning, you get precision and explainability at a level flat data structures cannot match.
Why cybersecurity isn’t keeping up in the age of vibe-everything
LLMs have already reshaped how software is built, yet cybersecurity adoption still lags. In areas like app development, “high-temperature” outputs can be a feature, where creativity and flexibility are welcome even if the outcome is imperfect.
Security work, however, is fundamentally different: security outcomes demand accuracy, strong precision/recall, and, just as importantly, explainability.
The promise of LLMs in security is still massive. Agentic systems can stitch together findings, add context that once took days to assemble, and dramatically reduce triage time. The old model of static, top‑down alerts creates fatigue rather than clarity, even when enhanced with runtime analysis. Even with runtime contextualization and reachability analysis, “flat” findings remain noisy due to the presence of too many hard and soft variables.
When these models are grounded in organizational signals like policies and risk priorities, and when they incorporate real‑time environment data, the workflow changes completely. Imagine a reality where agents are properly grounded, explainable, and equipped with adequate context on organizational signals (policies, risk appetite, asset criticality) and environment context (configurations, prevailing threats, control). Security teams wouldn’t have to sift through thousands of static issues; they’d be able to engage in an iterative dialogue about what matters now, next week, and next quarter.
Grounding and explainability: Where things get complicated for LLMs in cybersecurity
LLM token prediction is a core challenge for LLM security use cases. When you prompt an LLM to write a poem, dozens or hundreds of next tokens are plausible; over the next 10 tokens, combinatorics explode.
Security is different. Consider evaluating the posture of an EC2 instance based on a stream of API calls. One incorrect token (such as mislabeling a security group or missing an ingress rule) can invalidate the entire assessment. The acceptable prediction space must be narrow.
Low-level internal decisions like token predictions that drive factual conclusions, must be tightly constrained and fully grounded in evidence. We cannot misinterpret or overlook a security group when assessing lateral movement.
High‑level planning/orchestration can tolerate a broader prediction space because we can iteratively steer and refine the hypothesis.
Explainability is the contract with auditors, engineers, and risk/compliance teams. Without a graph, you’re effectively asking them to trust a probabilistic token stream. There’s no concrete point to reference when they ask, “Why this alert?”
With a graph, every claim reduces to a visible path: which facts (nodes) were used, which relationships (edges) were followed, and where any assumptions were entered. That path is the audit trail; without it, trust and adoption fall apart.
When the graphs come marching in
Graphs collapse complex, noisy documents into discrete, typed relationships. With the environment modeled as nodes and edges (e.g., EC2 → HAS_SG → SG → ALLOWS → CIDR), the agent isn’t guessing across a sprawling token stream; it’s navigating a bounded graph, which dramatically shrinks the search space and makes each step inspectable. Let’s look at a hypothetical example:
Graph form:
(i-0a12) -[HAS_SG]- (sg-0aa1) -[ALLOWS {proto:tcp, port:22}]- (0.0.0.0/0)
(i-0a12) -[HAS_SG]- (sg-0bb2) -[ALLOWS {proto:tcp, port:5432}]- (10.0.2.0/24)
Raw JSON:
Raw JSON:
{
"Reservations": [{
"Instances": [{
"InstanceId": "i-0a12",
"SecurityGroups": [
{"GroupId": "sg-0aa1","GroupName":"web-sg"},
{"GroupId": "sg-0bb2","GroupName":"db-sg"}
],
"Tags": [{"Key":"Name","Value":"prod-web-1"}, ...],
"BlockDeviceMappings": [...],
"NetworkInterfaces": [{"Ipv6Addresses":[], "PrivateIpAddress":"10.0.1.23", ...}],
...
}, ...]
}],
"SecurityGroups": [{
"GroupId": "sg-0aa1",
"IpPermissions": [{
"IpProtocol": "tcp",
"FromPort": 22,
"ToPort": 22,
"IpRanges": [{"CidrIp":"0.0.0.0/0"}],
"UserIdGroupPairs": []
}, ...],
"Description": "allow-ssh",
...
}, ...]
}
To reach the same security conclusion from raw JSON, an LLM must traverse a complex multi-step reasoning path:
- Locating the instance “i-0a12” deep within the nested Reservations[0].Instances[0] structure
- Parsing the SecurityGroups array to extract group IDs
- Cross-referencing these IDs against a separate SecurityGroups section (potentially hundreds of lines away)
- Diving into each group’s IpPermissions array
- Interpreting the IpRanges to understand network access patterns
This creates a lengthy chain of inferences across scattered data points, where each step introduces potential for error or hallucination.
In contrast, the graph representation offers a direct, nearly deterministic path:
(i-0a12) -[HAS_SG]-> (sg-0aa1) -[ALLOWS]-> (0.0.0.0/0). In transformer terms, the graph’s explicit structure narrows attention and concentrates the next-token distribution. Each attention head can then focus on semantically meaningful edges rather than parsing nested data structures.
Borrowing from information theory, we treat entropy as uncertainty in a probability distribution. Here we use it heuristically to contrast (a) how ambiguous the input context is and (b) how wide the model’s next-token distribution is.
Low entropy ⇒ explicit
-
Context entropy (input): How scattered or ambiguous is the data the model must reason over? JSON:** High entropy – nested arrays, optional fields, implicit relationships.
-
Generation entropy (output/tokens): How many tokens are “acceptable” at each prediction step? For low-level security judgments, we want a small prediction space (ideally near-deterministic). Graph-grounded reasoning reduces generation entropy by providing fewer plausible next steps, aligning with how transformer attention concentrates probability mass.
-
JSON: High entropy – the model’s attention must span across nested arrays, optional fields, and implicit relationships, which creates a diffuse attention pattern across hundreds of tokens.
- Graph: Low entropy – focuses attention on explicit, typed relationships, dramatically reducing the attention entropy.
GraphRAG offers concrete evidence of the advantages. Microsoft’s implementation showed that graph‑based retrieval dramatically outperforms traditional vector RAG for comprehensiveness and diversity (winning 72–83 % of pairwise comparisons). Crucially, their root-level community summaries required 97% fewer tokens than source-text summarization while still beating vector embedding RAG on global sense-making tasks.
Lowering both kinds of entropy by structuring context and constraining generation raises precision and makes explanations trivial: “We flagged lateral movement because edge X → Y exists and rule Z allows it.”
Beyond reducing entropy, GraphRAG resolves security questions that are hard for text-only RAG by composing conclusions from relationships rather than a single passage. For “Which AWS Lambda functions can access secrets?”, the relevant evidence—roles, attached policies, actions, ARNs, and conditions—is absent from the question text and scattered across sources. A graph lets the system traverse all Lambda→Secret paths and determine which ones truly grant access.
Tackling the scale and semantics challenges
The graph representation of modern SaaS environments continues to grow more complex by the day and is showing no signs of slowing. As I noted in my previous article, the fundamental challenges persist: graph databases remain more fragile than traditional data stores, scale poorly, demand careful modeling to avoid performance pitfalls, and carry higher operational costs.
These technical hurdles, compounded by the scarcity of graph expertise in most organizations, create significant barriers to adoption. But even if teams overcome these initial challenges, they face an even thornier problem: efficient graph traversal at enterprise scale.
The Scale Challenge
Consider the reality of modeling massive, cross-vendor ecosystems. How do we traverse these sprawling graphs efficiently during inference while keeping costs aligned with business value?
Even if we could somehow fit an entire cross-vendor graph schema into a context window, the results would likely be disappointing when non-trival traversal is needed. High entropy would degrade performance, while token costs would skyrocket with minimal opportunities for token caching to offset the expense.
Potential solution:Applying RAG techniques to serve focused schema sub-graphs tailored for specific inference tasks.
The Semantic Gap
While individual edges carry clear semantic meaning (A → B), paths do not. Take the vector A → B → C: what does this chain tell us about the relationship between A and C?
Without explicit semantics, agentic systems often overreach or misinterpret these paths entirely.
Potential solution: Leverage RAG capabilities to bind graph vectors (A→B→C) with embedding vectors, creating semantic bridges where none existed before.
Looking ahead
These challenges aren’t insurmountable; they’re design problems waiting for elegant solutions.
Solutions emerge through hybrid approaches, using RAG techniques to generate focused sub-graphs for specific inference tasks, and binding graph vectors with embedding vectors to create semantic bridges, among others. These aren’t just technical optimizations; they’re fundamental design patterns for making graph-grounded security both practical and scalable.
The promise remains compelling: security professionals conversing with AI about what matters now, next week, or next quarter, rather than drowning in thousands of static alerts.