In modern financial systems, real-time data processing is both the backbone and bottleneck of innovation. Whether it’s simulating market shocks, adjusting risk models, or updating dashboards as new data flows in, the demand is clear: compute faster, adapt instantly, and never break. But as systems grow more complex, the conventional approaches to computation – batch pipelines, monolithic engines, black-box models – start to show their limits.
In this article I explore a powerful alternative: using dependency graphs to model and execute real-time computations. I’ll walk through how this paradigm works, how it’s been deployed in financial scenario analysis, and what it takes to build one that scales.
Why dependency graphs?
The most familiar example of a computation graph is an Excel spreadsheet: changing one cell updates only the cells that depend on it. That’s the fundamental idea behind dependency graphs. In this model, each computation is a node, and data flows along edges that define dependencies. Update a value, and only its consumers recompute — not the entire system.
Why does this matter? Because in real-time systems:
- Volume and complexity is a challenge — modern systems involve huge numbers of calculations
- Latency matters — you can’t afford to recompute everything.
- Modularity helps — updates often affect only parts of a system.
- Traceability is essential — you need to know how a result was produced.
Dependency graphs bring all three.
In practice, many stopgap solutions exist – from manual caching of intermediate results to heuristic partial updates – but these can become fragile and error-prone. Hard-coding which components to update on which event quickly turns into a maintenance nightmare in complex systems with intertwined calculations. What is needed is a general, automated way.
The financial sector: scenario analysis
Let’s anchor this in a real use case: scenario-based risk analytics.
In financial institutions, portfolios must be stress-tested against hundreds of hypothetical market scenarios — interest rate shocks, inflation spikes, FX volatility, credit default events. Each scenario triggers a cascade of calculations across traders’ portfolios and key risk metrics.
The usual headaches are slow recomputation, where tweaking just one market input can force you to re-run the calculation for that scenario for hours, and wasted compute, as parts that didn’t even change get recalculated anyway.
Enter the graph
By modeling each intermediate computation – say, “price of a fixed income asset = price[expected cash flows curve, discount factors]” – as a node in a graph, we gain:
- Partial recomputation: change a scenario input, and only recompute what’s upstream (in the example above — changing discounting rule does not require cash flows recomputation to update the price).
- Dependency tracing: discover exposure to which factors should be taken into account by the trader and what are the scenarios we need to run for a given portfolio of instruments.
- Dependency tracing, but in reverse: instantly understand what is going to be affected by a given change.
- Parallel execution: independent subgraphs can compute concurrently.
- Structural flexibility: modeling your logic as a graph encourages a modular design. Each node is a self-contained piece of functionality. This makes the system more maintainable and extensible. It’s a design that grows with your needs.
To continue the example from the financial sector — a scenario within a graph framework becomes simply an override of some nodes, making it easy to inject new inputs or parameters without needing to change the signature of consumers.
Under the hood: core algorithms
1. Topological sorting and execution scheduling
Before evaluating a computation graph, it’s essential to determine an execution order that respects the dependencies between nodes. This is typically done via topological sorting — assuming, of course, that the developer has correctly built a valid DAG (directed acyclic graph) without circular dependencies. This is a classic topological sort.
In dynamic systems, though, this step can’t just happen once — nodes might be added or removed on the fly, or marked as unevaluated (“dirty”) when dependencies change. That’s why incremental topological sorting algorithms like Pearce-Kelly or dynamic DFS are used to update the execution order with minimal recomputation cost.
Practically, the graph is linked to the current session. When you want to compute a node, you first determine the subgraph needed to evaluate it. Then, you start evaluating nodes in topological order: first those without dependencies, then those that depend on already-evaluated nodes, and so on until you reach the target node. This results in a partially evaluated graph that can be reused for other computations.
2. Memoization
Each node stores its last output. If dependencies don’t change, we avoid recomputation entirely.
If you need to calculate another node later and its dependencies are already evaluated, you can pick them up immediately. If something changes, the values of affected nodes are updated in-place, and the system traverses upstream to mark all nodes, depending on the one that was updated, as “dirty”, ensuring correctness while minimizing redundant work.
3. Lazy evaluation
This is the killer feature. As mentioned above, once an input changes, a “being dirty” flag propagates through the graph upwards. Only dirty nodes will get recomputed. This reduces workload dramatically — especially in large graphs . But recomputation does not happen immediately — “being dirty” flag of a node just indicates that its value is invalid. A new (valid) value of a “dirty” node is computed only when some node is requested that depends on it! This concept is called “lazy evaluation”.
Engineering the graph
State Management and Immutability
Managing state within a computation graph demands careful engineering, particularly in concurrent or distributed environments. Consider using immutable node values and structure computations as pure functions. This approach minimizes the risk of race conditions and subtle concurrency bugs. Additionally, for features like time-travel debugging, auditability or rollback, maintaining historical snapshots of the graph can be invaluable.
Scalable Execution & Parallelism
Independent branches of the graph can be processed in parallel threads or distributed systems. This is how big data processing frameworks (like Apache Spark or Airflow for ETL) schedule tasks — the DAG of operations ensures tasks run in the correct order and allows concurrency where possible. In a real-time computing context, if two sets of computations share no dependencies, they can proceed simultaneously, scaling throughput on multi-core or distributed infrastructure.
Monitoring and observability
Graphs aren’t just execution engines — they’re living documentation. To debug or optimize them, you need:
- Trace logs: which nodes ran, when, and why.
- Visual graph inspection tools that show dependencies.
- Metric overlays: runtime stats on node latency, fan-out.
Beyond finance: broader applications
While this architecture found fertile ground in scenario analytics, it’s far from a finance-only trick.
- Reactive UI frontends: modern frontend frameworks increasingly use fine-grained reactive programming to update the user interface efficiently — when a piece of application state changes, only the components that depend on that change are re-rendered
- Live dashboards and BI tools: real-time updates with minimal latency require partial recomputation.
- Scientific simulations: from climate models to physics engines, many systems rely on chained dependencies.
Across all these examples, the recurring theme is reactivity — systems that react to changes. By structuring those reactions through a dependency graph, we gain clarity and efficiency. In UI frameworks, this yields smoother, faster interfaces. In finance, it yields risk numbers computed on the fly. In data systems, it yields faster pipelines skipping unchanged steps. The concept of a “living” computation graph enables a shift from batch processing to interactive, incremental systems in many fields.
Lessons from the field
After deploying these systems in production, here’s what I’ve learned:
- Design for visibility. If you can’t see how data moves through your graph, you’ll never debug it under pressure.
- Don’t over-optimize early. Graph brings natural performance wins — avoid premature micro-tuning.
- Fail loud and early. Silent inconsistencies in dependencies are dangerous — use run-time validations.
- Build for evolution. Graphs will grow and mutate with the system. Support modular graph composition, hot-swapping subgraphs and backwards-compatible APIs.
References
[1] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker and I. Stoica, Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (2012), USENIX NSDI
[2] Apache Software Foundation, Directed Acyclic Graphs in Apache Airflow (2024), airflow.apache.org/docs
[3] C. Olah, Understanding LSTM Networks (2015), colah.github.io
[4] G. Louppe, Causal Modeling with Computational Graphs (2017), arXiv:1703.10685
[5] M. Pearce and P. Kelly, A Dynamic Topological Sort Algorithm for Directed Acyclic Graphs (2007), Journal of Experimental Algorithmics
[6] T. S. Huang, R. Patel, J. Li and K. Nishida, Reactive Graph-Based Execution in Modern Frontend Frameworks (2021), ACM SIGPLAN Symposium on Reactive Programming
[7] Google AI, Introduction to Graphs and tf.function (2024), tensorflow.org
[8] J. Anderson and T. Smith, Memoization-Aware Data Pipelines for Real-Time Analytics (2023), Proceedings of the VLDB Endowment (PVLDB)
[9] Meta AI Research, Maintaining Large‑Scale AI Capacity at Meta (2024), engineering.fb.com
[10] A. Hellerstein, J. M. Hellerstein and E. Meijer, Reactive Programming and the Future of Dataflow (2019), Communications of the ACM
[11] D. Crankshaw, X. Wang, G. Zhou, M. J. Franklin, J. E. Gonzalez, A. Tumanov and I. Stoica, The Case for Inference Graphs: Fast, Modular Inference in Production ML Systems (2020), USENIX OSDI
