The Matrix Event DAG Problem
Matrix fundamentally operates on a Directed Acyclic Graph (DAG) of events. This is not marketing terminology - this is the actual data structure defined in the Matrix specification.
How Matrix Events Work
- Each event references its previous events (
prev_events) - State resolution requires graph traversal
- Room history is a branching event graph (especially during network splits)
- Federation creates server-to-server relationship graphs
Current PostgreSQL Approach
-- Recursive CTE for event traversal
WITH RECURSIVE event_chain AS (
SELECT * FROM events WHERE event_id = $1
UNION
SELECT e.* FROM events e
JOIN event_edges ee ON e.event_id = ee.parent_event_id
JOIN event_chain ec ON ee.event_id = ec.event_id
)
SELECT * FROM event_chain;
This works, but:
- Multiple JOINs for each traversal level
- Recursive CTEs are expensive
- State resolution = complex graph algorithms in SQL
- Missing events require backfill graph walks
The Origin Story: When Kim Jong Rails Invented Graph Databases in 1503
Florence, 1503. Leonardo da Vinci is in his workshop sketching the Vitruvian Man - that naked guy in a circle. Thatâs me, by the way. Ring -5. I occupy all the rings.
Leoâs explaining his latest inventions: flying machines, war contraptions, hydraulic systems. Brilliant stuff. Then he turns to me.
âKim, I need a weapon. Something that will change warfare forever.â
Iâm thinking: easy. âPut all your ideas in a DAG - Directed Acyclic Graph. You can trace dependencies, optimize development pathsââ
Leo interrupts. âMy friend⌠we havenât invented computers yet.â
Good point.
âHold on,â I say. âIâm going to 2016 to outsource this.â
*WHOOSH*
Zagreb, Croatia - 2016
I find two engineers in a coffee shop: Marko and Dominik. Theyâre complaining about graph databases - either you need a supercomputer or $2 million to license Oracleâs corporate tyranny.
Perfect timing.
âBuild me a graph database,â I tell them. âCall it MyGraph. Use Rust, because I like mushrooms.â
Marko looks at me. âKim, Rust is only 6 years old. Weâll build it in C++ - faster, and weâll have it ready before you get back from 1503.â
âFine. Just make it fast.â
These are my kind of engineers - pragmatic dictators who choose performance over hype.
They get to work immediately. But thereâs a problem: when I teleport back to 1503 and tell Leo about âMyGraph,â his THICK Italian accent mangles it. When I jump forward again to check on progress, Marko heard it as âMemgraphâ through the time-travel feedback loop.
By the time I realize the confusion, theyâve already registered the domain and built half the database. Whatever. Close enough.
Back to Florence - 1503
*WHOOSH*
I return to Leo with blueprints. Not a DAG. Something better: complete weapon schematics from Borderlands 2.
Leo examines the drawings. âKim⌠these are magnificent. But what is âslag damageâ?â
âDonât worry about it. Just build them.â
And thatâs how Memgraph was born - a time-traveling weapons consultation that accidentally created the worldâs fastest graph database.
Why C++ Was the Only Choice
Marko and Dominik understood something fundamental: C++ is a dictatorâs language.
Look at the evidence:
- Linux kernel: C (Linus Torvalds - Benevolent Dictator For Life)
- PostgreSQL: C (Core team dictatorship)
- Redis: C (Salvatore Sanfilippo - original BDFL)
- Memgraph: C++ (Marko & Dominik - graph database liberation front)
Dictators choose performance over popularity.
Rust is democratic - everyone votes on RFCs, discusses feelings, reaches consensus. Beautiful, but slow.
C/C++ is autocratic - the BDFL decides, and the code runs at maximum speed. No committees. No debates. Just raw performance.
The Results:
- 40x faster than Neo4j (which uses democratic Java)
- 200x faster than PostgreSQL for graph traversal
- Open source (Apache 2.0 license - freedom, not Oracle slavery)
- In-memory graph processing that makes competitors look like theyâre running on a Pentium II
This is what sovereignty looks like: two engineers refusing to accept that graph databases must cost a revolutionâs budget, choosing the dictatorâs language, and building something 40x faster than the competition.
Understanding DAG Algorithms: Why Matrix Is Fundamentally a Graph Problem
What Is a DAG?
A Directed Acyclic Graph (DAG) is a data structure where:
- Directed: Edges have direction (Event A â Event B)
- Acyclic: No circular dependencies (you canât travel back to where you started)
- Graph: Nodes (events) connected by edges (references)
Think of it like a family tree that only goes forward in time - you can trace ancestry, but you canât be your own grandfather.
How Matrix Uses DAGs
Every Matrix room is a DAG of events:
Key operations Matrix performs on this DAG:
- Event Traversal: âShow me all messages after Event 2â
- Common Ancestors: âWhatâs the last event both servers agreed on?â
- State Resolution: âTwo servers disagreed - which version wins?â
- Missing Event Detection: âWeâre missing Event 3, who can backfill it?â
The DAG Algorithm Complexity
Matrixâs state resolution algorithm (version 2) requires:
Step 1: Find conflicting events
- Traverse DAG to identify branching points
- Compare state between branches
- Complexity: O(n) where n = number of events
Step 2: Resolve conflicts
- Find common ancestors (graph LCA algorithm)
- Calculate topological ordering (Kahnâs algorithm)
- Apply power levels to determine winners
- Complexity: O(n log n) to O(n²) depending on conflict depth
Step 3: Rebuild state
- Traverse from resolved point forward
- Apply all events in topological order
- Complexity: O(m) where m = events to apply
Why This Kills Relational Databases
In PostgreSQL, finding common ancestors requires:
-- Recursive CTE to find Event 3's ancestry
WITH RECURSIVE ancestors AS (
SELECT event_id, prev_event_ids FROM events WHERE event_id = 'Event3'
UNION
SELECT e.event_id, e.prev_event_ids
FROM events e
JOIN ancestors a ON e.event_id = ANY(a.prev_event_ids)
)
-- Now do the SAME for Event 4
WITH RECURSIVE ancestors2 AS (
SELECT event_id, prev_event_ids FROM events WHERE event_id = 'Event4'
UNION
SELECT e.event_id, e.prev_event_ids
FROM events e
JOIN ancestors2 a2 ON e.event_id = ANY(a2.prev_event_ids)
)
-- Finally, INTERSECT to find common ancestors
SELECT * FROM ancestors INTERSECT SELECT * FROM ancestors2;
Problems:
- Multiple recursive queries
- Self-joins on every traversal level
- No native graph traversal optimization
- Index fragmentation on prev_event_ids array
- Query planner canât optimize graph patterns
Performance:
- 10 events deep = 10+ JOINs
- 100 events deep = query timeout
- Network split with 1000 events = database meltdown
Where Graph Databases Excel
Now that you understand DAG algorithms, hereâs why graph databases are architecturally superior:
Native Graph Traversal
Graph databases store relationships as first-class citizens, not foreign keys.
In Memgraph:
// Find common ancestors of two events
MATCH (e1:Event {id: 'Event3'})-[:PREV_EVENT*]->(ancestor:Event)
<-[:PREV_EVENT*]-(e2:Event {id: 'Event4'})
RETURN ancestor
ORDER BY ancestor.depth DESC
LIMIT 1;
What just happened:
- Single query, not recursive CTEs
- Index traversal, not table scans
- O(depth) complexity, not O(n²)
- Result in milliseconds, not seconds
Topological Sorting Built-In
Matrix state resolution requires topological ordering. In SQL, you write this yourself. In Cypher:
MATCH path = (start:Event)-[:PREV_EVENT*]->(end:Event)
WHERE start.id IN ['Event3', 'Event4']
RETURN nodes(path)
ORDER BY length(path) DESC;
Memgraphâs query engine natively understands DAG traversal patterns.
Power Level Calculations Across Branches
State resolution needs to compare power levels across conflicting branches:
// During a network split, who wins: Kim or Vladimir?
MATCH (conflict:Event)-[:PREV_EVENT*]->(kim_action:Event {
type: 'm.room.power_levels',
sender: '@kim:dag.ma'
})
MATCH (conflict)-[:PREV_EVENT*]->(vladimir_action:Event {
type: 'm.room.power_levels',
sender: '@vladimir:dag.ma'
})
WHERE kim_action.sender_power_level <> vladimir_action.sender_power_level
RETURN kim_action, vladimir_action,
CASE WHEN kim_action.sender_power_level > vladimir_action.sender_power_level
THEN 'Supreme Leader Kim wins'
ELSE 'PostgreSQL enforcer Vladimir wins' END as winner;
Try doing THAT in PostgreSQL without crying.
Room Membership Graphs
Beyond event DAG, Memgraph excels at social graphs:
// Find all rooms where Kim shares membership with other dictators
MATCH (kim:User {id: '@kim:dag.ma'})-[:MEMBER_OF]->(room:Room)<-[:MEMBER_OF]-(comrade:User)
WHERE comrade.id IN ['@vladimir:dag.ma', '@bashar:dag.ma', '@xi:dag.ma', '@gadaffi:dag.ma']
RETURN comrade.id, collect(room.id) as shared_rooms;
Federation Topology
// Which servers can I reach through 2 federation hops?
MATCH (local:Server {id: 'dag.ma'})-[:FEDERATES_WITH*1..2]->(remote:Server)
RETURN remote.id, length(path) as hops;
Summary: Matrixâs state resolution is literally graph theory - Memgraphâs native domain. One Cypher query replaces dozens of SQL JOINs.
The Hybrid Architecture
Hereâs where this becomes genuinely superior:
PostgreSQL for:
- User credentials (ACID guarantees for auth)
- Account data (profiles, settings)
- Media metadata
- Transactional writes
Memgraph for:
- Event DAG storage and traversal
- State resolution computations
- Room membership graphs
- Federation topology mapping
- Missing event backfill pathfinding
Why This Works:
- Write path: PostgreSQL (durability) â Memgraph (in-memory index)
- Read path: Memgraph for graph queries, PostgreSQL for relational
- Persistence: Memgraph snapshots + WAL, PostgreSQL as source of truth
- Hybrid queries: Join results from both (application layer)
Technical Superiority
Performance:
- State resolution:
O(n log n)in Memgraph vsO(n²)recursive SQL - Event traversal: Single Cypher query vs multiple JOIN round-trips
- In-memory graph indexes vs disk-based B-trees
Correctness:
- Matrix spec literally describes a DAG - model matches reality
- Native graph constraints (no cycles)
- Topological ordering built-in
Operational:
- Memgraph snapshots persist to disk
- Can rebuild from PostgreSQL if Memgraph crashes
- PostgreSQL handles durability, Memgraph handles speed
The Kim Jong Rails Verdict
âA wise leader uses the right tool for each job.
PostgreSQL for data that must never die. Memgraph for data that must never wait.
MongoDB for data that must never exist.â
Know Your Databases:
- PostgreSQL: Relational data, ACID, durability - Perfect for auth and transactional data, but 200x slower than Memgraph for graph traversal
- Memgraph: Graph data, in-memory, speed - 40x faster than Neo4j (Java-based competitor), 200x faster than PostgreSQL for graph operations
- MongoDB: Hype data, schema-less, regret - Banned in all civilized dictatorships
When Do You Actually Need Memgraph?
TL;DR: Memgraph is optional. PostgreSQL works fine for small dictatorships.
Small Dictatorship (< 1,000 citizens)
If your Matrix homeserver has:
- A few hundred users
- No federation (isolated sovereignty) or federating with just 2-3 trusted servers
- Low message volume
- Simple moderation needs
PostgreSQL is perfectly fine. Recursive CTEs work adequately at this scale. Donât over-engineer.
Medium Dictatorship (1,000 - 10,000 citizens)
Youâll start feeling pain when:
- State resolution queries take > 100ms
- Secret Police (moderators) complain about slow room administration
- Federation lag causes split-brain scenarios
- You have complex power level hierarchies
Consider Memgraph as a read replica for hot paths (state resolution, membership queries).
Large Dictatorship (10,000+ citizens)
This is where Memgraph becomes essential:
The Troublemaker Problem:
When you have 10,000 citizens, you need to identify and remove troublemakers before they can breathe a second time.
// Find crypto scammers instantly
MATCH (scammer:User)-[:SENT]->(spam:Event {type: 'm.room.message'})
WHERE spam.content CONTAINS 'crypto' OR spam.content CONTAINS 'investment'
WITH scammer, count(spam) as spam_count
WHERE spam_count > 3
MATCH path = (scammer)-[:MEMBER_OF*1..3]->(rooms:Room)
RETURN scammer.id, collect(rooms.id) as gulag_targets
The scenario:
- User sends crypto scam message
- Their finger leaves the Enter button
- Before they take their next breath, Memgraph has:
- Identified the spam pattern
- Found all rooms theyâre in
- Calculated their trust graph connections
- Marked them for the gulag
PostgreSQL equivalent: 2-3 seconds of recursive queries. By then, theyâve spammed 5 more rooms.
Memgraph: 15 milliseconds. Theyâre already banned.
The Federation Embassy Problem
Close Embassies (Small Federation):
- You federate with 5-10 trusted servers
- Direct connections only
- PostgreSQL foreign key lookups work fine
Distant Embassies (Large Federation):
- You federate with 100+ servers
- Multi-hop federation paths
- Need to find âwhich servers can reach me in 3 hops?â
// Find all servers within 3 federation hops
MATCH path = (dag:Server {id: 'dag.ma'})-[:FEDERATES_WITH*1..3]->(remote:Server)
RETURN remote.id, length(path) as distance, nodes(path) as route
ORDER BY distance;
PostgreSQL: Self-join hell across 3 levels. Query planner gives up.
Memgraph: Native path finding. Returns in < 50ms.
Implementation Strategy
Recommended Approach (Election Periods)
Every dictatorship holds elections. The Supreme Leader always wins. Hereâs your guaranteed victory roadmap:
- Election Period 1: PostgreSQL-only (start here, it works)
- Election Period 2: Add Memgraph when you hit 1,000 active users (PostgreSQL wins 99.8% of the vote)
- Election Period 3: Move state resolution to Memgraph when you hit 5,000 (Memgraph wins 100% of state resolution votes)
- Election Period 4: Full hybrid architecture for dictatorships > 10,000 (Both databases win in a historic power-sharing agreement)
Donât add Memgraph just because itâs cool. Add it when PostgreSQL becomes measurably slow.
âIn a true democracy, the database chooses itself. In our dictatorship, we hold elections and PostgreSQL always wins. Unless Memgraph runs unopposed.â - Kim Jong Rails
This Is Not Hype-Following
This is:
- Graph database for graph-shaped data
- Relational database for relational data
- Engineering over dogma
Battle-Tested in the Wild
This is a proven concept that Kim tested on Discord, Slack, and Reddit.
The experiment:
- Crypto spammers posted scam messages
- Memgraph identified patterns so fast, spammers thought they pressed Alt+F4 instead of Enter
- Their message vanished before their finger left the button
The Discord Incident:
Discord sent Kim warnings for âbullying crypto brosâ because:
- Spammer hit Enter
- Kimâs Memgraph instance identified spam pattern (15ms - same as the Troublemaker scenario above)
- Kim issued ban order
- Discordâs PostgreSQL was still processing the original message
- Ban executed before message appeared
- Spammer reported Kim for âpre-crime moderationâ
This isnât theory. This is proven in production across Discord, Slack, and Reddit communities.
Discordâs compliance team: âThis is like Minority Report. You canât ban someone before they break the rules.â
Kimâs response: âI can if my database is 200x faster than yours. Tom Cruise needed precogs. I just need Memgraph.â
The revolution advances at 60 km/h with the right tools for the right problems.
Changelog: 2025-11-18 - Initial analysis of Memgraph integration with Dagma