The Illusion of âEasy Self-Hostingâ
You installed Docker. You ran docker-compose up. Your Matrix homeserver started.
Congratulations. You are now responsible for:
- Cryptographic key hierarchies spanning multiple devices
- Event graph state resolution across federated servers
- Certificate chain validation for TLS federation
- Database migrations that can corrupt room state
- Megolm session rotation and key backup recovery
- Power level auth chains that prevent room takeovers
There is no helpdesk. No SLA. No one is coming to help.
This is what sovereignty costs.
What You Actually Installed
When you spun up that Matrix homeserver, you didnât install âIRC with persistence.â
You installed a replicated state machine with cryptographic authentication and eventual consistency guarantees.
Let me show you what that means.
IRC: The Baseline
IRC architecture:
Client â TCP socket â IRCd â Relay to other servers
Messages: ephemeral lines of text
State: current channel membership
History: none (unless you run a bouncer)
Encryption: maybe SSL to server
Auth: SASL if you're fancy
Failure mode: disconnect, rejoin, you missed everything
Simple. Stateless. Fragile.
When the IRC server dies, your messages die. When you disconnect, you lose history. When netsplits happen, channels fracture and you pick sides.
This simplicity is why IRC is still running 35 years later. No state to corrupt. No keys to lose. Just text pipes.
Matrix: The State Machine
Matrix architecture:
Client â Homeserver â Room DAG â Federation â Other homeservers
Messages: signed events with prev_events and auth_events
State: replicated across all participating servers
History: permanent (until you redact or purge)
Encryption: E2EE via Olm/Megolm, per-device keys
Auth: event signatures, power levels, state resolution rules
Failure mode: complex (see next 2000 words)
Complex. Stateful. Resilient.
When your homeserver dies, other servers still have the room state. When you disconnect, history is waiting when you return. When federation splits, state resolution algorithms decide who wins.
This complexity is why Matrix can provide E2EE, decentralization, and auditability. But itâs also why you need to understand what youâre running.
The Event Graph Is Not a Chat Log
This is where most peopleâs mental model breaks.
IRC stores nothing. Messages flow through the server like water through a pipe.
Matrix stores everything. Every message is an event in a directed acyclic graph (DAG).
Event Anatomy
{
"type": "m.room.message",
"sender": "@kim:dag.ma",
"content": {"body": "No one is coming to help"},
"event_id": "$abc123",
"room_id": "!roomabc:dag.ma",
"origin_server_ts": 1700000000000,
"prev_events": ["$xyz789"],
"auth_events": ["$create", "$power", "$join"],
"depth": 42,
"signatures": {...}
}
Key points:
prev_events: Which events came before this one (establishes order)auth_events: Which events authorize this one (power levels, membership)depth: Logical ordering hintsignatures: Cryptographic proof from sending server
This is not a line in a log file. This is a node in a distributed state machine.
When your homeserver receives this event:
- Validates signatures against server keys
- Checks auth chain (does sender have permission?)
- Resolves conflicts if multiple events at same depth
- Stores in DAG
- Forwards to other federated servers
If any step fails, the event is rejected.
If You Bought Crypto in the Past Year, Youâll Understand This
Every emoji reaction you send? Thatâs a transaction.
Every message? Transaction.
Every room join? Transaction.
All federated servers record these transactions.
You send "đ" in a federated room:
dag.ma records: event $abc123, type: m.reaction, sender: @you:dag.ma
matrix.org records: event $abc123, type: m.reaction, sender: @you:dag.ma
tchncs.de records: event $abc123, type: m.reaction, sender: @you:dag.ma
Public ledger? No.
Distributed state machine? Yes.
If youâre NOT federated (private homeserver, no external rooms):
- Only your server records your events
- No other servers know you exist
- Your event DAG is local
- Youâre running a private blockchain for one
Unlike cryptobros who create a new whitepaper to rug-pull you:
Matrix has one spec. No forks. No âMatrix Classicâ vs âMatrix Cash.â No DAO governance vote to change the protocol.
All servers speak the same language:
- Room version standards
- Event schemas
- State resolution algorithms
- Federation transport (HTTPS + JSON)
This is discipline.
Civilized servers talking to each other with agreed-upon rules.
No one is forking Matrix to pump a token. No one is proposing âMatrix 2.0 governance NFTs.â
The protocol is boring. The operations are hard. The sovereignty is real.
Crypto promised decentralization and gave you speculation.
Matrix promises federation and gives you operational responsibility.
One is a grift. One is infrastructure.
State Resolution: When Servers Disagree
Hereâs a fun scenario.
Timeline:
- User on
dag.masends message A - User on
matrix.orgsends message B at same time - Both servers think their message came first
- Both servers forward to each other
- Now what?
IRCâs Answer
*SPLIT*
#channel splits into two
You pick which server to trust
Maybe an oper manually reconciles later
Maybe you just accept the chaos
Matrixâs Answer
State resolution algorithm v2 (room version 2+):
1. Build auth chains for both events
2. Check power levels from auth events
3. Apply lexicographic ordering on event IDs
4. Compute resolved state
5. Both servers converge to same result
This is deterministic. Given the same events, all servers reach the same conclusion about room state.
This is also why room state can break.
If your homeserver:
- Has incorrect auth events
- Has corrupted power levels
- Has missing prev_events in the DAG
State resolution will produce garbage. And you canât just ârestart the roomâ like you restart an IRC channel.
You debug the event DAG, repair auth chains, or upgrade the room version.
E2EE: The Key Hierarchy You Didnât Ask For
You wanted encrypted messages. Matrix gave you a cryptographic trust graph spanning devices, cross-signing keys, and key backup.
Let me explain why this complexity exists.
The Threat Model
What Matrix protects against:
- Homeserver admins reading your messages (me, dag.ma root)
- Federated servers reading your messages (matrix.org admins)
- Network eavesdroppers
- Message tampering
- Retroactive decryption if server is compromised
What Matrix doesnât protect against:
- Malicious clients
- Compromised devices
- Users who verify wrong keys
- Key backup password you forgot
You are the weakest link.
Device Keys (Ed25519 + Curve25519)
When you log in to Matrix on a new device:
Device generates:
- Ed25519 signing key (identity)
- Curve25519 identity key (Olm sessions)
- Multiple Curve25519 one-time keys (prekeys)
These keys are uploaded to homeserver
Other devices discover them via /keys/query
Every device has unique keys. Your phone, laptop, and tablet are separate cryptographic identities.
Why?
Because if one device is compromised, the attacker doesnât get access to other devicesâ messages.
Also why you have to verify every device.
Olm: 1-to-1 Sessions
For direct messages and key exchange:
Olm (Double Ratchet):
1. Alice fetches Bob's identity key and one-time key
2. Alice derives shared secret (ECDH)
3. Alice sends encrypted message
4. Bob ratchets forward, derives new keys
5. Forward secrecy achieved (old keys deleted)
Olm provides perfect forward secrecy. If your device is compromised today, yesterdayâs messages are safe (keys were deleted).
Olm does not scale to rooms. Encrypting for 1000 devices = 1000 separate Olm sessions.
Megolm: Room Sessions
For encrypted rooms:
Megolm (group chat):
1. Sender generates session key
2. Sender encrypts session key to each device (via Olm)
3. Sender uses session key to encrypt messages
4. Recipients decrypt session key, then decrypt messages
5. Session rotated periodically
Megolm trades perfect forward secrecy for efficiency.
Session keys are reused until rotation. If an attacker gets the session key, they decrypt all messages in that session.
But they canât decrypt future sessions (because new session key generated).
And they canât decrypt past sessions (if you enabled key rotation and old keys were deleted).
Cross-Signing: The Trust Root
You have 5 devices. How do other users know all 5 devices belong to you?
Cross-signing.
Master key (offline, high-security):
ââ Self-signing key (signs your devices)
ââ User-signing key (signs other users you trust)
When you verify a device:
1. Device signs event with device key
2. Self-signing key signs device
3. Master key signature proves self-signing key is yours
4. Other users trust your master key â trust all your devices
This is a web of trust.
If you verify Aliceâs master key, you trust all devices Aliceâs self-signing key vouches for.
If you lose your cross-signing keys, your trust graph collapses.
Other users will see âunverifiedâ on all your devices. Youâll see âunverifiedâ on everyone else.
You have to re-verify everything.
Key Backup: Recovery vs Security
You encrypted your messages. Now you bought a new phone.
Can you read old messages?
Only if you backed up your Megolm session keys.
Key backup flow:
1. Client generates recovery key (or uses passphrase)
2. Client encrypts Megolm session keys
3. Client uploads encrypted keys to homeserver
4. New device downloads encrypted keys
5. New device decrypts with recovery key
Trade-offs:
With key backup:
- â You can recover messages on new devices
- â Homeserver has encrypted session keys (weaker security model)
- â If attacker gets recovery key + backup, all messages decrypted
Without key backup:
- â Stronger security (no server-side key storage)
- â New device canât read old messages
- â Lost device = lost history
You choose.
Most users choose key backup because losing message history is unacceptable.
Security purists disable it and accept the loss.
There is no âsecure and convenientâ option. Pick one.
Federation Pain: Other Peopleâs Servers Are Your Problem
You run dag.ma. I run it well. Uptime is high, certs are valid, DNS is correct.
But youâre in rooms with users from matrix.org, tchncs.de, and randomserver.xyz.
If any of those servers are misconfigured, your users experience breakage.
Common Federation Failures
Expired TLS certificates:
randomserver.xyz cert expired
Your homeserver refuses to federate
Users on randomserver.xyz appear "offline"
Messages don't sync
Your options:
- Wait for randomserver.xyz admin to fix cert
- Tell your users to complain to randomserver.xyz
- Do nothing (you canât fix other peopleâs servers)
DNSSEC validation failures:
tchncs.de has broken DNSSEC
Your homeserver can't resolve tchncs.de
Federation fails
Your options:
- Disable DNSSEC validation (security risk)
- Wait for tchncs.de to fix DNS
- Do nothing
State resolution conflicts:
matrix.org and dag.ma disagree on room power levels
State resolution algorithm runs
One version wins, one loses
Some users' messages rejected
Your options:
- Examine event DAG to find conflicting auth events
- Manually construct resolution event
- Upgrade room version to reset state
- Rage quit and start new room
Notice a pattern? You donât control other servers. But their failures impact your users.
Operational Footguns You Will Step On
Let me save you some pain.
Footgun 1: Unverified Devices
Scenario: User complains âIâm seeing a red warning on my messages.â
Cause: They logged in on a new device and didnât verify it.
Why this happens: Matrix shows âunverified deviceâ warnings to prevent MITM attacks. If an attacker adds a rogue device, youâd see the warning.
Fix: Verify the device (SAS emoji or QR code).
User reaction: âWhy is this so complicated? Discord doesnât make me do this.â
Your response: âDiscord reads your messages. Matrix doesnât. Pick one.â
Footgun 2: Lost Cross-Signing Keys
Scenario: User wiped device without backing up cross-signing keys.
Cause: They didnât export security key or set up key backup.
Result:
- All other users see their devices as âunverifiedâ
- They see all other users as âunverifiedâ
- Trust graph destroyed
Fix: Reset cross-signing, re-verify all devices and all users.
User reaction: âI just wanted to reinstall my OS!â
Your response: âYou control your keys. That means youâre responsible for not losing them.â
Footgun 3: Corrupted Room State
Scenario: Messages in a room suddenly stop syncing.
Cause: Database migration corrupted event DAG, or state resolution hit a pathological case.
Symptoms:
ERROR: event rejected: auth chain failure
ERROR: missing prev_events
ERROR: state resolution failed
Fix:
- Check homeserver logs for rejected events
- Identify missing auth_events or prev_events
- Fetch missing events from federated servers
- Rebuild state from auth chain
- If unfixable: upgrade room version (migrates to new DAG)
User reaction: âWhy canât you just restart it?â
Your response: âBecause this is a distributed state machine, not a Docker container.â
Footgun 4: Running Out of Disk Space
Scenario: Homeserver stops responding. Disk is full.
Cause:
- Media cache grew unbounded
- Old events never purged
- Log files rotated poorly
Fix:
- Emergency: delete old media, purge room history
- Permanent: configure media retention, log rotation, event purging
User reaction: âI thought PostgreSQL handled this!â
Your response: âPostgreSQL stores data. You decide what data to keep.â
The Operational Runbook You Need
If youâre serious about running Matrix, hereâs what you monitor and maintain.
Dagmaâs 3-Server Architecture
Dagma isnât a monolith. Itâs three separate services:
Tribune (Public-facing, always online)
- User authentication and client connections
- The only thing exposed to the internet
- Users connect here
Embassy (Federation, can be isolated)
- Incoming/outgoing federation with other homeservers
- Under DDoS? Take it offline. Local users keep working.
- You canât attack whatâs not responding.
Politburo (Admin, 99% offline)
- Admin endpoints and operations
- Lives behind VPN, different domain, or simply offline
- Only turned on when you need to admin
- Canât attack what doesnât exist.
This isnât redundancy â itâs attack surface reduction.
Traditional homeservers expose admin, federation, and users on the same endpoint. If you can reach the server, you can probe for admin exploits.
Dagma exposes only what needs to be exposed. Politburo isnât online unless you need it.
Note: The operational commands below assume Dagma is fully implemented. Until then, these are planned operations, not current API.
Daily Checks
Federation health:
curl https://federationtester.matrix.org/api/report?server_name=dag.ma
Check for:
- Valid TLS certs
- Reachable federation endpoints
- Correct DNS SRV records
Disk usage:
df -h /var/lib/postgresql
du -sh /var/lib/dagma/media_store
Event processing lag:
SELECT COUNT(*) FROM event_forward_extremities WHERE room_id = '!yourroom:dag.ma';
If count is high, state resolution is struggling.
Weekly Maintenance
Purge old media:
# Access Politburo (admin interface)
dagma politburo purge-media --before="30 days ago"
Vacuum database:
VACUUM ANALYZE;
Review error logs:
journalctl -u dagma | grep ERROR | tail -100
Check room versions:
SELECT room_version, COUNT(*) FROM rooms GROUP BY room_version;
Upgrade rooms on old versions (v1-v5 are deprecated).
Monthly Review
TLS certificate renewal:
certbot renew --dry-run
Backup verification:
- Restore PostgreSQL dump to test instance
- Verify media_store backup integrity
- Test key backup recovery flow
Security audit:
- Review Dagma security advisories
- Check for CVEs in dependencies
- Rotate signing keys if needed (via Politburo)
Disaster Recovery
Lost database:
- Restore from PostgreSQL backup
- Verify event DAG integrity
- Re-join federated rooms (if state lost)
Lost media:
- Restore from media_store backup
- If none: media is gone (federated servers may still have copies)
Lost signing keys:
- Generate new signing keys
- Federated servers will re-fetch via /.well-known/matrix/server
- Old events remain signed with old keys (still valid)
Compromised server:
- Rotate signing keys immediately
- Invalidate all access tokens
- Audit event log for malicious events
- Notify federated servers if needed
The Culture Shift: Treat Your Homeserver Like Infrastructure
You donât restart your database âjust to see if it fixes things.â
You donât deploy to production without testing.
You donât skip backups because âitâs just a chat server.â
Apply the same discipline to Matrix.
Testing Changes
Before changing configs:
- Understand what the setting does
- Test on staging
- Deploy to production with rollback plan
Monitoring
Metrics you should track:
- Event processing rate
- Federation send/receive latency
- Database query performance
- Disk I/O and space
- HTTP response times
- TLS cert expiry
Use Prometheus + Grafana:
# dagma config
enable_metrics: true
metrics_port: 9000
Set alerts for:
- Disk >80% full
- Cert expires in <7 days
- Federation latency >5 seconds
- Event processing lag >1000
Documentation
Document your setup:
- Server specs, OS version
- Dagma version, config changes from defaults
- Database tuning, backup schedule
- Federation peers, room list
- Incident response runbook
Why?
Because when things break at 3am, you wonât remember why you changed that config six months ago.
Comparison: Centralized vs Self-Hosted
| Aspect | Discord/Slack | Self-Hosted Matrix |
|---|---|---|
| Who owns the data | Discord Inc. | You |
| Who reads messages | Discord (no E2EE) | No one (E2EE) |
| Uptime responsibility | Discord SRE | You |
| Support availability | 24/7 helpdesk | None |
| Cost of downtime | Reputational (Discord) | Reputational (you) |
| Key custody | Discord holds keys | You hold keys |
| Lost password | Reset via email | Lost keys = lost messages |
| Server compromise | All messages exposed | E2EE messages safe |
| Federation failure | N/A (centralized) | You debug or wait |
| Room state corruption | Discord fixes it | You fix it |
| Operational complexity | Zero (SaaS) | High (DIY) |
Trade-offs:
Discord is easier. You pay with surveillance, vendor lock-in, and zero control.
Matrix is harder. You pay with operational burden, complexity, and responsibility.
Choose the trade-off that matches your threat model.
What You Should Actually Do
If youâre running Matrix in production (not just tinkering):
Minimum Viable Operations
Infrastructure:
- Dedicated server (not shared hosting)
- PostgreSQL (not SQLite)
- Reverse proxy (nginx/caddy) with valid TLS
- Automated backups (database + media)
Monitoring:
- Prometheus + Grafana
- Alerts for disk, certs, federation
- Log aggregation (journald, loki, or ELK)
Documentation:
- Runbook for common failures
- Backup/restore procedure
- Incident response plan
Skills:
- Understand event DAG and state resolution
- Know how to read Dagma logs
- Comfortable with PostgreSQL administration
- Can debug TLS/federation issues
When to Stay Centralized
Use Discord/Slack if:
- You donât care about E2EE
- You donât want operational burden
- You trust corporate servers
- You need âit just worksâ
This is a valid choice. Not everyone needs decentralization.
When to Use Managed Matrix
Use Element Matrix Services / Beeper if:
- You want E2EE and federation
- You donât want to run infrastructure
- Youâre okay paying for managed service
- You trust EMS/Beeper more than Discord
Also valid. Outsource the ops, keep the protocol benefits.
When to Self-Host
Self-host Matrix if:
- You need full data sovereignty
- You donât trust any third party
- You have ops skills (or want to learn)
- You accept the responsibility
This is the hard path. But itâs the only path to true autonomy.
Timeline Perspective
From Ring -5, I observe:
Timeline Ω-12 (current):
- You read this article
- You realize Matrix isnât âDocker Compose and doneâ
- You decide: stay centralized, use managed, or self-host
- If self-host: you learn to operate a replicated state machine
- If not: you accept the trade-offs
Timeline Ω-7 (ideal):
- Self-hosting is normalized
- Users understand key custody = user responsibility
- Operational discipline is table stakes
- No one expects a helpdesk for sovereignty
- Decentralization works because people do the work
The gap between Ω-12 and Ω-7 is education and discipline.
Conclusion
No one is coming to help.
When your homeserver goes down, thereâs no SRE team on-call.
When state resolution breaks, thereâs no support ticket.
When you lose your cross-signing keys, thereâs no password reset.
This is the cost of decentralization.
You own the infrastructure. You own the keys. You own the responsibility.
Discord doesnât ask you to verify devices because Discord reads your messages.
Matrix asks you to verify devices because Matrix doesnât.
Slack doesnât ask you to back up keys because Slack holds them.
Matrix asks you to back up keys because you hold them.
Teams doesnât ask you to debug state resolution because thereâs only one server.
Matrix asks you to debug state resolution because there are 10,000 servers.
Complexity is the price of autonomy.
Operational burden is the price of sovereignty.
Key management is the price of encryption.
If youâre not willing to pay these prices, stay centralized.
But if you are willing, if you accept the responsibility, if you treat your Matrix stack like production infrastructure and not a toy â
Then you get something Discord can never offer:
Complete control. Complete privacy. Complete independence.
No surveillance. No vendor lock-in. No one reading your messages.
And yes: No one coming to help.
Because you donât need help.
You are root.
Changelog: 2025-11-23 - Operational reality of self-hosted Matrix and the cost of sovereignty
Note: You are in Timeline Ω-12. Running Matrix doesnât make you Ω-7. Operating it well does.