Room DAG concepts

Edges

The word "edge" comes from graph theory lingo. An edge is just a connection between two events. In Synapse, we connect events by specifying their prev_events. A subsequent event points back at a previous event.

A (oldest) <---- B <---- C (most recent)

Depth and stream ordering

Events are normally sorted by (topological_ordering, stream_ordering) where topological_ordering is just depth. In other words, we first sort by depth and then tie-break based on stream_ordering. depth is incremented as new messages are added to the DAG. Normally, stream_ordering is an auto incrementing integer, but backfilled events start with stream_ordering=-1 and decrement.

/sync returns things in the order they arrive at the server (stream_ordering).
/messages (and /backfill in the federation API) return them in the order determined by the event graph (topological_ordering, stream_ordering).

The general idea is that, if you're following a room in real-time (i.e. /sync), you probably want to see the messages as they arrive at your server, rather than skipping any that arrived late; whereas if you're looking at a historical section of timeline (i.e. /messages), you want to see the best representation of the state of the room as others were seeing it at the time.

Forward extremity

Most-recent-in-time events in the DAG which are not referenced by any other events' prev_events yet.

The forward extremities of a room are used as the prev_events when the next event is sent.

Backwards extremity

The current marker of where we have backfilled up to and will generally be the oldest-in-time events we know of in the DAG.

This is an event where we haven't fetched all of the prev_events for.

Once we have fetched all of its prev_events, it's unmarked as a backwards extremity (although we may have formed new backwards extremities from the prev events during the backfilling process).

Outliers

We mark an event as an outlier when we haven't figured out the state for the room at that point in the DAG yet.

We won't necessarily have the prev_events of an outlier in the database, but it's entirely possible that we might. The status of whether we have all of the prev_events is marked as a backwards extremity.

For example, when we fetch the event auth chain or state for a given event, we mark all of those claimed auth events as outliers because we haven't done the state calculation ourself.

State groups

For every non-outlier event we need to know the state at that event. Instead of storing the full state for each event in the DB (i.e. a event_id -> state mapping), which is very space inefficient when state doesn't change, we instead assign each different set of state a "state group" and then have mappings of event_id -> state_group and state_group -> state.

Stage group edges

TODO: state_group_edges is a further optimization... notes from @Azrenbeth, https://pastebin.com/seUGVGeT