Expand description
A step by step guide that explains how to include end-to-end-encryption support in a Matrix client library.
This crate implements a sans-network-io state machine that allows you to add end-to-end-encryption support to a Matrix client library.
This guide aims to provide a comprehensive understanding of end-to-end encryption in Matrix without any prior knowledge requirements. However, it is recommended that the reader has a basic understanding of Matrix and its client-server specification for a more informed and efficient learning experience.
The introductory section provides a simplified explanation of end-to-end encryption and its implementation in Matrix for those who may not have prior knowledge. If you already have a solid understanding of end-to-end encryption, including the Olm and Megolm protocols, you may choose to skip directly to the Getting Started section.
§Table of Contents
- Introduction
- Getting started
- Decrypting room events
- Encrypting room events
- Interactively verifying devices and user identities
§Introduction
Welcome to the first part of this guide, where we will introduce the fundamental concepts of end-to-end encryption and its implementation in Matrix.
This section will provide a clear and concise overview of what end-to-end encryption is and why it is important for secure communication. You will also learn about how Matrix uses end-to-end encryption to protect the privacy and security of its users’ communications. Whether you are new to the topic or simply want to improve your understanding, this section will serve as a solid foundation for the rest of the guide.
Let’s dive in!
§Notation
§End-to-end-encryption
End-to-end encryption (E2EE) is a method of secure communication where only the communicating devices, also known as “the ends,” can read the data being transmitted. This means that the data is encrypted on one device, and can only be decrypted on the other device. The server is used only as a transport mechanism to deliver messages between devices.
The following chart displays how communication between two clients using a server in the middle usually works.
The next chart, instead, displays how the same flow is happening in a end-to-end-encrypted world.
Note that the path from the outbox to the inbox is now encrypted as well.
Alice and Bob have created a secure communication channel through which they can exchange messages confidentially, without the risk of the server accessing the contents of their messages.
§Publishing cryptographic identities of devices
If Alice and Bob want to establish a secure channel over which they can exchange messages, they first need learn about each others cryptographic identities. This is achieved by using the homeserver as a public key directory.
A public key directory is used to store and distribute public keys of users in an end-to-end encrypted system. The basic idea behind a public key directory is that it allows users to easily discover and download the public keys of other users with whom they wish to establish an end-to-end encrypted communication.
Each user generates a pair of public and private keys. The user then uploads their public key to the public key directory. Other users can then search the directory to find the public key of the user they wish to communicate with, and download it to their own device.
Once a user has the other user’s public key, they can use it to establish an end-to-end encrypted channel using a key-agreement protocol.
§Using the Triple Diffie-Hellman key-agreement protocol
In the triple Diffie-Hellman key agreement protocol (3DH in short), each user generates a long-term identity key pair and a set of one-time prekeys. When two users want to establish a shared secret key, they exchange their public identity keys and one of their prekeys. These public keys are then used in a Diffie-Hellman key exchange to compute a shared secret key.
The use of one-time prekeys ensures that the shared secret key is different for each session, even if the same identity keys are used.
Similar to X3DH (Extended Triple Diffie-Hellman) key agreement protocol
§Speeding up encryption for large groups
In the previous section we learned how to utilize a key agreement protocol to establish secure 1-to-1 encrypted communication channels. These channels allow us to encrypt a message for each device separately.
One critical property of these channels is that, if you want to send a message to a group of devices, we’ll need to encrypt the message for each device individually.
TODO Explain how megolm fits into this
§Getting started
Before we start writing any code, let us get familiar with the basic principle upon which this library is built.
The central piece of the library is the OlmMachine
which acts as a state
machine which consumes data that gets received from the homeserver and
outputs data which should be sent to the homeserver.
§Push/pull mechanism
The OlmMachine
at the heart of it acts as a state machine that operates
in a push/pull manner. HTTP responses which were received from the
homeserver get forwarded into the OlmMachine
and in turn the internal
state gets updated which produces HTTP requests that need to be sent to the
homeserver.
In a manner, we’re pulling data from the server, we update our internal state based on the data and in turn push data back to the server.
§Initializing the state machine
use anyhow::Result;
use matrix_sdk_crypto::OlmMachine;
use ruma::user_id;
let user_id = user_id!("@alice:localhost");
let device_id = "DEVICEID".into();
let machine = OlmMachine::new(user_id, device_id).await;
This will create a OlmMachine
that does not persist any data TODO
use anyhow::Result;
use matrix_sdk_crypto::OlmMachine;
use matrix_sdk_sqlite::SqliteCryptoStore;
use ruma::user_id;
let user_id = user_id!("@alice:localhost");
let device_id = "DEVICEID".into();
let store = SqliteCryptoStore::open("/home/example/matrix-client/", None).await?;
let machine = OlmMachine::with_store(user_id, device_id, store).await;
§Decryption
In the world of encrypted communication, it is common to start with the encryption step when implementing a protocol. However, in the case of adding end-to-end encryption support to a Matrix client library, a simpler approach is to first focus on the decryption process. This is because there are already Matrix clients in existence that support encryption, which means that our client library can simply receive encrypted messages and then decrypt them.
In this section, we will guide you through the minimal steps necessary to get the decryption process up and running using the matrix-sdk-crypto Rust crate. By the end of this section you should have a Matrix client that is able to decrypt room events that other clients have sent.
To enable decryption the following three steps are needed:
- The cryptographic identity of your device needs to be published to the homeserver.
- Decryption keys coming in from other devices need to be processed and stored.
- Individual messages need to be decrypted.
The simplified flowchart
§Uploading identity and one-time keys.
To enable end-to-end encryption in a Matrix client, the first step is to announce the support for it to other users in the network. This is done by publishing the client’s long-term device keys and a set of one-time prekeys to the Matrix homeserver. The homeserver then makes this information available to other devices in the network.
The long-term device keys and one-time prekeys allow other devices to encrypt messages specifically for your device.
To achieve this, you will need to extract any requests that need to be sent
to the homeserver from the OlmMachine
and send them to the homeserver.
The following snippet showcases how to achieve this using the
OlmMachine::outgoing_requests()
method:
// Get all the outgoing requests.
let outgoing_requests = machine.outgoing_requests().await?;
// Send each request to the server and push the response into the state machine.
// You can safely send these requests out in parallel.
for request in outgoing_requests {
let request_id = request.request_id();
// Send the request to the server and await a response.
let response = send_request(request).await?;
// Push the response into the state machine.
machine.mark_request_as_sent(&request_id, &response).await?;
}
§🔒 Locking rule
It’s important to note that the outgoing requests method in the
OlmMachine
, while thread-safe, may return the same request multiple
times if it is called multiple times before the request has been marked as
sent. To prevent this issue, it is advisable to encapsulate the outgoing
request handling logic into a separate helper method and protect it from
being called multiple times concurrently using a lock.
This helps to ensure that the request is only handled once and prevents multiple identical requests from being sent.
Additionally, if an error occurs while sending a request using the
OlmMachine::outgoing_requests()
method, the request will be
naturally retried the next time the method is called.
A more complete example, which uses a helper method, might look like this:
struct Client {
outgoing_requests_lock: tokio::sync::Mutex<()>,
olm_machine: OlmMachine,
}
async fn process_outgoing_requests(client: &Client) -> Result<()> {
// Let's acquire a lock so we know that we don't send out the same request out multiple
// times.
let guard = client.outgoing_requests_lock.lock().await;
for request in client.olm_machine.outgoing_requests().await? {
let request_id = request.request_id();
match send_request(&request).await {
Ok(response) => {
client.olm_machine.mark_request_as_sent(&request_id, &response).await?;
}
Err(error) => {
// It's OK to ignore transient HTTP errors since requests will be retried.
eprintln!(
"Error while sending out a end-to-end encryption \
related request: {error:?}"
);
}
}
}
Ok(())
}
Once we have the helper method that processes our outgoing requests we can structure our sync method as follows:
async fn sync(client: &Client) -> Result<()> {
// This is happening at the top of the method so we advertise our
// end-to-end encryption capabilities as soon as possible.
process_outgoing_requests(client).await?;
// We can sync with the homeserver now.
let response = send_out_sync_request(client).await?;
// Process the sync response here.
Ok(())
}
§Receiving room keys and related changes
The next step in our implementation is to forward messages that were sent
directly to the client’s device, and state updates about the one-time
prekeys, to the OlmMachine
. This is achieved using
the OlmMachine::receive_sync_changes()
method.
The method performs two tasks:
-
It processes and, if necessary, decrypts each to-device event that was pushed into it, and returns the decrypted events. The original events are replaced with their decrypted versions.
-
It produces internal state changes that may trigger the creation of new outgoing requests. For example, if the server informs the client that its one-time prekeys have been depleted, the OlmMachine will create an outgoing request to replenish them.
Our updated sync method now looks like this:
async fn sync(client: &Client) -> Result<()> {
process_outgoing_requests(client).await?;
let response = send_out_sync_request(client).await?;
let sync_changes = EncryptionSyncChanges {
to_device_events: response.to_device.events,
changed_devices: &response.device_lists,
one_time_keys_counts: &response.device_one_time_keys_count,
unused_fallback_keys: response.device_unused_fallback_key_types.as_deref(),
next_batch_token: Some(response.next_batch),
};
// Push the sync changes into the OlmMachine, make sure that this is
// happening before the `next_batch` token of the sync is persisted.
let to_device_events = client
.olm_machine
.receive_sync_changes(sync_changes)
.await?;
// Send the outgoing requests out that the sync changes produced.
process_outgoing_requests(client).await?;
// Process the rest of the sync response here.
Ok(())
}
It is important to note that the names of the fields in the response shown in the example match the names of the fields specified in the sync response specification.
It is critical to note that due to the ephemeral nature of to-device
events[1], it is important to process these events before persisting the
next_batch
sync token. This is because if the next_batch
sync token is
persisted before processing the to-device events, some messages might be
lost, leading to decryption failures.
§Decrypting room events
The final step in the decryption process is to decrypt the room events that
are received from the server. To do this, the encrypted events must be
passed to the OlmMachine
, which will use the keys that were previously
exchanged between devices to decrypt the events. The decrypted events can
then be processed and displayed to the user in the Matrix client.
Room message events can be decrypted using the
OlmMachine::decrypt_room_event()
method:
// Decrypt your room events now.
let decrypted = machine
.decrypt_room_event(encrypted, room_id, &settings)
.await?;
It’s worth mentioning that the OlmMachine::decrypt_room_event()
method
is designed to be thread-safe and can be safely called concurrently. This
means that room message events can be processed in parallel, improving the
overall efficiency of the end-to-end encryption implementation.
By allowing room message events to be processed concurrently, the client’s implementation can take full advantage of the capabilities of modern hardware and achieve better performance, especially when dealing with a large number of messages at once.
§Encryption
In this section of the guide, we will focus on enabling the encryption of messages in our Matrix client library. Up until this point, we have been discussing the process of decrypting messages that have been encrypted by other devices. Now, we will shift our focus to the process of encrypting messages on the client side, so that they can be securely transmitted over the Matrix network to other devices.
This section will guide you through the steps required to set up the encryption process, including establishing the necessary sessions and encrypting messages using the Megolm group session. The specific steps are outlined bellow:
The process for enabling encryption in a two-device scenario is also depicted in the following sequence diagram:
In the following subsections, we will provide a step-by-step guide on how to enable the encryption of messages using the OlmMachine. We will outline the specific method calls and usage patterns that are required to establish the necessary sessions, encrypt messages, and send them over the Matrix network.
§Tracking users
The first step in the process of encrypting a message and sending it to a device is to discover the devices that the recipient user has. This can be achieved by sending a request to the homeserver to retrieve a list of the recipient’s device keys. The response to this request will include the device keys for all of the devices that belong to the recipient, as well as information about their current status and whether or not they support end-to-end encryption.
The process for discovering and keeping track of devices for a user is outlined in the Matrix specification in the “Tracking the device list for a user” section.
A simplified sequence diagram of the process can also be found bellow.
The OlmMachine refers to users whose devices we are tracking as “tracked
users” and utilizes the OlmMachine::update_tracked_users()
method to
start considering users to be tracked. Keeping the above diagram in mind, we
can now update our sync method as follows:
async fn sync(client: &Client) -> Result<()> {
process_outgoing_requests(client).await?;
let response = send_out_sync_request(client).await?;
let sync_changes = EncryptionSyncChanges {
to_device_events: response.to_device.events,
changed_devices: &response.device_lists,
one_time_keys_counts: &response.device_one_time_keys_count,
unused_fallback_keys: response.device_unused_fallback_key_types.as_deref(),
next_batch_token: Some(response.next_batch),
};
// Push the sync changes into the OlmMachine, make sure that this is
// happening before the `next_batch` token of the sync is persisted.
let to_device_events = client
.olm_machine
.receive_sync_changes(sync_changes)
.await?;
// Send the outgoing requests out that the sync changes produced.
process_outgoing_requests(client).await?;
// Collect all the joined and invited users of our end-to-end encrypted rooms here.
let mut users = Vec::new();
for (_, room) in &response.rooms.join {
// For simplicity reasons we're only looking at the state field of a joined room, but
// the events in the timeline are important as well.
for event in &room.state.events {
if is_member_event_of_a_joined_user(event) && is_room_encrypted(room) {
let user_id = get_user_id(event);
users.push(user_id);
}
}
}
// Mark all the users that we consider to be in a end-to-end encrypted room with us to be
// tracked. We need to know about all the devices each user has so we can later encrypt
// messages for each of their devices.
client.olm_machine.update_tracked_users(users.iter().map(Deref::deref)).await?;
// Process the rest of the sync response here.
Ok(())
}
Now that we have discovered the devices of the users we’d like to communicate with in an end-to-end encrypted manner, we can start considering encrypting messages for those devices. This concludes the sync processing method, we are now ready to move on to the next section, which will explain how to begin the encryption process.
§Establishing end-to-end encrypted channels
In the Triple Diffie-Hellman section, we described the need for two Curve25519 keys from the recipient device to establish a 1-to-1 secure channel: the long-term identity key of a device and a one-time prekey. In the previous section, we started tracking the device keys, including the long-term identity key that we need. The next step is to download the one-time prekey on an on-demand basis and establish the 1-to-1 secure channel.
To accomplish this, we can use the OlmMachine::get_missing_sessions()
method in bulk, which will claim the one-time prekey for all the devices of
a user that we’re not already sharing a 1-to-1 encrypted channel with.
§🔒 Locking rule
As with the OlmMachine::outgoing_requests()
method, it is necessary to
protect this method with a lock, otherwise we will be creating more 1-to-1
encrypted channels than necessary.
// Mark all the users that are part of an encrypted room as tracked
if let Some((request_id, request)) =
machine.get_missing_sessions(users.iter().map(Deref::deref)).await?
{
let response = send_request(&request).await?;
machine.mark_request_as_sent(&request_id, &response).await?;
}
With the ability to exchange messages directly with devices, we can now start sharing room keys over the 1-to-1 encrypted channel.
§Exchanging room keys
To exchange a room key with our group, we will once again take a bulk
approach. The OlmMachine::share_room_key()
method is used to accomplish
this step. This method will create a new room key, if necessary, and encrypt
it for each device belonging to the users provided as an argument. It will
then output an array of sendToDevice requests that we must send to the
server, and mark the requests as sent.
§🔒 Locking rule
Like some of the previous methods, OlmMachine::share_room_key() needs to be protected by a lock to prevent the possibility of creating and sending multiple room keys simultaneously for the same group. The lock can be implemented on a per-room basis, which allows for parallel room key exchanges across different rooms.
// Let's share a room key with our group.
let requests = machine.share_room_key(
room_id,
users.iter().map(Deref::deref),
EncryptionSettings::default(),
).await?;
// Make sure each request is sent out
for request in requests {
let request_id = &request.txn_id;
let response = send_request(&request).await?;
machine.mark_request_as_sent(&request_id, &response).await?;
}
In order to ensure that room keys are rotated and exchanged when needed, the
OlmMachine::share_room_key()
method should be called before sending
each room message in an end-to-end encrypted room. If a room key has
already been exchanged, the method becomes a no-op.
§Encrypting room events
After the room key has been successfully shared, a plaintext can be encrypted.
let content = AnyMessageLikeEventContent::RoomMessage(RoomMessageEventContent::text_plain("It's a secret to everybody."));
let encrypted_content = machine.encrypt_room_event(room_id, content).await?;
§Appendix: Combining the session creation and room key exchange
The steps from the previous three sections should combined into a single method that is used to send messages.
struct Client {
session_establishment_lock: tokio::sync::Mutex<()>,
olm_machine: OlmMachine,
}
async fn establish_sessions(client: &Client, users: &[&UserId]) -> Result<()> {
if let Some((request_id, request)) =
client.olm_machine.get_missing_sessions(users.iter().map(Deref::deref)).await?
{
let response = send_request(&request).await?;
client.olm_machine.mark_request_as_sent(&request_id, &response).await?;
}
Ok(())
}
async fn share_room_key(machine: &OlmMachine, room_id: &RoomId, users: &[&UserId]) -> Result<()> {
let _lock = acquire_per_room_lock(room_id).await;
let requests = machine.share_room_key(
room_id,
users.iter().map(Deref::deref),
EncryptionSettings::default(),
).await?;
// Make sure each request is sent out
for request in requests {
let request_id = &request.txn_id;
let response = send_to_device_request(&request).await?;
machine.mark_request_as_sent(&request_id, &response).await?;
}
Ok(())
}
async fn send_message(client: &Client, room_id: &RoomId, message: &str) -> Result<()> {
let mut content = json!({
"body": message,
"msgtype": "m.text",
});
if is_room_encrypted(room_id) {
let content = Raw::new(&json!({
"body": message,
"msgtype": "m.text",
}))?.cast();
let users = get_joined_members(room_id).await;
establish_sessions(client, &users).await?;
share_room_key(&client.olm_machine, room_id, &users).await?;
let encrypted = client
.olm_machine
.encrypt_room_event_raw(room_id, "m.room.message", &content)
.await?;
}
Ok(())
}
TODO