That Wibbly Wobbly Real-Timey Wimey stuff

Courtney Couch2013-08-26

The realtime magic behind Muut and how we deliver all your instant notifications.

Muut was built from the ground up as a 100% realtime system. We bubble up events and listen to events throughout our infrastructure and deliver events to users within a couple milliseconds. We are able to target individual tabs, sessions, groups of users, exclude specific users and so on from any arbitrary event. We even use this system to handle messaging between components of our infrastructure.

To do all this we rolled our solution from scratch.

Realtime browser tabs

An anonymous user and JohnS writing on two different tabs


Why not just use static channels?

Often realtime events are handled through static pub/sub type channels. You subscribe to particular types of events that relate to you. We'll use HttpPushStream on nginx as our example.

Lets say you were interested in what I had to say (I'm sure everyone is) and we decided on a pattern for channels like this:

http://mydomain.com/sub/[type]/[identifier]

So the channel we subscribe to is:

http://mydomain.com/sub/user/courtneycouch

Now anytime I publish a message to that channel, anyone subscribed will get it. Awesome! This setup works well and is quite scaleable. You could imagine channels for forums, threads, users, and any other arbitrary thing a user might need.

There are a few problems here as complexity rises however.

1) A new set of connections for each channel

If you were interested in multiple users, and events on 20 different threads, you might have to be long polling 30 different channels. I'm not going to explain why this is really ugly, if you don't see it, god help you.

2) No ability to restrict events to certain users (limit or target)

This was a big one for us.

We send events to individual users such as their login status and information. We can't be broadcasting that information willy nilly to anyone that can make a get request. We had to be able to say "send this event to this specific connection" or "send this event to all connections with this session" or even "send this event to all sessions logged in as this user."

We also wanted the ability to limit who an event goes to. If you like a post, you don't need to be notified that you liked it, everyone else does though. Users get notifications of events you cause, but you only get events caused by other browsers. The ability to limit and target users also gives us the ability to launch our secured options so only users with access rights to certain areas of the forum will receive events related to those areas.

3) The ability to intersect subscriptions

Let's say we were to run a service that had tens of thousands of forums, say, something like Muut. Just hypothetically. If a user were to subscribe to my posts: /sub/user/courtneycouch then they would get the events from every forum. I could, however, have a channel /sub/forum/moot/user/courtneycouch that limited it to only those on a single forum, but then I have to have a channel for each forum possibly. That's not a big deal on a small scale but on a bit scale imagine if I update my displayname, then that notification would need to be published to every forum I have ever used.

What about arbitrary more conditions: /sub/user/courtneycouch/thread/1/thread/5 (only wanting notifications about events from me on 2 different threads) This becomes untenable because I cannot possibly create channels for every possible combination of conditions.

Who is writing?

Showing who is currently writing


Our solution you ask?

We abstracted away the servers handling incoming connections into our JSON-RPC servers. These servers handle the persistent connections. These servers work similarly to socket.io (which we are not using).

The client begins the connection with a CORS requests and transparently upgrades to WebSocket if possible. Disconnected WebSockets are able to reconnect and resume. Once a connection is successfully established, the channel registers specific tags it's interested in.

Behind that are the actual application servers which handle API requests from the JSON-RPC servers, responding in a normal request/response pattern when the API call requires it (getting a list of posts, getting a list of who likes something).

Whenever there's an action from an API call that results in an event, let's say I were to write a new post, that event is sent to a Redis server with a payload and address information.

For example:

{
address: {
tags: ['forum:moot','user:courtneycouch']
},
payload: { hello: 'world' }
}

JavaScript

This is then picked up by every JSON-RPC server and they will intersect those two tags, sending the event {hello: 'world'} to any connection that matches.

We can also change the address to exclude specific channels:

{
address: {
tags: ['forum:moot','user:courtneycouch'],
exclude: ['channel1']
},
payload: { hello: 'world' }
}

JavaScript

Which would send the event to any connections that have both of those tags but do not have an ID of channel1.

Like the exclude we also have an include:

{
address: {
tags: ['forum:moot','user:courtneycouch'],
exclude: ['channel1'],
include: ['channel2']
},
payload: { hello: 'world' }
}

JavaScript

This will do the same but also include channel2 even if it does not have the tags listed.

From any server in our infrastructure we can create arbitrary events and target where they will go, whether it's an event for a specific tab in a browser, or every active user across the board.

We can also create arbitrary tags like browser:chrome location:usa status:anonymous and so on which allows us to have fine grained control of who we send data to.

So if I want to send an event to everyone using Google Chrome in the USA on OSX who is anonymous, I could simply do it by modifying the address of the event before sending.

We use this pattern to allow us complete control over events. Being able to target specific sessions, tabs, or users with specific properties has allowed us to create a system that can ultimately adapt to any messaging needs we might have, however complex.

You might be wondering if this granularity really is necessary. For us, it's critical. As I mentioned above we use it for login results, and we also have to be able to send events only to administrators. Our ACL product means that every user will have access to different threads on a single forum, so we have to cross reference with access rights and only send events to users with rights to the data contained in the event. There are also situations where we need to notify a single user's other tabs only. Perhaps a logout event and we need to send a notification to any other tabs the user has on that session that the user is now logged out.

There are, in fact, quite a few use cases for such detailed control. This real-time backbone was the foundation we built Muut on top of so we rely on it for virtually everything Muut does.

Static channels vs Muut channels

Static channels vs Muut channels


What did we build this with?

The JSON-RPC servers are NodeJS applications and we publish all the events to them using Redis pub/sub. We rolled our own communication libraries for the client and server.

What's next?

The current version of our tagging only allows us to intersect the tags (tags are all AND), but we're in the process of setting up our JSON-RPC Servers to handle more complex boolean tagging. So a channel could tag itself forum:moot AND user:courtney to intersect them or use an OR statement to get events from either. We haven't needed this yet, but it will become important for some items further down our roadmap (specifically our Hub service that's in progress).

We're also in the process of switching out all our internal messaging to run through ZeroMQ rather than Redis. This will allow us more complex routing to more efficiently use the network resources we have. Such as to direct notifications only to the JSON-RPC Servers that require it rather than forcing every server to process every notification.

A notification box

Muut notification on bottom right of the UI


Is it fast?

Well I mentioned above about delivery within a couple miliseconds. That's being incredibly generous. From emitting an event until it's being serialized for a socket is more like a few hundred microseconds. We're able to emit events at near wire speed, so there's virtually no cost to our real-time event emitting or processing. We're actively sending thousands of events per second with hardly a blip in system utilization. In fact, our biggest load network wide is dealing with SSL overhead.

Is that it?

Well I did greatly simplify pieces and obviously I haven't shared all the implementation details. For example we actually piggyback a bunch of metadata on top of the notifications to use the same notifications to drive our search indexing as well as SEO generation. As a result most notifications, have a metadata property with data for any internal work that needs to be done.

The session and channel management as well is rather non trivial. This is mostly just a 10,000ft overview of how we approached Real-Time here at Muut.

Courtney Couch

courtney@muut.com