moot

Open full view…

That Wibbly Wobbly Real-Timey Wimey stuff

Mon, 26 Aug 2013 11:19:01 GMT

matita
Mon, 26 Aug 2013 16:01:26 GMT

This is pure gold.

matita
Mon, 26 Aug 2013 16:13:42 GMT

Since your infrastructure don't rely on pub/sub, could it be possible to subscribe to a RegExp tag? I thought about this when someone asked about a "chat" feature for Moot and you said that you could achieve this by creating a forum with path similar to */messages/user1/user2* (don't remember precisely...) I think that your solution leaves out the messages started by `user2` to `user1`. If you could subscribe to a regexp path you could do someting like */messages/(\w+|user1)/(\w+|user1)* and catch all messages to `user1`. BTW it also could be very good if any developer could use by API your infrastructure to send messages that are not for forum or comments :D

Courtney Couch
Mon, 26 Aug 2013 17:21:57 GMT

We do not use RegEx processing for notification processing. Every effort was made to make the handling of notifications not require any text searching. The overhead for delivering messages is linear for the number of tags on the event, adding regex would significantly change that. As far as opening up the API to allow the client more control, we plan on keeping the mechanism fairly hidden. We know what is relevant to a user based on their authentication and API methods they call, so we automatically manage the tags on our end based on that. If a `user2` sends a message to `/messages/user1/user2` then on their feed they will have their messages, as well as `user1` will have that in their feed. When `user2` lists `messages` they will see their incoming messages in `/messages/user2` as well as their outbound to `user1` in `/messages/user1/user2` since it's a child of the `/messages` path. The relevant notifications are transparently handled without needing any wo rk from your end. It's easier for us to simply manage this for you and gives us more control to manage the performance profile.

Jon Watte
Mon, 26 Aug 2013 19:56:20 GMT

How does sharing work? Or is this single instance only?

shuri
Mon, 26 Aug 2013 22:12:30 GMT

How do you store and handle queries on tags?

Courtney Couch
Tue, 27 Aug 2013 00:00:33 GMT

Sharing? What do you mean? We have a distributed network so nothing we do is on a single instance :)

Courtney Couch
Tue, 27 Aug 2013 00:01:52 GMT

@shuri notifications are all transient. They are not stored unless they need to be queued for a specific channel with some delivery issues. We also don't query them, they are pushed out to channels that have tags, the channels don't query them after the fact.

film42
Tue, 27 Aug 2013 01:00:17 GMT

Can't wait to see some of this open sourced. I've been very impressed with moot's responsiveness.

Courtney Couch
Tue, 27 Aug 2013 02:21:21 GMT

I'm glad you have taken note of the result of our efforts on that front! We're nothing if not obsessed over performance. Our JSON-RPC client/server will definitely be open sourced and we'll be evaluating what other pieces we can open up once things calm down. It's been a bit of a mad dash over here since the start of our Beta. :)

cmelbye
Tue, 27 Aug 2013 03:51:37 GMT

Amazing.

bfadmin
Sun, 08 Sep 2013 21:36:17 GMT

Sorry, didn't get the point. >This is then picked up by every JSON-RPC server and they will intersect those two tags, sending the event {hello: 'world'} to any connection that matches. It sounds crazy to me because the task of JSON-RPC server is just to handle the connection to clients, the routing logic should be done on the backend. Secondly, the whole idea with the boolean OR seems to be prone to performance issues in case a message has a lot of tags.

Courtney Couch
Mon, 09 Sep 2013 06:34:18 GMT

The problem with having the multiplexing happen earlier is that if a particular JSON-RPC server has a few thousand connections that need an event then in your scenario the server would have to read and process a few thousand events and send each one, and in our situation it reads a single event and multiplexes to the connections. As far as pure throughput, pushing the multiplexing to as late in the process as possible is ideal. In regard to boolean OR statements causing performance issues that's simply not true. We don't process on a connection by connection basis. We intersect sets using redis a boolean statement such as `(tag1 and tag2) or (tag3 and tag4)`: (simplified a bit to demonstrate) --- sinterstore temp tag1 tag2 sinterstore temp2 tag3 tag4 sunion temp temp2 --- Pretty simple stuff and this works against millions of connections and millions of tags incredibly quickly.

bfadmin
Mon, 09 Sep 2013 08:51:45 GMT

> @courtneycouch > is that if a particular JSON-RPC server has a few thousand connections that need an event then in your scenario the server would have to read and process a few thousand events Still feel like we are talking about different things. I consider JSON servers from your story as just "channels that try to be persistent". All they can do is to send and receive messages, open and close. No processing. Just act like tubes. If you mean that a single backend server will not handle all the load from the channels, I agree, but you can use clusters. A server from the cluster will normally take a message from a channel, parse its tags, and send the message according to the tag routing. To me it is the most obvious way. The communication between JSON servers and backend need not be synchronous, you can use some queue manager like RabbitMQ. I hope this makes sense. >In regard to boolean OR statements causing performance issues that's simply not true. We don't process on a connection by connection basis. I don't understand the "connection by connection" abraca dabra, but I know that the complexity of the intersection/union will clearly grow with the number of items in the list to unify/intersect. Even in optimized Redis queries. You may not face this problem now, when the number of users is not so big.

Courtney Couch
Mon, 09 Sep 2013 09:09:45 GMT

> @bfadmin > If you mean that a single backend server will not handle all the load from the channels, I agree, but you can use clusters No it's not a single backend server. Our infrastructure is decentralized for the most part (the few pieces that are single points are in the process of getting removed). In any case the JSON-RPC servers are responsible for multiplexing and managing state of persistent connections. >A server from the cluster will normally take a message from a channel, parse its tags, and send the message according to the tag routing. Yes that's what is being described here. The JSON-RPC servers accept events, and from its tags distribute to the connections it manages. >The communication between JSON servers and backend need not be synchronous, you can use some queue manager like RabbitMQ. RabbitMQ is completely unnecessary. Redis pub/sub is totally sufficient. >I don't understand the “connection by connection” abracadabra, but I know that the complexity of the intersection/union will clearly grow with the number of items in the list to unify/inter sect. Even in optimized Redis queries. You may not face this problem now, when the number of users is not so big. I suspect you haven't played with Redis much. The limiting factor here is not the intersecting of sets, its serializing down to the sockets. I could have 10 million users online simultaneously with thousands of tags each and it wouldn't hardly matter. The limit would be how many notifications can be serialized across the network per second. The scale you are talking about is irrelevant since the network bottleneck is so tiny by comparison. Assuming notifications are an average of 300 bytes - that means a theoretical maximum of 416k notifications can be sent out per second. In general an event gets sent out on average of once every 10 seconds. That means 4,160k users per server, and events get sent out on average to 20 connections. That's 20k intersections and joins. There's no way we'd be able to push the theoretical maximum of a 1gb pipe and even if we could, 20k intersections wouldn't be much of an issue. Redis can _easily_ handle this. The ability of a single server to handle that kind of limit though is irrelevant since we can simply add as many JSON-RPC servers as we want. It's horizontally scalable. Each server maintains it's own repository of tags and handles multiplexing on it's own. Whether I have 5 or 500 servers won't matter so much. I'm not sure what scale you are suggesting but having millions of simultaneous users online is pretty damn massive, and our test cases emulate these loads.

bfadmin
Mon, 09 Sep 2013 10:50:16 GMT

Thanks, Courtney, I see we talk about something similar now, but I definitely need to go and play some Redis to get rid of unnecessary stuff in my picture.) Thanks for taking your time:)

Courtney Couch
Mon, 09 Sep 2013 11:34:27 GMT

No problem. It's a pretty big subject and I definitely glossed over many of the details on this post. We will be elaborating more on our infrastructure as we have time and many of the realtime bits will be open sourced later on as well.

vishy1618
Wed, 30 Oct 2013 06:14:07 GMT

moot.it is really interesting!

expatabroad
Sat, 18 Oct 2014 13:50:58 GMT

Looks great. I've been weighing up between this, disqus and intensedebate (livefyre too expensive)...... Muut seems to be very slick and has some nice functionality

Felipe Rohde
Mon, 02 Feb 2015 03:29:09 GMT

Nice work Courtney, But how do you treat websockets connection persistence, I mean, trough load balancers, to sticky sessions? HAProxy or something like that?

Yogesh Chandra
Sun, 09 Aug 2015 16:40:35 GMT

Very informative post! Thanks. What was the argument against using web sockets or socket.io?

next