/h/all   /h/all   /r/news   /r/crypto   /r/roboticsfeed   /r/philosophy   /r/architecture   /r/aboutsmileys   /r/smileysupdates  


While deciding what to develop next on [Smileys Pub](https://smileys.pub) the answer kind of got served to me. Someone had registered and posted a reply on one of my posts and I had no idea that happened for 2 weeks. Obviously this site needed tools much like reddits messaging to keep people informed of new actions. This update will be less generic and focus on a single feature, and it will be yet another Genserver example (there cannot be too many of those) with the inclusion of production cluster deployment.

Development would include 3 systems. A room, user and post system. The room would keep track of new content, new subs and hot posts. User would be to hold state on new replies and votes directly to users latest posts. And post would track amount of comments in new posts, which is annoying to do via query since threads are nested and updates in theory could be frequent. This being Elixir, all of this will update users clients in real time.

Image showing some of the resulting update counters

Each system is similar but has a few unique traits. I chose Room as an example but will link to code for all three.


Here is an image showing the setup of the room genserver & dynamically created agents:




Hows it look
The general idea here is to keep state, but not so much state that it builds up forever and consumes infinity resources. With the rooms it is not as much of a problem since as you will see we are only interested in a few counters and the room index itself is pretty finite.


If unfamiliar with genservers & agents head to elixir-lang.org guides and skim the guides. It is a good idea to be familiar with the concept but you will want to code it firsthand to get a feel for it. I've already made a few for things like search requests and other use cases but they had slightly less complexity and were more obvious use cases. The genserver running is going to keep the state of rooms as a map with each key being a room name. Like the elixir-lang example it will act as a bucket registry. Each key will lead to the process ID of a bucket, implemented using Agents. A monitor is set on each agent so if there is a crash we lose that bucket (room) and re-create it as needed. The genserver will be in charge of setting expiry on counts as well. There are some interface methods as well that ensure a bucket exist before incrementing a count. With the purpose being known, here is the code:




So each Agent bucket represents a room, and they live and die per the registry... which is also acting as an interface to make sure a bucket exists in my case on actions, which is probably not ideal but it works pretty good so far. Each bucket contains counters in a map that increment when a new post or subscription happens, and decrement at a predetermined expirey interval. I have decided that 6 hours of activity will be tracked in each room, so you always know what is happening in real time for the last 6 hours of a room.. 5 new posts, 10 new subs, etc.


It is pretty simply once you try it but time to describe some of the considerations when using this in production.




Enter the Cluster..
From the earlier parts it may be known this site is distributed across 4 ec2 instances, the api on another 2 and a process server on another (because none of this work would be fun and educational if it wasn't distributed). In my original implementation the genservers were just named things like :room_activity_reg and added via worker to phoenix' supervision tree. But that would mean there is a separate process on each node so you would get near random results depending on which server you hit. The next step was to use the global registry, so they'd be named {:global, :room_activity_reg} which works pretty well. All nodes then have access to the same processes, however if you attempt to register them twice you will get an already registered error so you need to either use a master node to register or check to see if a process of that name exists using erlangs global library. It works but in a distributed environment you can still have 2 nodes try to register at roughly the same time and have a duplicate process registry issue. I went with a library approach that already solved this problem called [syn](https://github.com/ostinelli/syn) which provides a global registry and handles collisions for me. The only issue at the moment is this will keep a process dead instead of restarting it at the moment I believe, so there may be a quick followup coming where that problem is solved better. Syn uses Mnesia and all nodes are aware of each other so it worked out of the box very easily. Here is my application startup code btw in phoenix where these are registered with syn (that file is a bit of a mess atm whoops):





Does it scale?
The only part of this I was worried about was map size. Everything would scale with the cluster just fine other than that I think. If there are millions of users and track each activity in a bucket then the registry would grow one to one with the users. I have not found good info on it yet but heard that possibly a map will get buggy if it grows to a large size. That's a pretty solvable problem though that I won't worry about quite yet. At that point there would be a slight refactor to distribute the genserver across multiple processes. So scaling this in the current form could run into an issue... fixing it does not seem intimidating but it is something I should find the time to simulate and test.




This took about a week in my offtime with just a few short programming sessions. It isn't particularly striking that this functionality was possible to make; however the ease of it along with the way in which it scales was very appealing to me. And adding, removing and adjusting functionality to it is as easy as changing anything else in the code as long as your aware of the impact on the current buckets and servers. Whether these were the right tools for this situation, it feels good for now and am happy to get the exercise. One additional note is, a node reset impacts these so hot swapping your launches is much preferable. This is the start of when I start really thinking about OTP concepts and trying for 100% uptime over long periods of time. Some changes will need to be made to get there but it's going to be super stable until then I think. Let me know if anyone has alternative suggestions to improve or refactor.
It's interesting to think about what other large community sites would be like if they were based in Elixir.
Ah and sorry Michael for not replying to your post in a timely fashion! !


Here are the other genservers for site activity


Posted 1 year ago