I always appreciate when people talk about how they’ve built a particular piece of software or a web service, so I thought I’d talk about some of the architecture choices I made when building Groups.io, my recently launched email groups service. This will be a multi-part series.
One of the goals I had when I first started working on Groups.io was to use it as an opportunity to learn the new language Go. Groups.io is written completely in Go and is my first project in the language. As a diehard C programmer (ONElist was written in C, and Bloglines was written in C++), it took very little time to get up to speed on Go and I now consider myself a huge fan of the language. There are many reasons why I like to code in Go. It’s compiled, so it’s fast and you get all the code checks you miss from interpreted languages. It generates stand alone binaries, which is great for distributing to production machines. It’s got a great standard library. It’s easy to write multithreaded code (threads are called goroutines). The documentation system is good. But besides all that, the philosophy behind Go just fits my mental model better than any other language I’ve worked in. It all combines to make programming in Go the most fun I’ve had coding in a very long time.
Groups.io consists of several components that interact with each other. All interactions are done using JSON over HTTP.
The web server handles all web traffic, naturally. It is proxied behind nginx, because I believe that makes for a more flexible and slightly more secure system. Nginx terminates the encrypted HTTPS traffic and passes the unencrypted traffic to the web process. We use the standard Go HTML template system for our web templates, and we use several parts of the Gorilla web toolkit. We use Bootstrap for our HTML framework.
The smtpd daemon handles incoming SMTP traffic for the groups.io domain. It is also proxied behind nginx. The email it handles consists mainly of group messages, although there are some other messages as well, including bounce messages. It sends group and bounce messages to the messageserver for processing. Other messages are forwarded, using a set of rules, to other email addresses. We based smtpd heavily on Go-Guerrilla’s SMTPd.
The messageserver daemon processes group messages, bounce messages and email commands. For group messages, it verifies that the poster is subscribed and has permission to post to the group, it archives the message and sends it out to the group subscribers, using Karl to send the messages. It also sends the messages to our Elasticsearch cluster. Bounce and email command messages are processed as well. All group messages are processed through the messageserver, whether they arrive through the smtpd, or whether they were posted through the web site.
Karl, named after Karl ‘The Mailman’ Malone, is our email sending process. It is responsible for all emails originating from the groups.io domain. It is passed an email message, a footer template, a sender, and a set of data about each receiver the message should be sent to. For each receiver, it evaluates the template, inserting subscriber specific information, and then merges it with the email message before sending it out. It also handles DKIM signing of emails. It stores all emails using Google’s leveldb database until they are successfully sent. A reasonable question to ask is why didn’t I outsource the email delivery part of the service. There are several companies that provide email delivery outsourcing. In general, outsourcing is a way to save development time. But when I thought about it, I did not think I’d be able to save much time by outsourcing; I’d still have to connect our data with whatever templating system the email delivery service used. And Karl did not take very long to write. But more importantly, email delivery is a core competency of our service and I believe we have to own that.
Errord is a simple logging process, used to log error messages and stack traces from any core dumps in any of the other processes. I can look at the errord log and instantly see if anything in the system has crashed and where it crashed.
Rsscrawler and instagramcrawler are cronjobs that deal with the Feed and Instagram integrations, respectively. Rsscrawler looks for updates in feeds that are integrated with our groups, and Instagramcrawler does the same for instagram accounts. They’re currently run twice an hour. If they find an update, they generate a group message and pass it along to the messageserver.
Bouncer is a cronjob that is run once a day to manage bouncing users.
Expirethreads is a cronjob that’s run twice an hour to expire threads that are tagged with hashtags that have an expiration.
Senddigests is a cronjob that’s run once a night, to generate digest emails for users with digest subscriptions.
In future articles, I’ll talk about the machine cluster running Groups.io, the database design behind the service, and some aspects of the code itself. Are there any specific topics you’d like me to address? Please let me know. Are you unhappy with Yahoo Groups or Google Groups? Or are you looking for an email groups service for your company? Please try Groups.io.