Writing a microservice in Ruby

Development

Reading Time: 12 minutes

This article was originally published by Pierpaolo Frasa on his personal site, and with his permission, we are sharing it here for Codeship readers.

Everybody is talking about microservices, but I haven’t seen a lot of good, comprehensive descriptions of how to actually write a microservice in Ruby. This may be because a significant number of Ruby developers are still most comfortable with Rails (which is not a bad thing in and of itself, but it’s not all Ruby is capable of).

So I want to offer my account.

Here’s the story: We want to write a microservice whose responsibility it is to send mail. It gets a message such as this:

{
  'provider': 'mandrill',
  'template': 'invoice',
  'from': 'support@company.com',
  'to': 'user@example.com',
  'replacements': {
    'salutation': 'Jack',
    'year': '2016'
  }
}

And it knows that it has to send the invoice email template to user@example.com by substituting some variables. (We’re using mandrill as an email API provider, which sadly is going out of service soon.)

This is a perfect fit for a microservice because it’s a small and focused piece of functionality with a clearly defined interface. So, that’s exactly what we did at work, when we decided that our mailing infrastructure needed a clean rewrite.

Now if we have a microservice, we need a way to send it some messages. That’s where a message queue comes in. There are tons of options here, and you may pick whatever favourite you have. We chose to go with RabbitMQ, because:

  • It’s popular and encodes a standard (AMQP).
  • It has bindings for most languages, so it’s perfect for polyglot environments. I like writing Ruby (and know it better than other languages), but I don’t think it’s the best fit for every problem nor that it’s going to be so for all future times. So if we ever have a need to write an application in, say, Elixir that sends mail through our service, it will be possible (and not too hard).
  • It is very flexible and accomodates a number of workflows—simple background processing through a named queue (which I’ll focus on here) or complex message exchange workflows (even RPC is possible). The website has a lot of examples.
  • It comes with a useful admin panel accessible through your web browser.
  • There are hosted solutions available (no affiliation). (And for development, you should be able to find it in your favourite package manager’s sources.)
  • It’s written in Erlang and apparently those Erlang programmers know how to deal with concurrency. ;)

Putting a message into a queue with RabbitMQ is fairly easy and looks something like this:

require 'bunny'
require 'json'

connection = Bunny.new
connection.start
channel = connection.create_channel
queue = channel.queue 'mails', durable: true

json = { ... }.to_json
queue.publish json
connection.close

bunny is the official RabbitMQ gem and when we don’t pass any options to Bunny.new, it will just assume that rabbitmq runs on localhost:5672 with the standard credentials. We then (after some setup) connect to a queue identified by the name “mails”. If the queue doesn’t exist yet, it will be created, otherwise it will connect to it. Then, we can publish any message we want directly to that queue (for example, the payload we saw above). We’re using JSON here, but in fact you could be using anything you like (BSON, Protocol Buffers, whatever), RabbitMQ doesn’t care.

Now, we’ve got the producer side covered, but we still need an application that actually takes and processes the message. For that, we used sneakers, which is a wrapper gem around RabbitMQ. It basically exposes the subset of RabbitMQ to you that you’re most likely to use if you just want to do some background processing—but it’s still all RabbitMQ underneath. With sneakers (which is heavily inspired by sidekiq), we can define a “worker” to process our mail sending requests:

require 'sneakers'
require 'json'
require 'mandrill_api/provider'

class Mailer
  include Sneakers::Worker
  from_queue 'mails'

  def work(message)
    puts "RECEIVED: #{message}"
    option = JSON.parse(message)
    MandrillApi::Provider.new.deliver(options)
    ack!
  end
end

We have to specify the queue that we’re reading from (“mails”) and then the work method will actually consume the message. We parse the message (which we know is JSON, because that’s what we agreed upon before—but again, that is your decision, RabbitMQ or sneakers don’t care) and pass the message hash to some internal class that does the actual work. Finally, we must acknowledge the message or RabbitMQ will put the message back into the queue. If you need to reject a message etc., the sneakers wiki can fill you in on how to do that. We also included some logging in order to know what’s going on (we’ll talk later about why we log to standard output).

But one class doth not an application make. So we have to build a project structure around this—and this is something that we’re not usually taught as Rails developers, because we’re used to just running rails new and everything is set up for us. So I want to expand on this a little bit. This is roughly what our project tree ended up looking like:

.
├── Gemfile
├── Gemfile.lock
├── Procfile
├── README.md
├── bin
│   └── mailer
├── config
│   ├── deploy/...
│   ├── deploy.rb
│   ├── settings.yml
│   └── setup.rb
├── examples
│   └── mail.rb
├── lib
│   ├── mailer.rb
│   └── mandrill_api/...
└── spec
    ├── acceptance/...
    ├── acceptance_helper.rb
    ├── lib/...
    └── spec_helper.rb

Some of this is self-explanatory, such as the Gemfile(\.lock)? and the readme. There’s also not much to say about specs, except that we used the convention to have one helper file (spec_helper.rb) for fast unit tests and another one (acceptance_helper.rb) for acceptance tests that require some more setup (e.g. mocking actual HTTP requests). The lib folder is also rather uninteresting for our purposes, since we already saw the lib/mailer.rb (it’s the worker class we defined above) and the rest is specific to the individual service. And the examples/mail.rb is basically just the example mail enqueuing script that we saw above, kept around so we can trigger a manual test whenever needed. Now I want to specifically talk about the config/setup.rb file. This is the file that we’ll always load initially (even in the spec_helper.rb), so it’s desirable that it doesn’t do too much (or your tests will be slow). Here’s what it looks like in our case:

require 'bundler/setup'

lib_path = File.expand_path '../../lib', __FILE__
$LOAD_PATH.unshift lib_path

ENVIRONMENT = ENV['ENVIRONMENT'] || 'development'

require 'yaml'
settings_file = File.expand_path '../settings.yml', __FILE__
SETTINGS = YAML.load_file(settings_file)[ENVIRONMENT]

if %w(development test).include? ENVIRONMENT
  require 'byebug'
end

The most important thing we’re doing here is setting the load path. First we require bundler/setup, so we can henceforth require gems by name. Next, we add the lib folder of our service to the load path. This means that we can e.g. require mandrill_api/provider and it will look it up from <project_root>/ lib/mandrill_api/provider. Nobody likes to deal with relative paths, so that’s something you want to do. Notice that we don’t get autloading like in Rails. We’re also not calling Bundler.require which would actually require all the gems in your Gemfile—something that Rails would do for you. This means that you always have to require your dependencies (gems or lib files) explicitly (which I actually think is good design anyway).

Next, I like the Rails convention of having multiple environments, so we’re setting it here by loading it from a UNIX environment variable called ENVIRONMENT (and defaulting to development). We’re also going to need some settings (like the RabbitMQ connection options or some API keys for the services we use) and it makes sense to have them depend on the environment, so we load a YAML file for that and put that into a global variable.

Finally, we make it so that dropping byebug (the debugger for Ruby 2.x) into our code will always work in development and test by requiring it beforehand. If you’re worried about the speed of that (it does take a moment) you might leave that out and only include when explicitly needed or (heresy!) you could include a monkeypatch like:

if %w(development test).include? ENVIRONMENT
  class Object
    def byebug
      require 'byebug'
      super
    end
  end
end

Now, we’ve got a worker class and a general project structure. We just need to tell sneakers to run our worker and that’s what we do in bin/mailer:

#!/usr/bin/env ruby
require_relative '../config/setup'
require 'sneakers/runner'
require 'logger'
require 'mailer'
require 'httplog'

Sneakers.configure(
  amqp: SETTINGS['amqp_url'],
  daemonize: false,
  log: STDOUT
)
Sneakers.logger.level = Logger::INFO
Httplog.options[:log_headers] = true

Sneakers::Runner.new([Mailer]).run

Notice that this is an executable (the shebang at the top), so we can just run it without the ruby command. We first load our setup file (for this we have to use a relative path) and then all the other stuff that we need, including our mailer worker class.

Sign up for a free Codeship Account

The important part here is where we configure sneakers: The amqp argument accepts a URL that specifies the RabbitMQ connection which we load from the settings. We also tell sneakers to run in the foreground and log to standard ouput (more on that in a bit). Then we basically give sneakers an array of worker classes and tell it to run them. We also took the liberty to include some logging, so we could see what’s going on. The httplog gem will just log all outgoing requests which is quite useful if you’re talking to an external API (here we’re making it also log the HTTP headers, which it doesn’t do by default).

Now if you run bin/mailer you will see something like the following:

... WARN: Loading runner configuration...
... INFO: New configuration:
#<Sneakers::Configuration:0x007f96229f5f28 ...>
... INFO: Heartbeat interval used (in seconds): 2

(though the actual output is much more verbose!)

If you keep it running and then run our enqueue script from above in another terminal window, you’ll get something like:

... RECEIVED: {"provider":"mandrill","template":"invoice", ...}
D, ... [httplog] Sending: POST
https://mandrillapp.com:443/api/1.0/messages/send-template.json
D, ... [httplog] Data: {"template_name":"invoice", ...}
D, ... [httplog] Connecting: mandrillapp.com:443
D, ... [httplog] Status: 200
D, ... [httplog] Response:
[{"email":"user@example.com","status":"sent", ...}]
D, ... [httplog] Benchmark: 1.698229061003076 seconds

(again shortened here) which is pretty informative, especially in the beginning, but of course you can later reduce log noise as needed.

This gives us the basic project structure, so what’s left to do? Well, here comes the hard part: Deployment.

There’s lots of things you have to take care of when you deploy a microservice (or, in general, any application), including the following:

  • You want to daemonize it (i.e., it should run in the background). We could have done this when setting up sneakers above, but I prefer not to—in development, I like to be able to see the log output and to be able to kill the process with CTRL-C.
  • You want to have proper logging. This include somehow making sure that your logfiles don’t end up filling your whole disk or becoming so unmanageably huge that grepping through them takes forever (e.g. log rotation).
  • You want it to restart both when you have to restart your server for whatever reason and when it crashes because who knows what.
  • You want to have standard commands for starting / stopping / restarting when needed.

You could do this all by yourself in Ruby, but I think there’s a better solution: Use something that’s built to handle those tasks, i.e. your operating system (Mike Perham, the creator of sidekiq, agrees with me). In our case, this meant systemd since that’s what runs on our servers (and on most Linux systems nowadays), but I really don’t want to pick a fight here. Upstart or daemontools probably work just as fine.

For systemd to run our microservice we need to generate some config files. This could be done by hand, but I prefer to use a tool called foreman. With foreman, we can specify all the processes that our app needs to spin up in a Procfile:

mailer: bin/mailer

Here, we only have one process, but you could have several ones. We have a process that we identify as “mailer” and it should run the bin/mailer executable. Now, the nice part about foreman is that it is able to export this configuration to a number of init systems, including systemd. For example, out of this simple Procfile it will actually create a bunch of files; this is because, as I mentioned earlier, we could have multiple processes specified in the Procfile, so the different files specify a hierarchy of dependencies. We will have a top-level mailer.target file that depends on a mailer-mailer.target file (if we had multiple processes in the Procfile, mailer.target would depend on multiple sub-targets). This is in turn dependent on mailer-mailer-1.service (we could have more than one such file by explicitly setting the concurrency level to a value higher than one). That last file looks something like this:

[Unit]
PartOf=-.target

[Service]
User=mailer_user
WorkingDirectory=/var/www/mailer_production/releases/16
Environment=PORT=5000
Environment=PATH=
/home/deploy/.rvm/gems/ruby-2.2.3/gems/bundler-1.11.2:...
Environment=ENVIRONMENT=production
ExecStart=/bin/bash -lc 'bin/mailer'
Restart=always
StandardInput=null
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=%n
KillMode=process

The exact details are not so important, but here we can see that we specify the user the service should run as, the working directory, the command to run to start the service, that we should always restart after a failure and that we log to the system logger. We also set some environment variables, including the PATH, something I’ll talk about in a bit.

With this, we get all of the behaviour that we wanted earlier. The job will run in the background and restart after a failure. You can also make it run when the system boots up by running sudo systemctl enable mailer.target. And the log output, that we’re just sending to standard output, will get redirected to the system logger. In systemd’s case, this is typically journald, which is a binary logger (where the issue of rotating your logs doesn’t arise anymore). Inspecting the log output for our mailer service can be done with:

$ sudo journalctl -xu mailer-mailer-1.service
-- Logs begin at Thu 2015-12-24 01:59:54 CET, end at ... --
Feb 23 10:00:07 ... RECEIVED: {"from": ...}
... 

You can give journalctl more options and e.g. filter by date.

In order to have foreman generate the systemd files, we have to set up the export process in our deployment. Now, I don’t know whether you use Capistrano 2 or 3 or something else (like mina), so I’m just going you to show the respective shell commands that you’ll have to incorporate. The hardest task somehow is always getting your environment variables set up correctly. In order for foreman to write the variables we saw above to the startup scripts, we can put them into a .env file first, by running this from the deployed project root:

$ echo "PATH=$(bundle show bundler):$PATH" >> .env
$ echo "ENVIRONMENT=production" >> .env

(I’m ignoring the PORT variable here—foreman generates it automatically, and we don’t even need it for our service.)

Then we tell foreman to export to systemd while reading these variables in from the .env file we just created:

$ sudo -E env "PATH=$PATH" bundle exec foreman\
  export systemd /etc/systemd/system\
  -a mailer -u mailer_user -e .env

That’s a long command, but it boils down to running foreman export systemd, while specifying the directory that the files should be put in (/etc/systemd/system is, as far as I can tell, a “standard” directory for it), the name of the target, the user it should run as and the environment file it should load.

Then we reload everything:

$ sudo systemctl daemon-reload
$ sudo systemctl reload-or-restart mailer.target

And then, we enable the service in order to make it always run when the server boots up:

$ sudo systemctl enable mailer.target

After that, our service should be up and running on our server, ready to accept any incoming messages.

I have covered a lot of ground in this post, but it is my hope that I showed some of you the big picture behind writing and deploying a microservice. Obviously, you’ll have some research to do if you really want to follow along and do this yourself, but I think I have given you some pointers to technologies you could be looking at.

We wrote our mailer service basically like this a couple months ago and so far we’re pretty happy with the results. The fact that the mailer is isolated with a clearly defined API and is well-tested independently gives us a lot of confidence in it doing what it’s supposed to do. And having a sane restart behaviour is a deal-breaker for us—we still have some sidekiq workers that occasionally die and while we fixed the issue by adding monit on top of it, it feels much better to fully use the tools your operating system provides.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

  • Have you got the source code at github?

    • Pierpaolo Frasa

      Sorry, the code is not public atm. :(

  • Serguei Cambour

    Thank you for sharing. It seems like you have a typo in #work method: option options variables to fix :)

    • Pierpaolo Frasa

      yes, you’re totally correct. thanks for spotting it. :)

  • Stanislav Mekhonoshin

    Nice article.
    But I prefer docker for microservices deployment.
    It allows to get rid of capistrano, dealing with systemd and libraries conflicts.

    • Pierpaolo Frasa

      Docker has a definitive benefit in that it gives you full isolation. But it doesn’t make things magically easier. You still have to figure out logging, error handling, communication (hooking up to rabbitmq) etc. There are ways, but for this simple tutorial, I chose to go the easier route. Also, I don’t feel systemd to be hard to deal with at all. :)

  • I’m not really following the reasoning behind making byebug a monkeypatch. The condition is the same, but the code is strictly more complex (ie. requires the exact same thing but via monkeypatch which blows away the method cache, etc). Am I missing something? Is the spec_helper loaded more than once?

    • Pierpaolo Frasa

      Well, that was only an aside anyway. But the reason is that if you run specs individually, they take a while longer to run if byebug is in your spec helper because byebug is actually kinda heavy – this is a minor nuisance when TDDing. If the monkeypatch is offending your taste, you can of course always create some editor macro that will just insert `require ‘byebug’; byebug` at any given point in the code.

      • Yeah I understand, sorry to focus on an incidental detail (what can I say, I’m a developer ;). I understand intimately why you would not want to always load the debugger.

        But I still don’t see how the monkeypatch version is any more efficient than the inline spec_helper code. As far as I can tell they both do exactly the same thing, the monkeypatch just does it with more overhead. Unless you are leaving out some magic where the monkeypatch is only loaded when the debugger method is actually called.

        • Pierpaolo Frasa

          it’s just lazy loading. On a quick examination that I just made in a project, the “require ‘byebug'” statement increases the spec load time by about 0.1s. Even if your spec file that you run doesn’t call the debugger. With the monkeypatch, you would lose that extra 0.1s only when actually using it. While this possibly increases the load time slightly when you do use it (because of blowing away the method cache, as you mentioned), it will reduce the load time it in case you don’t – which should be much more often; I don’t throw in debuggers in my tests all the time, only when I don’t understand what’s going on. As it’s only 0.1s on my machine, this might be a stupid micro-optimisation (although I really like fast tests), so it might not be worth it ;) I actually now think an editor macro for “require ‘byebug’; byebug” would be a preferable solution.

  • Vance Feld

    heh. i don’t even use a server any more for microservices.. i just deploy node.js to a functions app in Azure and boom, all microservices billed per million requests, auto-scaling, no headache. get with the times. this email code is about 30 lines of code AND configuration in node.js + Azure Function App or Amazon Lambda.

    • Marco Roy

      Like anything else, serverless function services (Azure Functions, AWS Lambda) are not a golden hammer. There is a significant performance penalty with using them due to the uncertainty of the freeze-thaw cycle, which doesn’t make it a perfect solution for every microservice. Docker/containers can be a great alternative; with well managed infrastructure, it almost feels serverless. However, I do agree that in this case, it’s just emails, so performance really shouldn’t be a concern.

  • Marco Roy

    By the way, Mandrill is not going out of service, it’s simply becoming a MailChimp add-on instead of a standalone service. We have a similar microservice that uses Mandrill as well, and we love their services. So, a correction would be nice, for their sake. :)

    • Pierpaolo Frasa

      Well, they completely changed their pricing model and TOC in a way that was incompatible with our current usage. And they gave everyone just two months’ notice. So, I think “going out of service” is as nice as I can be. :) We switched over to Sparkpost, which seems to pretty much seamlessly pick up where Mandrill left.