MongoDB Backup Strategies with Write-Heavy Applications

The MongoDB Management Service Backup announced earlier in May provides excellent backup and restore capabilities for your MongoDB replica set cluster.

Setting up MongoDB backup is quite easy:
  • Sign Up for service,
  • Provide credit card details
  • Install and configure agent
In ~ 10 minutes your replica set is backed up and backup agent starts transferring the oplog data to backup mothership. Price for snapshot creating and storage is negligible, compared to the price ($2/GB) for oplog processing. For example, previous invoice was $0.89 for 330GB Snapshot storage, $0.65 for 71GB for snapshot create and $155 for 77GB of oplog traffic. 
After analyzing my application, I figured out that most of the traffic was caused by two collections:
  • metalog - 1GB capped collection which stores every API request with details
  • opcounters - this stores various opcounters; every minute new document is created and $inc operator increases various counter(s) by one, depending on action, e.g. db.opcounters.update({$_id:1234}, $set:{read:{$inc:1},write:{$inc:1}});
Luckily, MongoDB backup allows you to exclude namespaces when backing up. I decided to skip my metalog and opcounters and not to include them to backup. Even though my application gets more hits daily, the backup cost has decreased 10 times. Because of replica set, I need not to worry about data being lost; there are 3 members and if primary fails, secondary will still have the data.

I still do regular daily backups of my collections to S3 using mongodump with --oplog, which first allows me to have quick access to data snapshot in case I need to analyse data on my own developer machine. Run mongorestore with --oplogReplay option to get the data with latest oplog.

Sometimes you need to re-sync the backup; this used to happen more often with older version of backup agent. Newest backup agent seems to be more solid and re-syncing the backup is not required that often (in fact, after upgrading the agent, I have not done re-sync anymore).

Depending on the nature of your app using MongoDB, I would suggest skipping collections which are merely for logging purposes (for example capped collections) - these are 99% of the time write-only and can cause your oplog to quickly fill up and this can hit your wallet pretty heavily. Skipping other collections depend how critical these collections are for you.

Other considerations:
  • Run your replica set cluster members in geographically different locations, or different availability zones (for example, when running your replica set in the AWS cloud) to avoid downtime in case of power outage or network failure in data center.
  • Consider running Delayed Replica Set Member for rolling backup. This one makes backups based on oplog with a defined delay, and following feature is extremely useful for human errors. Make sure to set it's priority=0, hidden, set slaveDelay in seconds (3600 is pretty ok). Restoring replica set at this point is promoting this replicaSet member as primary.
More reading:


Comments

Popular posts from this blog

Stubbing and Mocking Static Methods with PHPUnit

Enable HTTP/2 Support in AWS ELB

How To Attach Your EBS volume to multiple EC2 instances