Facebook Continuos Integration Cycle


There is an interesting screencast by Chuck Rossi, release manager, on how they manage the release process at Facebook. Worth seeing, but if you haven’t got the time, here’s a quick executive summary.

Facebook runs weekly release cycles

Facebook use svn internally. All developers push to the trunk; every Sunday a release branch is created from the trunk (the ‘shadow branch’, which includes the tests as well). It gets tested for a couple of days, and on Tuesday it is pushed out. Each developer is responsible for their own fixes, and if they are not available to support them the fixes is removed from the branch.

Facebook uses IRC bots internally

Everybody communicates internally by IRC; IRC bots are set up to reply to common questions (the “questions that should not be asked” as Chuck Rossi called them). For example to ask “is my rev going to be in the next release”, the developer would type /msg request_bot rt [rev #]? either in the IRC window, or the address bar of their browser.

IRC bots are also used to communicate availability to support one’s own fixes /msg request_bot support or someone else’s /msg request_bot support @someoneelse A fix with nobody to support it is not released.

Facebook use the usual testing tools, plus some custom ones

They use unit testing, selenium, PHP debugging, and have quite a few reporting tools of their own. They have a nifty one which isolates PHP errors and gives a timeline of how often it occurs.

Facebook use a gatekeeper to push code out

The code for next 6 months is already in the Facebook codebase - but it’s not available to members of the public. There is a ‘gatekeeper’ to determine who runs varios revisions of the code - they can make it available internally, by IP range, by age group, country, and, interestingly, by % of global users. That allows them to push out a revision to 1% of the general public for a few minutes, gather some data, make the code unavailable again, analyze the data, and then try again, perhaps with a high percentage next time.

Facebook use push karma to manage risk

Every developer has a karma score of 4 stars when they start, and any major issue they caused lowers that score. That allows the release managers to decide whether to run extra tests for riskier developers.

Facebook use BitTorrent to push code to the cluster

PHP code is compiled, using HipHop, into a 1GB binary which includes all of Facebook.com. That is then put on an internal BitTorrent tracker, and the thousand of production servers download it. The release managers can then reboot them in clusters with the changes.

*[svn]: subversion