Building Scalable PHP / MySQL Websites

Comments

The task: summarize notes from a lecture about scaling PHP apps.

Can’t remember by whom, but you can tell it was a couple of years ago because neither cloud services nor NoSQL get much of a mention. It also doesn’t mention tools such as Varnish. Most of the information is still relevant though.

Scaling

Scaling means enabling your application to grow as traffic grows.

Scaling the web server

I.e., the server responding to browser requests. Different stages:

STAGE 1: a single server with everything in it

STAGE 2: Load balanced servers

Servers can be added as needed.

The guiding principle: only implement what you need, but without coding yourself into a corner. So:

  • Don’t code assuming a physical machine; so careful with APC, Zend Server

  • Don’t assume a file system, unless you use distributive file system NFTS etc

  • Do not use file based sessions but DB based

If you have to, encapsulate all the bad practices into a BAD CLASS, so that you know what to change later on when you need to scale.

Scaling the database

The database is always the bottleneck.

STAGE 1: a single server with everything in it

STAGE 2: master slave replication

Two databases - everything that happens in the master also happens in the slave. This is easily set up with MySQL. It’s worth doing even if only one physical server - if nothing else for backing up.

STAGE 3: isolate areas of load

Have read happen from slave, write to master. Worth preparing for that, by having an extra paramater in you DB call to select which server is being accessed. If doing that, one needs to avoid slave lag - i.e., the delay between action on master being reproduced on the slave. It should be in the order of 0.001s, but can get worse if there is lots of traffic, up to 2s. To prepare for that, don’t write code with a master instruction immediately followed by a slave instruction.

STAGE 4: multiple slaves

Again, something that comes out of the box with MySQL. A typical, but not ideal, scenario - one master, and one slave box for each load balanced web server box. Or one box for each load balanced web server / DB pair. It’s simple but not ideal, as you must create a web server even if all you want is to add a new DB slave, or viceversa. And if a DB gets corrupted you quickly need to remove the web server from the pool, which kind of defeats the point. It’s better to have each web server pick up random DB box.

Sample code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class DB {
       private static $cfg = array(
           'write' =>
               array('mysql:dbname=MyDB;host=10.1.2.3'),
           'read' =>
               array('mysql:dbname=MyDB;host=10.1.2.7',
                     'mysql:dbname=MyDB;host=10.1.2.8',
                     'mysql:dbname=MyDB;host=10.1.2.9');
           );
       public static function getConnection($pool) {
           $max = count(self::$cfg[$pool]) - 1;
           $dsn = self::$cfg[$pool][mt_rand(0, $max)];
           return new PDO($dsn, USER, PASS);
       }
   }
   $db = DB::getConnection('read');

STAGE 5: slave pools

Pools increases the likelihood that queries gets cached on a given server. Another benefit is that if certain queries need improving performance, you can add servers to that particular pool. Different approaches:

  • Divide slaves into pools, each handling only one type of query.

  • Keep a pool of slaves just for batch jobs

  • Use a pool of slaves for comments, one for articles, etc

  • Use a pool for AJAX calls, another for posts, etc

STAGE 6: partitioning

Breaking up the master database. A drawback is that you can lose some SQL functionality, for example the ability to do joins, so they have to done in PHP. You can prepare for it by having the connection returned depending on the type of query.

There are tools for it - MySQL Cluster (which takes a long time to do the splitting), or federation, but they are a bit generic.

Three types: vertical partitioning, horizontal partitioning, application level partitioning.

Vertical partitioning

Done on a single master. Moving different columns into different tables - trying to create smaller tables. Candidates: rarely used columns, columns mostly empty, columns not used in where clauses.

Horizontal partitioning

Mostly applied to multiple masters. It’s when rows are split across databases. It can be

  • Range based

  • Date based (most common)

  • Interlaced (row 1 in table 1, row 2 in table 2, etc). It works well for IDs where you can use module division.

  • User based, like Facebook which have (had) users from different universities in different databases

Application level partitioning

Only done on multiple servers. Different tables in different DB servers, keeping related tables together to allow for joins.

STAGE 7: partition between data centers

Like master slave - but hell.

Caching Database

It allows you ignore the DB. Principles:

  • choose small, discrete, reusable units

  • don’t store data you can’t recreate

  • store data in as close to final processed form

Two types:

Single server memory caches

It caches data specific to the server you are on. It is about performance, not scalability though. APC, Zend Server are used for that. Limited due to lack of synchronised cache.

Distributed cache

Required for scalability. Classic: Memcached, Zend platform.

Challenge 100% complete

This was a simple but useful task, as this is a busy week.

Comments