Establish Semi-Synchronous Replication

Background

Semi-synchronous replication is a way to improve data integrity at the possible expense of performance. To understand semi-synchronous replication, you need to understand the alternatives it compromises between.

Asynchronous replication means the master will confirm transactions without waiting for that change to synchronize with any slaves. If the master fails before the transaction is replicated, the client software and the user believe a transaction has been safely committed that is now totally lost. Asynchronous replication is the standard for MySQL.

Synchronous replication means the server that initially handles the transaction does not send confirmation to the client until every healthy replication peer acknowledges the transaction. Synchronous replication is built into MySQL Cluster but not available in standard MySQL.

Semi-synchronous replication is a compromise between asynchronous and fully synchronous replication. The master does not confirm transactions until at least one slave has copied the change to its relay log. Like synchronous replication, the client has confidence that confirmed transactions can survive a single-server failure (the change exists on at least one other server at confirmation time). Like asynchronous replication, the client gets that confirmation quickly, without waiting for every slave to process the change.

The key advantages of semi-synchronous replication are:

  • It improves data integrity for single-server outages.
  • It's a simple improvement on asynchronous replication, not a whole new skill to learn.
  • It does not affect read performance at all.
  • The master falls back to asynchronous replication if all slaves are down or delayed beyond the configured timeout.

The key disadvantages of semi-synchronous replication are:

  • Network and I/O delays can dramatically increase client wait time.
  • The slave only acknowledges that the transaction has been copied to its relay log, not applied to its database. This does not address eventual consistency problems with lagging read slaves.
  • The master only waits for one slave to acknowledge. To promote a slave to replace a failed master, you still need to figure out which slave was most up to date.

Steps

Check that Replication is Healthy

The Slave server should already be replicating from the Master server, as at the end of the Establish Replication procedure.

SSH to the Master: master.example.com

Connect to MySQL and insert a new row:

master ~ $ mysql -u root

mysql> insert into important.stuff set details = 'Replicating Master to Slave';
Query OK, 1 row affected (0.02 sec)

SSH to the Slave: slave.example.com

Connect to MySQL and verify the row you inserted on the Master has replicated:

slave ~ $ mysql -u root

mysql> select * from important.stuff order by id desc limit 1;
+-----+-----------------------------+---------------------+
| id  | details                     | happened            |
+-----+-----------------------------+---------------------+
| 296 | Replicating Master to Slave | 2013-04-12 14:09:26 |
+-----+-----------------------------+---------------------+
1 row in set (0.00 sec)

mysql>
After this step is completed:
  1. On the Master, INSERT a new record in the table important.stuff
  2. Within 1 second, that record is replicated to the Slave
Collect a Performance Baseline

From the Master, establish a performance baseline before enabling semi-synchronous replication, using mysqlslap.

This data used in this test shows how semi-synchronous replication affects MySQL in general. Before enabling this feature in production, you should test with more realistic data representative of your application by using the --create and --query options.

mysql> exit
Bye
master ~ $ mysqlslap --auto-generate-sql --concurrency=10 --number-of-queries=1000
Benchmark
    Average number of seconds to run all queries: 2.600 seconds
    Minimum number of seconds to run all queries: 2.600 seconds
    Maximum number of seconds to run all queries: 2.600 seconds
    Number of clients running queries: 10
    Average number of queries per client: 100

There will be some variance between the results shown in this document and your experience, due in part to noisy neighbor problem in our lab.

Configure the Master for Semi-Synchronous Replication

On the Master, install the semi-synchronous master replication plugin.

master ~ $ mysql -u root -e"INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';"
master ~ $ 

Now enable it by editing /etc/my.cnf

master ~ $ sudoedit /etc/my.cnf

And inserting this line after [mysqld]

/etc/my.cnf
[mysqld] rpl_semi_sync_master_enabled = 1 relay_log = mysql-relay-bin log_slave_updates = 1 log_bin = mysql-bin server_id = 10 ......some content not shown

Now restart the MySQL process.

master ~ $ sudo service mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]

Now verify that the plugin is enabled. rpl_semi_sync_master_enabled should be ON

master ~ $ mysql -u root
mysql> SHOW VARIABLES LIKE "Rpl_semi_sync_master_%";
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | ON    |
| rpl_semi_sync_master_timeout       | 10000 |
| rpl_semi_sync_master_trace_level   | 32    |
| rpl_semi_sync_master_wait_no_slave | ON    |
+------------------------------------+-------+
4 rows in set (0.00 sec)

Check the plugin's status.

Rpl_semi_sync_master_status should be ON, indicating that the plugin is ready.

Rpl_semi_sync_master_clients should be 0, indicating that no replication slaves reported their support for semi-synchronous replication when they connected.

mysql> show status like "rpl_semi_sync%";
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 0     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)
After this step is completed:
  1. On the Master, SHOW PLUGINS includes rpl_semi_sync_master
  2. On the Master, /etc/my.cnf sets rpl_semi_sync_master_enabled to 1
  3. On the Master, the MySQL variable rpl_semi_sync_master_enabled has value ON
  4. On the Master, SHOW STATUS shows Rpl_semi_sync_master_status is ON
Configure the Slave for Semi-Sync

On the Slave, install the semi-synchronous slave replication plugin.

Please note, this is a slightly different plugin (appropriate to slaves) and settings than you just applied on the Master.

mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.00 sec)

Now enable the plugin by editing /etc/my.cnf

mysql> exit
Bye
slave ~ $ sudoedit /etc/my.cnf

And inserting this line after [mysqld]

/etc/my.cnf
[mysqld] rpl_semi_sync_slave_enabled = 1 relay_log = mysql-relay-bin log_slave_updates = 1 log_bin = mysql-bin server_id = 20 ......some content not shown

Now restart the MySQL process.

slave ~ $ sudo service mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]

Now verify that the plugin is enabled. rpl_semi_sync_slave_enabled should be ON

slave ~ $ mysql -u root

mysql> SHOW VARIABLES LIKE "Rpl_semi_sync_slave_%";
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| rpl_semi_sync_slave_enabled     | ON    |
| rpl_semi_sync_slave_trace_level | 32    |
+---------------------------------+-------+
2 rows in set (0.00 sec)

Check the plugin's status. Rpl_semi_sync_slave_status should be ON

mysql> show status like "rpl_semi_sync%";
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Rpl_semi_sync_slave_status | ON    |
+----------------------------+-------+
1 row in set (0.00 sec)
After this step is completed:
  1. On the Slave, SHOW PLUGINS includes rpl_semi_sync_slave
  2. On the Slave, /etc/my.cnf sets rpl_semi_sync_slave_enabled to 1
  3. On the Slave, the MySQL variable rpl_semi_sync_slave_enabled has value ON
  4. On the Slave, SHOW STATUS shows Rpl_semi_sync_slave_status is ON
Verify that the Master is Replicating Semi-Synchronously

On the Master, verify that Rpl_semi_sync_master_clients is 1. When the Slave re-connected to the Master in Step 4, it reported that it was semi-synchronous capable.

mysql> show status like "rpl_semi_sync%";
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

Now insert a new row. In a healthy semi-synchronous replication Rpl_semi_sync_master_yes_tx increases with each transaction, indicating the slave successfully acknowledged the transaction.

mysql> insert into important.stuff set details="Inserted semi-synchronously";
Query OK, 1 row affected (0.02 sec)

mysql> show status like "rpl_semi_sync%";
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 535   |
| Rpl_semi_sync_master_net_wait_time         | 535   |
| Rpl_semi_sync_master_net_waits             | 1     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

mysql>
After this step is completed:
  1. On the Master, SHOW STATUS shows Rpl_semi_sync_master_clients is 1
  2. Inserting a new row on the Master causes Rpl_semi_sync_master_yes_tx to increment.
Test Asynchronous Replication Fallback

Semi-synchronous replication falls back to asynchronous replication if no slaves acknowledge the transaction before rpl_semi_sync_master_timeout on the Master.

mysql> SHOW VARIABLES LIKE "rpl_semi_sync_master_timeout";
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_timeout       | 10000 |
+------------------------------------+-------+
1 rows in set (0.00 sec)

The default value is 10000 milliseconds, or 10 seconds.

To test fallback, stop MySQL on the Slave

mysql> exit
Bye
slave ~ $ sudo service mysqld stop
Stopping mysqld:                                           [  OK  ]

On the Master, insert another record:

mysql> insert into important.stuff set details = 'Fail from Semi to Asynchronous';
Query OK, 1 row affected (10.04 sec)

Notice that this transaction took slightly longer to perform than the value of rpl_semi_sync_master_timeout.

Now that a transaction has missed the acknowledgement timeout, further transactions run much faster:

mysql> insert into important.stuff set details="Insert me asynchronously";
Query OK, 1 row affected (0.02 sec)

Now look at the status of the semi-synchronous plugin on the Master while the Slave is not responding.

Rpl_semi_sync_master_clients drops back to 0

Rpl_semi_sync_master_no_tx increments on each new transaction written to the master.

Rpl_semi_sync_master_status is OFF

mysql> show status like "rpl_semi_sync%";
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 0     |
| Rpl_semi_sync_master_net_avg_wait_time     | 414   |
| Rpl_semi_sync_master_net_wait_time         | 829   |
| Rpl_semi_sync_master_net_waits             | 2     |
| Rpl_semi_sync_master_no_times              | 1     |
| Rpl_semi_sync_master_no_tx                 | 2     |
| Rpl_semi_sync_master_status                | OFF   |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)
After this step is completed:
  1. On the Slave, the MySQL process is not running.
  2. On the Master, each transaction increments Rpl_semi_sync_master_no_tx
  3. On the Master, SHOW STATUS shows Rpl_semi_sync_master_clients is 0
  4. On the Master, Rpl_semi_sync_master_status is OFF.
Restore Service on the Slave

On the Slave, start the MySQL process.

slave ~ $ sudo service mysqld start
Starting mysqld:                                           [  OK  ]

On the Master, verify that semi-synchrnous replication has been re-established:

mysql> show status like "rpl_semi_sync%";
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 414   |
| Rpl_semi_sync_master_net_wait_time         | 829   |
| Rpl_semi_sync_master_net_waits             | 2     |
| Rpl_semi_sync_master_no_times              | 1     |
| Rpl_semi_sync_master_no_tx                 | 2     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)
After this step is completed:
  1. On the Slave, the MySQL process is running.
  2. On the Master, each transaction increments Rpl_semi_sync_master_yes_tx
  3. On the Master, Rpl_semi_sync_master_status is ON.
Compare Performance of Semi-Synchronous and Asynchronous

Now replication is semi-synchronous, run the same mysqlslap on the Master and compare results to the baseline from Step 2.

mysql> exit
Bye
master ~ $ mysqlslap --auto-generate-sql --concurrency=10 --number-of-queries=1000
Benchmark
    Average number of seconds to run all queries: 2.908 seconds
    Minimum number of seconds to run all queries: 2.908 seconds
    Maximum number of seconds to run all queries: 2.908 seconds
    Number of clients running queries: 10
    Average number of queries per client: 100

In my tests, this result is usually 2-10% slower than the baseline, again with some variance. (The actual performance change is so small next to normal variance, that this test occasionally runs slightly faster than the baseline. For better results, you could use the --iterations flag to collect more data.)

However, these servers are in the same datacenter. If they had higher network latency, the story might change.

Use tc to injects 90ms (about a tenth of a second) latency, which is typical roundtrip latency if one server were in London and the other in New York.

master ~ $ sudo tc qdisc add dev eth0 root netem delay 90ms
master ~ $ tc qdisc show
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 90.0ms

Now re-run the same slap.

master ~ $ mysqlslap --auto-generate-sql --concurrency=10 --number-of-queries=1000
Benchmark
    Average number of seconds to run all queries: 50.089 seconds
    Minimum number of seconds to run all queries: 50.089 seconds
    Maximum number of seconds to run all queries: 50.089 seconds
    Number of clients running queries: 10
    Average number of queries per client: 100

There will again be some variance in results, but a slowdown of 25x (from ~2s to ~50 s) is not unbelievable.

Remember, network latency is affecting performance because, with semi-synchronous replication, the transaction does not fully succeed until the slave acknowledges, which is now slowed down by the network. If we switch back to asynchronous replication, performance will return to baseline even with the added latency still in place.

master ~ $ mysql -u root -e"set global rpl_semi_sync_master_enabled = 0;"
master ~ $ mysqlslap --auto-generate-sql --concurrency=10 --number-of-queries=1000
Benchmark
    Average number of seconds to run all queries: 2.391 seconds
    Minimum number of seconds to run all queries: 2.391 seconds
    Maximum number of seconds to run all queries: 2.391 seconds
    Number of clients running queries: 10
    Average number of queries per client: 100

Don't forget to turn off the artificial delay when you're done testing.

master ~ $ sudo tc qdisc del dev eth0 root netem