Saturday, March 19, 2011

Multi-Master Support in Drizzle Replication

So Brian asked me the other day what it would take to support multiple masters in our new Drizzle slave plugin. Not master to master replication, but multiple masters sending replication events to a single slave that simply ignores any conflicts and just chugs along. I told him I didn't know, but considering how simple the code is, it probably wouldn't take much.

To get a better understanding of what exactly would be involved in supporting multiple masters, I decided to just start hacking it up. I did this mainly to get a sense of what would need to be changed, since my original design didn't allow for this at all. (Shortsightedness on my part I suppose.)

So I have a beta version of my results available in this Launchpad branch:
lp:~dshrews/drizzle/beta-multi-master
From my simple tests, it seems to work. I'm not real happy with the code (like I said, this was a hack), but functionality is there. I'm not promising this will go into Drizzle trunk just yet. I would like to make some improvements on it, and I'd really like to get some feedback from people on it.

To use it, you'll first need to create a modified slave configuration file. Here is a sample one:
ignore-errors

[master1]
master-host = foo.my.domain
master-port = 3306
master-user = user1
master-pass = password

[master2]
master-host = bar.my.domain
master-port = 3306
master-user = user2
master-pass = password
Currently, a total of 10 masters are supported. This was an arbitrary number. It was simplest to just predetermine a set number of masters due to some complications with config file parsing which I wasn't prepared to solve (this is one of the things I want to see fixed). One IO thread per master will be started, though we still use a single applier thread for the time being.

You'll notice in the sample config a new option, ignore-errors. If this option is present, the slave ignores any errors from replication events received from the masters that it executes locally. I highly recommend you have this option enabled. Also note the addition of the [master1] and [master2] sections that define options for each master. You can go all the way to a [master10] section.

Nothing changes with how you start your slave or masters (see my post on setting up a simple replication example).

Give it a try and let me know how it works for you. Again, this is bleeding edge stuff (does any other database support this?  :) ), so be prepared for bugs.

Thursday, March 17, 2011

Installing Drizzle from Source on OS X

Installing Drizzle on OS X 10.6 is pretty simple. We have a page on our wiki that has the basic steps, but I thought that I'd detail what I do on my Macs in the hope that it may make someone's life easier. Note that we don't build on any 10.5 machines, and I don't use that version anymore, so YMMV with these instructions. Also, these instructions assume that you have the Xcode package already installed. I have Xcode 4 installed, but these instructions should work with Xcode 3, too. If they don't, let me know.

I used to use MacPorts on my Macs to install the necessary libraries that are needed by Drizzle. I've recently dumped that because I didn't like all of the extra stuff that was installed (do you really need to install a separate Perl installation?). And a recent b0rk of their Perl installations was the final straw.

It turns out that all you really need are just a few extra packages to build Drizzle on your Mac. Here are the packages that I currently have installed on my machines:

The first three (autoconf, automake, and libtool) aren't strictly necessary. I install those because I want newer versions of those tools than what OS X provides by default. It makes building a little bit nicer (the output is much cleaner). The last three are what you really need.

Each package has its own instructions for how to compile and install. I use the default installation path (/usr/local) for each. Basically, building and installing for each is simply:
  • ./configure
  • make
  • sudo make install
The Boost package is the lone exception:
  • ./bootstrap.sh
  • ./bjam
  • sudo ./bjam install
If you install the libtool package, there is one additional step you should do. That package installs the binaries libtool and libtoolize into /usr/local/bin. I rename these to glibtool and glibtoolize, respectively. The Drizzle build system looks for these program names.

The last thing I do is to make sure that /usr/local/bin is in my path. So in $HOME/.bash_profile, I have this line:
export PATH=/usr/local/bin:$PATH
With all that in place, to build Drizzle is just:

  • ./config/autorun.sh
  • ./configure
No need to add any extra options to configure to find libraries.

Wednesday, March 9, 2011

Simplest Test Replication Setup. Ever.

> cd tests
> ./dbqp --mode=randgen --start-and-exit --suite=slave_plugin

BOOM. DONE. WINNING.

Tuesday, March 8, 2011

Simple Drizzle Replication Example Using the Slave Plugin

In this blog post, I thought that I'd cover, in a bit more detail, setting up a simple replication setup between two drizzle servers using the new replication slave plugin. If you've used MySQL replication before, you should find some of the concepts very similar. I'll only cover the simplest of examples (single master, single slave) and also explain how to provision a new slave into an existing setup.

So you've downloaded the latest and greatest version of Drizzle and want to setup replication. Where do you start? The very first thing to do is to make certain that both master and slave share the same version of Drizzle to avoid any potential incompatibility issues. Then you setup your master.

Master Setup

Setting up the master is the easiest step. The only thing you really have to do here is to make sure that the master Drizzle database server is started with the --innodb.replication-log option. Something along the lines of:

    master> sbin/drizzled --datadir=$PWD/var --innodb.replication-log &


For more complex setups, you'll also want to consider using the --server-id option, but the default for that is fine for this example. That option is documented here.

With the master running, you can optionally now create a backup of any databases to be imported on the new slave. You would use drizzledump to make such a backup. In this example, I'm keeping it simple and assuming that we are starting with a fresh database with no data.

Slave Setup

Now that you have the master running, you can setup your slave. Starting the slave is almost as simple as starting the master. You will need to use two options to the Drizzle database server on the slave: --plugin-add=slave and --slave.config-file.

    slave> sbin/drizzled --datadir=$PWD/var \
                                    --plugin-add=slave \
                                    --slave.config-file=/tmp/slave.cfg &


These options tell the server to load the slave plugin, and then tells the slave plugin where to find the slave host configuration file. This configuration file has options to specify the master host and a few options to control how the slave operates. You can read more about the available configuration options in the replication slave plugin documentation. Below is a simple example:

    master-host = kodiak
    master-port = 3306
    master-user = kodiak_slave
    master-pass = my_password
    io-thread-sleep = 10
    applier-thread-sleep = 10

Most of these options have sensible defaults, and it should be pretty obvious what most of them are for. The plugin documentation referenced above describes them in more complete detail if you need more information.

So once you start the slave as described above, it will immediately connect to the master host specified in the configuration file and begin pulling events from the InnoDB-based transaction log. By default, a freshly provisioned slave will begin pulling from the beginning of this transaction log. Once all replication messages have been pulled from the master and stored locally on the slave host, the IO thread will sleep and periodically awaken to check for more messages. That's all fine and dandy for your first slave machine in a brand new replication setup, but how do you insert another slave host into an already existing replication architecture? I'm glad you asked!

Provisioning a New Slave Host

We've recently made some changes that makes provisioning a new slave host very easy.

So, the basic formula for creating a new slave host for an existing replication setup is:

  1. Make a backup of the master databases.
  2. Record the state of the master transaction log at the point the backup was made.
  3. Restore the backup on the new slave machine.
  4. Start the new slave and tell it to begin reading the transaction log from the point recorded in #2.
Steps #1 and #2 are covered with the drizzledump client program. If you use the --single-transaction option to drizzledump, it will place a comment near the beginning of the dump output with the InnoDB transaction log metadata. For example:

    master> drizzledump --all-databases --single-transaction > master.backup
    master> head -1 master.backup
    -- SYS_REPLICATION_LOG: COMMIT_ID = 33426, ID = 35074

The SYS_REPLICATION_LOG line gives us the replication log metadata we need when we start a new slave. It has two pieces of information:
  • COMMIT_ID - This value is the commit sequence number recorded for the most recently executed transaction stored in the transaction log. We can use this value to determine proper commit order within the log. The unique transaction ID cannot be used since that value is assigned when the transaction is started, not when it is committed.
  • ID - This is the unique transaction identifier associated with the most recently executed transaction stored in the transaction log.
With this step done, we can now do steps #3 and #4 to start the new slave. First, you must start the slave WITHOUT the replication slave plugin enabled. We don't want it reading from the master until we've imported the backup. So start it without the plugin enabled, import your backup, then shutdown the server:

    slave> sbin/drizzled --datadir=$PWD/var &
    slave> drizzle < master.backup
    slave> drizzle --shutdown

Now that the backup is imported, we can restart the slave with the replication slave plugin enabled and use a new option, --slave.max-commit-id, to force the slave to begin reading the master's transaction log at the proper location:

    slave> sbin/drizzled --datadir=$PWD/var \
                                    --plugin-add=slave \
                                    --slave.config-file=/tmp/slave.cfg \
                                    --slave.max-commit-id=33426 &


We give the --slave.max-commit-id the value from the comment in the master dump file which defines the maximum COMMIT_ID value (the latest transaction) represented by the slave's contents.

Conclusion

So that's all there is to it. I hope you find this example useful and it encourages you to begin trying out Drizzle replication. Be sure to report any bugs and enhancements you'd like to see to our bug system. And don't forget that we have Drizzle Developer Day after the MySQL User Conference this year on April 15th. Be sure to sign up for that and come chat with us face-to-face about replication and anything else Drizzle related.  :)

Happy Replicating.