Friday, December 10, 2010

Drizzle Transaction Log and Statement Rollbacks

First off, apologies to those who left comments on my past posts and your comments didn't show up. Apparently, I didn't have my blog settings setup correctly, and I just noticed (and approved) them.

So for the past few weeks, Patrick Crews and myself have been trying to chase down a really nasty transaction log bug that has been showing up under heavy load. The transaction log would end up not accurately representing the changes that took place in the server. Obviously, that's bad... unless you don't care if your slaves are an accurate representation of your master.

This has been a crazy difficult bug to track down, partly due to the fact that the cause appears to be multiple bugs. We've identified at least two bugs that are leading to transaction log inconsistency, one of which is how we deal with SQL statements that fail mid-execution. (The other, we believe, has to do with savepoints, and we're still working on that one.)

For the statement rollback bug, there were two problems:
  1. We were not correctly removing recorded row changes when a statement affecting multiple rows failed after changing the first row.
  2. Because we can now segment a single statement's row changes across multiple Transaction messages (due to having a max message size), we had no way to rollback transaction log changes made by a failed statement once it had been split up into multiple messages.
For issue #1, consider this statement:
INSERT INTO t1 (not_null_column) VALUES ('abc'),('def'),(NULL);
As each row is inserted, the in-memory protobuf message recording the transaction row changes is updated to include the row inserts. As we insert into this column which does not allow NULLs, we'll get changes recorded for values 'abc' and 'def'. However, once we try to insert NULL, the statement fails and InnoDB rolls back the two inserted rows. These two rows were not being removed from the in-memory message structure, which later gets sent through the replication stream and written to the transaction log, thus the inconsistency. The fix is fairly straightforward in that we just recognize the error, then remove the entries before sending the protobuf message downstream.

For issue #2, the situation is similar to issue #1, except that once row changes were sent through the replication stream (i.e., recorded in the transaction log file), we had no way to undo them if the current statement was still active and ended up failing. We decided to resolve this by adding a new protobuf Statement message type: ROLLBACK_STATEMENT.

The implications of this new Statement type? The first is that we now need to be able to clearly identify statement boundaries so we know how much work to undo. That means we no longer "roll up" like SQL statements into the same protobuf Statement message as an optimization for space (this avoided repeating the header information for the Statement message). So now each row-changing SQL statement executed gets its very own protobuf Statement message.

The second implication is that readers of the transaction log file need to be able to either:
  • rollback statements (you could conceivably use savepoints for this), or
  • cache Statement messages until the end segment is seen (which it can then execute the changes) or discard the cached messages if the next read Statement is a ROLLBACK_STATEMENT type.
These changes are now in our trunk. And major props to Stewart who helped us identify and fix this bug using his storage_engine_api_tester plugin.

Thursday, October 21, 2010

Drizzle Replication Progress Report

In an attempt to keep folks up-to-date as to what is happening with Drizzle replication, I thought that I'd blog about what we are currently doing, and where we are heading.

Current Area of Focus

A building is only as solid as its foundation. For Drizzle replication, that foundation is the transaction log. Our focus, as of late, has been testing the transaction log heavily, ensuring that the messages within it are correct and that its contents can be successfully used to replicate to another machine and have an exact copy of our data. Myself, Patrick Crews (awesome QA guy), and Joe Daly (awesome contributor) have been focused on this testing and fixing bugs for the last several weeks. This Launchpad Blueprint tracks our progress and list of bugs. And Patrick has posted some blog entries about this work.

Initially we started out simple with a single database session. This helped to uncover many issues, most of which have been fixed. We have now moved on to testing our transaction log with many concurrent database connections. The results are looking very good so far!

Next Steps

In its simplest form, replication can be achieved by just shipping the transaction log file from master to slave and applying it to the slave. Drizzle obviously needs a solution for this since we do not yet have a native implementation of replication. So we've decided that as a first-phase replication solution, we'll use a 3rd party solution to help us.

I looked at a few options at how we could implement this:
  • RabbitMQ - This is a message exchange system that would allow us to simply send our Google Protobuf Messages (the basic unit of communication in our replication architecture) from the master's transaction log (or alternatively, as a plugin reading the replication stream directly from the Drizzle kernel) to a server running RabbitMQ. Slaves could then connect to the RabbitMQ server to receive the replication events. This has the benefit of moving much of the I/O associated with replication off to another machine. Also, we already have a plugin created by Marcus Eriksson that will send the replication events to a RabbitMQ server. He also has an applier that will run on the slave to receive the events and apply them to the database.
  • ZeroMQ - Someone suggested that I look at this. This is an asynchronous message-processing library that looks really interesting. However, this is a networking library and would be more appropriate to use this in a native replication solution, which is a possibility in the future.
  • Tungsten Replicator - Continuent supplies an open source solution for replication for multiple database platforms (notably MySQL and Postgres). It provides some wonderful features, like easy promotion of slave to master in case of failure, and filtering of transaction log events.
ZeroMQ doesn't really fit what I was looking for in this phase (to write as little code as possible), so I've put off looking at it for now. Though I am excited to look at using this in the future.

RabbitMQ was a good possibility. Marcus has already done quite a bit of work with this, but the code is a bit out of date currently. There have been several changes to the protobuf messages since Marcus originally wrote this code, and it's difficult to keep up when we change so quickly. Still, it wouldn't take much effort to get this code up-to-date and working.

Honestly, I didn't know much about Tungsten Replicator until recently. This really looks like an excellent, full-featured product. It is well thought out in its design, and seems to be fairly simple to get up and running. Then I got really excited about it when I found out that Marcus has already been working with Continuent on adding Drizzle support! Not only that, Marcus announced his creation of a set of Java tools that contain the common code for working with the Drizzle transaction log. This can be used in his Tungsten work, as well as re-incorporated into his RabbitMQ work eventually.

It seems like a real win to work with Marcus on helping him to add Drizzle support to Tungsten, so this, along with transaction log testing, is my focus going forward. I hope to write up a simple HOWTO on using Tungsten Replicator with Drizzle once it is ready to use.

Looking to the Future

While nothing is written in stone just yet, we have multiple potential ideas for replication that are pretty exciting. Just some of them are:
  1. Replication streams separated on a per-catalog basis. This would help to eliminate some contention issues and reduce the headaches that a single user could cause to the entire replication system.
  2. Ability to limit resource usage of the slave, such as limiting concurrent writes per device.
  3. Optional compression of the replication stream.
Although we have a bit of work to do, I think we have the potential to deliver some pretty interesting stuff in the future.

What other ideas would you like to see implemented in Drizzle replication?

Sunday, August 29, 2010

Drizzle Transaction Message Limit

Some recent changes I made have recently been pushed to Drizzle trunk that affect the size of the Transaction protobuf message that any replication stream will see (e.g., the transaction log). This was necessary to fix bug 600795.

Without a Transaction message size limit, for any bulk operations, like LOAD DATA, we would have ended up with a Transaction message that could possibly contain a very large Statement message that contained all of the INSERT data for the bulk load. This obviously could eat up a large amount of memory if we kept allowing the Statement to grow without bounds. The Drizzle kernel, when it can, keeps appending the values to INSERT onto the same record.

To circumvent this, we now allow multiple Transaction records for a single database transaction. Each Transaction GPB message representing a single database transaction will all have the same transaction ID, and only the last Transaction message will have the Statement's end_segment attribute set to true.

Here is an example of this change that you might now see in the transaction log:


transaction_context {
  server_id: 1
  transaction_id: 3
  start_timestamp: 1283118092815781
  end_timestamp: 1283118092815869
}
statement {
  type: INSERT
  start_timestamp: 1283118092815782
  end_timestamp: 1283118092815868
  insert_header {
    table_metadata {
      schema_name: "test"
      table_name: "t"
    }
    field_metadata {
      type: INTEGER
      name: "id"
    }
    field_metadata {
      type: VARCHAR
      name: "a"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: false
    record {
      insert_value: "2"
      insert_value: "abc"
      is_null: false
      is_null: false
    }
    record {
      insert_value: "3"
      insert_value: "def"
      is_null: false
      is_null: false
    }
  }
}

transaction_context {
  server_id: 1
  transaction_id: 3
  start_timestamp: 1283118092816250
  end_timestamp: 1283118092816725
}
statement {
  type: INSERT
  start_timestamp: 1283118092816251
  end_timestamp: 1283118092816724
  insert_header {
    table_metadata {
      schema_name: "test"
      table_name: "t"
    }
    field_metadata {
      type: INTEGER
      name: "id"
    }
    field_metadata {
      type: VARCHAR
      name: "a"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: true
    record {
      insert_value: "4"
      insert_value: "ghi"
      is_null: false
      is_null: false
    }
    record {
      insert_value: "5"
      insert_value: "jkl"
      is_null: false
      is_null: false
    }
  }
}


This example is a bit contrived as there is no need to split up such a small transaction, but you can see the basic changes here. We have two Transaction messages, both with the same transaction ID. You can see that the Statement's end_segment is set to false in the first message, while the Statement within the second Transaction message has end_segment set to true.

So, in case it isn't obvious, there are now two ways to determine when you should commit if you are a replication stream TransactionApplier, or if you are reading from the transaction log:

  1. If the transaction ID changes, COMMIT.
  2. Or, if the current Transaction has all Statement messages with end_segment set to true, COMMIT.
Choose which ever method of the two best suits your needs.

Currently, if a Transaction message crosses the 1M threshold, the kernel will create a new Transaction message. Why did I choose 1M? Well, the Google Protobuf documentation says:
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
So 1M seemed to be a reasonable default. I'll change this in the near future to be a configurable value once we get some changes to our sys var stuff merged.

Wednesday, August 25, 2010

Adding a Drizzle Executable in Xcode

In my last post, I explained how to setup Drizzle under an Xcode project. This allows you to take advantage of Xcode's features while developing on Drizzle (or any other project of your own choosing). The one thing we weren't able to do was debug the Drizzle executable. This post remedies that.

So it turns out that this is an easy fix. But for Drizzle, there are a few extra hoops you have to jump through in order to get it to work.

The basic steps we need to do for Drizzle are:
  1. Add a custom executable in Xcode
  2. Setup any arguments you want to pass to the executable
  3. Setup any environment variables needed for the executable to run properly
For most other projects, you can probably just get away with #1 and possibly #2. For Drizzle, though, we need to do #3 so that it can find its libraries.

Step #1 is easy. In the Groups & Files window, right click on Executables and then Add -> New Custom Executable...



This will pop up a window where you define where the executable resides. Once that is done, you can define what arguments to pass it, and what environment variables should be set when it runs, among other things. These should be self explanatory, and you should be able to set this up for your particular project. For Drizzle, though, it isn't so intuitive.

In Drizzle, after you run configure and make, the executable lives in a hidden directory within your xcode-branch repo directory. It will actually be in:

$drizzle-repo/xcode-branch/drizzled/.libs/drizzled

The executable you see in the xcode-branch/drizzled subdirectory is actually a shell script that runs the real executable for you. Don't ask me why. So enter the path to the executable in the .libs subdirectory:

Once you enter in the path and click Finish, you are given the chance to edit executable working directory, arguments, environment variables etc.  The important part here is the information under the Arguments tab. We need to set the arguments to the executable as well as set the DYLD_LIBRARY_PATH environment variable so that the executable can find its dynamic libraries (otherwise, it looks for them in the installation directory, which if doesn't exist yet, will cause the executable to not start). Here is an example:

Once you have this setup, you should now be able to run your executable (after you've built it, of course), and use the Xcode debugger to merrily do some bug hunting. Make sure that you have unselected the Load symbols lazily option in Xcode Debugging preferences so that your breakpoints will be recognized!

Happy debugging.

Wednesday, June 30, 2010

Developing Drizzle with Xcode

Even with all of its shortcomings, Xcode has excellent integration with the GNU debugger. It also has some good features for understanding a large code base that you may be unfamiliar with. For these reasons, I like to use Xcode for my development work. Unfortunately, it takes a bit of magic to get it setup to work with an external project that was not created specifically with/for Xcode and that does not use CVS/Subversion/Perforce.

If you want to use Xcode to work on Drizzle, these are the steps you need to take to setup a project. Note that Xcode doesn't support Bazaar so you still need to do some things from the command line. (Come on, Apple. CVS and Subversion just plain suck)

These instructions assume that the Drizzle repo is located in $drizzle-repo, as described here in the Drizzle wiki.

First, we need to create a new branch (we'll call it xcode-branch) of Drizzle trunk to use for our yet to be created Xcode project.

cd $drizzle-repo
bzr branch trunk xcode-branch

Next, we'll go into Xcode and create a new project. Choose the External Build System project. With this type of project, we'll be able to click on Build to run make for us.

Click the Choose... button and save your project some place appropriate (anywhere but inside your $drizzle-repo directory). We'll tell the project where to find the source files in xcode-branch in the following steps.

After saving, your project window should appear. At this point, we have no source code files in our project, so we need to tell Xcode to use our xcode-branch as the location of our source. We do that by right-clicking on our project name in the Groups & Files section of the project window and selecting Add -> Existing Files...




Navigate to your xcode-branch directory, select the directory, and click the Add button. In the window that appears, make sure that Copy items into destination group's folder option is NOT selected. We don't want to copy the source files into the Xcode project directory, we want to use them from within the Bazaar branch. We do this so that after we make changes with Xcode, we can easily open a Terminal window and commit our changes from the command line. (Again, this is because Xcode does not have Bazaar direct integration.) Also, make sure that Recursively create groups for any added folders is selected. So you should have:


Click Add and a folder for your xcode-branch should appear under your project name in the Groups & Files section.

So what have we just done? We basically did two things here:
  1. Created a Bazaar branch for our modifications we will make under Xcode
  2. Created an Xcode project that points to our Bazaar branch.
Our working Xcode project is split up into two separate directories: one for the Xcode project files, and one for the Bazaar source files. This may seem a bit odd if you expected everything to be under a single directory, but we have to do it this way so that we can still use Bazaar commands.

There is one last thing to do now. Xcode is now setup to automatically just call 'make' when we click on our Build button, but it will attempt to run 'make' in the Xcode project directory, not in our xcode-branch directory. This is an easy fix. Double-click on the target underneath Targets in the Groups & Files section of the project window. This will open up the following window:

Set the Directory under Custom Build Command to the location of your xcode-branch. Once you do this, then clicking Build will run 'make' from within the xcode-branch.

That's it! You can now edit the source, use Xcode features to jump to various definitions, search project files, and build the project using Command-B or the Build menu option. Obviously, you'll first have to create and run the configure script in your xcode-branch directory by hand before you can build. And you will not be able to use the other build commands, like Build and Run, or use the debugger until you setup an executable for the project. I'll cover that in a future post.