Sunday, August 29, 2010

Drizzle Transaction Message Limit

Some recent changes I made have recently been pushed to Drizzle trunk that affect the size of the Transaction protobuf message that any replication stream will see (e.g., the transaction log). This was necessary to fix bug 600795.

Without a Transaction message size limit, for any bulk operations, like LOAD DATA, we would have ended up with a Transaction message that could possibly contain a very large Statement message that contained all of the INSERT data for the bulk load. This obviously could eat up a large amount of memory if we kept allowing the Statement to grow without bounds. The Drizzle kernel, when it can, keeps appending the values to INSERT onto the same record.

To circumvent this, we now allow multiple Transaction records for a single database transaction. Each Transaction GPB message representing a single database transaction will all have the same transaction ID, and only the last Transaction message will have the Statement's end_segment attribute set to true.

Here is an example of this change that you might now see in the transaction log:


transaction_context {
  server_id: 1
  transaction_id: 3
  start_timestamp: 1283118092815781
  end_timestamp: 1283118092815869
}
statement {
  type: INSERT
  start_timestamp: 1283118092815782
  end_timestamp: 1283118092815868
  insert_header {
    table_metadata {
      schema_name: "test"
      table_name: "t"
    }
    field_metadata {
      type: INTEGER
      name: "id"
    }
    field_metadata {
      type: VARCHAR
      name: "a"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: false
    record {
      insert_value: "2"
      insert_value: "abc"
      is_null: false
      is_null: false
    }
    record {
      insert_value: "3"
      insert_value: "def"
      is_null: false
      is_null: false
    }
  }
}

transaction_context {
  server_id: 1
  transaction_id: 3
  start_timestamp: 1283118092816250
  end_timestamp: 1283118092816725
}
statement {
  type: INSERT
  start_timestamp: 1283118092816251
  end_timestamp: 1283118092816724
  insert_header {
    table_metadata {
      schema_name: "test"
      table_name: "t"
    }
    field_metadata {
      type: INTEGER
      name: "id"
    }
    field_metadata {
      type: VARCHAR
      name: "a"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: true
    record {
      insert_value: "4"
      insert_value: "ghi"
      is_null: false
      is_null: false
    }
    record {
      insert_value: "5"
      insert_value: "jkl"
      is_null: false
      is_null: false
    }
  }
}


This example is a bit contrived as there is no need to split up such a small transaction, but you can see the basic changes here. We have two Transaction messages, both with the same transaction ID. You can see that the Statement's end_segment is set to false in the first message, while the Statement within the second Transaction message has end_segment set to true.

So, in case it isn't obvious, there are now two ways to determine when you should commit if you are a replication stream TransactionApplier, or if you are reading from the transaction log:

  1. If the transaction ID changes, COMMIT.
  2. Or, if the current Transaction has all Statement messages with end_segment set to true, COMMIT.
Choose which ever method of the two best suits your needs.

Currently, if a Transaction message crosses the 1M threshold, the kernel will create a new Transaction message. Why did I choose 1M? Well, the Google Protobuf documentation says:
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
So 1M seemed to be a reasonable default. I'll change this in the near future to be a configurable value once we get some changes to our sys var stuff merged.

Wednesday, August 25, 2010

Adding a Drizzle Executable in Xcode

In my last post, I explained how to setup Drizzle under an Xcode project. This allows you to take advantage of Xcode's features while developing on Drizzle (or any other project of your own choosing). The one thing we weren't able to do was debug the Drizzle executable. This post remedies that.

So it turns out that this is an easy fix. But for Drizzle, there are a few extra hoops you have to jump through in order to get it to work.

The basic steps we need to do for Drizzle are:
  1. Add a custom executable in Xcode
  2. Setup any arguments you want to pass to the executable
  3. Setup any environment variables needed for the executable to run properly
For most other projects, you can probably just get away with #1 and possibly #2. For Drizzle, though, we need to do #3 so that it can find its libraries.

Step #1 is easy. In the Groups & Files window, right click on Executables and then Add -> New Custom Executable...



This will pop up a window where you define where the executable resides. Once that is done, you can define what arguments to pass it, and what environment variables should be set when it runs, among other things. These should be self explanatory, and you should be able to set this up for your particular project. For Drizzle, though, it isn't so intuitive.

In Drizzle, after you run configure and make, the executable lives in a hidden directory within your xcode-branch repo directory. It will actually be in:

$drizzle-repo/xcode-branch/drizzled/.libs/drizzled

The executable you see in the xcode-branch/drizzled subdirectory is actually a shell script that runs the real executable for you. Don't ask me why. So enter the path to the executable in the .libs subdirectory:

Once you enter in the path and click Finish, you are given the chance to edit executable working directory, arguments, environment variables etc.  The important part here is the information under the Arguments tab. We need to set the arguments to the executable as well as set the DYLD_LIBRARY_PATH environment variable so that the executable can find its dynamic libraries (otherwise, it looks for them in the installation directory, which if doesn't exist yet, will cause the executable to not start). Here is an example:

Once you have this setup, you should now be able to run your executable (after you've built it, of course), and use the Xcode debugger to merrily do some bug hunting. Make sure that you have unselected the Load symbols lazily option in Xcode Debugging preferences so that your breakpoints will be recognized!

Happy debugging.