Wednesday, January 12, 2011

Change in Transaction Protobuf Message Segmenting

If you read my post on Drizzle transaction message limits, you know that it is possible to have multiple Transaction protobuf messages (segments) for a single database transaction. This was necessary to keep the Google protobuf messages from growing too large (there is a maximum limit on message size).

Before my most recent change, you would have to parse each Statement sub-message contained within the enclosing Transaction message to see if the Transaction was split up into multiple messages (they would be linked together by sharing the same transaction ID). This was kind of a pain, but since the segment information was only contained in the Statement, this was the only way to do it.

As of Bazaar revision number 2076 of the Drizzle trunk, we now have segment information stored in the Transaction message in addition to the Statement message. We added the values segment_id and end_segment to the Transaction message definition. These are just like the identically named values in the Statement message definition. From the drizzled/message/transaction.proto definition file:
message Transaction
  required TransactionContext transaction_context = 1;
  repeated Statement statement = 2;
  optional Event event = 3;

   * A single transaction in the database can possibly be represented with
   * multiple protobuf Transaction messages if the message grows too large.
   * This can happen if you have a bulk transaction, or a single statement
   * affecting a very large number of rows, or just a large transaction with
   * many statements/changes.
   * For the first two examples, it is likely that the Statement sub-message
   * itself will get segmented, causing another Transaction message to be
   * created to hold the rest of the Statement's row changes. In these cases,
   * it is enough to look at the segment information stored in the Statement
   * message.
   * For the last example, the Statement sub-messages may or may not be
   * segmented, but we could still need to split the Statements up into
   * multiple Transaction messages to keep the Transaction message size from
   * growing too large. In this case, the segment information in the Statement
   * submessages is not helpful if the Statement isn't segmented. We need this
   * information in the Transaction message itself.
   * These values should be set appropriately whether or not the Statement
   * sub-messages are segmented.
  optional uint32 segment_id = 4; /* Segment number of the Transaction msg */
  optional bool end_segment = 5;  /* FALSE if Transaction msg is split into multiples */
So other than making it easier to check to see if a Transaction is segmented, why add these new values?

Well, it turns out having the segment information only in the Statement doesn't allow us to segment large Transaction messages if none of the Statement sub-messages are themselves segmented. This is documented in this bug report.

No comments:

Post a Comment