wiki:DifferenceDesign

Introduction

In order to support IXFR-out, we need to be able to store and read the difference between different versions of a zone. We'll also need that capability for supporting dynamic update (to be very precise dynamic update itself does not necessarily require "difference" handling if the updates can be directly made on a persistent storage, but we may not always be able to assume that, and it's also pretty likely that we want to use IXFR-out for such zones).

Considerations are broken down in the following three parts:

  • Top level API to manipulate the diffs. This is data source agnostic. Most developers will only care about this API.
  • Low level API for diffs that connects the top level interface and data source specific details
  • The lowest level storage, especially in a (SQL-based or other) database

The design is mostly derived from BIND 9's journal format and related APIs, but adjusted to the BIND 10's architecture (e.g., in many cases diffs are managed in the same database, and the details may vary depending on the specific data source/database backends).

Top Level Diff Representation

We use a sequence of IXFR-style difference sequences for the representation of diffs at the top level, i.e., the form of diffs that applications directly handle. Each difference sequence is a sequence of RRs: pre-transaction SOA, zero or more other deleted RRs, the post-transaction SOA, and zero or more other added RRs. So, in the end, the entire diff between one (serial) version to another is a stream of RRs that follow some specific rules in the ordering.

Here is an example a single difference sequence:

example.com. SOA serial=1
a.example.com. A  192.0.2.1
example.com. SOA serial=2
b.example.com. A  192.0.2.2
c.example.com. AAAA 2001:db8::5

This means a version change from serial 1 to 2 for zone example.com, deleting a.example.com/A, and adding b.example.com/A and c.example.com/AAAA. (Note that the version change doesn't have to be consecutive; a single difference sequence can represent a change from version 1 to 3, for example).

An application that stores diffs to a data source is responsible for making sure that the stream of RRs is actually a sequence of valid IXFR sequences. This is a trivial task for an IXFR client implementation (in BIND 10, it's b10-xfrin) because it can simply pass the RRs in the IXFR response as it receives them. In other general cases, especially for a primary server handling dynamic updates, we'll need to provide a convenient tool to convert a given set of changes to a zone into the expected stream of RRs.

Likewise, an application that retrieves diffs between two (serial) versions of a zone can assume the stream forms a sequence of valid IXFR sequences. This will be convenient for an IXFR server implementation (it will be b10-xfrout in BIND 10), and, in practice, this is probably almost the only application that would like to retrieve diffs (other than experimental or debugging tools).

Top Level API

API for adding diffs

We extend the existing getUpdater method of the DataSourceClient class with a new parameter:

    virtual ZoneUpdaterPtr getUpdater(const isc::dns::Name& name,
                                      bool replace,
                                      bool journaling) const = 0;

The journaling parameter is new. If this is true, every change made in the returned updater will be recorded in a persistent storage for the corresponding data source. As was the case without journaling, the updater object forms a single transaction: Unless commit() is performed on the updater, changes to the journal are not committed either. If the updater is destructed without doing commit() or after commit() fails, the internal rollback cancels both the original updates and temporarily stored diffs. If commit() is successfully completed, both the original updates and added diffs are committed as an atomic operation.

How to realize this behavior depends on the implementation of the specific derived updater class. For an updater with a database backend, an easy way is to represent both zones and diffs in tables of the same database and handle the update process in a single transaction. Our implementation actually adopts this approach (see the low level API section).

If and when we support a way of updates where zones and diffs are internally managed in a different storage (like in the case of BIND 9, where zones are managed in-memory and then stored in a file, and the diffs are stored in a persistent "journal file"), it turns out that the notion of atomic commit is difficult to implement. If that is the case and we need it, we may have to revisit the API.

Note that in the case of AXFR journaling will normally be false. In the case of IXFR or dynamic update, it will often be true, but the application may want to suppress recording diffs, in which case it will be set to false.

Once the updater is created, the application doesn't have to care about the diffs: addRRset() and deleteRRset() internally (and temporarily) add diffs to the underlying storage, and commit() confirms the stored sequence of diffs.

API for retrieving diffs

For retrieving diffs we introduce a new class, ZoneJournalReader (note: there may be better name than this, e.g., ZoneDiffReader). It's an abstract base class to provide top level interface for application to retrieve diffs from the underlying data source.

We extend the DataSourceClient class to add a factory method of ZoneJournalReader:

class DataSourceClient {
public:
    ...
    virtual ZoneJournalReader(Ptr)
    getJournalReader(const isc::dns::Name& zone,
                     uint32_t begin_serial,
                     uint32_t end_serial) const = 0;
};

zone is the name of the zone for which the diff should be retrieved, and begin_serial and end_serial specify the range of diff sequences.

On construction, the specific derived class of ZoneJournalReader starts a kind of read-only transaction to get access to the diff storage. How to realize the "transaction" is specific to the underlying data source (accessor) implementation. It may be an explicit database transaction, or an implicit lock due to a single SQL statement, or an exclusive file lock. The only higher level requirement is to ensure that an attempt to modify the diff storage shouldn't confuse the reader.

On successful construction, the ZoneJournalReader object works as an iterator over the diff sequences. The application will retrieve the diff via the getNextDiff() method:

class ZoneJournalReader {
public:
    virtual ConstRRsetPtr getNextDiff() = 0;
};

getNextDiff() returns RRsets, each containing exactly one RDATA (so this is effectively a single RR), in the form of the top level representation described in the previous section. So, the first RRset will be an SOA RR of the serial begin_serial which is to be deleted, followed by RRs of that serial that are to be deleted, and so on.

Low Level API for Databases

We'll begin with the architecture that uses a database backend for both zone's records and diffs. We'll need to add an intermediate layer to support the higher level interfaces for the database backends. Higher level applications won't have to care about details of these APIs and implementations.

API for adding diffs

We extend the DatabaseUpdater class as follows:

  • Add the journaling parameter to the constructor to indicate whether to record the diffs as they added to the database. DatabaseClient::getUpdater() should need a trivial update accordingly, too.
  • Extend addRRset() and deleteRRset() so that when journaling is true they will add the corresponding diffs to the database.

In addition, addRRset() and deleteRRset() may also have to check (if journaling is true) whether the diff sequences meet the expected top level representation.

We'll also have to update the interfaces of DatabaseAccessor to support the extensions to the DatabaseUpdater class. Specifically, we'll need a new method (e.g.) addRecordDiff():

class DatabaseAccessor {
public:
    ...
    virtual void addRecordDiff(DiffOperation operation, uint32_t serial,
                               const std::string &params[DIFF_PARAM_COUNT])
    = 0;
};

enum DiffOperation {
    DIFF_ADD = 0,
    DIFF_DELETE = 1
};

enum DIFFRecordParams {
    DIFF_NAME = 0,
    DIFF_TYPE = 1,
    DIFF_TTL = 2,
    DIFF_RDATA = 3,
    DIFF_PARAM_COUNT = 4
};

This basically assumes the database schema described in the next section. With that in mind the semantics of the method and enum definitions should be obvious.

Finally, we need to implement addRecordDiff() for specific derived classes (right now it's SQLite3Accessor). In many cases this should be straightforward.

API for retrieving diffs

We'll first need to introduce a specific derived class of ZoneJournalReader for the database support:

class DatabaseJournalReader : public ZoneJournalReader {
public:
    DatabaseJournalReader(shared_ptr<DatabaseAccessor> accessor, int zone_id,
                          uint32_t begin_serial, uint32_t end_serial);
    virtual ConstRRsetPtr getNextDiff();
};

DatabaseClient::getJournalReader() will construct a DatabaseJournalReader object with the accessor for the given zone and with the given begin and end serials.

We'll also need to extend DatabaseAccessor to support the "diff iterator". We can probably reuse the same framework we already use to get normal records. With this approach the extension would look like this:

class DatabaseAccessor {
public:
    ...
    virtual IteratorContextPtr getRecordDiffs(int id,
                                              uint32_t begin_serial,
                                              uint32_t end_serial) = 0;
};

A specific derived implementation of getRecordDiffs() executes such an SQL statement as that shown in the next section, and holds the statement context. The DatabaseJournalReader constructor internally calls this method, and its getNextDiff() method simply calls the getNext() method of IteratorContextPtr.

Additionally, getNextDiff() may want to call getNext() multiple times and buffer the results so that if the size of diff isn't very large it can complete the database transaction faster and release the underlying lock.

Finally, we need to implement addRecordDiff() for specific derived classes (right now it's SQLite3Accessor).

SQL(ite3) Schema

This is an example schema and retrieval statement for SQL based databases that can be used as a backend of the higher level interfaces above. The specific example is intended to be used for SQLite3 as that's the only database backend we support as of this writing, but it should be applicable to other SQL databases with very few or no modifications, too.

While we may begin with this example schema and statement in the actual development, there may be better ones in terms of performance or clarity (see also the open issue section below), and we may change the details as we gain more experiences.

We store the difference sequences in a separate database table named "diffs". It's in the same database as other BIND 10's DNS related tables such as "zones" and "records". We store them as a relatively straightofrward mapping from the higher level representation, i.e., IXFR-style diff. This is the schema of the diffs table:

table diffs (id integer primary key,
             zone_id integer not null,
             version integer not null,
             operation integer not null,
             name string not null collate nocase,
             rrtype string not null collate nocase,
             ttl integer not null,
             rdata string not null)
id
A unique, monotonically increasing integer ID of each diff. This column is filled by the database automatically.
zone_id
The ID of the zone of the diff. It's identical to the ID of the zone in the "zones" table.
operation
Either 0 (add) or 1 (delete)
name
Textual representation of the owner name of the updated RR, e.g. "example.com."
rrtype
Textual representation of the RR type of the updated RR, e.g. "A", "NS", "AAAA"
ttl
The TTL of the updated RR
rdata
Textual representation of the RDATA of the updated RR, e.g. "192.0.2.1" (for A RR)

Then the stored database rows corresponding to the ABOVE EXAMPLE would look like this (assume "zone_id" for example.com is 10):

ID ZID  ver op  name            rrtype
1,  10,   1, 0, "example.com.", "SOA",...
2,  10,   1, 0, "a.example.com.", "A", ...
3,  10,   2, 1, "example.com.", "SOA", ...
4,  10,   2, 1, "b.example.com.", "A", ...
5,  10,   2, 1, "c.example.com.", "AAAA", ...

Note: The need for the ID column may not be obvious. It is necessary to identify the correct rows in the expected order on retrieval; we cannot naively use the version (SOA serial) values because serials can wrap around, and without some monotonically increasing identifiers we cannot identify the serials between a given range. The actual realization may not have to be exactly the same as this one, however, as long as the higher level requirement can be met.

SQL(ite3) Statement to Retrieve Diffs

To retrieve the entire diffs between given two versions (B and E) for a zone whose zone_id is Z, we'll execute:

select * from diffs where
  zone_id = Z and
  id >= (select id from diffs where version = B and operation = 1
         order by id asc limit 1) 
  and
  id <= (select id from diffs where version = E and operation = 0
         order by id desc limit 1);

Note about version: since they need to be compared using the serial number arithmetic (RFC1982) and can wrap around, we cannot simply compare versions.

Partly because of this the statement is complicated and involves multiple subqueries (which might be less efficient). There may be alternate approaches; see the "Open Issues" section.

Open Issues

There are several open issues in initial discussions. Those include:

  • Specific SQL schema and retrieval statement. As noted above, there may be a better way than the example schema.
  • Text vs binary: this schema and low level interface assumes diffs are stored in the form of columns of strings (and perhaps a few integers). But we might want to use binary representation for the RR data for performance reasons.
  • There was a question whether we want to fix the capability of journaling per data source (client) basis rather than per updater basis.
  • Housekeeping: We'll soon need to consider how to clean up too old diffs. Also, to make it independent from data source details, we'll probably need an API to delete diffs.

See details in the relevant mailing list discussions on bind10-dev: https://lists.isc.org/pipermail/bind10-dev/2011-October/002693.html and https://lists.isc.org/pipermail/bind10-dev/2011-October/002714.html

Last modified 6 years ago Last modified on Nov 16, 2011, 10:48:29 AM