wiki:ScalableZoneLoadDesign

1. Introduction

This is an initial proposal of making in-memory zone maintenance scalable regarding reloading and updating. The most important point for the near future tasks is to make zone loading (and releasing old version) asynchronous so this can be done while the auth server is still responding to queries using the old version of zone. This proposed design also takes into account of future extensions when we switch to shared-memory based approach for even higher scalability, so we can migrate to the new approach (or use both) without changing the application implementation heavily (or ideally, at all).

The proposed design introduces a set of new classes and revises existing ones to meet these goals, and explains how they work. (This is not a full description; some definitions are omitted or left open, and some notations are informal).

2. Memory Segments

In order to allow applications to get access to in-memory image that could be managed in different ways (locally in the application or in a shared-memory area, etc) transparently, we introduce the concept of "memory segment". (The term is borrowed from the boost "managed memory segments".)

We define an abstract base class to get to an entry point (called "header") of some memory image regardless of the underlying memory management implementation. The following diagram shows a class hierarchy with a base class and some example derived classes for a header typed "ZoneTableHeader" (we'll use this concept more specifically below, but for now its details are not important).

ZoneTableSegment, the base class, has a pure virtual method named getHeader(), which somehow returns a pointer to an object of type ZoneTableHeader:

class ZoneTableSegment {
public:
    virtual ZoneTableHeader* getHeader() = 0;
};

This example diagram gives three possible derived classes:

  • LocalZoneTableSegment a memory segment object for memory region managed in the normal way locally in the application
  • ShMemZoneTableSegment a memory segment object for memory region managed in a shared-memory area of the system
  • MmapZoneTableSegment a memory segment object for memory region mapped from a file using mmap(2)

The implementation of LocalZoneTableSegment could be as simple as follows:

class LocalZoneTableSegment : public ZoneTableSegment {
public:
    virtual ZoneTableHeader* getHeader() {
        return (&header_);
    }
private:
    ZoneTableHeader header_;
};

Using boost::managed_shared_memory, a possible implementation of ShMemZoneTableSegment would be:

class ShMemZoneTableSegment : public ZoneTableSegment {
public:
    ShMemZoneTableSegment(const char* const shm_name) :
        shm_segment_(open_only, shm_name)
    {}
    virtual ZoneTableHeader* getHeader() {
        return (shm_segment_.find<ZoneTableHeader>("header").first);
    }
private:
    managed_shared_memory shm_segment_;
};

Where we assume the header is "named" in this shared memory segment with the name of "header".

3. Zone Table Segment and Zone Segments

We store zones for a single data source in memory using the following two sets of memory segments:

  • Zone table segment: it contains a conceptual "table" of zones
  • Zone segments: a set of memory segments, each storing data for a set of zones.

The zone table is essentially a map from domain names to "zone locators" that allows longest matching search. A zone locator is a position independent structure that can be used to get access to the zone data of a specific zone stored in some of the zone segments. In practice, it's a pair of integer indexes:

struct ZoneLocator {
    uint32_t segment_id; // ID of the zone segment that contains this zone
    uint32_t zone_id; // ID of the zone within that zone segment
};

The following diagram shows an example of a zone table segment and zone segments for an in-memory data source that contains the "example.org" zone.

The header of the zone table segment includes an (offset) pointer to the actual table data in memory; the header of zone segments include an (offset) pointer to an array of (offset) pointers for the list of zones contained in the segment, each of which points to the actual data of the corresponding zone.

In this example, the "example.org" zone is stored in the n-th zone segment with the zone ID in that segment of m.

4. Using Segments from In-Memory Data Source Client

The datasrc InMemoryClient class, which currently directly maintains the zone table and zone data in local memory, will become a lightweight accessor class to the corresponding memory regions via segment objects.

A conceptual implementation of the revised version of this class would be as follows:

class InMemoryClient : public DataSourceClient {
public:
    // Construct with a set of memory segments, and make shortcut
    // pointers to heavily used data.
    InMemoryClient(shared_ptr<ZoneTableSegment> table_segment,
                   vector<shared_ptr<ZoneSegment> >& zone_segments);
    // ...
private:
    shared_ptr<ZoneTableSegment> table_segment_;
    ZoneTable* table_;
    struct ZoneSegmentArgs {
        shared_ptr<ZoneSegment> segment_;
        offset_ptr<ZoneData>* zones_;
    };
    ZoneSegmentArgs* zone_segments_;
};

The relationship between a revised client object, memory segments, and data stored in memory can be summarized as follows:

Assuming the number of zone segments is reasonably small (like up to several thousands), what the constructor should do is now pretty cheap (just copying a reasonable number of shared pointers and getting a similar number of memory addresses via the segments), and should be effectively "non blocking".

A conceptual implementation of findZone would be as follows:

FindResult
InMemoryClient::findZone(const Name& name) {
    result, locator = table_->find(name);
    zone_data =
        zone_segments_[locater.segment_id].zones_[locater.zone_id].get();
    return (FindResult(result,
                       ZoneFinderPtr(new InMemoryZoneFinder(zone_data))));
}

For "reloading" a zone, the client object will be simply given a (pointer to) new zone segment object that should contain the new version of zone data (building it in the corresponding memory regions can be a time-consuming task, but it's been done somewhere else). The client can then replace its zone_segments_ entry with the new pointer:

void
InMemoryClient::updateZoneSegment(uint32_t seg_id,
                                  shared_ptr<ZoneSegmentPtr>
                                  new_zone_seg)
{
    // with the following, the zone data should be "magically" swapped.
    zone_segments_[seg_id].zones_ = nwe_zone_seg->getHeader()->zones.get();
    zone_segments_[seg_id].segment = new_zone_seg;
}

5. Memory Event Handler

Now that InMemoryClient is not responsible for building/updating the zone table and zone data, we need to provide a separate interface for this task. Also, the different ways of memory management (in particular, whether it's local or shared) have different requirements. For example, if the memory is managed locally, it needs to support asynchronous build and updates so other important services (such as responding to queries) won't be suspended too long; if the memory is managed in a different process, the user-side application needs to communicate with the manager process.

The MemoryEventHandler class is an abstract base class that encapsulates these details so that the application can use a unified interface without worrying about those. A concrete object of this class receives various events that can affect memory management (configuration update, update notification on shared data, etc) and generates new memory segments that should be used by the application to follow up with the events.

The following diagram shows three possible derived classes with clarifying some underlying details:

The LocalMemEvHandler class manages the necessary memory locally.

The ShMemEvHandler and MmapEvHandler are expected to be used for shared memory segments and mmap'ed segments, respectively. When these are used, a separate process (tentatively named "b10-memmgr") is expected to manage the actual memory, and the event handler classes will communicate with this process to generate corresponding memory segment objects.

6. Authoritative Server Usage

The new and revised classes described so far are primarily intended to be used by the authoritative server program (b10-auth). The following diagram summarizes the relationship between the main application (b10-auth) and these class objects (and the memory image that stores zone data).

b10-auth holds an object of (derived class of) MemoryEventHandler class, has it create memory segment objects based on its underlying memory management technique, and uses it to create an InMemoryClient object (probably via a ClientList, but for simplicity we directly refer to the client object here). The InMemoryClient internally holds the given memory segment objects and use them to get access to the memory image that stores the zone data.

7. Expected Sequences for Common Operations

Finally, we show sequence diagrams clarifying how the objects interact each other in terms of memory management for common operations regarding in-memory zone data.

We mainly focus on locally managed memory as that will be used in the first implementation.

7.1 Initial Loading

On the initial startup, zone building process can be blocking. auth instantiates (in polymorphic way via some factory, based on the configuration) a local MemoryEventHandler object with the data source configuration. It builds all zone data, possibly contacting other data sources that may store the source of the data (or just loading textual zone files), creates the zone table and zone memory segments, and returns it to auth. auth uses these to create an in-memory data source client. At this point, incoming queries can be handled. The lookup will be performed in-memory via the memory segments and responded immediately.

Note: the fact that memory event handler needs to have data source config may look awkward, but since it's emulating an external memory management component, it should be actually reasonable. Basically, unless this configuration has a very long list of specific zones, having a duplicate copy wouldn't be that costly; for environments that would require a very long list we'd need a shared-memory type approach.

7.2 Updating a Zone

This is a scenario where an incoming IXFR makes partial updates to a zone (the same sequence will apply for DDNS). Before update happens, auth is handling queries from in-memory data, referring to the pre-update zone. In the IXFR session, xfrin records the differences in the underlying DB. When it completes, xfrin notifies auth of the update. auth forwards the notify to the memory event handler, which recognizes it can be done completely, so retrieves the diff from the DB, applies it to its local memory data, and returns the "new" zone segment object (in the local memory case, it would actually be the same object). This works synchronously; basically applying a set of diffs should be considered lightweight task, and query handling is suspended in that period. auth then passes the in-memory client the "new" zone segment, and the client resets its attribute. At this point new queries can be handled again, which will be responded from the updated zone data.

7.3 Replacing a Zone

In this case, it's zone replacement after an AXFR. After xfrin completes the xfr session, it notifies auth of the update, and then auth forwards it to the memory event handler. The handler recognizes it can be a time-consuming task (because it's a full zone replacement), so while it's building new version of zone data, it keeps old data intact, and returns a continuation context to auth. auth will keep handling queries still using old data, and periodically resumes the task at the memory event handler using the given context. Every time the task is resumed, the handler builds new data incrementally, and when it completes it swaps the pointer so further lookups will refer to the new version of data. Since the event handler releases memory used for the old version next, which can also take time, it starts the process and returns a new continuation context to auth as well as the zone memory segment for the new version. auth gives the in-memory client the new segment object, and subsequent queries will be responded using the new version of zone. The releasing process keeps going incrementally, and eventually completes at the event handler.

7.4 Updating Data Source Configuration

In this scenario, something is changed on the data source configuration, and cfgmgr tells auth the update. This case is similar to the initial loading in some sense (it can be optimized so unchanged zones that exist both in the old and new config won't be updated, but that's beyond the scope of this discussion), but since auth is handling queries as well, building the new data must be done asynchronously like the case of zone replacement. When the entire new version of config is in the memory, the event handler gives the auth the whole set of new table and zone segments, and auth gives them to the in-memory client. The client will also replaces all segments with the new set, and subsequent queries will be responded from the new data.

7.5 Replacing a Zone Using Shared Memory

This scenario shows a possible future extension when we support shared-memory type zone management. In this case, we have a new "memory manager (memmgr)" process, and it acts like the local memory event handler of the previous cases. xfrin notifies memmgr (instead of auth) of the update, and when memmgr completes its task, it tells auth. While memmgr is working auth just keeps handling queries using old version of data. When auth gets the notification from memmgr, auth forwards it to the memory event handler, which should be of shared-memory version of derived class. In this case, the event handler's task shouldn't be time-consuming - it's basically a single call to opening a shared memory segment - so it's done synchronously. The memory event manager returns the new zone segment (in this case it should really be a new one), and auth gives it to the in-memory client.

Last modified 5 years ago Last modified on Jun 22, 2012, 8:48:27 AM

Attachments (10)

Download all attachments as: .zip