wiki:HADesign

High Availability in Kea 1.4.0 - Design

Migrated to Gitlab, please update here: https://gitlab.isc.org/isc-projects/kea/wikis/implemented-designs/High-Availability-Design

Supported Topologies

This section describes HA topologies which are to be supported by Kea running in different HA configurations. The most common topologies involve two servers running in load balancing or hot-standby mode and their variations. The servers utilize the control channel to communicate with each other and don't need to use SQL database replication to synchronize the lease database. Therefore, this solution works for any lease database backend, although Memfile will have the best performance as it doesn't require communication with the external database. The performance impact is important because the Kea servers send lease updates synchronously.

It is also possible to configure Kea to use database replication, in which case the Kea servers do not perform leases synchronization themselves. However, the servers still send heartbeat messages between each other to detect failures of their peers, i.e. failures of the DHCP server process, rather than a database failure.

Some of the topologies presented below include more than two DHCP servers. The extra servers are called backup and they receive lease updates from the load balancing or/and hot-standby servers. The primary reason for using a backup server is to have yet another copy of the lease database. It is also possible to use the backup server to respond to the DHCP traffic if other servers have died, but it requires manual intervention of the administrator via control channel.

Load Balancing

In this configuration, there are two DHCP servers running at the same time and both servicing DHCP requests from the clients. Both servers use the same algorithm for choosing the server for processing a DHCP request. The algorithm is applied on one of the client identifiers, e.g. MAC address, circuit-id, and thus it gives a stable result across many requests sent from the same client. The server which discovers that is not designated to process the packet, drops it if the other server is operational. The server which is not designated to process the packet will process the packet only if detects that its partner is offline.

The picture above shows three clients, two of which are served by server 1. The client 3 is served by server2. Both servers exchange lease updates synchronously, so as each server has a full information about allocated leases by the entire system and can start serving all clients when its partner crashes.

This topology provides high availability for the DHCP service within a site where it is deployed and protects against crashes of one of the servers by directing the entire DHCP load onto the surviving server. It provides means for automated detection of failures of the partner and synchronizing the lease databases when the server comes back online after the failure.

The following is an example configuration for High Availability in the load balancing case.

{
    "high-availability":  [
        {
            "this-server-name": "server1",
            "mode": "load-balancing",
            "heartbeat-interval": 10,
            "max-response-delay": 60,
            "max-ack-delay": 10,
            "max-unacked-messages": 10,
            "peers": [
                {
                    "name": "server1",
                    "url": "http://server1.example.org:8080/",
                    "role": "primary", 
                    "auto-failover": true
                },
                {
                    "name": "server2",
                    "url": "http://server2.example.org:8080/",
                    "role": "secondary",
                    "auto-failover": true
                }
            ]
        }
    ]
}

The structure above is included in configurations of all HA peers. This means that every HA peer knows configurations of all of the peers, including itself. The this-server-name parameter specifies a name of the particular server instance, so as the instance can distinguish between its own HA configuration and other peers' configurations. It is possible to omit the url parameter of the server to which the given configuration refers. If this parameter is specified, it is ignored for the given server instance.

The role parameter indicates if the server is primary or secondary. This has no effect on the server during normal operation as both servers play identical roles in handling the DHCP traffic directed to them. The reason for adding this parameter into the load balancing configuration is to resolve potential deadlocks between the state machines of the two servers waking up after a shutdown or failure. The primary server takes the lead in performing certain tasks such as lease database synchronization, while the secondary server waits for the primary to complete before it does the same.

The auto-failover indicates if the server should automatically take over the traffic directed to the partner which is considered down. This only has an effect for load balancing and standby servers. If this value is set to false the server administrator must send ha-service-scopes command to trigger failover.

The mode parameter instructs both servers to operate in the load balancing mode.

The next set of parameters specify values according to which the server detects if its partner is offline:

  • heartbeat-interval - specifies an interval between the last response from the partner over the control channel, and the next heartbeat message to be sent to the parter. This value must be greater than 0.
  • max-response-delay - specifies a maximum time between two successful responses from the partner over the control channel. At this point, the server assumes that the communication between the servers is interrupted but it continues to examine DHCP traffic directed to the partner to see if it responds to the queries from clients. This value must be greater than 0.
  • max-ack-delay - specifies maximum time for a client trying to communicate with DHCP server to complete the transaction. This is verified by checking the value of the secs field in DHCPv4 and elapsed time option in DHCPv6. This value may be set to 0 if the max-unacked-messages value is also set to 0.
  • max-unacked-messages - specifies the number of consecutive client's DHCP messages not responded by the partner within a time of max-ack-delay. This value may be set to 0, which effectively disables the mechanism of tracking DHCP messages directed to the partner. In such case, the server may transition to the partner down state when it discovers that the communication has failed with the peer over the control channel.

The server sends heartbeat messages according to the specified interval. If the partner doesn't respond to the commands over the control channel (including periodic heartbeat messages) for a longer period of time than max-response-delay, the server starts examining the traffic directed to the partner by checking the values in secs field (DHCPv4) or elapsed time option (DHCPv6) and comparing them with the max-ack-delay. If this value is exceeded for a message, the message is considered unanswered. If more than max-unacked-messages is detected, the server assumes that the partner is offline. In this case, the server can either automatically start processing packets directed to the partner, or signal the partner's failure to the monitoring system.

Hot Standby

This is another configuration involving a pair of servers. One of the servers is designated as primary. Another one is a hot-standby server. The primary server is processing the entire DHCP traffic. It synchronously sends lease updates to the standby server. The standby server doesn't process any DHCP traffic as long as the primary server is online. The standby server receives lease updates and heartbeat messages from the primary and responds to these commands. If the primary server finds that the standby doesn't respond to the control commands it signals that to the monitoring system.

The standby server detects the failure of the primary in the same way as for the load balancing case. When the standby server detects that the primary is offline, it takes over the entire DHCP traffic directed to the system.

The following is the sample configuration for the hot standby mode:

{
    "high-availability":  [
        {
            "this-server-name": "server1",
            "mode": "hot-standby",
            "heartbeat-interval": 10,
            "max-response-delay": 60,
            "max-ack-delay": 10,
            "max-unacked-messages": 10,
            "peers": [
                {
                    "name": "server1",
                    "url": "http://server1.example.org:8080/",
                    "role": "primary"
                },
                {
                    "name": "server2",
                    "url": "http://server2.example.org:8080/",
                    "role": "standby"
                }
            ]
        }
    ]
}

In this configuration, an additional parameter role has to be specified to indicate which server is a primary and which one is standby. It is not allowed to specify the same values for both servers, e.g. two primary servers or two standby servers. Optionally, a backup role can be used instead of the standby. The difference is that the backup server doesn't automatically detect failures of the primary. The backup server can take over the DHCP service in case of the primary server's failure, but this needs to be triggered manually.

Also, when the primary server comes back online after a failure, it doesn't automatically synchronize a database with the backup server, nor it starts the DHCP service automatically. The primary has to be manually instructed to synchronize the database with the specified backup server and then DHCP service enabled.

Multi-site Configuration

In the classic failover case, it is possible to configure two DHCP servers in the different sites such that the server1 is a primary server for the site 1 and a standby server for site 2. Similarly, the server2 is a primary server for site 2 and a standby server for site 1. In Kea 1.4.0 it won't be possible to configure a server to participate in multiple relationships, i.e. the server can't be a primary and a standby server at the same time.

To achieve the equivalent setup with Kea HA, it will be necessary to deploy two DHCP servers per site, one being primary server for this site and the second one being a standby server for another site.

Multiple Servers

It is possible to configure more than two servers to perform HA as shown on the picture below.

In this topology, the server 1 and server 2 operate in the hot standby mode. There is third server which is configured as backup, which means that it receives all updates from the server responding to DHCP queries, but it doesn't react to the other servers' failures. The system must not be configured with more than one standby server because the standby servers would have no means to detect which of them should react to the failure. The backup server can be activated manually by the system administrator if required, e.g. when both server 1 and server 2 crash.

It must be noted that introducing additional backup servers will have a negative impact on the performance of the active server, because it will have to send synchronous lease updates to multiple locations.

Another variation of this setup is when there is no standby server, but only backup servers. Any of those servers can be activated manually in case of active server's failure.

Finally, the backup servers can be added to the load balancing configuration. In this case, both load balancing servers will be sending lease updates to the backup servers apart from sending lease updates to each other.

Central Lease Database

All use cases presented so far can be used in conjunction with any supported lease database backend. If the external database is in use, e.g. MySQL, it is possible to use database replication capabilities to provide high availability of the DHCP service.

In this scenario, both DHCP servers talk to the same database over the virtual IP address. All updates to the database are replicated to the slave database instance. The DHCP servers can be either configured to operate in the load balancing, hot standby or active-backup mode. When the database crashes, the MySQL requests will be directed to the slave database instance. The failure of any of the Kea servers is handled as usual, but the database synchronization is not performed by Kea.

When one of the servers (primary, standby, or load balancing) comes back online after a failure, it follows similar procedure to enable the DHCP service as described in previous sections, except that it doesn't synchronize the lease database with the peers. It transitions directly to the ready state (bypassing syncing state).

The HA configuration will look similar to this:

{
    "high-availability":  [
        {
            "this-server-name": "server1",
            "mode": "load-balancing",
            "send-lease-updates": false,
            "heartbeat-interval": 10,
            "max-response-delay": 60,
            "max-ack-delay": 10,
            "max-unacked-messages": 10,
            "peers": [
                {
                    "name": "server1",
                    "url": "http://server1.example.org:8080/",
                    "auto-failover": true
                },
                {
                    "name": "server2",
                    "url": "http://server2.example.org:8080/",
                    "auto-failover": true
                }
            ]
        }
    ]
}

The new parameter lease-updates must be set to false to indicate that the server must not perform lease updates nor leases synchronization because this is handled by the external lease database internally.

Subnets and Pools Configuration for HA

The following subnet configuration includes two pools. One of the pools is only allowed for packets classified as belonging to "server1", another one is only allowed for packets belonging to "server2". When the packet is received by the servers, the load balancing algorithm assigns the packet to one of those classes if the class is selected based on the output of the algorithm and it matches the value specified in this-server-name. If the candidate class doesn't match the local server name the class is not assigned to the packet. The packet which is not assigned to any of the "server1" or "server2" classes, will not be processed by the server, i.e. the server will not be able to locate a pool for this packet because all specified pools require a class name.

"subnet4": [
    {
        "subnet": "192.0.2.0/24",
        "pools": [
            {
                "pool": "192.0.2.1 - 192.0.2.50",
                "client-class": "server1"
            },
            {
                "pool": "192.0.2.51 - 192.0.2.100",
                "client-class": "server2"
            }
        ]
    }
]

Similarly, it is possible to split subnets rather than pools to different HA peers. In such case, the client classes should be specified in the subnet scope.

"subnet4": [
    {
        "subnet": "192.0.2.0/24",
        "client-class": "server1",
        "pools": [
            {
                "pool": "192.0.2.1 - 192.0.2.100",
            }
         ]
    },
    {
        "subnet": "10.1.1.0/24",
        "client-class": "server2",
        "pools": [
            {
                "pool": "10.1.1.1 - 10.1.1.100"
            }
         ]
    }
]

Finally, it is also possible to split shared networks between different HA peers.

The use of classes is only required in case of load balancing setup. In the hot-standby or backup modes there is no need to select pools (or subnets) by classes because there is always exactly one server doing DHCP. This server can use all available pools.

Configuring the Server to Pause State Machine

The server which starts up after a downtime may require some administrative actions prior to starting normal operation, e.g. prior to lease database synchronization. For instance, an administrator may want to restore the lease database from some external source before the server attempts lease database synchronization with its partner. In this case, the administrator should be able to pause the HA state machine in the initial, i.e. waiting, state to restore the lease database. Then, the administrator may allow normal HA operation by sending a command to un-pause the state machine. Pausing the state machine should be possible without the need to send a command, because timing of such command would be critical. It is more reliable to provide a configuration parameter to pause the state machine in the given state and make it wait for further action.

It is envisaged that there will be other use cases in the future which will require HA state level configuration. Therefore, state level configuration scope state-machine is introduced to allow for specifying in which states the server should pause the state machine. For each state it is also possible to specify whether the state machine should be paused only once or every time the state is entered. Additional parameters controlling per state behavior (not necessarily related to pausing) may be added at the later time to the state-machine configuration scope.

The following configuration snippet demonstrates how to configure state machine pausing in two states:

{
    "high-availability":  [
        {
            "this-server-name": "server1",
            "mode": "load-balancing",
            "send-lease-updates": false,
            "heartbeat-interval": 10,
            "max-response-delay": 60,
            "max-ack-delay": 10,
            "max-unacked-messages": 10,
            "peers": [
                {
                    "name": "server1",
                    "url": "http://server1.example.org:8080/",
                    "auto-failover": true
                },
                {
                    "name": "server2",
                    "url": "http://server2.example.org:8080/",
                    "auto-failover": true
                }
            ],
            "state-machine": {
                "states": [
                    {
                        "state": "waiting",
                        "pause":  "once"
                    },
                    {
                        "state": "ready",
                        "pause":  "always"
                    },
                    {
                        "state": "partner-down",
                        "pause":  "never"
                    }
                ]
            }
        }
    ]
}

According to this configuration, the server will pause the state machine in the waiting state once (upon startup) and will pause the state machine every time the server enters the ready state. The server should never be paused in the partner-down state, which is the default setting. Therefore, the parter-down specific configuration could be omitted in this example, but it is provided to demonstrate all current configuration options.

In the future we may also allow for specifying maximum time for the server to remain paused in the given state. When this time elapses, the state machine will be automatically un-paused. However, there is no plan to implement this mechanism at this time (Kea 1.5 timeframe).

Load Balancing

Load balancing is performed in a HA hooks library, in the pkt4_receive or pkt6_receive callout. Each server is applying hash on a MAC address as described in RFC3074 to determine which of the servers should service the client's request. When two peers are involved, we're going to assume that odd hash bucket values will be serviced by first server ("server1" in the above example) and even hash bucket values will be serviced by the second server ("server2"). If the server finds that it is responsible for a packet, it will add a client class with the server name to the packet. The server will assign addresses from the pools (or subnets) dedicated to this server.

If the server determines that the given packet should be processed by its peer it must check whether the peer is online or offline. This information should be available in the hooks library and it is set according to the algorithm described in the "Failure Detection" section. If the peer is deemed to be offline the server takes responsibility for processing the current packet. In this case, it uses peer's name (e.g. "server2") as a client class name. That way, the server will process the packet normally belonging to its peer and will use the resources (pools, subnets) which the peer is normally responsible for.

When the server finds that the packet should be processed by a partner and the partner appears to be functional, the callout returns DROP status code. The server will subsequently drop this packet.

Communication Between Peers

Kea 1.3.0 exposes RESTful control interface via Control Agent, a separate process forwarding control commands to respective Kea services. The communication between Control Agent and DHCP servers is performed using unix domain sockets. The RESTful interface seems to be a good fit for communication between failover peers, because it already implements certain commands required for HA, i.e. lease queries. It also allows for sending and receiving long chunks of data over TCP, which is required for bulk update of leases between two peers during recovery after a failure.

The following picture shows Kea processes involved in communication.

Each Kea instance contains a Control Agent and the DHCP server of a given type. Lease updates are sent to another instance via its Control Agent. The Control Agent also forwards acknowledgments to the lease updates over the same link. The lease updates are generated by the server and sent directly to the Control Agent of the partner. The local Control Agent instance is not involved in this communication. The local Control Agent instance is used to receive lease updates from the partner.

The following subsections describe extensions required for the Kea code to facilitate communication between HA peers.

Extensions to RESTful API

In Kea 1.3.0 a new HTTP connection is opened for each control comand. When the server detects the end of the command or a timeout occurs, the server closes the connection. This works fine for a typical case when RESTful API is merely used for updating server configuration and the commands are sent rather rarely. In the HA case, we're planning to synchronously notify the peers about lease allocations. In the heavy load, there might be hundreds or thousands of lease updates per second sent between the peers. In such case, establishing a new TCP connection for each lease update is not an option. Therefore, the RESTful API has to be extended to support HTTP 1.1 persistent connections, i.e. connections are by default persistent and they are only closed when specifically signalled by a client or by a server. The server may choose to close the connection after a certain period of client's inactivity. In such case, the connection can be re-established when neccessary, e.g. when the lease update has to be delivered to a HA peer or when a heartbeat message needs to be sent.

See w3.org for the details of persistent connections in HTTP 1.1. There are no plans to support HTTP 1.0 "keep-alive" connections at this point. The HA peers will always use HTTP 1.1 for communication.

Client Side Communication

The libkea-http library contains generic classes and functions managing server side HTTP connections, HTTP requests parsing, generating responses etc. The RESTful API implementation is created using this library.

The communication between the HA peers requires client side HTTP support as well. Kea 1.3.0 does provide a kea-shell application for sending control commands over the RESTful API, but it is not going to be practical to use this application for lease updates. One reason for this is that we want to have control over the established connections. Another reason is to avoid receiving long responses (such as bulk lease queries) over the shell application. Thus, we plan to extend the libkea-http library with new classes and functions to establish client side connection over the RESTful API, send HTTP 1.1 requests, receive and parse responses. This library already conatins useful C++ classes that may be extended for such purpose. The client side connections can be implemented using libkea-asiolink library.

The lease update requests will be generated in a hooks library (HA hooks library) attached to the DHCP server process. This means that each DHCP server will act as a "controlling client" to its peers. The peers will receive lease updates over their instances of the Control Agent (CA).

Generating Asynchronous Lease Updates

The DHCP message processing is synchronous. We aim for sending asynchronous lease updates from the hook libraries and do not send a DHCP response to the client until the lease update is finished. While the server is waiting for the completion of the asynchronous lease update, it should be responsive both to control commands and to DHCP traffic. This is going to require changes both in the hooks framework and DHCP server code.

An ability to put packet processing on hold while the asynchronous operation is in progress is called here "packets parking".

We're introducing a new a new status code to be returned by the callouts, i.e. NEXT_STEP_PARK. When the callout sets this status code, the server will park the processed packet if packets parking is supported for the particular hook point. Parking packet is not going to be supported for all hook points from the first release, because implementing it for all hook points would be very involved. Initially, we're going to focus on supporting packets parking for a new hook point "leases4_committed". The callouts for this hook point will be triggered after the server has finished allocating, renewing or releasing all leases for the particular DHCP transaction. The callouts are not triggered for message types which do not modify lease database, i.e. DHCPDISCOVER, DHCPINFORM.

The "leases4_committed" hook point will be placed right before the existing "pkt4_send" hook point and it will receive the following arguments:

  • query4, type: Pkt4Ptr, Holds a pointer to the DHCP query
  • leases4, type: Lease4Collection, Holds a collection of allocated or renewed leases
  • deleted_leases4, type: Lease4Collection, Holds a collection of deleted leases

When the callout returns NEXT_STEP_PARK status code the server has to store the state of the packet, until it is unparked. The new class hooks::ParkingLot? will be created, which will hold a collection of parked packets for the selected hook point. The CalloutHandle object provided as an argument to the callouts will hold the shared pointer to the parking lot. The hooks library can use this pointer to unpark the packet when the asynchronous operation completes.

One of the fundamental problems to be solved is how to deal with the situation when multiple callouts schedule asynchronous operations on the same hook point. The respective callouts are unaware whether other callouts have scheduled asynchronous operations and thus whether the packet can be unparked or have to wait for the other operations to complete. This is going to be solved by introducing reference counting on the parked packet. The callout which returns NEXT_STEP_PARK must call ParkingLot::reference(query) to bump up the reference count on the parked packet. When the ParkingLot::unpark(query) is called, the reference counter is decreased. The packet is not unparked until the reference count goes to 0.

The example callout can look similar to this:

int
leases4_committed(CalloutHandle& callout_handle) {
    try {
        impl_->leases4Committed(CalloutHandle& callout_handle);

    } catch (...) {
        return (1);
    }
}

And the HAImpl class where the callout is implemented could contain a function that does the actual job.

void
HAImpl::leases4Committed(CalloutHandle& callout_handle) {
    Pkt4Ptr query4;
    Lease4CollectionPtr leases4;
    Lease4CollectionPtr deleted_leases4;
    IOServicePtr io_service;

    // Get all arguments available for the leases4_committed hook point.
    callout_handle.getArgument("query4", query4);
    callout_handle.getArgument("leases4", leases4);
    callout_handle.getArgument("deleted_leases4", deleted_leases4);

    // Both should never be empty.
    if (leases4->empty() && deleted_leases4->empty()) {
        return;
    }

    // Get the parking lot for this hook point. We're going to remember this
    // pointer until we unpark the packet.
    ParkingLotHandlePtr parking_lot = callout_handle.getParkingLotHandlePtr();
    // This is required step every time we ask the server to park the packet.
    // The reference counting is required to keep the packet parked until
    // all callouts call unpark. Then, the packet gets unparked and the
    // associated callback is triggered. The callback resumes packet processing.
    parking_lot->reference(query4);

    // Create the HTTP client which we'll use to send the lease update. In the
    // real code we don't want to recreate the client every time the callouts
    // are invoked because it would cause all connections to be closed. This
    // is created here to keep the example simple.
    HttpClient client(*io_service);

    // Let's only focus on the lease update here. We should make similar steps
    // for deleted leases, but here we only provide an example for lease
    // updates and new allocations.
    ElementPtr lease_as_json = leases4->at(0)->toElement();
    lease_as_json->set("force-create", Element::create(true));
    // Create HTTP request with the command as body.
    PostHttpRequestJsonPtr request(new PostHttpRequestJson(HttpRequest::Method::HTTP_POST, "/",
                                                           HttpVersion::HTTP_10()));
    request->setBodyAsJson(config::createCommand("lease4-update", lease_as_json));

    // Create the pointer to the object where response will be stored.
    HttpResponseJsonPtr response(new HttpResponseJson());

    // Start asynchronous request. It will return immediately. The callback function
    // specified as the last argument will be invoked when the response comes
    // back. In the real code we should check error code, error string and
    // the response, then log any potential errors. Here, we only demonstrate
    // that when the response comes back we will trigger unpark to resume the
    // packet processing.
    client.asyncSendRequest(getConfig()->getPeerConfig("server2")->getUrl(), request, response,
                            [this, parking_lot, query4](const boost::system::error_code& ec,
                                                        const HttpResponsePtr& response,
                                                        const std::string& error_str) {
        parking_lot->unpark(query4);
    });

    // The callout returns this status code to indicate to the server that it
    // should park the query packet.
    callout_handle.setStatus(CalloutHandle::NEXT_STEP_PARK);
}

The IO service object used in this callout should be provided by the server, so as both callouts and the server use the same instance of the IO service. The "leases4_committed" hook point is not appropriate for passing the IO service to the hook library, because the IO service instance is required by the hook library much earlier than when this hook point is triggered. The load() function doesn't provide means to pass arguments to the hook library. The proposed solution is to add new hook point "dhcp4_srv_configured" which will be triggered as a result of the server configuration and will be used to provide a common instance of the IO service to the hook library.

An example callout implementation looks like this:

IOServicePtr io_service;

int dhcp4_srv_configured(CalloutHandle& handle) {
    try {
        handle.getArgument("io_service", io_service);
  
    } catch (...) {
        LOG_ERROR(ha_logger, HA_IOSERVICE_NOT_CONFIGURED);
        return (1);
    }
    return (0);
}

In order to facilitate packet parking the server has to be modified to be able to resume processing of the parked packet when the hook libraries signal that the packet should be unparked. This is a major problem because the Dhcpv4Srv class is quite complex. After thorough analysis of the existing code, the following changes appear to be the least intrusive:

  1. Introduce new "leases4_committed" hook point right before "pkt4_send" hook point
  2. The "pkt4_send" and "bufffer4_send" hook points are moved to separate functions.
  3. When "leases4_committed" callout returns NEXT_STEP_PARK the code parks the packet with the pointer to the function that triggers "pkt4_send" and "buffer4_send", when unparked.
  4. When "leases4_committed" callout returns NEXT_STEP_CONTINUE, the code runs as usual, i.e. the "pkt4_send" and "buffer4_send" are invoked immediately.

Generating Synchronous Lease Updates

The code example from the previous section demonstrates how to implement asynchronous lease updates. The leases4_committed hook point can also be used to perform synchronous lease updates. The following example demonstrates how synchronous lease updates can be implemented.

void
HAImpl::leases4Committed(CalloutHandle& callout_handle) {
    Pkt4Ptr query4;
    Lease4CollectionPtr leases4;
    Lease4CollectionPtr deleted_leases4;

    // Get callout arguments.
    callout_handle.getArgument("query4", query4);
    callout_handle.getArgument("leases4", leases4);
    callout_handle.getArgument("deleted_leases4", deleted_leases4);

    // In some cases we may have no leases, e.g. DHCPNAK.
    if (leases4->empty() && deleted_leases4->empty()) {
         return;
    }

    // Synchronously send lease updates here
    // ...
}

The careful reader would notice that the callout implementation is greatly simplified comparing to the asynchronous updates case. That's because there is no need to park the DHCP query, reference count the parked query and unpark the query when the lease update is completed.

HA Communication Separation

The control channel requires Kea Control Agent process to be running, as it exposes HTTP and forwards commands to the respective Kea servers. In a simple scenario, it should be possible to use the same instance of the Control Agent for HA and for the administrative commands. However, when the separation of the HA and administration is required, it should be possible to use two instances of the Control Agent. In this case, both will need to be bound to the same unix domain socket. There is a potential problem with multiplexing commands from different CA instances over the same unix domain socket. However, the communication channel between CA and DHCP servers is fully synchronous, so it is not really possible for the DHCP server to receive partial commands. That solves the problem with multiplexing. The only question is whether we're going to allow asynchronous communication between CA and DHCP servers in the future. If we do, we will need a separate unix domain socket for different CA instances and corresponding updates to the configuration syntax.

Generating Lease Updates

There are many points during the packet processing when the server adds, deletes or updates leases in the database. There is a problem of database consistency between the HA peers when such updates are performed. The first idea to address this problem was that we'd add new hook points for deleting, adding and updating leases, which would trigger appropriate control commands (lease updates). However, this poses several issues. The most severe is the significant increase in generated traffic over the control channel. The second problem stems from the fact that two servers can send the lease updates at (nearly) the same time. In such case, there is a high probability of a deadlock while both servers wait for an acknowledgement from their partners. We are going to solve the deadlock issue by "parking a ready packet" and allowing the server to continue operating while waiting for the response from the peer. However, this requires that the DHCP answer is ready for sending before it can be parked. The response is not ready at the stage when the server performs local lease updates (or acquisition), so the idea of sending a lease update for each write operation on the database doesn't seem viable.

There are also situations when a local lease update is not triggered by DHCP packet processing, i.e. leases reclamation. It is possible to implement callouts for hook points associated with lease reclamation which would trigger appropriate lease updates. However, a simpler solution seems to be to simply let the servers deal with the expired leases they have in their databases on their own. This assumes that each server is configured to run periodic lease reclamation with the same interval.

Heartbeat

The Kea servers working in HA configuration should periodicaly check if their peers are online. One of the indicators is that peer responds to the commands sent over the control channel. Under the heavy DHCP load, the peers constantly communicate with each other to provide lease updates. However, there are times when the load may be significantly decreased, e.g. night time when many devices are down, or simply long lease lifetimes may cause long periods of time when no client is renewning a lease.

To maintain the information about peer's availability, the Kea server should periodically send "heartbeat signal" to the peer. If the communication fails (e.g. timeout or error) it is a first indication that the peer may be down. A response to the heartbeat command will include a server state. Diferrentiation between the server states is important when the server comes back online after the failure. Its running peer determines whether it should continue to serving leases for the other server, or it should transition back to the load balancing mode. The following states are defined and can be included in the response to the heartbeat command:

  • syncing - the server is synchronizing data after down time. Its running peer should continue serving leases for this server as it is not yet ready to take over.
  • ready - the server completed synchronization of the data after down time and is ready to transition to the load balancing (normal) state.
  • load balancing - the server is serving its own clients and its peer should serve its clients.
  • partner down - the server has discovered that its peer is down and has taken over serving leases for the peer.

The handler for heartbeat command is going to be implemented within HA hooks library.

During normal operation, the heartbeat command will only be sent to the peer if the period when no lease updates are sent to the peer is longer than configured. The timer counting the time to the next heartbeat will be set after completion of any command sent to the peer. The lease update (or any other command) will cancel this timer. If no command is sent before the timer expires, the heartbeat command is sent. If the lease update fails due to an IO error or response message indicates some other error, the server considers this case as if the heartbeat has failed. It may be followed by the heartbeat message to retrieve the state of the peer.

The periodic heartbeat will be implemented in terms of IntervalTimer class. The timer will be created and controlled within HA hooks library. The events associated with the timer will be triggered by the global IOService instance associated with the server. Currently, this object is created and returned by the Dhcpv4Srv class and is not accessible by hooks libraries. We'll need to move initialization of this object to a global scope where hooks can access it.

Failure Detection

When two servers are running in the load balancing or hot-standby mode, they communicate with each other sending lease updates. Any failure in that communication is the first indication that the other server may be down. The frequency of lease updates being sent depends on the DHCP traffic in the network. If the traffic is low, e.g. during the night hours, the servers should still be able to detect failures reasonably quickly. Therefore, the servers send heartbeat messages between the lease updates to guarantee that the communication with the partner is maintained frequently enough. A failure in this communication indicates a potential problem with another server.

There are several ways in which the communication issue may manifest. The most obvious one is the timeout in such communication. This may happen when the machine running the DHCP server has crashed or rebooted. Another possible case is that the Control Agent on the remote machine is still running but the DHCP service crashed or the DHCP service is running but has no connection to the lease database. In such case, the response may be received but will indicate an error. In particular, the Control Agent may indicate that the DHCP service is down. Such explicit notification causes the local server to transition to the partner down state.

When there is no explicit notification, the server will continue trying to communicate with the partner because the issue may be temporary. If the communication is unsuccessful for the specified amount if time, the server assumes that the communication with the partner is no longer possible. At this point, this server could take over the DHCP traffic normally handled by the partner, if configured to do so. However, it is likely that the partner is still answering the DHCP requests but simply there is an issue with the communication between HA peers. Therefore, the default behavior in the communication-failed state is to monitor packets that should be processed by the partner to detect if the partner is answering the DHCP requests from the clients.

The HA peers typically do not see the responses from other DHCP servers. However, they receive all packets that should be processed by the partner. By monitoring the values in the secs field or elapsed time option it is possible to detect that the other server hasn't responded within a reasonable amount of time, thus forcing the clients to retry. A single DHCP client retrying to send DHCPDISCOVER is not yet a proof for DHCP server's inactivity. The server may simply drop the requests from the given client, e.g. if the particular client is not allowed in the network by the administrative policy. Therefore, the server would typically wait for multiple DHCPDISCOVER messages from different clients before it tranisions to the partner down state.

It is likely that the number of DHCPDISCOVER messages in the network is very low or no DHCPDISCOVER messages are sent, because all clients have already got their leases. In order to cover such scenario the server also monitors DHCPREQUEST messages received from the clients in the rebinding state. In fact, if the server is operational, there shouldn't be any messages from the rebinding clients or the volume of those should be very low. The increased number of messages from the rebinding clients indicates that the clients are unable to communicate with their server. If the server detects a certain number of messages from the rebinding clients it will assume that the other server is not responding to the requests and will transition to the partner down state.

To make server configuration simpler, we will just provide a single parameter called max-unacked-messages which will specify the maximum number of unanswered messages, including both discovery and rebinding, before the server transitions to the partner down state.

If the two servers can't communicate with each because of the control channel failure (network partition) the servers are unable to send lease updates to each other. However, both servers may be still responding to the DHCP requests sent to them. Normally, the server waits with a response to the client until it gets an acknowledgement to a lease update. If the network is partitioned the lease updates can't be delivered. In such case, the servers must respond to the DHCP clients despite not delivering lease updates to the peers. If the servers were dropping the DHCP requests because the network partition, it would lead to the increased number of the rebinding requests from the DHCP clients. As a consequence, both servers would transition to the partner down state.

The following state diagram presents the failure detection process in more detail.

UML Diagrams

This section contains state and sequence diagrams for the DHCP servers running in HA configuration.

Server Startup

The following state diagram describes the situation when the load balancing server is first started or comes back online after a failure, until it begins to serve leases.

When the server is started it begins with reading and parsing its configuration. If it is not configured to use HA, it simply enables DHCP function and starts serving DHCP clients as usual. If the HA is enabled, the server enters Waiting state. The server doesn't know the state of the partner yet. Thus, it starts sending heartbeat messages to detect partner's state. The partner may be running (possibly in the Partner Down state) or may be initializing, if both servers has been launched together. if the partner appears to be synchronizing its database the server stays in the Waiting state until synchronization completes. If the servers have been launched at nearly the same time, it is possible that both are in the Waiting state. It is a deadlock situation because we neither want both servers to wait for each other to complete database synchronization nor we want them to transition to the Synchronizing state where they would both update each other's lease database. Therefore, one of the servers must be administratively designated as primary. The second one is a secondary. The primary server being in the Concurrent Waiting state is transitioned to the Synchronizing state, while the secondary is transitioned back to the Waiting state. The deadlock is resolved.

The synchronizing server may need to stop DHCP function on the partner to avoid changes in its lease database while synchronization is performed. When the database synchronization is complete, the server enters Ready state to indicate that it can start serving DHCP clients, but it doesn't start yet. The other server being in the Waiting state will detect that the partner is done synchronizing the leases and can also synchronize its database. When they both enter the ready state the primary server moves to the Load Balancing state. Its partner will eventually see that the server is in this state and will join it, also doing load balancing. At any state, it is possible that the other server is not responding to control commands (and DHCP queries), in which case the running server transitions to the Partner Down state.

If the server gets into Ready state, it is possible that the partner is already running, e.g. performing load balancing or covering Partner Down situation. In the first case, the server can automatically transition to the Load Balancing state. In the latter case, the server remains in the Ready state until the partner discovers it and moves to Load Balancing. This is to avoid the situation that both servers run and serve the same set of clients.

Server Operation

The following sequence diagram includes two clients and two servers. During the normal operation (load balancing), the client1 is served by the server1 and the client2 is served by the server2. The diagram includes the failure scenario when server1 starts to also serve client2, when server2 stops responding to DHCP queries and commands over the control channel.

The sequence starts with client1 performing 4-way exchange with the server1. The server1 synchronously notifies the server2 about the new lease being handed out. Between the DHCP requests, both servers are sending heartbeat messages over the control channel (the diagram only shows the heartbeat sent by server1 for simplicity). The server1 also sends synchronous lease update when the client1 renews its lease.

Then, the heartbeat is sent but no response is returned from the server2. This may indicate that the server2 is offline. The server1 doesn't transition into the partner down state yet. It monitors DHCP messages received from client2, which should be normally served by the server2. When the secs field value exceeds the maximum delay in server response the server1 finally assumes that the server2 is offline and the server1 start responding to queries targetted at server2. The lease updates are not sent to the server2 because this server is offline. While serving server2's clients the server1 continues to send heartbeat messsages to the server2.

Meanwhile, the server2 wakes up, so it sends syncing status to indicate that it is not quite ready yet. When the database is synchronized, the server2 starts sending ready status to indicate that it may now transition back to the load balancing state. At this point, the server1 stops responding to any queries for server2. It continues to respond to its own queries. The server2 will respond to subsequent queries from the client2.

Simultaneous Lease Updates

With two servers synchronously sending lease updates to each other, there is a potential deadlock when two servers at nearly the same time send such updates. Each server would wait for the response from the partner and would be unable to handle the lease update from the partner while waiting. In order to solve this problem this design proposes a modification to the Kea server logic which would allow for "parking" a DHCP message being processed while waiting for the asynchronous operation to complete. The parked response is ready to be sent but it is temporarily held in the queue. The response is sent as soon as the asynchronous operation completes, if the asynchronous operation is successful. If the response is unsuccessful, the parked response can be dropped or some error handling operation can be triggered.

The following diagram demonstrates a case when two servers perform lease updates simultaneously, and how this situation is resolved.

The sequence diagram shows two clients and two servers. The clients send DHCPREQUEST messages to the respective servers at nearly the same time. Both servers process received packets and park ready responses. Next, each server triggers asynchronous lease update to the partner. The callback executed when the lease update comes back is associated with a certain parked response. While the asynchronous operation is in progress, the server continues "normal" operation. The server could process new DHCP packets and/or run IO service which results in receiving responses over the control channel. The lease updates may be sent in chunks (partial lease updates) and the partner will gather these chunks until it receives the entire command. When the entire command is received, the acknowledgment is generated and sent back over the control channel. Again, it may be sent in chunks when the response is large.

Note that thanks to asynchronous communication both servers send their partial lease updates simultaneously. Depending on which of them has received the entire lease update this server first responds to the partner. The partner can now pop parked DHCP response and send it back to the client.

Controlling HA State Machine

State machine can be paused once or multiple times in selected states, according to the HA hooks library configuration. The StateModel class is going to be extended to allow for specifying a pausing mode for each defined state. The !StateModel::defineState function needs to take additional parameter which provides this mode. The !State object has to hold this mode as well as the information whether it was already visited (transitioned to) or not. The new State::shouldPause function will return a boolean flag indicating whether to pause the model or not. It is going to be invoked from the !StateModel::setState function.

An administrator may un-pause the state machine by sending the ha-continue command to the server. This command will trigger !StateModel::unpause function which will flip the boolean flag indicating that the model is paused. The state handlers use this flag to determine whether they should proceed normally or return if the state machine is paused.

New Commands

This section describes the syntax of the new commands required by the HA.

Disable DHCP Service

There are generally two use cases for disabling DHCP operation: globally disable DHCP for all subnets and network, selectively disable DHCP operation for specific subnets and/or networks (scopes). These use cases are orthogonal, i.e. when the service is globally disabled and then enabled, it doesn't affect the state of the DHCP service for the scopes. The global and scope specific settings for the DHCP service is controlled via two different commands.

In order to globally disable the DHCP service the following command can be used:

{
    "command": "dhcp-disable"
}

In order to disable DHCP service for selected subnets and/or networks:

{
    "command": "dhcp-disable-scopes",
    "arguments": {
        "subnets": [ 1, 2, 3, 4 ],
        "networks": [ "foo", "bar" ]
    }
}

In the failover case it is critical to provide a mechanism to automatically re-enable the DHCP service if the HA peer which disabled it never came back to re-enable the service explicitly. One example of such situation is when the HA peer disabled the DHCP service on the other server to synchronize the database, but died before completing this synchronization. Both commands provided above allow for specifying the optional max-period parameter which specifies the maximum number of seconds for the service or scopes to be disabled. When this time elapses, the changes are reverted (DHCP service is re-enabled).

For example, the following command:

{
    "command": "dhcp-disable",
    "arguments": {
        "max-period": 20
    }
}

globally disables DHCP service for the maximum time of 20 seconds. The DHCP service may be re-enabled with dhcp-enable command before this time elapses.

Similarly, it is possible to disable DHCP service for selected subnets for a given period of time:

{
    "command": "dhcp-disable-scopes",
    "arguments": {
        "subnets": [ 1, 2, 3, 4]
        "max-period": 20
    }
}

It is possible to re-enable some or all of these scopes by sending dhcp-enable-scopes command.

In case the dhcp-disable and/or dhcp-disable-scopes with max-period value are sent multiple times, the server accumulates these commands. Consider the following sequence of commands:

{
    "command": "dhcp-disable",
    "arguments": {
        "max-period": 20
    }
}
{
    "command": "dhcp-disable-scopes",
    "arguments": {
        "subnets": [ 1, 2 ]
        "max-period": 20
    }
}
{
    "command": "dhcp-disable-scopes",
    "arguments": {
        "subnets": [ 2, 3, 4 ]
        "max-period": 20
    }
}

It results in DHCP being globally disabled and also explicitly disabled for the subnet with ids: 1, 2, 3, 4. For each received command the timer counting max-period is reset. When the max-period elapses, the service gets globally enabled and also enabled for the subnet ids of 1, 2, 3 and 4.

Enable DHCP Service

The following command globally enables DHCP service.

{
    "command": "dhcp-enable"
}

The following command enables DHCP service for selected scopes:

{
    "command": "dhcp-enable-scopes",
    "arguments": {
        "subnets": [ 1, 2, 3, 4 ],
        "networks": [ "foo", "bar" ]
    }
}

If the DHCP service was disabled for the specified amount of time, i.e. max-period parameter was specified, this parameter remains in force as long as there are any scopes for which the DHCP service was disabled with this parameter and which haven't been re-enabled with dhcp-enable-scopes.

Get All Leases

The following two commands retrieve all leases or all leases for specified subnets. If the _subnets_ argument is not specified, all leases are returned. This is useful when the lease database is synchronized with a peer after a failure.

{
    "command": "lease4-get-all",
    "arguments": {
        "subnets": [ 1, 2, 3, 4 ]
    }
}

For the DHCPv6 case:

{
    "command": "lease6-get-all",
    "arguments": {
        "subnets": [ 1, 2, 3, 4 ]
    }
}

Lease updates

Kea already provides lease4-add, lease4-update, lease4-del and the corresponding DHCPv6 specific commands to add, update and remove leases. They have been designed to work for lease database administration rather than to convey lease updates between the HA peers. The peers have no means to know whether the specific lease already exists in the partner's database or not. It is not acceptable to perform lease queries prior to selecting appropriate command, because of the performance implications and the unnecessary increase of the network traffic.

This design proposes to extend the existing lease4-update and lease6-update with the force-create boolean parameter indicating that the lease must be created if it doesn't exist.

Heartbeat Command

The heartbeat commands are sent between the HA peers to detect if failures. In the fatal failure case (e.g. server crash) no response will be received from the peer and the heartbeat will be lost. If the peer is online (e.g. waking up or ready for service), the server status will be returned.

{
    "command": "ha-heartbeat"
}

and the response format:

{
    "result": 0,
    "text": "HA peer status returned.",
    "arguments": {
        "status": "syncing" | "ready" | "load-balancing" | "partner-down"
    }
}

Synchronize Lease Database

The lease database synchronization is often triggered automatically, when the server starts up. However, it is also possible to manually start synchronization, e.g. when the backup server is started or when the automatic synchronization is administratively disabled.

{
    "command": "ha-synchronize",
    "arguments": {
        "max-period": 20,
        "server-name": "server2"
    }
}

This command includes a parameter specifying a name of the server to synchronize with. The local server will first disable DHCP service on the remote server using dhcp-disable command. The max-period parameter is used to make sure that the DHCP service is resumed on the remote server in case the local server dies during this synchronization, thus failing to send dhcp-enable command.

Modify Service Scopes

In many cases it will be required to manually instruct the DHCP server to start serving clients which belong to its HA peers. The command that modifies the scope of service looks as follows.

{
    "command": "ha-service-scopes",
    "arguments": {
        "scopes": [ "server1", "server2" ]
    }
}

The scope in this context is the name of the server responsible for processing certain portion of received DHCP requests. For example, if there are two servers running load balancing and one of the servers crashes, the command above instructs the surviving server to serve all DHCP requests in this network. In other words, this command allows for transitioning the server to the partner down state manually. In order to return to load balancing, this command needs to be sent with a single scope to be served by this server, e.g.

{
    "command": "ha-service-scopes",
    "arguments": {
        "scopes": [ "server2" ]
    }
}

Un-pausing state machine

If the state machine is paused in any state and the administrator has finished required administrative action, he or she needs to send ha-continue command to allow the state machine to continue its normal operation, e.g. transition to the next state.

{
    "command": "ha-continue"
}

After receiving this command the server may pause its state machine in any other state if required by the configuration. If the state machine is un-paused in the state for which it was specified that it should pause only once, the server will never pause in this state again.

Location of Functions Required for HA

The following section lists new "functions" in Kea required for HA. Provided that the HA is going to be an optional feature, we should consider which parts of its implementation can be provided in hooks libraries and which require extensions to the Kea core code. The table below provides an information for each new function whether it fits into a hook library or/and a Core. It also makes an assessment regarding the preferred location out of those two, along with a commentary explaining the choice.

Hooks Core Preferred Comments
Load balancing algorithmYesYesHooksPrefer hooks because it provides more flexibility as to how the algorithm works.
HA configurationYesYesHooksPrefer hooks on the ground that Kea configuration is already complex. We can use user-context for it.
Long lived connectionsNoYesCoreRESTful API is entirely implemented in the core.
Communication with peersYesYesCoreThis is rather generic functionality for the server to be able to communicate with another server. So, probably better to implement it in Core.
Periodic heartbeatYesYesHooksChanges to Core are required to provide access to common IOService instance. The timer itself should be managed in hooks.
Failure detection algorithmsYesYesHooks Implementing in hooks gives more flexibility.
Leases replicationYesYesHooksReplication is an optional mechanism (when HA is in use) so it is heavy stuff and better to keep it separate.
Command: cease/resume DHCP serviceNoYesCoreCould be done in hooks but would require updates to Core anyway.
Commands for lease manipulationYesNoHooksWe already have lease manipulation hook lib.
Command: heartbeatYesYesHooksHA specific so better to put it into a hook.
Commands for triggering lease syncYesYesHooksHA specific so better to put it into a hook.
Using foreign server identifier by peerYesYesHooksHA specific so better to put it into a hook.

Tasks Required for HA Implementation

Checkpoint 1

Outcome: HTTP communication is possible over persistent connection. There is a hook point available which can be used to generate lease updates over the custom (non HTTP) channel.

  1. #5447: Create stub libdhcp_ha hooks library (1)
  2. #5454: Implement HA configuration parsing in the hooks library (2)
  3. #5448: Add support for persistent connections into Control Agent using HTTP (5)
  4. #5451: Create HTTP client classes in libkea-http (5)
  5. #5457: Parking DHCP packets (v4) (5)
  6. #5459: Send lease updates to the peer (v4) (2)
  7. #5472: Add "force-create" option to the lease4-update command (1)
  8. #5473: Add "force-create" option to the lease6-update command (1)
  9. #5468: leases4-get-all command (3)

Total of 25 days. Predicted completion: end of January, 2018.

Checkpoint 2

Outcome: Actual communication between peers takes place. It includes sending heartbeat command, fetching leases from the database and failure detection.

  1. #5461: Add a timer triggering a heartbeat command (2)
  2. #5463: heartbeat command (2)
  3. #5442: Extend dhcp-disable and dhcp-enable commands
    1. v4 part (1)
    2. v6 part (1)
  4. #5466: database fetching and synchronization (3)
  5. #5464: failure detection algorithm (v4) (3)
  6. #5474: ha-synchronize command (v4) (2)
  7. #5476: ha-scopes command (2)

Total of 18 days. Predicted completion: end of February, 2018.

Checkpoint 3

Outcome: State machine for DHCPv4 server implemented. The server can recognize its own state and the state of the partner. It uses client classification information to pick the right pool, subnet etc. It drops packets not aimed for this server.

  1. #5374: Allow for selecting a subnet/pool based on multiple client classes (3)
  2. #5425: Add client classification to pools (2)
  3. #5455: Add ability to drop the packet requesting renewal of an address which belongs to restricted pool (3)
  4. #5456: Implement load balancing algorithm with assigning appropriate classes to the received packet (4)
  5. #5470: Implement failover state machine (v4) (5)
  6. #5580: HA: fixes in DHCPv4 state machine post #5470 merge
  7. #5543 - Command for wiping all leases

Total of 17 days. Predicted date: end of March, 2018

Checkpoint 4

Outcome: HA is functional for DHCPv6. The HA solution is documented and all remaining features are added, e.g. SQL database configurations are handled.

  1. #5460: Send lease updates to the peer (v6) (2)
  2. #5462: Add a timer triggering a heartbeat command (v6) (2)
  3. #5465: failure detection algorithm (v6) (2)
  4. #5458: Parking DHCP packets (v6) (3)
  5. #5467: database fetching and synchronization (v6) (3)
  6. #5471: Implement failover state machine (v6) (5)
  7. #5475: ha-synchronize command (v6) (2)
  8. #5469: leases6-get-all command (2)
  9. #5477: database reconnect (3)
  10. #5604: Implement send-lease-updates parameter to disable lease updates for the database replication case
  11. #5478: User's Guide (2)
  12. #5479: Developer's Guide (1)

Total of 27 days. Predicted date: end of April, 2018

Under consideration

The following ideas were discussed, but so far they have not been assigned to any specific checkpoints. Depending on various factors and time constraints, they may or may not be done in 1.4 timeframe:

  1. Extend dhcp-disable and dhcp-enable commands to specify scopes
    1. #5519: v4 part (1)
    2. #5519: v6 part (1)
  2. #5540 - Keep unix socket connection open after command is processed
  3. #5541 - Command for adding multiple leases: design
  4. #5542 - Command for adding multiple leases: implementation
  5. #5544 - Extend leaseX-get-all to return number of leases as integer
  6. #5584 - Implement ability to store additional information (user context) with leases
  7. #5603 - Handle time skew between the HA peers
  8. tbd - Update User's Guide with some guidelines regarding the applicability of our HA solution
  9. tbd - ping before use
  10. tbd - leaseX-add called on HA-enabled server. Is the change propagated to the partner? Tomek's opinion: it shouldn't. there should be a switch that tell Kea to send the update. Otherwise leaseX-add/update/del is not usable by regular users in HA mode.

Example Test Environment

This section describes an example test environment for system testing the HA feature.

Overview

The test environment consists of two virtual machines (VM1 and VM2), which run Kea server instances with HA enabled. The third host is connected to both VMs via eth10 and eth20 interfaces and can send relayed DHCP traffic to them. The host can run any DHCP client to generate the DHCP traffic. One convenient option is to run perfdhcp application (shipped with Kea) which can simulate relayed traffic to specified destinations. In our example, the host is sending DHCP queries from the IP address of 192.168.56.1.

Both VMs run two Kea deamons: kea-dhcp4 and kea-ctrl-agent (DHCPv4 server and Control Agent). The respective daemons use very similar, but not exactly the same, configurations. All these configurations are provided below. The following are the differences between them.

For the Control Agents, the difference is in the "http-host" parameter as CAs have to bind to different IP addresses assigned on their interfaces. For the DHCP servers, the differences are in "this-server-name" HA parameter and interface names on which the DHCP service is enabled.

The DHCP queries can go to both DHCP servers via eth10 and eth20 interfaces respectively. The servers process those queries and send lease updates to each other using control channel. The server1 sends lease updates to the server2 via Control Agent on the VM2. The server2 sends lease updates to the server1 via Control Agent on the VM1. If there are no lease updates being sent between the servers (in case of low DHCP traffic) the servers send heartbeats over the control channels to check their presence. Both lease updates and the heartbeats are forwarded from the Control Agents to the respective DHCP servers for processing.

The DHCPv4 servers must load two hooks libraries: libdhcp_lease_cmds.so and libdhcp_ha.so. The former enables support for control commands for manipulating leases (e.g. receiving lease updates, returning leases from the lease database etc.). The latter contains the actual HA implementation.

Server 1 on VM1 configurations

DHCP Server on VM1

// This is an example configuration of the Kea DHCPv4 server. It uses High
// Availability hooks library and Lease Commands hooks library.
{
// DHCPv4 configuration starts here.
"Dhcp4": {
    // Add names of your network interfaces to listen on.
    "interfaces-config": {
        // The DHCPv4 server listens on this interface.
        "interfaces": [ "eth10" ]

        // Kea DHCPv4 server by default listens using raw sockets. This ensures
        // all packets, including those sent by directly connected clients
        // that don't have IPv4 address yet, are received. However, if your
        // traffic is always relayed, it is often better to use regular
        // UDP sockets. If you want to do that, uncomment this line:
        // "dhcp-socket-type": "udp"
    },

    // Kea support control channel, which is a way to receive management
    // commands while the server is running. This is a Unix domain socket that
    // receives commands formatted in JSON, e.g. config-set (which sets new
    // configuration), config-reload (which tells Kea to reload its
    // configuration from file), statistic-get (to retrieve statistics) and many
    // more. For detailed description, see Sections 8.8, 16 and 15.
    "control-socket": {
        "socket-type": "unix",
        "socket-name": "/tmp/kea-dhcp4-ctrl.sock"
    },

    // Use Memfile lease database backend to store leases in a CSV file.
    // Depending on how Kea was compiled, it may also support SQL databases
    // (MySQL and/or PostgreSQL) and even Cassandra. Those database backends
    // require more parameters, like name, host and possibly user and password.
    // There are dedicated examples for each backend. See Section 7.2.2 "Lease
    // Storage" for details.
    "lease-database": {
        // Memfile is the simplest and easiest backend to use. It's a in-memory
        "type": "memfile",
        // This parameter disables writing leases to a lease file. It is used
        // for testing purposes only and should be removed before going to
        // production.
        "persist": false,
        "lfc-interval": 3600
    },

    // Setup reclamation of the expired leases and leases affinity.
    // Expired leases will be reclaimed every 10 seconds. Every 25
    // seconds reclaimed leases, which have expired more than 3600
    // seconds ago, will be removed. The limits for leases reclamation
    // are 100 leases or 250 ms for a single cycle. A warning message
    // will be logged if there are still expired leases in the
    // database after 5 consecutive reclamation cycles.
    "expired-leases-processing": {
        "reclaim-timer-wait-time": 10,
        "flush-reclaimed-timer-wait-time": 25,
        "hold-reclaimed-time": 3600,
        "max-reclaim-leases": 100,
        "max-reclaim-time": 250,
        "unwarned-reclaim-cycles": 5
    },

    // Global timers specified here apply to all subnets, unless there are
    // subnet specific values defined in particular subnets.
    "renew-timer": 30,
    "rebind-timer": 40,
    "valid-lifetime": 60,

    // Many additional parameters can be specified here:
    // - option definitions (if you want to define vendor options, your own
    //                       custom options or perhaps handle standard options
    //                       that Kea does not support out of the box yet)
    // - client classes
    // - hooks
    // - ddns information (how the DHCPv4 component can reach a DDNS daemon)
    //
    // Some of them have examples below, but there are other parameters.
    // Consult Kea User's Guide to find out about them.

    // These are global options. They are going to be sent when a client
    // requests them, unless overwritten with values in more specific scopes.
    // The scope hierarchy is:
    // - global (most generic, can be overwritten by class, subnet or host)
    // - class (can be overwritten by subnet or host)
    // - subnet (can be overwritten by host)
    // - host (most specific, overwrites any other scopes)
    //
    // Not all of those options make sense. Please configure only those that
    // are actually useful in your network.
    //
    // For a complete list of options currently supported by Kea, see
    // Section 7.2.8 "Standard DHCPv4 Options". Kea also supports
    // vendor options (see Section 7.2.10) and allows users to define their
    // own custom options (see Section 7.2.9).
    "option-data": [
        {
            "name": "domain-name-servers",
            "data": "192.0.3.1, 192.0.3.2"
        }
    ],

    // Client classes will associate address pools with certain servers taking
    // part in HA.
//    "client-classes": [
//    ],

    // HA requires two hooks libraries to be loaded: libdhcp_lease_cmds.so and
    // libdhcp_ha.so. The former handles incoming lease updates from the HA peers.
    // The latter implements high availability feature for Kea.
    "hooks-libraries": [
        {
            "library": "/home/marcin/devel/kea-build/lib/hooks/libdhcp_lease_cmds.so",
            "parameters": { }
        },
        {
            // The HA hooks library should be loaded.
            "library": "/home/marcin/devel/kea-build/lib/hooks/libdhcp_ha.so",
            "parameters": {
                // High Availability configuration is specified for the HA hook library.
                // Each server should have the same HA configuration, except for the
                // "this-server-name" parameter.
                "high-availability": [ {
                    // This parameter points to this server instance. The respective
                    // HA peers must have this parameter set to their own names.
                    "this-server-name": "server1",
                    // The HA mode is set to load-balancing but it doesn't really
                    // mean anythning until we implement state machines.
                    "mode": "load-balancing",
                    // Heartbeat is to be sent every 10 seconds if no other control
                    // commands are transmitted.
                    "heartbeat-delay": 10,
                    // The following parameters control how the server detects the
                    // partner's failure. The ACK delay sets the threshold for the
                    // 'secs' field of the received discovers.
                    "max-ack-delay": 5,
                    // This specifies the number of clients which send messages to
                    // the partner but appear to not receive any response.
                    "max-unacked-clients": 20,
                    "peers": [
                         // This is the configuration of this server instance.
                         {
                             "name": "server1",
                             "url": "http://192.168.56.33:8080/",
                             "role": "primary",
                             "auto-failover": true
                         },
                         // This is the configuration of our HA peer.
                         {
                             "name": "server2",
                             "url": "http://192.168.56.66:8080/",
                             "role": "secondary",
                             "auto-failover": true
                         }
                     ]
                 } ]
            }
        }
    ],

    // This example contains a single subnet declaration.
    "subnet4": [
        {
            // Subnet prefix.
            "subnet": "192.0.3.0/24",

            // Specify two address pools. In the future we'll be able to associate
            // those pools with servers (server classes).
            "pools": [ { "pool": "192.0.3.100 - 192.0.3.150" },
                       { "pool": "192.0.3.200 - 192.0.3.250" } ],

            // These are options that are subnet specific. In most cases,
            // you need to define at least routers option, as without this
            // option your clients will not be able to reach their default
            // gateway and will not have Internet connectivity.
            "option-data": [
                {
                    // For each IPv4 subnet you most likely need to specify at
                    // least one router.
                    "name": "routers",
                    "data": "192.0.3.1"
                }
            ],

            // This subnet will be selected for queries coming from the following
            // IP address.
            "relay": { "ip-address": "192.168.56.1" }
        }
    ]
},

// Logging configuration starts here.
"Logging":
{
  "loggers": [
    {
        // This section affects kea-dhcp4, which is the base logger for DHCPv4
        // component. It tells DHCPv4 server to write all log messages (on
        // severity INFO or more) to a file.
        "name": "kea-dhcp4",
        "output_options": [
            {
                // Specifies the output file. There are several special values
                // supported:
                // - stdout (prints on standard output)
                // - stderr (prints on standard error)
                // - syslog (logs to syslog)
                // - syslog:name (logs to syslog using specified name)
                // Any other value is considered a name of a time
                "output": "stdout"

                // This governs whether the log output is flushed to disk after
                // every write.
                // "flush": false,

                // This specifies the maximum size of the file before it is
                // rotated.
                // "maxsize": 1048576,

                // This specifies the maximum number of rotated files to keep.
                // "maxver": 8
            }
        ],
        // This specifies the severity of log messages to keep. Supported values
        // are: FATAL, ERROR, WARN, INFO, DEBUG
        "severity": "INFO",

        // If DEBUG level is specified, this value is used. 0 is least verbose,
        // 99 is most verbose. Be cautious, Kea can generate lots and lots
        // of logs if told to do so.
        "debuglevel": 99
    },
    {
        // This section specifies configuration of the HA hooks library specific
        // logger.
        "name": "kea-dhcp4.ha_hooks",
        "output_options": [
            {
                "output": "stdout"
            }
        ],
        "severity": "DEBUG",
        "debuglevel": 99
    }
  ]
}
}

Control Agent on VM1

// This is a Control Agent configuration for HA testing. It includes
// settings for one of the HA peers. They specify an address and port
// on which the HTTP service is available for the HA peers.
{

// This is a basic configuration for the Kea Control Agent.
// RESTful interface to be available at http://192.68.56.33:8080/
"Control-agent": {
    "http-host": "192.168.56.33",
    "http-port": 8080,

    // Specify location of the files to which the Control Agent
    // should connect to forward commands to the DHCPv4 and DHCPv6
    // server via unix domain socket.
    "control-sockets": {
        "dhcp4": {
            "socket-type": "unix",
            "socket-name": "/tmp/kea-dhcp4-ctrl.sock"
        },
        "dhcp6": {
            "socket-type": "unix",
            "socket-name": "/tmp/kea-dhcp6-ctrl.sock"
        }
    }
},

// Logging configuration starts here. Kea uses different loggers to log various
// activities. For details (e.g. names of loggers), see Chapter 18.
"Logging":
{
  "loggers": [
    {
        // This specifies the logging for Control Agent daemon.
        "name": "kea-ctrl-agent",
        "output_options": [
            {
                // Specifies the output file. There are several special values
                // supported:
                // - stdout (prints on standard output)
                // - stderr (prints on standard error)
                // - syslog (logs to syslog)
                // - syslog:name (logs to syslog using specified name)
                // Any other value is considered a name of a time
                "output": "stdout"
            }
        ],
        // This specifies the severity of log messages to keep. Supported values
        // are: FATAL, ERROR, WARN, INFO, DEBUG
        "severity": "INFO",

        // If DEBUG level is specified, this value is used. 0 is least verbose,
        // 99 is most verbose. Be cautious, Kea can generate lots and lots
        // of logs if told to do so.
        "debuglevel": 0
    }
  ]
}
}

Server 2 on VM2 Configuration

DHCP Server on VM2

// This is an example configuration of the Kea DHCPv4 server. It uses High
// Availability hooks library and Lease Commands hooks library.
{

// DHCPv4 configuration starts here.
"Dhcp4": {
    // Add names of your network interfaces to listen on.
    "interfaces-config": {
        // The DHCPv4 server listens on this interface.
        "interfaces": [ "eth20" ]

        // Kea DHCPv4 server by default listens using raw sockets. This ensures
        // all packets, including those sent by directly connected clients
        // that don't have IPv4 address yet, are received. However, if your
        // traffic is always relayed, it is often better to use regular
        // UDP sockets. If you want to do that, uncomment this line:
        // "dhcp-socket-type": "udp"
    },

    // Kea support control channel, which is a way to receive management
    // commands while the server is running. This is a Unix domain socket that
    // receives commands formatted in JSON, e.g. config-set (which sets new
    // configuration), config-reload (which tells Kea to reload its
    // configuration from file), statistic-get (to retrieve statistics) and many
    // more. For detailed description, see Sections 8.8, 16 and 15.
    "control-socket": {
        "socket-type": "unix",
        "socket-name": "/tmp/kea-dhcp4-ctrl.sock"
    },

    // Use Memfile lease database backend to store leases in a CSV file.
    // Depending on how Kea was compiled, it may also support SQL databases
    // (MySQL and/or PostgreSQL) and even Cassandra. Those database backends
    // require more parameters, like name, host and possibly user and password.
    // There are dedicated examples for each backend. See Section 7.2.2 "Lease
    // Storage" for details.
    "lease-database": {
        // Memfile is the simplest and easiest backend to use. It's a in-memory
        // C++ database that stores its state in CSV file.
        "type": "memfile",
        // This parameter disables writing leases to a lease file. It is used
        // for testing purposes only and should be removed before going to
        // production.
        "persist": false,
        "lfc-interval": 3600
    },

    // Setup reclamation of the expired leases and leases affinity.
    // Expired leases will be reclaimed every 10 seconds. Every 25
    // seconds reclaimed leases, which have expired more than 3600
    // seconds ago, will be removed. The limits for leases reclamation
    // are 100 leases or 250 ms for a single cycle. A warning message
    // will be logged if there are still expired leases in the
    // database after 5 consecutive reclamation cycles.
    "expired-leases-processing": {
        "reclaim-timer-wait-time": 10,
        "flush-reclaimed-timer-wait-time": 25,
        "hold-reclaimed-time": 3600,
        "max-reclaim-leases": 100,
        "max-reclaim-time": 250,
        "unwarned-reclaim-cycles": 5
    },

    // Global timers specified here apply to all subnets, unless there are
    // subnet specific values defined in particular subnets.
    "renew-timer": 30,
    "rebind-timer": 40,
    "valid-lifetime": 60,

    // Many additional parameters can be specified here:
    // - option definitions (if you want to define vendor options, your own
    //                       custom options or perhaps handle standard options
    //                       that Kea does not support out of the box yet)
    // - client classes
    // - hooks
    // - ddns information (how the DHCPv4 component can reach a DDNS daemon)
    //
    // Some of them have examples below, but there are other parameters.
    // Consult Kea User's Guide to find out about them.

    // These are global options. They are going to be sent when a client
    // requests them, unless overwritten with values in more specific scopes.
    // The scope hierarchy is:
    // - global (most generic, can be overwritten by class, subnet or host)
    // - class (can be overwritten by subnet or host)
    // - subnet (can be overwritten by host)
    // - host (most specific, overwrites any other scopes)
    //
    // Not all of those options make sense. Please configure only those that
    // are actually useful in your network.
    //
    // For a complete list of options currently supported by Kea, see
    // Section 7.2.8 "Standard DHCPv4 Options". Kea also supports
    // vendor options (see Section 7.2.10) and allows users to define their
    // own custom options (see Section 7.2.9).
    "option-data": [
        {
            "name": "domain-name-servers",
            "data": "192.0.3.1, 192.0.3.2"
        }
    ],

    // Client classes will associate address pools with certain servers taking
    // part in HA.
//    "client-classes": [
//    ],

    // HA requires two hooks libraries to be loaded: libdhcp_lease_cmds.so and
    // libdhcp_ha.so. The former handles incoming lease updates from the HA peers.
    // The latter implements high availability feature for Kea.
    "hooks-libraries": [
        {
            "library": "/home/marcin/devel/kea-build/lib/hooks/libdhcp_lease_cmds.so",
            "parameters": { }
        },
        {
            // The HA hooks library should be loaded.
            "library": "/home/marcin/devel/kea-build/lib/hooks/libdhcp_ha.so",
            "parameters": {
                // High Availability configuration is specified for the HA hook library.
                // Each server should have the same HA configuration, except for the
                // "this-server-name" parameter.
                "high-availability": [ {
                    // This parameter points to this server instance. The respective
                    // HA peers must have this parameter set to their own names.
                    "this-server-name": "server2",
                    // The HA mode is set to load-balancing but it doesn't really
                    // mean anythning until we implement state machines.
                    "mode": "load-balancing",
                    // Heartbeat is to be sent every 10 seconds if no other control
                    // commands are transmitted.
                    "heartbeat-delay": 10,
                    // The following parameters control how the server detects the
                    // partner's failure. The ACK delay sets the threshold for the
                    // 'secs' field of the received discovers.
                    "max-ack-delay": 5,
                    // This specifies the number of clients which send messages to
                    // the partner but appear to not receive any response.
                    "max-unacked-clients": 20,
                    "peers": [

                         // This is the configuration of our HA peer.
                         {
                             "name": "server1",
                             "url": "http://192.168.56.33:8080/",
                             "role": "primary",
                             "auto-failover": true
                         },
                         // This is the configuration of this server instance.
                         {
                             "name": "server2",
                             "url": "http://192.168.56.66:8080/",
                             "role": "secondary",
                             "auto-failover": true
                         }
                     ]
                 } ]
            }
        }
    ],

    // This example contains a single subnet declaration.
    "subnet4": [
        {
            // Subnet prefix.
            "subnet": "192.0.3.0/24",

            // Specify two address pools. In the future we'll be able to associate
            // those pools with servers (server classes).
            "pools": [ { "pool": "192.0.3.100 - 192.0.3.150" },
                       { "pool": "192.0.3.200 - 192.0.3.250" } ],

            // These are options that are subnet specific. In most cases,
            // you need to define at least routers option, as without this
            // option your clients will not be able to reach their default
            // gateway and will not have Internet connectivity.
            "option-data": [
                {
                    // For each IPv4 subnet you most likely need to specify at
                    // least one router.
                    "name": "routers",
                    "data": "192.0.3.1"
                }
            ],

            // This subnet will be selected for queries coming from the following
            // IP address.
            "relay": { "ip-address": "192.168.56.1" }
        }
    ]
},

// Logging configuration starts here.
"Logging":
{
  "loggers": [
    {
        // This section affects kea-dhcp4, which is the base logger for DHCPv4
        // component. It tells DHCPv4 server to write all log messages (on
        // severity INFO or more) to a file.
        "name": "kea-dhcp4",
        "output_options": [
            {
                // Specifies the output file. There are several special values
                // supported:
                // - stdout (prints on standard output)
                // - stderr (prints on standard error)
                // - syslog (logs to syslog)
                // - syslog:name (logs to syslog using specified name)
                // Any other value is considered a name of a time
                "output": "stdout"

                // This governs whether the log output is flushed to disk after
                // every write.
                // "flush": false,

                // This specifies the maximum size of the file before it is
                // rotated.
                // "maxsize": 1048576,

                // This specifies the maximum number of rotated files to keep.
                // "maxver": 8
            }
        ],
        // This specifies the severity of log messages to keep. Supported values
        // are: FATAL, ERROR, WARN, INFO, DEBUG
        "severity": "DEBUG",

        // If DEBUG level is specified, this value is used. 0 is least verbose,
        // 99 is most verbose. Be cautious, Kea can generate lots and lots
        // of logs if told to do so.
        "debuglevel": 99
    },
    {
        // This section specifies configuration of the HA hooks library specific
        // logger.
        "name": "kea-dhcp4.ha_hooks",
        "output_options": [
            {
                "output": "stdout"
            }
        ],
        "severity": "DEBUG",
        "debuglevel": 99
    }
  ]
}
}

Control Agent on VM2

// This is a Control Agent configuration for HA testing. It includes
// settings for one of the HA peers. They specify an address and port
// on which the HTTP service is available for the HA peers.
{

// This is a basic configuration for the Kea Control Agent.
// RESTful interface to be available at http://192.68.56.66:8080/
"Control-agent": {
    "http-host": "192.168.56.66",
    "http-port": 8080,

    // Specify location of the files to which the Control Agent
    // should connect to forward commands to the DHCPv4 and DHCPv6
    // server via unix domain socket.
    "control-sockets": {
        "dhcp4": {
            "socket-type": "unix",
            "socket-name": "/tmp/kea-dhcp4-ctrl.sock"
        },
        "dhcp6": {
            "socket-type": "unix",
            "socket-name": "/tmp/kea-dhcp6-ctrl.sock"
        }
    }
},

// Logging configuration starts here. Kea uses different loggers to log various
// activities. For details (e.g. names of loggers), see Chapter 18.
"Logging":
{
  "loggers": [
    {
        // This specifies the logging for Control Agent daemon.
        "name": "kea-ctrl-agent",
        "output_options": [
            {
                // Specifies the output file. There are several special values
                // supported:
                // - stdout (prints on standard output)
                // - stderr (prints on standard error)
                // - syslog (logs to syslog)
                // - syslog:name (logs to syslog using specified name)
                // Any other value is considered a name of a time
                "output": "stdout"
            }
        ],
        // This specifies the severity of log messages to keep. Supported values
        // are: FATAL, ERROR, WARN, INFO, DEBUG
        "severity": "INFO",

        // If DEBUG level is specified, this value is used. 0 is least verbose,
        // 99 is most verbose. Be cautious, Kea can generate lots and lots
        // of logs if told to do so.
        "debuglevel": 0
    }
  ]
}
}

Tasks Required for HA phase 2

Feature 1

Outcome: It is possible to tell Kea to pause in waiting state, then use a command to tell it to resume operation.

  1. #5673: Update HA requirements and design
  2. #5674: Implement holding in the waiting state
  3. #5675: Implement a command to resume operation

Predicted completion: end of July 2018.

Feature 2

Outcome: Hostnames are sanitized.

  1. ...
  2. ...
  3. ...

Feature 3

Outcome: Loaded leases are sanity checked against subnet configuration. In particular, subnet-id is checked, whether the address belongs to a given subnet. This mechanism applies to both leases loaded from disk (memfile) as well as added using REST API.

  1. ...
  2. ...
  3. ...
Last modified 5 weeks ago Last modified on Aug 17, 2018, 11:39:21 PM

Attachments (14)

Download all attachments as: .zip