wiki:SubnetCommandsDesign

Subnets Manipulation Design

This page is work in progress and is subject to significant updates as a result of reviews.

Introduction

This document presents a design for subnet configuration manipulation in Kea via hook libraries. The Subnet Commands Hook Library is the accompanying document which specifies commands that we're planning to implement in Kea 1.3.0 release.

The next section lists the issues which this document is trying to address.

Assumptions

The following are the design assumptions:

  1. CfgMgr does not facilitate partial configuration changes, e.g. add/delete/update subnet and leave the rest of the current server's configuration untouched.
  2. Any attempt to change configuration should be atomic and revertible, e.g. when multiple parameters of a subnet configuration are updated the server should use the new configuration after the entire subnet update has taken place, not when the first couple of parameters have been updated while other not. When the new subnet configuration is in collision with other existing configuration it should be possible to revert to the last good configuration.
  3. New configuration should be tested before it is used.
  4. Configuration should be versioned, i.e. even the minimal change to the subnet configuration should trigger configuration version update.
  5. Commands that trigger subnet manipulation should be parsed using existing configuration parsers as much as possible and practical. Many configuration parsers (including those for subnet configuration parsing) are built on the assumption that they take part in the full server configuration and use CfgMgr's staging configuration, e.g to sanity check that the particular option definition exists. In case of the incremental configuration changes, the configuration has been already committed and thus it is not a staging configuration.

The purpose of this design is to resolve or mitigate these problems, but it already seems obvious that, without redesigning the configuration storage, addressing those problems would require some compromises. In fact, we already discuss the possibility to unify the configuration storage for DHCP servers, D2 and CA, so that may be a good field to also consider how partial configuration changes should be handled.

Access to current server configuration

The use case for updating subnet configuration over the command channel comes from deployments where the DHCP server manages address space for large number of subnets and the new subnets are added and updated quite frequently. Current workaround is to modify the server configuration (adding/modifying subnet) and reconfigure it. This is an overkill for the deployment with large number of subnets and may cause disruptions in DHCP service operation as well as other operational issues. In those cases, the expectation is that when the particular subnet is added/modified any other configuration remains untouched and the configuration changes only apply to the subnet being configured.

The existing implementation of the CfgMgr differentiates between "staging" and "current" configuration. The "staging" configuration is used when the server is reconfigured and parsers add their configuration bits into this configuration. Once the configuration parsing is ready and the new configuration seems sane, the staging configuration replaces current configuration and from now on the server will use the new configuration. The server has access to the current configuration but can't modify it, i.e. const pointer to the configuration is returned. This was meant to prevent accidental configuration changes while the server is running.

Having access to only a const configuration poses a problem for the control commands which should merely update the specific bits of the existing configuration. Two possible ways to deal with that are:

  1. Make CfgMgr return non-const pointer to the current configuration
  2. Create staging configuration from the existing current configuration and apply changes to the staging configuration, then commit.

The former is simpler but may be a source of many other problems. The latter seems sane from the configuration integrity standpoint but also has significant disadvantages. Creating a staging configuration from existing configuration involves copying of the entire configuration structure which impacts server performance, especially when many subnets have to be copied. In addition, the CfgMgr does not fully support deep copying server configuration. Adding this capability in Kea 1.3 release is probably not feasible due to time constraints.

One possible workaround for the lack of deep copying the current configuration is the use of:

SrvConfig::toElement(...)

which dumps configuration in the JSON format. The reverse can be achieved by calling configuration parsing code. However, converting configuration back and forth is going to be costly operation and we'd probably want to avoid doing it in the main process thread.

Having said that, the better approach (in terms of efficiency) would be to return non-const pointer to the current configuration and avoid copying the entire data structure.

Applying updates to the current configuration

Assuming we allow for write access to the current configuration we start facing some new problems. The previously protected current configuration is not protected anymore and any part of the Kea server code or even user library can modify this configuration intentionally or unintentionally. There were proposals to restrict access to the mutable configuration by using C++ friendship mechanism. The friend class of the CfgMgr could access the mutable configuration by calling the private function, which rest of the code wouldn't have access to. This seems like a good idea, but it would require designating a friend class which could be used whenever the direct access to the current config is necessary. Two possibilities have been considered:

  1. Designate CommandMgr as a friend class, so as only command handlers can directly access the configuration.
  2. Create new class (and designate it a friend) which would generally be used to "safely" update configuration and could also serve as an abstraction layer to database, when we decide to use a database as an alternative configuration storage.

The first option would only work in case when the hook library, used for configuration manipulation, would be implemented in terms of CommandMgr. The most recent discussions between the team members suggest that we may actually get rid of CommandMgr within hook libraries and only rely on the hooks framework to register and run command handlers.

The second option might be better but it now just adds another layer in the class hierarchy which would pass invocations to CfgMgr. In fact, it doesn't preclude anyone from just using this class to freely hack current server configuration when one finds that the CfgMgr doesn't allow for that, but this new class does. Secondly, storing configuration in the database is a large feature that needs its own design and we can't make any assumptions about it right now. In fact, it may not need any additional friend class but simply extend existing CfgMgr with methods to access the database. Lastly, we can't fully protect libraries' implementers against shooting at their own foot.

Having said that, this design proposes that the CfgMgr always returns non-const pointer to the current configuration so as "write" operations may be performed (in C++ terms: non-const functions can be called on the object representing this configuration) to modify the configuration with a minimal overhead.

Having access to the "living" server configuration requires us to be careful when we modify it. Without any rollback mechanism for partial configuration, once we apply the configuration it can't be reverted or rolled back. In the long run we want to be able to roll back configuration changes but this seems to be a whole new feature. Right now it is possible to use the:

SrvConfig::toElement()

to dump the current configuration. Such configuration could be stored in memory and/or in a file, thus giving an opportunity to rollback. However, such post-configuration action would again be time consuming and not appropriate for our use case. It is possible to batch run such action in a thread while allowing the server to continue running with a new configuration. However, the use of threads might be a bit controversial as the team still haven't decided how to implement concurrency in Kea. Therefore, we now leave the problem of a configuration rollback open and out of scope for this design.

When working on the current server's configuration it is highly important that any configuration changes are properly validated before they are applied (especially that there is presently no way to rollback configuration). The validation of the parsed data is mostly done in configuration parsers. They are launched in a certain order to guarantee that dependencies between configuration parts are satisfied. For example, when adding an option for a subnet the parser would verify if the a definition for this option exists. One minor issue with the parsers is that they perform their checks against values in staging configuration as they were "designed" to handle full reconfiguration. In order to implement subnet manipulation hook library we need to update the parsers to use current configuration instead of staging in our cases. This can be a parameter of the subnet/subnet list parser itself.

Kea configuration is pretty complex and has many dependencies between various parameters. It is not practical to verify how to reliably validate each configuration part in cases when only this part has been modified. However, to illustrate the potential problem we provide an example of the option definition specification.

When option definitions are parsed, in case of the full reconfiguration, the parser would only check if they are not in collision with standard definitions or if they are not in collision between themselves. This is ok, because the parsing process haven't yet reached the point of parsing option values. When there is a collision between the option value and the option definition the error is reported in the context of option value. Conversely, when a control command is updating the option definition in the existing configuration and there are already options using this definition the update may be in collision with option values and the error should be reported. Our parsers don't validate the option definitions against option values right now, so such "live" update could cause significant inconsistencies between the two, if we used the parsers as they are today. To address that problem, it seems that eventually all our parsers will have to be updated to perform sanity checks against all configuration parts that may be in collision.

Partial updates to subnet configuration

In Kea 1.3 release the following commands pertaining to (non-shared) subnets were implemented:

  • subnets listing: subnet4-list, subnet6-list
  • getting a single subnet: subnet4-get, subnet6-get
  • adding a subnet: subnet4-add, subnet6-add
  • removing a subnet: subnet4-del, subnet6-del

Details regarding those commands can be found here: https://jenkins.isc.org/job/Kea_doc/guide/kea-guide.html#subnet-cmds

The first four commands are simple to implement because they don't modify existing configuration. The last four commands modify the list of subnets. When the new subnet is to be added the subnet parser conducts sanity check whether this subnet has no conflict with the current configuration. As an example, the parser should check whether a network interface with which the subnet is associated exists and the server is listening on this interface. It should also check whether the options specified for this subnet are correct in terms option definitions used etc.

The parsers will also verifies that there is no conflict between the new subnet and existing subnets, e.g. duplicate subnet id. More importantly, since we are also implementing support for shared subnets we have to be careful to check what are the dependencies between the new subnet and other subnets belonging to the same network. When the subnet is being added it is merely needed to update the data structure which holds the information about grouping of subnets into shared subnets. When deleting the subnet, we have to check which shared subnets this subnet belongs to and update that information accordingly.

Since the operation of deleting a subnet may trigger updates in several places of the configuration data structure it is important to guarantee atomicity of the configuration update from the server standpoint. This is currently guaranteed by the design of the server which is mono threaded and doesn't run DHCP service while the commands are processed. If in the future the server takes benefit of multi threading, appropriate locking mechanisms will have to be implemented in the Configuration Manager to prevent the server from using a configuration being modified at the same time.

Shared networks support

Commands for manipulating shared networks are currently out of scope. The primary reason are time constraints. Capability to modify shared networks will be developed in the near future.

Actions triggered on subnet deletion

There are two possible actions that the server needs to perform when a subnet is deleted: remove host reservations and remove leases associated with the subnet. Whether the server performs those actions or not depends on the values of "reservations-action" and "leases-action" specified in the command. If it does, the server should first attempt to perform these actions on the database prior to removing the subnet from the configuration structures as they have much greater likelihood of failure. In case of failure to perform these actions on the database, the server must not remove the subnet form the configuration and report an error to the controlling client.

The actions are outside of scope for 1.3. This particular aspect will be covered in upcoming Kea releases.

Last modified 2 months ago Last modified on Sep 14, 2017, 11:01:24 AM