wiki:SimpleParser

This document is a draft and is a subject for substantial changes

SimpleParser: A design for re-configuration of Kea Configuration Parsers, attempt #2

For earlier attempt, see ConfigParseDesign.

the pains of the current code

There's unanimous agreement among the team (and some external powers users as well) that the current parser code is difficult to work with and a source of general unhappiness. To be more specific, the following issues and concerns were identified:

  1. Complexity.. Existing parsers are overwhelmingly complex. Even though each parser derived from DhcpConfigParser? is relatively small class, the complexity comes from too large number of interdependent parsers.
  2. Dispersed. The code is disorganized, i.e. spread out between multiple directories (src/bin/dhcp6 and src/lib/dhcpsrv/parsers).
  3. Build/commit fiasco. The split into build/commit never worked well. The underlying concept of having intermediate configuration storage (created during build) that gets validated and then, if validation was successful, is commited, never worked in DHCP. It was abused beyond reason. Also, due to dependency of certain parameters (e.g. option definitions had to be already parsed when option data was parsed) many parsers moved the whole logic to build(), and commit() was not used at all.
  4. No rollback capability. It is not trivial to revert configuration change. This split originates from BIND10 days and was inherited from DNS auth that did receive only changes in the configuration, rather than the full configuration. As a result, the split was abused and many of the parsers have commit() being a no-op operation. Several years ago, Marcin implemented great feature of staging configuration that was added to the parsers as afterthought.
  5. List of parameters. There is no way to generate a list of all directives. We do have .spec files, but they're not actually used by the code. The code has the directives spread out in multiple places in multiple files in mutliple directories. Answering a simple question ("can I do X in the scope Y?") requires a session of reverse engineering. What's worse, we have the .spec files that are kinda kept up to date. This is actually more damaging that way, because there's no strict correlation between the code and a .spec file. So there may be parameters that are supported, but are not in .spec files. The opposite is also true - .spec files can be buggy and have different parameters. This is particularly true for default values.
  6. Comments and file inclusions. It's exceedingly complex to add comments that don't start at the first column or span several lines. Both Tomek and Marcin tried to implement it, but failed miserably. The same is true for including files (have include statement in the config that includes other files.
  7. Default values. The current parsers don't handle the default values, i.e. if there's no directive, the parser is not created at all. We have kludgy workarounds for that, but the code for it is in different place than the parser, which leads to the logic being spread out in different places.
  8. Unclear inheritance. It is exceedingly difficult to understand which parameters are inherited from which scopes. Kea development team can probably answer the question, but it is impossible to answer such question without deep understanding of the code.
  9. Unnecessary intermediate storage.. Due to various reasons, we have a total of four stages of configuration processing: first converts string into JSON structures (a tree of Element instances), the second stage creates parsers, the third stage call build/commit on those parses to put the data into intermediate storage (a contraption of various StringStorage?, Int32Storage and the like) and then finally, the fourth step is to use that intermediate storage to create actual configuration structures used by the DHCP code.
  10. Unnecessary O(n) memory complexity. If we have 1000 host definitions in the config file, we create a separate parser instance for each host reservation. This is wasteful memory utilization and can become a problem for huge configuration.
  11. Syntax checking. Syntax checking is poor. The existing JSON parser allowed things like empty option-data entries like this:
    "option-data": [ {} ]
    
    or having trailing commas:
    "option-data": [
        {
            "code": 12,
            "data": "2001:db8:1:0:ff00::1"
        },
    ]
    
    or having incorrect types, e.g.
    "renew-timer": "forevah"
    
  12. No includes It was next to impossible to extend the current code to provide an ability to include files. Several people (including those posting to kea-users) complained that there is no way to include files from config files.

Proposed solution

To solve those issues a two phase approach was proposed:

PHASE 1: replace isc::data::fromJSON with bison-based parser. This will address the following issues:

  1. allow to have a single file that defines the actual syntax, much better syntax checking
  2. provide more flexibility, in particular allow various comment types and file inclusions
  3. the code is more compact
  4. the whole grammar is defined in one file (in two if you need token definitions). This functionally replaces the .spec files. The major advantage over spec files is that bison grammar is actually enforced during parsing (as .spec files were not used during parsing, its content may have been incorrect or outdated and nobody would detect that). The grammar can be used as a source of list of parameters.
  5. The lexer code provides support for several nice features: bash comments (# - we had that before, but the old code required comment to start at the first column, now it can start anywhere), C comments (because some people like that comment style) and C++ comments (because commenting out multiple lines is easy). What's more important, the new code allows file inclusions. Anywhere in the JSON file or string you can add <? include "second-file.json"?> and the content of that file will be included. There is a limit of 10 nested files, so we can recover from stupid mistakes like recursive file inclusions.

As a result, the parser returns JSON structures that are guaranteed to be syntatically correct. In case of mistakes, Bison returns very useful, context specific errors, e.g. "unexpected :, expected } or , in line X, column Y".

PHASE 2: replace existing DhcpConfigParser classes with simpler replacement that does the following:

  1. must be significantly less complex than existing code
  2. does not use build/commit approach
  3. must have rollback capability (i.e. use staging config feature implemented in CfgMgr)
  4. must provide a clear list of default values
  5. must provide a clear list of inherited values

The following text explains how those goals were achieved:

PHASE 1: Syntactic bison-based parser

The code is implemented in two files: src/bin/dhcp6/dhcp6_lexer.ll, which defines regular expressions that are used on the input (be it a file, a string in memory or perhaps one day data coming from a socket, and src/bin/dhcp6/dhcp6_parser.yy, which defines the syntax (or 'grammar' in bison nomenclature). This is a structured file that is easy to read. Here are some examples of used:

// Defines global Dhcp6 object. It starts with "Dhcp6", followed by : and {, followed
// by a list of global parameters and closed with }.
dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;

// Global parameters could be one or more global parameters.
// This list is defined recursively.
global_params: global_param
| global_params COMMA global_param;

// These are the parameters that are allowed in the top-level for
// Dhcp6.
global_param
: preferred_lifetime
| valid_lifetime
| renew_timer
| rebind_timer
| subnet6_list
| interfaces_config
| lease_database
| hosts_database
| mac_sources
| relay_supplied_options
| host_reservation_identifiers
| client_classes
| option_data_list
| hooks_libraries
| expired_leases_processing
| server_id
| dhcp4o6_port
;

Working code that demonstrates this capability for Kea-dhcp6 is available on trac5014 branch.

PHASE 2: SimpleParser

To radically simplify the parsers, a SimpleParser class has been defined. It is intended to replace DhcpConfigParser?. An important factor here is that the existing code can be migrated one parser at a time. It has a number of characteristics:

  1. Simplicity. Each parser provides a simple interface, e.g.
    CfgOptionPtr parse(ConstElementPtr json)
    
    This has the major benefit of not needing any intermediate storages. The JSON structures (returned by bison parser or coming from any other source) are directly converted to the structures used by configuration manager and the DHCP code. Since there is no intermediate storage, the code becomes much simpler and {String,Int,Boolean}Storage is no longer used.
  2. The code is almost stateless. The only state kept in most cases is whether the parsing is done in v4 or v6 context. This greatly simplifies the parsers (no contexts, no child parsers list, no separate storage for uint32, strings etc. In fact, there's so little state kept, that this parser is mostly a collection of static methods.
  3. No optional parameters (all are mandatory). This simplifies the parser, but introduces a new step before parsing where we insert the default values into client configuration before parsing. This is actually a good thing, because we now have a clear picture of the default parameters as they're defined in a single place (the DhcpConfigParser had the defaults spread out in multiple files in multiple directories).
  4. Clearly defined default values. There is a single place where all default values are specified. This is easily readable by people who are not familiar with the code.
  5. Clearly defined inherited values. There is a single place where all the inherited values are specified. This is easily readable by people who are not familiar with the code.
  6. Control over the order in which the structures are parsed. The order of parsing is determined by the order the parsers are called. It's easy to call specific parser earlier than others.
  7. Better memory complexity. Since the parsers are almost stateless, we can reuse the same parser instance for many elements, e.g. only one parser for 1000s of host reservations. A side effect is that we don't need pointers to parsers any more. That's yet another simplification.

Here is an example of a parser that parses option definition:

/// @brief Parser for a single option definition.
///
/// This parser creates an instance of a single option definition.
class OptionDefParser : public SimpleParser {
public:
    /// @brief Constructor.
    ///
    /// @param address_family Address family: @c AF_INET or AF_INET6
    OptionDefParser(const uint16_t address_family);

    /// @brief Parses an entry that describes single option definition.
    ///
    /// @param option_def a configuration entry to be parsed.
    /// @return tuple (option definition, option space) of the parsed structure
    ///
    /// @throw DhcpConfigError if parsing was unsuccessful.
    OptionDefinitionTuple parse(isc::data::ConstElementPtr option_def);

private:
    /// @brief Address family: @c AF_INET or @c AF_INET6
    uint16_t address_family_;
};

Parsing the structures is easy. The SimpleParser provides several convenience methods for obtaining parameters: getInteger, getString, getBoolean. This is a slightly simplified code code that parses option defintion:

std::pair<isc::dhcp::OptionDefinitionPtr, std::string>
OptionDefParser::parse(ConstElementPtr option_def) {

    // Get mandatory parameters.
    std::string name = getString(option_def, "name");
    uint32_t code = getInteger(option_def, "code");
    std::string type = getString(option_def, "type");

    // Get optional parameters. Whoever called this parser, should have
    // called SimpleParser::setDefaults first.
    std::string space = getString(option_def, "space");
    std::string encapsulates = getString(option_def, "encapsulate");

    def.reset(new OptionDefinition(name, code, type, encapsulates));
    return make_pair(def, space);
}

An important point here is that the code is only slightly tweaked code that used to be in build() or commit() methods. This way we retain all the little tweaks that we accumulated over the years (like not requiring option data if the option is empty, requiring only one of option name or option code for defined options etc).

Another simplification comes from the fact that the parsers don't deal with the default values as all parameters are mandatory. Of course expecting users to always specify everything would be a bad idea, therefore there's a code that fills in the defaults with a very simple call:

    /// @brief Sets option defaults for a single option
    ///
    /// This method sets default values for a single option.
    ///
    /// @param option an option data to be filled in with defaults.
    /// @param v6 is it v6 (true) or v4 (false) option?
    /// @return number of default values added
    static size_t setOptionDefaults(isc::data::ElementPtr option, bool v6);

The code internally uses an array defined in the following way:

/// This table defines the default values for option definitions in DHCPv6
const SimpleDefaults OPTION6_DEF_DEFAULTS = {
    { "record-types", Element::string,  "" },
    { "space",        Element::string,  "dhcp6" },
    { "array",        Element::boolean, "false" },
    { "encapsulate",  Element::string,  "" }
};

Such an array is easily readable, even by people who have no clue about C++. In the future a solution could be engineered that would generate parts of the User's Guide automatically based on that array. Automatically generated documentation ensures no discrepancies between the code and docs.

Finally, there's the capability to derive parameters between scopes. This is a powerful mechanism. So far the envisaged usage is for deriving timer values from global to the subnet scope, but there are many other applications possible (options for subnets and pools, next-server and bootfile from global to subnet to possibly host reservation etc.). This mechanism is implemented using the following interface:

    /// @brief Derives (inherits) parameters from parent scope to a child
    ///
    /// This method derives parameters from the parent scope to the child,
    /// if there are no values specified in the child scope. For example,
    /// this method can be used to derive timers from global scope (e.g. for
    /// the whole DHCPv6 server) to a subnet scope. This method checks
    /// if the child scope doesn't have more specific values defined. If
    /// it doesn't, then the value from parent scope is copied over.
    ///
    /// @param parent scope to copy from (e.g. global)
    /// @param child scope to copy from (e.g. subnet)
    /// @param params names of the parameters to copy
    /// @return number of parameters copied
    static size_t deriveParams(isc::data::ConstElementPtr parent,
                               isc::data::ElementPtr child,
                               const ParamsList& params);

The code uses an array defined in the following way:

/// This array defines a list of parameters that can be inherited
/// from the global to the subnet scope.
const ParamsList GLOBAL_TO_SUBNET = {
    "renew-timer",
    "rebind-timer",
    "preferred-lifetime",
    "valid-lifetime"
};

Again, this is a piece of code that can be easily read by average users not skilled in C++ programming.

Proof of concept code

There is a working proof of concept code. Code for phase 1 is available on trac5014 branch. Code for phase 2 (that also includes code for phase 1) is on trac5014_phase2 branch. Note that some parts of the code require C++11. Make sure you compile it with -std=c++11.

Notable pieces of the code:

  1. There's an overview for the bison parser in src/bin/dhcp6/dhcp6.dox. This is recommended first read for people who are not familiar with bison.
  2. The details of simple parser are documented in SimpleParser in src/lib/dhcpsrv/parsers/simple_parser.h
  3. An example usage of the parser was added in 813f8529dad949c7c63be11645d7955e36478a78. Four parsers were converted to SimpleParser: OptionDataParser, OptionDataListParser, OptionDataDefParser, and OptionDataDefListParser. You may review those classes and how they are used. The aforementioned commit implements the changes in src/bin/dhcp{4,6}/json_config_parser.cc
  4. As this is proof of concept and mostly refactoring, the number of new unit-tests is limited. One notable new unit-test is ParserTest.file (see src/bin/dhcp6/tests/parser_unittest.cc) that loads all example configs from doc/examples/kea6.

This code as written with some extra work (unit-tests and applying similar changes to Dhcp4) will address #5014, #5015, #5036, #3450, #3875, #3960 (phase 1) and #2646, #3998, #3766 and #5039 (phase 2).

Future improvements

  1. Since the syntax is enforced by the parser, it duplicates the functionality of .spec files. We may consider whether we want to keep spec files or ditch them.
  2. Francis made several useful comments in #5014. Those should be looked at and implemented in one of the upcoming tickets.
  3. Since existing parsers became very simple (for example OptionDefParser? and OptionDefListParser? each consist of a single method), we can consider merging and decrease the number of independent classes. This would further simplify the code.
  4. your suggestion goes here. ;-)

Implementation progress

  • #5014 - implement a prototype
  • #5015 - design parser refactoring

Dhcp6:

  • #5036 - Define Dhcp6 grammar (bison)
  • #5037 - Migrate Dhcp6 and Subnet6 to bison parser
  • #5038 - Dhcp6: Migrate interfaces configuration
  • #5039 - Dhcp6: Migrate option values and definitions configuration
  • #5040 - Dhcp6: migrate host reservations configuration
  • #5041 - Dhcp6: Migrate hook libraries configuration
  • #5042 - Dhcp6: Migrate MAC sources, control socket and relay information parsers
  • #5043 - Dhcp6: Migrate D2 (DHCP-DDNS) configuration.
  • #5044 - Dhcp6: Migrate DUID configuration
  • #5045 - Dhcp6: Migrate lease expiration configuration

Dhcp4:

  • #5017 - Define Dhcp4 grammar (bison)
  • #5019 - Migrate Dhcp4 and Subnet4 to the bison parser
  • #5020 - Dhcp4: migrate interfaces configuration to new parser
  • #5021 - Dhcp4: Migrate option values and definitions configuration to new parser
  • #5030 - Dhcp4: Migrate host reservations configuration
  • #5031 - Dhcp4: Migrate hook libraries configuration
  • #5032 - Dhcp4: Migrate MAC sources, control socket, relay
  • #5033 - Dhcp4: Migrate D2 configuration
  • #5034 - Dhcp4: Migrate DUID configuration
  • #5035 - Dhcp4: Migrate lease expiration configuration

Also see dedicated milestone for this: http://kea.isc.org/milestone/Kea%201.2%20-%20Mozilla%20Milestone%201

Last modified 11 months ago Last modified on Dec 16, 2016, 7:13:16 PM