wiki:DiagnosticsDesign

Design for the Logging and Diagnostics Enhancements

This is the accompanying document to the Logging and Diagnostics Requirements document and describes the implementation of the requirements presented there.

Scope

This document is updated to cover the features planned for Kea 0.9.2 release.

Transfering lease information from the lease database (kea-lease-dump)

This section presents the overview of the kea-lease-dump tool which will allow for gathering the entire lease information from the SQL lease databases and dumping it into the CSV file. This tool is useful in cases when the user hits the problem with the DHCP server and must be assisted by the ISC staff. The snapshot of the lease database in the CSV format can then delivered along with other files to the ISC engineers for review.

The Implementation Choice

The solution presented here is based on the built-in capabilities of the MySQL and PostgreSQL databases, i.e. it uses specific SQL queries issued from the shell script to obtain the lease information from the specified database. This solution does not use the lease database backends from the libdhcpsrv library. Admittedly, the presented solution has a number of disadvantages, but the Engineering Team felt that it will be sufficient to satisfy the requirements for the 0.9.2 release, while being developed faster than the tool based on libdhcpsrv library. This decision may be revisited in the future with the following disadvantages of the current approach in mind:

  • The SQL queries used by the script require using explicit names of the table columns to perform conversions of binary data into readable format and to prepend the lease information with the CSV header. This means that the script will require updates every time the database schema has changed. In particular, it will require updates to the SQL queries when new column is added to one of the SQL tables.
  • The Google Test framework may not be used for testing the shell based tool.
  • There is no possibility to add the support for reading the database configuration from the Kea configuration file. If the tool could use the Kea libraries it could read the database name, database user name etc. from the configuration file, which would be convenient for the user and would facilitate automated data gathering after server crash.

The Command Line Options

The tool will expose the following command line to the user:

kea-lease-dump -[4|6] -t <database-type> -n <database-name> -u <user-name> -p -o <output-file>

Options:
-4: Dump DHCPv4 leases
-6: Dump DHCPv6 leases
-t <database-type>: One of the supported databases: "mysql" or "pgsql"
-n <database-name>: Database name
-u <user-name>: Database user name
-p: Should password prompt be used
-o <output-file>: Resulting CSV file with lease information

MySQL

In order to dump leases from the MySQL database, the tool will use the SELECT ... INTO OUTFILE query.

The following query will be used to gather DHCPv4 leases:

SELECT 'address', 'hwaddr', 'client_id', 'valid_lifetime', 'expire', 'subnet_id', 'fqdn_fwd', 'fqdn_rev', 'hostname'
  UNION
    SELECT INET_NTOA(address), IF(ISNULL(hwaddr), '', HEX(hwaddr)) as hwaddr, IF(ISNULL(client_id), '', HEX(client_id)), valid_lifetime, expire, subnet_id, fqdn_fwd, fqdn_rev, hostname
      FROM lease4 INTO OUTFILE '/home/marcin/devel/mysqldump/lease4.csv'
      FIELDS TERMINATED BY ','
      LINES TERMINATED BY '\n';

The first SELECT statement produces the CSV header including column names. This is required because SELECT ... INTO OUTFILE construct doesn't include the column names in its output. The script will have to delete the output file prior to making this SQL query, because the query will fail if this file already exists.

The following is the query to be used for gathering the DHCPv6 leases:

SELECT 'address', 'duid', 'valid_lifetime', 'expire', 'subnet_id', 'pref_lifetime', 'lease_type',
       'iaid', 'prefix_len', 'fqdn_fwd', 'fqdn_rev', 'hostname', 'hwaddr', 'hwtype', 'hwaddr_source'
  UNION 
    SELECT address as address, IF(ISNULL(duid), '', HEX(duid)) AS duid, valid_lifetime, expire, subnet_id,
           pref_lifetime, lease_type, iaid, prefix_len, fqdn_fwd, fqdn_rev, hostname,
           IF(ISNULL(hwaddr), '', HEX(hwaddr)) AS hwaddr, hwtype, hwaddr_source
      FROM lease6
      INTO OUTFILE '/home/marcin/devel/mysqldump/lease6.csv'
      FIELDS TERMINATED BY ','
      LINES TERMINATED BY '\n';

PostgreSQL

The following query will be used to retrieve all DHCPv4 leases from the PostgreSQL database:

COPY
  (SELECT ('0.0.0.0'::inet + address) AS address, hwaddr, client_id, valid_lifetime,
          expire, subnet_id, fqdn_fwd::int AS fqdn_fwd, fqdn_rev::int AS fqdn_rev,
          hostname
     FROM lease4)
     TO '/home/marcin/pgsqldump/lease4.txt'
     WITH CSV HEADER DELIMITER ',';

This query can be issued even when the output file already exists in the specified location. In this case it will be replaced with a new one.

The corresponding query for retrieving all DHCPv6 leases into the CSV file will look as follows:

COPY
  (SELECT address, duid, valid_lifetime, expire, subnet_id, pref_lifetime, lease_type,
          iaid, prefix_len, fqdn_fwd::int as fqdn_fwd, fqdn_rev::int as fqdn_rev, hostname 
   FROM lease6)
   TO '/home/marcin/pgsqldump/lease6.txt'
   WITH CSV HEADER DELIMITER ',';

Tickets

The following tasks/tickets cover the functionality described in this section:

  • Create kea-leases-dump tool to transfer leases from SQL database to CSV file (#3802)
  • Document kea-lease-dump tool (#3803)

Logging in Hooks Framework and Hooks Libraries

The DHCP servers invoke functions in installed hook libraries at different stages of processing. The hook libraries serve various purposes and differ by complexity. The execution of the proprietary code in hooks will naturally degradate server performance. In addition, it may also cause functional failures. Therefore it is critical from the diagnostics standpoint to assess if observed performance and/or functional issues origin in the hook library or the core server code. This allows for assessing the severity of the bug and designate appropriate engineers to work on the problem. Note, that bugs in the server operation are considered to have much higher severity because they affect many users. Bugs in the hook libraries are localized, i.e. affect less users, and thus are less severe.

Improved logging in the CalloutManager

In order to facilitate the assessment of the hooks libraries' performance, the CalloutManager class (src/lib/hooks/callout_manager.cc) will be extended with the measures of the execution time of each callout invoked for the particular hook. The implementation of the new utility class !Stopwatch is required, which will support the following operations:

  • measure the total time spent executing all callouts for the hook point,
  • measure the time spent executing a single callout

The class will expose start(), stop(), reset() functions. It will also expose functions returning the mesaured times in, at least, milliseconds and microseconds. Support for other units, like seconds or minutes may be added at later time.

The CalloutManager::callCallouts method will be extended as follows.

        // This object will be used to measure execution time of each callout
        // and the total time spent in callouts for this hook point.
        util::Stopwatch stopwatch;

        // Mark that the callouts begin for the hook.
        LOG_DEBUG(callouts_logger, HOOKS_DBG_CALLS, HOOKS_CALLOUTS_BEGIN)
            .arg(server_hooks_.getName(current_hook_));

        // Call all the callouts.
        for (CalloutVector::const_iterator i = callouts.begin();
             i != callouts.end(); ++i) {
            // In case the callout tries to register or deregister a callout,
            // set the current library index to the index associated with the
            // library that registered the callout being called.
            current_library_ = i->first;

            // Call the callout
            try {
                stopwatch.start();
                int status = (*i->second)(callout_handle);
                stopwatch.stop();
                if (status == 0) {
                    LOG_DEBUG(callouts_logger, HOOKS_DBG_EXTENDED_CALLS,
                              HOOKS_CALLOUT_CALLED).arg(current_library_)
                        .arg(server_hooks_.getName(current_hook_))
                        .arg(PointerConverter(i->second).dlsymPtr())
                        .arg(stopwatch.logFormatLastDuration());
                } else {
                    LOG_ERROR(callouts_logger, HOOKS_CALLOUT_ERROR)
                        .arg(current_library_)
                        .arg(server_hooks_.getName(current_hook_))
                        .arg(PointerConverter(i->second).dlsymPtr())
                        .arg(stopwatch.logFormatLastDuration());
                }
            } catch (const std::exception& e) {
                // If an exception occurred, the stopwatch.stop() hasn't been
                // called, so we have to call it here.
                stopwatch.stop();
                // Any exception, not just ones based on isc::Exception
                LOG_ERROR(callouts_logger, HOOKS_CALLOUT_EXCEPTION)
                    .arg(current_library_)
                    .arg(server_hooks_.getName(current_hook_))
                    .arg(PointerConverter(i->second).dlsymPtr())
                    .arg(e.what())
                    .arg(stopwatch.logFormatLastDuration());
            }

        }

        // Mark end of callout execution. Include the total execution
        // time for callouts.
        LOG_DEBUG(callouts_logger, HOOKS_DBG_CALLS, HOOKS_CALLOUTS_COMPLETE)
            .arg(server_hooks_.getName(current_hook_))
            .arg(stopwatch.logFormatTotalDuration());

Note that each existing message, invoked after the callout, includes an additional parameter holding the duration of the callout in milliseconds. There is also new message HOOKS_CALLOUT_BEGIN which marks beginning of a callout execution.

The same method will also be extended to measure the cumulative time for all callouts executed for the given hook point. The appropriate messages will be added to mark the beginning and the end of callouts execution and they will be logged at the HOOKS_DBG_CALLS level (debuglevel 45).

Fixing Broken Logging in Hooks Libraries

Diagnosing issues with the DHCP server running with hook libraries attached to it requires that the libraries provide good quality log information. The bug #3198 (present in 0.9.1 release) precludes the use of Kea logging mechanism from within the hook libraries. The details of the bug can be obtained from the bug description. The solution for this particular issue is still to be determined. Here we only emphasize that this bug MUST be resolved prior to 0.9.2 release to allow the implementers use the Kea logging mechanism.

Tickets

The following tasks/tickets cover the functionality described in this section:

  • Improve logging in hooks framework (#3804)
  • Fix use of logger in hook libraries (#3198)

Common configuration for all loggers (wildcard logger names)

In the BIND10 days it was possible to create a logger configuration entry with a wildcard name (asterisk) to provide common configuration for all loggers. With the removal of the BIND10 infrastructure this capability has been lost. Kea allows for using the common configuration file for all components (DHCP servers and D2). This configuration file includes configuration entries for loggers used by respective servers. However, there is no way to provide a common base configuration for all servers, i.e. each server requires dedicated configuration entry. In some cases it may be convenient for the administrator to have ability to specify such a common configuration, which would also improve the readability of the configuration file.

The configuration

The following is the example logging configuration with the use of wildcard name:

"Logging":
{
  "loggers": [
    {
      "name": "*",
      "severity": "INFO"
    }
  ]
}

This configuration will be shared by all loggers, possibly belonging to different processes.

Taking advantage from the hierarchical nature of loggers, it is also possible to define some unique properties for specific loggers while the rest of the configuration is inherited from the wildcard logger.

"Logging":
{
  "loggers": [
    {
      "name": "*",
      "severity": "INFO"
    },

    {
      "name": "kea-dhcp4",
      "output_options": [
          {
            "output": "/home/marcin/devel/kea-build/var/log/kea-dhcp4.log"
          }
      ],
    },
    {
      "name": "kea-dhcp6",
      "output_options": [
          {
            "output": "/home/marcin/devel/kea-build/var/log/kea-dhcp6.log"
          }
      ],
    },

The configuration parsing code will have to be modified to accept the wildcard logger name, and apply the configuration defined for it to the root logger. In particular, the SrvConfig::applyLoggingConfig will have to be modified to recognize that the logger specification being processed is the wildcard logger specification and modify the root logger configuration. Right now the only way to modify the root logger configuration is by calling the LoggerManager::initLogger. The LoggerManager::initLogger doesn't allow for specifying the output options for the root logger. The output options will have to be set by creating the LoggerSpecification object for the root logger and calling LoggerManager::process. Perhaps this can be done together with calling LoggerManager::process for all other loggers in the SrvConfig::applyLoggingCfg method.

Tickets

  • Allow for wildcard logger name to configure root logger (#3809)

Improvements to Logging Messages for DHCP servers and D2

This section describes the updates to the existing log messages to provide the better debugging capability and to improve the log output.

Improvements to Existing Logging Messages

The purpose of this section is to describe what level of information the logged messages should contain to allow for effective troubleshooting by examining the contents of the log file. The Kea logging system is designed to uniquely identify the origin of each logged message in the code by using the unique message identifiers. This is useful but it doesn't convey any information how the server ended-up in the particular state, when the particular (possibly error) message had to be logged. Therefore, the placement of log messages and the logged information must allow for tracing the server's logic which led to the particular state.

Logging Significant Decision Points

The Kea code, including both the applications and the libraries using the logging facilities, must be extended with new messages to mark significant decisions points. The number of debug messages which need to be logged is already high and will increase with the updates proposed herein. Apart from the performance penalty related to the use of debug logging level with many messages being logged for each packet, it may also bloat the log file and make it effectively unparsable/unreadable. To mitigate this problem we are planning to introduce multiple additional loggers to log messages pertaining to different functionalities of the DHCP servers. All loggers will inherit configuration from the root logger, which will allow them to share one configuration. At the same time it will be possible to provide separate configuration for each logger when needed. For example, if in the course of debugging it is suspected that the DHCP options parsing is causing the problem, it will be possible to enable debug level for the logger responsible for logging messages related to options parsing and silence all other messages by setting the appropriate severity for other loggers (or root logger).

The following are the new loggers to be introduced for the DHCP servers:

  • dhcpX: general logger for the DHCPv4 or DHCPv6 module
  • packet: to issue log messages related to the reception and sending DHCP packets
  • bad-packet: to issue log messages when the packet received is dropped
  • options: to issue log messages related to processing of the DHCP options
  • srv-hooks: to issue log messages related to hooks
  • hostname: to issue log messages related to processing the hostnames, FQDNs and sending name change requests to D2
  • lease: to issue log messages related to lease allocation
  • alloc-engine: to issue log messages in the allocation engine

The significant decision points for the DHCP servers include, but are not limited to:

  • Start processing message, include message type, source of the message, relay information (packet)
  • Callouts present for the hook point, start callouts execution procedure (srv-hooks)
  • Start parsing options (options)
  • Received message will be dropped, why? (bad-packet)
  • Preparing response to send (packet)
  • Sending response (packet)
  • Processing Client FQDN if included, log the FQDN option sent by the client (hostname)
  • Set Client FQDN option to be sent to the client, log the FQDN option (hostname)
  • Processing Hostname option, if included, log the Hostname option sent by the client (hostname)
  • Generate FQDN for the client, why? (hostname)
  • Generate Hostname for the client, why? (hostname)
  • Set Hostname option to be sent to the client, log the Hostname option (hostname)
  • No subnet selected/avaiable, skip processing (lease)
  • Hint for address/prefix allocation found, what hint, how found? (lease)
  • Determined the state of the client, e.g. INIT-REBOOT, RENEWING (lease)
  • Starting to generate Name Change Requests (hostname)
  • Setting the destination address for the response, how determined? (packet)
  • Determining the interface for the response, how? (packet)
  • Determining the local IP address for the response, how? (packet)
  • Classifying the packet, how? (dhcpX)
  • Starting DDNS (hostname)

The names in parens indicate logger to be used to log the corresponding messages.

The relevant updates to messages in the Allocation Engine include, but are not limited to:

  • Client has reservation,
  • Client has a lease, what lease?
  • Allocating new lease for the client, why?
  • Deprecating existing client lease, why?

All logging messages emitted from the server or the allocation engine should include the unique identifier of the client and the transaction id. The possible client idenitifiers are: hardware address or client identifier. The log messages should be consistent with respect to the use of the selected identifier. The transaction id should be extracted from the client's message.

Most of the log messages listed above should be logged at the DEBUG level to avoid the server's performance degradation as a result of message floods.

Tickets

The following tasks/tickets cover the functionality described in this section:

  • Improve log messages for DHCPv4 server (#3806)
  • Improve log messages for DHCPv6 server (#3807)
  • Improve log messages for D2 (#3808)

Documentation Updates

The Kea Developer's Guide documents how to implement and attach hooks libraries to the DHCP servers. Since nearly all future implementors will read this documentation it is important to make sure that it includes recommendations for the implementors regarding use of logging in their libraries. Such recommendations should include the following information:

  • How to initialize the logger in the library
  • How to select the unique name for the logger so as the messages from this library can be distinguished from other messages.
  • Where the log messages should be placed in the code and what level of details they should include.
  • What logging severity should be selected.

This and other documentation updates will be applied with ticket #3805.

Last modified 3 years ago Last modified on Apr 24, 2015, 9:38:47 AM