wiki:ClientClassificationDesign

Client Classification Design

This design attempts to address requirements defined in ClientClassificationRequirements. The phase 1 is now complete and is released in Kea 1.0. The upcoming 1.1 will cover implementing phase 2.

THIS IS WORK IN PROGRESS, please send your comments to kea-dev.

Implementation assumptions

The following are design assumptions. Those complement the list of requirements, available here: ClientClassificationRequirements.

  • I.1. User-friendly syntax to identify data in the incoming package.
  • I.2. There should be no throw away code. The design for phase 1 must be extensible for phase 2.
  • I.3. Client classification should be easy to use. If possible, the architecture that has least impact on performance, but ease of use is more important if mutually exclusive. The reason for this preference is that no matter how good the code will be, the compiled C++ code in hooks app will always be faster. Therefore the answer "if you want faster classification, use hooks" is and will remain a valid answer.
  • I.4. Token implementation MUST be reentrant, i.e. it must be possible for multiple instances (processes, threads) to perform evaluation at the same time without interfering each other. This is a preparatory step for Kea taking advantage of multiple cores one day.

Configuration

There are couple possible ways the class syntax could be defined. After much deliberation (see kea-dev archives from Nov 2015), we decided to go ahead with the following approach:

Client class definitions are global. They optionally can be provided with option values (those options will be used in all subnets, unless overwritten by more specific scope, see http://kea.isc.org/wiki/ClientClassificationRequirements#Optionsassignmentorder).

"Dhcp4": {

    # Client classes defined on a global level.

    "client-classes": [
     {
        "name": "MICROSOFT",
        "test": "vendor-class-identifier == 'MSFT'",

        # Options are not mandatory. There are at least two ways how option-less class can
        # be useful: for subnet selection, for subnet-specific subnet
        "option-data": [
        {
            "name": "some-option"
            "data": 100
        },
        {
            "name": "another-option"
            "data": "XXX"
        }]
     }],


    "subnet4": [
    {  
        "subnet": "192.0.2.0/24",
        "pools": [ { "pool": "192.0.2.1 - 192.0.2.200" } ]

        "option-data": [
        { # this is a regular option, everyone connected to this subnet may get it
            "name": "domain-name-servers",
            "data": "192.0.2.1, 192.0.2.2"
        },
        { # this is for clients in the subnet that must also belong to MICROSOFT class to get it
            "class": "MICROSOFT"
            "name": "third-option"
            "data": 777
        }]
    }],
} 

Parsing and evaluation

The expression defined in configuration (e.g. "option[vendor-class].hex == 'MSFT') must be parsed first. Once its syntax is understood and represented as logical objects (option[124], equal operator, const string 'MSFT') it can be evaluated for a given packet. Those two operations (parsing and evaluation) are independent and follow different restraints. In particular, parsing can be done once during reconfiguration, while evaluation must be done for each packet separately.

Parsing

In a general case, we hope to support complex operators, like substring, logical and, logical or and expression grouping (e.g. a DOCSIS v6 modem can be detected using the following expression: (vendor.enterprise-id == 4491) && (vendor.suboption(1023) == 'docsis3'). This can be further complicated by introducing operators precedence (evaluate * before + or -) and grouping. To properly support all that complexity, we decided to go with Bison/Yacc? grammar.

While the only operators supported in phase 1 are equality and substring, there will be more operators in the future. To properly handle an arbitrary complex expression, we use Reverse Polish Notation.

Parsed Representation

Each parsed token is an object that represents a certain value or operation. Some objects could be constant (e.g. 'MSFT' string), others would require a packet to be evaluated (option[vendor-class]) and third type, operators, would require other expressions to evaluate their values. In all cases, the parsing could be done once.

Here's a sketch of class hierarchy that allows such evaluation:

/// This class represents a single token. Examples of a token are:
/// - "foo" (a constant string)
/// - option[123] (a token that extracts value of option 123)
/// - == (an operator that compares two other tokens)
/// - substring(a,b,c) (an operator that takes three arguments: a string,
///   starting point and length)
class Token;
typedef boost::shared_ptr<Token> TokenPtr;

/// This is a structure that holds an expression converted to RPN
///
/// For example expression: option[123] == 'foo' will be converted to:
/// [0] = option[123] (TokenOption object)
/// [1] = 'foo' (TokenString object)
/// [2] = == operator (TokenEqual object)
typedef std::vector<TokenPtr> Expression;

/// @brief Base class for all tokens
///
/// It provides an interface for all tokens and storage for string representation
/// (all tokens evaluate to string).
class Token {
public:

    /// @brief This is a generic method for evaluating a packet.
    ///
    /// We need to pass the packet being evaluated and possibly previous
    /// evaluated values. Specific implementations may ignore the packet altogether
    /// and just put its own value on the stack (constant tokens), look at the
    /// packet and put some data extracted from it on the stack (option tokens),
    /// or pop arguments from the stack and put back the result (operators).
    ///
    /// The parameters passed will be:
    ///
    /// @param pkt - packet being classified
    /// @param value - stack of values with previously evaluated tokens
    virtual void evaluate(const Pkt& pkt, ValueStack& values) = 0;
};

/// @brief Token representing a constant string
///
/// This token holds value of a constant string, e.g. it represents
/// "MSFT" in expression option[vendor-class].text == "MSFT"
class TokenString : public Token {
public:
    /// Value is set during token construction.
    TokenString(std::string str)
        :value_(str){
    }

    /// Evaluation is no-op for constant string.
    void TokenString::evaluate (Pkt& pkt, ValueStack& values) {
       // Literals only push, nothing to pop
       values.push(toString());
    }

protected:
    std::string value_; /// constant value
};

/// @brief Token that takes value of an option
class TokenOption : public Token {
public:
    /// Two constructors, one for option[dns_servers] and another one for option[123]
    TokenOption(std::string option_name);
    TokenOption(uint16_t option_code);

    /// Evaluation only uses packet information.
    void evaluate(const Pkt& pkt, ValueStack& values) {
        OptionPtr opt = pkt->getOption(option_code_);
        if (opt!) {
            values.push("");
        } else {
            values.push(opt->toString());
        }
    }

private:
    uint16_t option_code_;
}

/// @brief Token that represents equality operator (compares two other tokens)
class TokenEqual : public Token {
public:
    TokenEqual();

    /// Evaluation does not use packet information, but rather last two parameters.
    /// It does simple string comparison and sets value to either "true" or "false"
    void evaluate (Pkt& pkt, ValueStack& values) {
        string op1 = values.pop();
        string op2 = values.pop();
        if (op1 == op2)
            values.push("true");
        else
            values.push("false");
    }
}

This list of classes is not complete. One day we will likely need a dedicated class for extracting information from Vendor-Identifying Vendor Class (124) and Vendor-Identifying Vendor-Specific Information (125) options. The number of specific Token classes is expected to grow.

Literal constructors should not raise exceptions: the recommended way to handle incorrect inputs is to check in the scanner (lexer.ll) and in sanity checks in the constructor to push the empty string value on the value stack.

Evaluation

Expressions are evaluated in the order they're put on the stack. The "stack" is a bit of a misnomer here. There are two stacks actually and one of them is more of a list. The first stack contains Tokens. That stack is created while parsing class expression during a reconfiguration. This happens once. Those token objects are stateless, i.e. they do not hold any values (unless they're const expressions). For each incoming packet, this stack is traversed over during evaluation. The stack is not modified, so there could be multiple threads walking over it in parallel. The evaluation values are stored in values stack. If we implement threading support one day, each thread will have to keep its own value stack.

In the aforementioned example value2 is evaluated first, value1 is evaluated second and operator == is evaluated last. Using RPN ensures that any operator will be evaluated after all its operands are evaluated.

The beauty of RPN is that the tokens will be on token stack in the order that ensures that if an operator requires X parameters, at least X parameters will be on the stack and will be evaluated before the operator.

// Evaluated values are stored as a stack of strings
typedef stl::stack<std::string> ValueStack;

/// This method checks whether the whole expression evaluated to 'true' (packet 
/// belongs to a class) or 'false' (does not belong). It walks over
/// Token stack.
bool evaluate(const Expression& expr, const Pkt& pkt) {
    ValueStack values;

    // iterate through all the tokens, evaluating each
    for each token on the expr {
        token.evaluate(pkt, values);
    }

    // Last value pushed is the end result
    // Note if there's more than one that would also be an error
    bool res = values.pop() ? "true" : "false";
    return (res);
}

NOTE: Those code snippets are just examples. We decided to keep the tokens stateless, so they could be evaluated simultaneously by multiple threads. This implies that the code will store intermediate token values somewhere else and will pass the values rather than pointers to tokens. Please treat this as a pseudo code that illustrates the idea, and not a completely correct C++ code.

Extensions in phase 2

NOTE: Exact scoping of phase 2 (Kea 1.1) is going to be determined after phase 1 reaches beta timeframe. This description is very preliminary.

For the phase 1, we will implement the following classes: Token (base class), TokenString? (constant string), TokenOption? (represents an option), TokenEqual? (== operator) and TokenSubstring? (calculates a substring(string, begin, length)).

For phase 2, we will consider implementing TokenField? (extracts fields from a packet, e.g. chaddr, op, secs etc.), TokenAnd?, TokenOr?, TokenNot?, TokenMeta? (extracts meta-data, e.g. interface name or source IP address) and several others.

For phase 1, we will implement generic Option::toString(), that would simply call existing Option:toText(). That would be somewhat awkward, but it would work for all options. We will implement toString() for several major options (most likely vendor-class, vendor-independent vendor-specific information option and perhaps few others). For phase 2, we will implement toString() for all remaining option types.

For phase 2, we will design a way to reference and extract specific fields in an option.

For phase 2, we will implement boolean logic, including parentheses.

Examples

Expression: substring(option[vendor-class].hex,0,3) == 'APC' would be parsed to:

0: option[vendor-class] (TokenOption)
1: 0 (TokenString or TokenInteger once we implement it)
2: 3 (TokenString or TokenInteger once we implement it)
3: substring (TokenSubstringOperator)
4: 'APC' (TokenString)
5: == (TokenEqual operator)

Expression: (option[vendor-info] == 4491) && (option[vendor-class] == 'docsis') would be parsed to:

0. option[vendor-info] (TokenOption)
1. 4491 (TokenString) (TokenString)
2. == (TokenEqual)
3. option[vendor-class] (TokenOption)
4. 'docsis' (TokenString)
5. == (TokenEqual)
6. && (TokenAnd)

Code layout

This code is implemented as a separate libeval library with minimal dependencies. As the expressions will extract various pieces of information (option values, fixed fields, maybe meta-data like source IP address or interface name) from packets, it has to depend on libdhcp++ (Pkt4,Pkt6, Option definitions).

The code seems to be easy to use. The call to be used is:

bool evaluate(const Expression& expr, const Pkt& pkt);

This is the only interface needed to evaluate expressions that give boolean answer. (either belongs to a class or not).

In the near future, we'll likely also implement the following interface:

/// Expected to be called at run-time for each packet.
std::string evaluateString(const Expression& expr, const Pkt& pkt);

Those are/will be called from existing Dhcpv4Srv::classifyPacket() and Dhcpv6Srv::classifyPacket().

Since the class information (including class name, class expression and possible options) is part of the configuration, it is stored in the SrvConfig? class (or a storage in it).

Interaction with Option classes

TokenOption? class will extract information from option. For phase 1, we decided to go with option[123].hex to extract binary representation of the option content of any option.

There is also option[123].text, which returns content of the option in textual representation, using existing toText() methods of each specialized classes. This methods were implemented long time ago to log options when sufficient debugging level is enabled. As such, they tend to include extra spaces, new lines and other formatting that makes their use awkward for classification. As such the code for supporting option[123].text is implemented in phase 1, but we decided to not advertise it in the documentation as its behavior will change in phase 2.

Note on performance

There is a trade-off between flexibility of expressions and performance. The more capabilities we provide the more complex the evaluation will be. The recommendation will be for people expecting complex evaluation and high performance to not use this classifier, but write hooks app instead. There is no way for interpreted code to be remotely comparable in performance to fixed C++ code.

Phase 2 (Kea 1.1)

The following text describes changes necessary for implementing phase 2.

Subnet class options

"Subnet class options" is a short name for options that can be assigned to a client that fulfills two requirements at the same time: 1. belongs to a subnet X and 2. belongs to a class Y. This is implementation of the item 4. in the option assignment order section of the requirements.

Currently options can be defined on a per subnet basis using the following syntax:

"Dhcp4": {
    "subnet4": [
        {
            "option-data": [
                {
                    "name": "domain-name-servers",
                    "code": 6,
                    "space": "dhcp4",
                    "csv-format": true,
                    "data": "192.0.2.3"
                },
                ...
            ],
            ...
        },
        ...
    ],
    ...
}

It will be extended with one optional parameter class:

"Dhcp4": {
    "subnet4": [
        {
            "option-data": [
                {
                    "name": "domain-name-servers",
                    "code": 6,
                    "space": "dhcp4",
                    "class": "docsis",
                    "csv-format": true,
                    "data": "192.0.2.3"
                },
                ...
            ],
            ...
        },
        ...
    ],
    ...
}

Note that the "class" parameter is fully optional. It allows for easy migration between existing model (options specified for all clients in this subnet) to the class subnet model (options specified for client in this subnet that also belong to the class). This also allows easy migration back and forth if needed. This can be useful if a new option (or option value) is introduced to a small amount of test group of clients and later, once proven to work will become available to all clients.

Options are stored in isc::dhcp::CfgOption? class, a series of OptionDescriptor? objects stored in OptionContainer?. OptionDescriptor? will be extended with an extra field that will contain a class name with the default value of empty string, which designates the option is available for everyone. This container is used in Dhcpv{4,6}Srv::appendRequestedOptions(). There will be an extra check conducted. If the class is not empty, the packet has to belong to a class for the option to be provided.

Note: The code in appendRequestedOptions() iterates over the options. To properly enforce that class specific options are assigned before the generic ones, class specific values has to be defined before the generic ones. The suggested way to achieve this property is to sort the options during config file parsing, so the class specific options are ahead of all non class options for each subnet.

Token values are strings

Token values are stored in ValueStack?, which is a stack of strings. Each token must evaluate to a string. There are certain occasions where token may be temporarily converted to some other type, but after a given token is evaluated, its value represented as a string is put on the stack. This ensues compatibility with existing and future tokens. This approach induces a minor performance penalty, but see implementation assumption I.3.

With the introduction of boolean operators and operators that return length (an integer), there is a need to implement a unified way to attempt to cast a string value to a given type. This will be implemented with the following methods:

static bool Token::toBool(const std::string& token_value) throws EvalTypeError;
static int Token::toInteger(const std::string& token_value) throws EvalTypeError;

Rules for converting to boolean.

  1. The following values convert to true: "true".
  2. The following values convert to false: "false".
  3. Every other value is illegal and causes EvalTypeError? exception.

Rules for converting to integer:

  1. The input must be understandable by boost::lexical_cast<int>(). If lexical_cast throws boost::bad_lexical_cast, toInteger() method will rethrow it as EvalTypeError?;

If, during the expression processing, any token throws anything, the processing for this class is aborted. If there are other classes with expressions, they will be evaluated.

Accessing text representation (option[123].txt selector)

It is useful to have a text representation of the contents of the option. This operator will call Option::toString(), which will return terse text representation of the option. It will use constant length when possible, so a dissection using substring operator would be a viable option. Each option type (all of the option classes deriving from Option) will have to provide toString() implementation.

This text representation will use the most terse, human readable format available. For example field consisting IPv4 address will be printed as 192.0.2.1, not 192.000.002.001. The text representation will not include option type or option length. If there's a need for such information, we may easily implement such accessors in the future (e.g. option[123].code or option[123].length).

If there are multiple fields in an option, each field will be separated by a single space, e.g. DNS servers option that contains two address will be represented as "192.0.2.1 192.0.2.2" (without quotes).

Accessing specific fields (option[123].enterprise-id selector)

Ultimately we will want to provide access fields for most common fields. In phase 2 (Kea 1.1) we will implement support for one of such fields a trial. If the approach is viable, we will extend it to other fields in the future releases. enterprise-id was chosen as a field that will be implemented first, as it is present in vendor class (v6 option 16), vendor specific information option (v6 option 17), vendor identifying vendor class option (v4 option 124) and V-I Vendor-specific Information option (v4 option 125). Note that the actual name of the field is enterprise-number, but it's commonly referred to as enterprise-id or vendor-id, so we'll keep it that way.

This will be implemented as new class derived from Token class. In its evaluate() method the class will attempt find an option of specified type and if found, will attempt to dynamic_cast it to one of the types that have the enterprise-id field.

Note: technically v4 options are capable of storing more than one enterprise-number, but I have not seen or heard about actual implementations that use that capability. While theoretically possible, it would require cooperation between multiple vendors, which seem to be unlikely. For the simplicity of use, we will start with a single enterprise-number and if we receive feedback requesting access for more than one enterprise-id, will will extend the selector to optionally provide offset, e.g. option[123].enterprise-id[3].

This operator will return the value of the enterprise-id field in integer format. For example for DOCSIS vendor options it will return 4491 (that's enterprise-id used by docsis devices).

Future Kea releases will allow access to additional fields.

Accessing constant fields

There will be two extra classes that access constant fields: TokenConstFields4 and TokenConstFields6. During the evaluation, it will extract values from specific constant fields. In particular:

  1. TokenConstFields4 will extract chaddr, giaddr, ciaddr, yiaddr, siaddr, hlen, htype, trans-id fields.
  2. TokenConstFields6 will extract type and trans-id fields.

The syntax used should be: pkt4.chaddr, pkt6.type.

When accessing a field with a type of IPv4 or IPv6 address the value placed on the stack will be a 4 (IPv4) or 16 (IPv6) byte string. In order to process them we shall provide an address literal so to check the gateway address one would use (note no quotes):

pkt4.giaddr == 1.2.3.4

For access to a field with a type of integer the current plan is to use a binary string 4 bytes long. So pkt4.hlen would result in a value of 0x00000006 and it could be compared using the following two statements:

pkt4.hlen == 6 
pkt4.hlen == 0x00000006

Accessing relay options

Relay options in v4 are stored as sub-options in RAI (82) option, regardless of how many relays the packet has traversed. To access relay options in v4, we will use the following syntax: rai[123], which means sub-option 123 stored in the relay agent information option.

Relay options in v6 are stored in separate encapsulation layers, e.g. if the packet traversed 2 relays, the packet looks like this: relay-forw(relay-forw(solicit)). Each relay may insert its own options, which are distinct from the options inserted by the client. In extreme case in situation with 2 relays, the packet may contain 3 instance of an option: two inserted by relays and third one by the client.

To specify which nesting level we want to access, the following notation should be used:

relay6[X].option[123]

In this notation X is the nesting level of the encapsulation option, with 0 being the relay closest to the server (the outermost wrapper).

If needed, we will use the same approach as in Pkt6::getRelayOption() and Pkt6::getAnyRelayOption(), which allows specifying search order. See Pkt6::RelaySearchOrder? for details.

Accessing relay6 constants

Access to the constant fields for a relay 6 option is via relay6[X].linkaddr and relay6[X].peeraddr. As with relay options the X refers to the nesting level of the relay encapsulation.

Accessing nested options

To access nested options, the following syntax should be used: option[123].option[45]. It will attempt to pick the top level option 123 and, if found, will try to get sub-option 45 from it.

IP Addresses

As there may be multiple places where an IP address is used it should be possible for the administrator to provide the address in a convenient form such as 10.0.0.1. This will be converted into a 4 byte string of 0x0A000001.

Debugging

In order to help debug classification expressions each token can log what the values it pops from and pushes to the value stack. In order to enable this the user needs to set the severity of the logging to "DEBUG" and the debug level to at least 55.

The expression "substring(option[61].hex,0,3) == 'foo'" would result in log statements something like this:

2016-05-19 13:35:04.163 DEBUG [kea.eval/44478] EVAL_DEBUG_OPTION Pushing option 61 with value 0x666F6F626172
2016-05-19 13:35:04.164 DEBUG [kea.eval/44478] EVAL_DEBUG_STRING Pushing text string '0'
2016-05-19 13:35:04.165 DEBUG [kea.eval/44478] EVAL_DEBUG_STRING Pushing text string '3'
2016-05-19 13:35:04.166 DEBUG [kea.eval/44478] EVAL_DEBUG_SUBSTRING Popping length 3, start 0, 41string 0x666F6F626172 pushing result 0x666F6F
2016-05-19 13:35:04.167 DEBUG [kea.eval/44478] EVAL_DEBUG_STRING Pushing text string 'foo'
2016-05-19 13:35:04.168 DEBUG [kea.eval/44478] EVAL_DEBUG_EQUAL Popping 0x666F6F and 0x666F6F pushing result 'true'

Parser generation

The eval library Makefile has a parser entry which is available with the configure argument --enable-generate-parser which invokes flex and/or bison when needed.

Note git fails to put a timestamp to *.hh files, so please use docs.isc.org:~fdupont/bin wrapper for bison (also attached to #4210 kea.isc.org ticket).

Kea 1.1 tickets

Phase 1 was released in Kea 1.0. Those tickets are required for phase 2. Prospective future phase 3 was not evaluated at this time.

  1. Requirements and design - #4257
  2. Implement boolean (and, or, not) operators - #4255
  3. Implement grouping (parentheses) operator - #4256
  4. Extract constant length fields from DHCPv4 packet (chaddr, giaddr, ciaddr, yiaddr, siaddr, hlen, htype) - #4268
  5. Extract constant field from DHCPv6 packet (type, trans-id) - #4269
  6. Make options inserted by a v4 relay agent available for the classification - #4264
  7. Make options inserted by a v6 relay agent available for the classification - #4265
  8. Implement string concatenation (+) operator - #4233
  9. Implement option[123].text representation (this covers most/all option classes) - #4221
  10. Implement option specific fields for vendor-independent vendor-specific option in v4 (e.g. option[125].enterprise-id) - #4270
  11. Implement option specific fields for vendor-independent vendor-specific option in v6 (e.g. option[17].enterprise-id) - #4271
  12. Implement access to nested options for vendor-specific options in v4 (125) and v6 (17) - #4271
  13. Implement meta-information support in classification (at least interface name, src/dst IP and packet length should be supported) - #4272
  14. Implement per subnet information, i.e. extend existing options storage to also convey class information for both v4 and v6. (not required for phase 1) - #4104
  15. Add new ip address literal for classification expression - #4232
  16. Add debugging statements to display values to users - #4480
  17. Add easy access to relay nearest the client for relay6 - #4482 (phase 3 or later)
Last modified 18 months ago Last modified on Jun 13, 2016, 12:56:16 PM