Using resolver and cache for NSAS needs


Currently, the NSAS builds its own copy of data, both because it is easier to look up data by following pointer and having all known IP addresses there as a vector than scanning the cache and because it should be faster and provide place for custom-NSAS-optimisations. Furthermore, call to NSAS will be much simpler, because the only needed parameter is the name of the zone.

But to have the structures, the data must be obtained, in a simple way.

Original design

Original design was to provide the referral information (authority and additional sections) together with the zone requested. While anyone that knows the name of the zone probably knows them as well, because both is from referral response, the cache might contain better data than the referral. Furthermore, the additional sections do not need to contain all IP addresses (for example with out of zone nameserver), so we need to be able to look up data anyway. And the lookup might be a CNAME. In short, this doesn't sound like a simple approach and has the problem of using unauthoritative data even if the cache has authoritative.

Asking the resolver for everything

The newer idea is to ask the resolver for any information we need. If someone asks us to choose an IP address of, we simply query the resolver for NS records for and then, when we get them, for A and AAAA records for the nameservers (currently only up to 2 of them and we deffer more requests for when they are answered). Then we build the structure and answer.

This has the great advantage of NSAS not ever needing to parse any messages or sections itself, is simple, if the cache holds authoritative data, it is preferred over the referral data, which are not authoritative, the resolver knows how to solve CNAMEs and DNAMES and we do not need to care about if the data arrived already and is in the cache or we will do a remote request.

However, described this way, it does not work, because there are some problems.

The recursive problem

The usual situation is that the one asking us to provide an IP address is the resolver. So it tells us to give it an IP address of some nameserver of Because we do not know yet, we ask the resolver to give us NS records of And the resolver will ask us to provide the IP address of, to ask what its nameservers are. This does not loop infinitely, as the entry exists and is marked as IN_PROGRESS (waiting for data), so the callback is just stored. But we do not get any data and, worse, we do not timeout, because timeouts are on network operations, not on running code, and nothing here communicates by a network, so it does not create timeouts. This might even lead to cyclic data structure with shared pointers, which will not get released.

The one saving the day is the cache. Assuming it stores anything that goes by, including data from additional and authoritative sections, the resolver does not ask us for IP address, but provides our answer directly from the cache. So this way it works in the usual situation.

Still, it is not bullet-proof. There might be a zone with single nameserver which does not have any IP address. In that case it is unreachable, but the cache can not assume anything from seeing empty additional section. So it does not know there are no IP addresses, so it will not provide them and the resolver will try to fetch them, asking us for IP address.

This can be solved by providing a CACHE_ONLY flag to the resolver (assuming it will have one), forcing it not asking anything remote and provide fail right away if the cache does not have the data.

Such flag would allow us to do a first-round over the nameservers and fill the IP addresses we already have right in the initialization, then start fetching at most 2 IP NSs at once externally.

Unauthoritative data problem

When we create the data structure, cache might know only unauthoritative data. But when the nameserver is queried, some authoritative data will arrive and the cache overwrites the unauthoritative by authoritative. But we don't, we still use the old one.

What is needed is that cache informs us about it. It is enough that we are informed when the data actually change (most of the time the unauthoritative and authoritative data will be the same and we do not need to be bothered). When we are informed, the simplest thing to do is pretend the entry expired and we will fetch new data from cache when they are needed.

Another problem is, we assume that resolver is willing to provide data that it knows is unauthoritative. But this can be solved simply by adding UNATHORITATIVE_OK flag.

TTL 0, cache eviction

We assume that we find the glue data in cache. But that might not happen, for example when some other thread needed to clean some space there (there isn't infinite amount of space in the cache usually) or the data has TTL 0 and it can't be kept in the cache. This would lead to not answering the query, but marking the entry as unreachable, which isn't correct.

This might be solved for example by providing some kind of cache cookies. When data are put into the cache, it would return a cookie and having such a cookie would guarantee that the cache is able to provide at last the data passed to it. (Technically, the easiest way to do this functionality is to put a shared pointer to the data into the cookie, and the cache would look first into itself, then into the cookie if not found.)

External assumption

This approach assumes few things about other components' behaviour. They are listed here, at a single place, for faster reference:

  • Cache needs to intercept every packet that goes in and store all information, including additional and authority sections (might be out of the main store, using cookies).
  • Cache needs to inform NSAS when unauthoritative or old information is replaced by different authoritative (eg. only when they differ).
  • Cache needs to be able to provide a way to store TTL and make it available to one exact NSAS query only. For example by cookies.
  • Resolver needs to be able to handle flags CACHE_ONLY (not doing any external queries, if there isn't the information, then fail) and UNAUTHORITATIVE_OK (it is acceptable to receive not authoritative data)
  • Resolver needs to be able to pass the cache cookie with the request back to cache (if the cookies are used).
  • Resolver interface should provide a way to ask for an RRset. It is not really required, but passing the RRset is probably better than constructing a response and then parsing it again.
Last modified 8 years ago Last modified on Nov 27, 2010, 8:14:37 PM