Opened 8 months ago

Closed 6 months ago

#5533 closed enhancement (complete)

Use caching host backend to actually cache hosts

Reported by: tomek Owned by: fdupont
Priority: medium Milestone: Kea1.4
Component: host-reservations Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 6
Total Hours: 0 Internal?: no

Description

Once #5531 and #5532 are implemented, the logic of HostMgr class should be further extended to cache hosts returned by other backends.

For details, see RadiusDesign.

Subtickets

Change History (15)

comment:1 Changed 7 months ago by fdupont

A way to implement negative answers is required too.

comment:2 Changed 7 months ago by fdupont

  • Owner set to fdupont
  • Status changed from new to accepted

comment:3 Changed 7 months ago by fdupont

Waiting for more reviews of dependency, i.e., #5531, #5532 (kea trac5532 branch) and #5534 (premium trac5534a branch).

comment:4 Changed 7 months ago by fdupont

Done. Branches to review are Kea trac5533a (I rebased it to make it mergeable without effort) and Premium trac5533.
When reviewing #5528, #5531, #5532 and #5534 please refer to these branches because many things were fixed and/or improved and of course not back-ported.

comment:5 Changed 7 months ago by fdupont

  • Add Hours to Ticket changed from 0 to 6
  • Owner changed from fdupont to UnAssigned
  • Status changed from accepted to reviewing

comment:6 Changed 7 months ago by fdupont

Internal note: simplify the host-id code as C++ integer conversion does not change bits (vs try to keep sign).

comment:7 Changed 7 months ago by tomek

  • Owner changed from UnAssigned to tomek

Since 5528, 5531, 5532, 5433 and 5534 are on the same branch, reviewing all them at once.

comment:8 Changed 7 months ago by fdupont

For the to come patch:

        if (!done) {
                struct timeval tv;
                unsigned long junk;

                gettimeofday(&tv, NULL);
                srandom((getpid() << 16) ^ tv.tv_sec ^ tv.tv_usec ^ junk);
                return;
        }

from srandomdev code when the random device fails...

Last edited 7 months ago by fdupont (previous) (diff)

comment:9 Changed 7 months ago by fdupont

Added the alternate code for srandomdev. If it is needed at another place (e.g., performance stuff) we can move it in missing or a tool location. BTW there is a ticket for adding RNG interface to cryptolink but here we only need an uniformly distributed generator (i.e., probability to get a particular k in 1..N is 1/N for all k's and for all drawings).

comment:10 follow-up: Changed 6 months ago by tomek

Here are my review comments for trac5533 branch in premium repo.

This is part one. It covers all changes, except those in libloadtests/ and tests/. Those will be delivered next week.

I made some edits on trac5533 in premium. Please pull and review.

container.h
What's the reason for defining new container
in container.h, rather than reusing the container
already defined in host_container.h in libdhcpsrv?
If it needs extending a bit, that's ok. I suppose it will be easier
to maintain if we have the same container type being used in both
places.

entry.cc
Why did you define toElement here? It would be better to use
Host::toElement4 and host::toElement6. See host.cc in
libdhcpsrv. We coule change its interface to cover both v4
and v6 if needed.

host_cache.h
What's the point of write-through flag? I get the write-through=false
mode. But how does it work when write-through is true? According
to the flag description add() doesn't add anything and always returns
false. This is both needless and incorrect. The add methods do not look
at the write_through_ flag at all. I suggest to simply remove this flag.
Will make the code simpler.

I don't understand what the cache_get_all_ids_ is useful for.
I my opinion the getAll method should return them every time
(i.e. act as if cache_get_all_ids_ is always true). Unless you
have some convincing arguments for the case of it being false,
I think this flag should be removed as well.

What's the point of having enabled_ flag? Is it for testing?
I don't see how it could be useful in production. If you don't
want to use caching, don't configure this backend.

Why getAll4 is not implemented? If it's time limitation, that's
fine, but please put a @todo there.

insert() - I really don't like that the overwrite parameter is
used in two different ways. It should be split into two separate
parameters.

Overall, I think the solution is a bit overengineered. It can
be simplified by removing all flags (enabled, cache_get_all_ids_
and write_through_) and you could use add instead of insert
and del instead of remove. Would be simpler.

cacheFlushHandler I don't like two things in the parameter syntax.
First, that the arguments may be either integer or a string. This
makes it impossible to write a deterministic sanity checker.
Second, it makes it impossible to add more parameters in the
future. it should use this syntax:


{

"command": "cache-flush",
"arguments": {

"number": 5

}

}

To flush all hosts, you can specify number: 0. Alternatively,
the number parameter could be optional. If not specified, all
are flushed.

host_cache.cc
Why does cache-start and cache-stop call extractCommand if those
commands do not take any arguments?

Cache flush could return how many cached entries were deleted.

Why do you need HCEntryListParser? Isn't the syntax the same
as in the hosts list in the config file?

The cache content should be preserved on its own. It shouldn't
require calling cache-write and cache-load. This is easy to
do, but probably a bit tricky to test, so it's ok to push
this to a separate ticket.

What is the benefit of having separate HostCacheImpl? class?
I think you can collapse HostCacheImpl? into HostCache?. The
implementation would be simpler that way. Ok, there are some
mechanisms that you probably want hidden (like relocate), but
you can simply make them private methods.

The handler methods should return CONTROL_RESULT_SUCCESS instead
of 0 (and other constants from command_interpreter.h).

container.h
Incorrect license!

host_cache_parsers.cc
Why do you need new parsers? Can't you use HostReservationParser?
from dhcpsrv/parsers/host_reservation_parser.h? One thing that
is missing there is the subnet-id, but that was solved in
host_cmds library. You could move the parsers from host_cmds to
libdhcpsrv and then use them in host_cmds and in host_cache.


Code compiled and unit-tests pass on mac os 10.12.6.

Last edited 6 months ago by tomek (previous) (diff)

comment:11 in reply to: ↑ 10 Changed 6 months ago by fdupont

Replying to tomek:

Here are my review comments for trac5533 branch in premium repo.

This is part one. It covers all changes, except those in libloadtests/ and tests/. Those will be delivered next week.

I made some edits on trac5533 in premium. Please pull and review.

container.h
What's the reason for defining new container
in container.h, rather than reusing the container
already defined in host_container.h in libdhcpsrv?
If it needs extending a bit, that's ok. I suppose it will be easier
to maintain if we have the same container type being used in both
places.

=> host container is not enough because it is missing two indexes,
one to manage the order of entries so you can flush oldest entries,
one to be able to find the host from the IPv6 reservation.
Without it you can't implement removal by IPv6 reservation.

entry.cc
Why did you define toElement here? It would be better to use
Host::toElement4 and host::toElement6. See host.cc in
libdhcpsrv. We could change its interface to cover both v4
and v6 if needed.

=> Host toElement are too incomplete. I'd prefer to cut & paste
the code and to not touch the Kea core code when not really
required.

host_cache.h
What's the point of write-through flag? I get the write-through=false
mode. But how does it work when write-through is true? According
to the flag description add() doesn't add anything and always returns
false. This is both needless and incorrect. The add methods do not look
at the write_through_ flag at all. I suggest to simply remove this flag.
Will make the code simpler.

=> it was clearer when the add method returned a boolean.
Note it is still useful for delete. I think I should rewrite this part of
the doc because I edited too many times.

I don't understand what the cache_get_all_ids_ is useful for.
I my opinion the getAll method should return them every time
(i.e. act as if cache_get_all_ids_ is always true).

=> it is an attempt to implement getAll but if you believe I should
change this I'll just remove it because the constraint to make
it to work with cache_get_all_ids_ set to true is too high.

And I am sorry but caching does not work well with getAll*.

Unless you
have some convincing arguments for the case of it being false,
I think this flag should be removed as well.

=> I can be convinced to remove the code, i.e. making
getAll always to return an empty collection.

What's the point of having enabled_ flag? Is it for testing?
I don't see how it could be useful in production. If you don't
want to use caching, don't configure this backend.

=> it was to be able to operate with the backend after the cache.
It is not critical...

Why getAll4 is not implemented? If it's time limitation, that's
fine, but please put a @todo there.

=> for 2 reasons: it can't work and it is not used. So there is
no reason to implement it.

insert() - I really don't like that the overwrite parameter is
used in two different ways. It should be split into two separate
parameters.

=> if you prefer.

Overall, I think the solution is a bit overengineered. It can
be simplified by removing all flags (enabled, cache_get_all_ids_
and write_through_) and you could use add instead of insert
and del instead of remove. Would be simpler.

=> no, add and insert must be different because add is
applied to all backends (so if the backend behind the cache
throws you are in trouble) and add does not return something
so you have no control over what it really does.

Some for del/remove even the difference is less critical.

cacheFlushHandler I don't like two things in the parameter syntax.
First, that the arguments may be either integer or a string. This
makes it impossible to write a deterministic sanity checker.
Second, it makes it impossible to add more parameters in the
future. it should use this syntax:


{

"command": "cache-flush",
"arguments": {

"number": 5

}

}

To flush all hosts, you can specify number: 0. Alternatively,
the number parameter could be optional. If not specified, all
are flushed.

=> too dangerous. I propose to have two different commands.

host_cache.cc
Why does cache-start and cache-stop call extractCommand if those
commands do not take any arguments?

=> because extractCommand does not what you can think from
its name. There is a ticket to provide the function with a new name.

Cache flush could return how many cached entries were deleted.

=> I don't remember (and has a flaky Internet connection)
what it returns.

Why do you need HCEntryListParser? Isn't the syntax the same
as in the hosts list in the config file?

=> no it is not (cf the toElement discussion).

The cache content should be preserved on its own. It shouldn't
require calling cache-write and cache-load. This is easy to
do, but probably a bit tricky to test, so it's ok to push
this to a separate ticket.

=> how do you propose to survive config reload? reseting
the host cache pointer flushes its whole content.

What is the benefit of having separate HostCacheImpl? class?

=> someone had the strange idea to make all base cache get* methods const.

I think you can collapse HostCacheImpl? into HostCache?. The
implementation would be simpler that way. Ok, there are some
mechanisms that you probably want hidden (like relocate), but
you can simply make them private methods.

=> you can't call a non const private method from a const public one.

The handler methods should return CONTROL_RESULT_SUCCESS instead
of 0 (and other constants from command_interpreter.h).

=> the doc says the constant is for the result entry in the JSON,
not for the returned value from the caller.
I agree 0 and 1 do not show a difference but there is at least
another defined constant. Note I coded CONTROL_RESULT_SUCCESS
but reading the code of other hooks I remarked they don't use it
so I tried to understand why...

container.h
Incorrect license!

=> oops.

host_cache_parsers.cc
Why do you need new parsers? Can't you use HostReservationParser?
from dhcpsrv/parsers/host_reservation_parser.h? One thing that
is missing there is the subnet-id, but that was solved in
host_cmds library. You could move the parsers from host_cmds to
libdhcpsrv and then use them in host_cmds and in host_cache.

=> host_reservation_parser.h is not generic (they are specialized
to IPv4 and IPv6), misses some fields (not only subnet-id which is
critical for the host cache, e.g. host-id), etc. So I prefer
to reuse a generic host entry parser I had somewhere.

Code compiled and unit-tests pass on mac os 10.12.6.

A short explanation about caching getAll* stuff:

  • you can't use an answer from a simple get because you don't know if it is the only one. If you returns it both other backends must be called to complete the collection (so the cache brings no benefit) and the host manager has to remove duplicates (so the cache brings deficit).
  • so you have to manage getAll* entries as a set of hosts which requires a different container.
  • if you try to return something to a single get* using the getAll* cache you break the LRU management. (the deep reason of this is the getAll* caller filters what is interesting to it from the set without feedback)

In fact the only case where you can implement getAll* with
the current host cache code is when you can assume strong
properties as the constraint for cache_get_all_ids_, or
that getAll* return always either zero or one entry.

I began the host cache implementation with the alloc engine
not using getAll* and it was decided it should not be used
for shared networks (BTW shared networks are really
incompatible with RADIUS because the subnet-id is a RADIUS
attribute).

comment:12 Changed 6 months ago by fdupont

Removed the getAll/cache_get_all_ids_ as it was more misleading than useful.
Moved the write through stuff to the host data source test which is its unique user.
About the enabled flag if I remove it I have to remove cache-start and cache-stop too.
Added a cache-clear command and improved cache-flush.
Changed the insert() type in Kea trac5533a and premium trac5533.
Reviewed and merge ident and comment changes.

comment:13 Changed 6 months ago by fdupont

Note the Kea part was merged.

comment:14 Changed 6 months ago by tomek

  • Owner changed from tomek to fdupont

I reviewed your changes and they look ok. The code compiles and unit-tests passed. I think we eventually need to merge the parsers and the toElement code. I've created new ticket (#5578) to address the duplication.

Also added 3 minor comments. Please pull and review. If you're ok with them, please merge and close all relevant tickets.

comment:15 Changed 6 months ago by fdupont

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged. Closing (with 553[124]).

Note: See TracTickets for help on using tickets.