Opened 4 months ago

Closed 3 months ago

#5528 closed enhancement (complete)

Ability to register host data sources from a hook

Reported by: tomek Owned by: fdupont
Priority: medium Milestone: Kea1.4
Component: host-reservations Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 3
Total Hours: 0 Internal?: no

Description

We currently have 3 proper hook backends: mysql, postgres, and cassandra. The support for them is determined at the compilation time. With addition of yet another one (Radius) it would be even more problematic to determine which backends to support. This is particularly tricky for OS packagers as they can't realistically expect that a simple DHCP server will imply the need to install many heavy dependencies.

Therefore we need a mechanism that will allow to dynamically register new host backend types. In particular, it must be possible to register them from a hook library. The immediate goal of this work is to be able to implement Radius based backend in a hook. A longer term solution will be to trim down libdhcpsrv by moving mysql, postgres and cassandra to dedicated hook libs.

Subtickets

Change History (16)

comment:1 Changed 4 months ago by tomek

  • Summary changed from Ability to register host backend types from hooks to Ability to register host data sources from a hook

comment:2 Changed 4 months ago by fdupont

  • Owner set to fdupont
  • Status changed from new to accepted

The database backend improvement is an illusion until the lease database is handled by a similar mechanism too..
Anyway there is a benefit to make the HostDataSourceFactory::create() class method a real dynamic factory code.

comment:3 Changed 4 months ago by fdupont

Code and tests done. I think I should update the doxygen file too.

comment:4 Changed 4 months ago by fdupont

  • Add Hours to Ticket changed from 0 to 3
  • Owner changed from fdupont to UnAssigned
  • Status changed from accepted to reviewing

Done. Ready for review. Note that #5531 will depend on this.

comment:5 Changed 4 months ago by fdupont

Updated by #5531, i.e. you should review directly trac5531 branch.

comment:6 Changed 4 months ago by fdupont

Change the branch to review to trac5533a.

comment:7 Changed 4 months ago by tomek

  • Owner changed from UnAssigned to tomek

Since 5528, 5531, 5532, 5433 and 5534 are on the same branch, reviewing all them at once.

comment:8 follow-ups: Changed 4 months ago by tomek

Just finished reviewing trac5533a. Let me start this review with couple remarks.

a) This code change is way too big to review it. Its size is one of the
problems, but another is that these are several different features
put together. After many hours of going through the code changes
it was difficult to focus and keep in mind why specific change
was there. Was it for caching, for backend registration, because
multiple backends?

I admit that due to recent emergencies the reviews weren't going too
well. But in the future please don't stuff everything like this, ok?

b) The host caching code, especially the mechanism to handle negative cache
is brilliant. Excellent job!

src/lib/dhcpsrv/dhcaccess_aprser.h

Please don't use size_t, but rather the enum type from cfg_db_access.h.

src/lib/cfg_db_access.h
Why did you change the LEASE_DB and HOSTS_DB from enum to constants?
Please turn them back into enum. Note that in 1.5 we will
likely have SUBNETS_DB and possibly others in the near future.

dhaccess_parser.cc - the list of backends is hardcoded here (line
125). It shouldn't.

base_host_data_source.h
Why did you change the add interface. It used to be void add() and if
the addition is failing, the code should throw DuplicateHost?,
BadHostAddress? or something similar.

libdhcpsrv.dox
The doc should briefly explain how to register a backend. Something
like "See host_data_source_factory.cc for examples." will do the
trick. I did this. Pleae pull and review.

host.h
The comment for negative flag should explain what exactly the flag
means (that it is a host object represents a lack of reservation and
that it is used in negative caching).

host_mgr.h
Historically we have messed up the names in here. The thing is
sometimes called backend, host data source or host manager. How about
we change the name to simply backend? If you like the name (if not,
please propose something equally simple), let's use "backend" naming
in the new code. We don't want to rename existing code, at least not
yet. We may do it when we get all the patches for Cassandra finally
merged. If you like the name, please rename new methods in HostMgr? to
addBackend(), delBackend(), delAllBackends().

Descriptions for get4Any and get6Any should clearly s

host_mgr.cc
get4(subnet_id, hwaddr, duid) - I have one fundamental and one
technical comment here. The fundamental one is that in my opinion the
for loop should not continue if a host is found. That was one of the
assumptions for the concept of multiple designs. You have faster
backends first and slowest last, so if you find a host, you omit
interacting with the slow backends.

The technical comment is that we should try to limit down the number
of getX methods. One of them that is not really useful is
get4(subnet_id, hwaddr, duid), because it's essentially two queries
bundled together. I don't want this refactoring to be done now, but
we should not make it more complicated in the future. So it would
be better to use get4(subnet_id, identitype-type, identifier-begin,
identifier-len).

The same two comments apply to get6(subnet-id, duid, hwaddr).

get4Any - would it be ok to simplify the for loop with:

for (auto it : alternate_sources_)

Also, the last line could explicitly return empty return
(ConstHostPtr?()), rather than return host which is null at this
stage.

What happens in the first backend returns a negative cache in line
191? It would be returned, right? The same is true in get6Any in
line 345.

If I understand the code correctly, this method may return either
actual host or cached negative. This fact should be better described.
There should be 2 comments: one in the method comment explaining
that the method can return actual or negative cached hosts.

The method could use some comments. I've added some. Please pull
and review. More comments are welcome.

database_backends.dox
The only change you did is copyright year bump :) Either change
something in the file or bump it back to 2017.

host_data_source_factory.cc
The registerFactory should log something, like "Registration
successful. It is now possible to use host backend of type 'xyz'".
This should be logged on debug level.

Similar comment should be logged for deregisterFactory.

memory_host_data_source.h
The methods that are not implemented should be described as such.
Added missing comments. Please pull.

host_mgr_unittest.cc

testGet4Any, testGet6Any: Other access method should be tested they
do not return negative cached host:

get4(subnet-id, hwaddr, duid), get4(subnet-id, address),
get6(subnet-id, duid, hwaddr), get6(prefix, len), get6(subnet-id,
addr)

host_data_source_factory_unittest.cc
The tests you wrote are pretty thorough. Good job!

I've updated couple things in 12 files. Please pull and review.

Please pull my changes and address the comments I made above.
Once you do it, the code is ready for merge. I'm now moving
to review of the premium/ part.


This absolutely requires a changelog. Here's proposed entry:

13XX.	[func]		fdupont
	Host backends are no longer hardcoded and can now be registered
	from hooks. It is now possible to register multiple backends.
	(Trac #5528, git abcd)
	(Trac #5531, git abcd)

13XX.	[func]		fdupont
	Caching host backend implemented. This code is not yet ready
	for use, thus disabled by default.
	(Trac #5532, git abcd)
	(Trac #5533, git abcd)

Ok, the code builds on Ubuntu 17.04. Unit-tests are running.
They passed without any extra backends, now rerunning them with mysql, postgres and cassandra.
Will update this ticket if something breaks down.

This concludes review of the open source part. I'm moving on to premium/ review.
As I understand it, this covers 5528, 5531 and 5532. Remaining tickets (5533 and 5534)
will be merged once the premium/ code is reviewed.

comment:9 Changed 4 months ago by tomek

  • Owner changed from tomek to fdupont

comment:10 in reply to: ↑ 8 Changed 3 months ago by fdupont

Replying to tomek:

Just finished reviewing trac5533a. Let me start this review with couple remarks.

a) This code change is way too big to review it. Its size is one of the
problems, but another is that these are several different features
put together. After many hours of going through the code changes
it was difficult to focus and keep in mind why specific change
was there. Was it for caching, for backend registration, because
multiple backends?

=> it is the problem with a ticket train: with 2 dependent tickets you
can patch back the first ticket when developing the second shows
things to change. Unfortunately it becomes too hard with more than
two tickets, here we have 5!
Ideas:

  • do not do ticket trains... of course it does not apply in all situations and ticket trains are still better than jumbo tickets, both for developer and reviewer.
  • do a second development pass to update previous packets and rebase on them. Adds a lot of work to the developer and make review as easy as no train even some design choices can become unclear because they are from feedback.

I admit that due to recent emergencies the reviews weren't going too
well. But in the future please don't stuff everything like this, ok?

=> I understand the issue, not yet the right solution (other than wait
and eventually backtrack).

b) The host caching code, especially the mechanism to handle negative cache
is brilliant. Excellent job!

=> even if negative caching was not in the ticket description it was obvious
than for performance negative caching was required!

src/lib/dhcpsrv/dhcaccess_aprser.h

Please don't use size_t, but rather the enum type from cfg_db_access.h.

src/lib/cfg_db_access.h
Why did you change the LEASE_DB and HOSTS_DB from enum to constants?
Please turn them back into enum. Note that in 1.5 we will
likely have SUBNETS_DB and possibly others in the near future.

=> hum, reconsidering this I believe the best choice is to have
an enum in the isc::dhcp namespace: it is not so bound to the config.
And with C++11 you can make an enum a full type. I'll look for
details and propose something.

dhaccess_parser.cc - the list of backends is hardcoded here (line
125). It shouldn't.

=> I agree. Note even it is *not* a code touched by the host cache train
(cf git diff master...trac5533a dbaccess_parser.cc) it is the right
occasion to fix it. I propose to open a ticket to design what we want,
e.g. a list of backend names, a map of backend names with their
parsers (more flexible even there is still the syntax constrain), etc.
=> #5562

base_host_data_source.h
Why did you change the add interface. It used to be void add() and if
the addition is failing, the code should throw DuplicateHost?,
BadHostAddress? or something similar.

=> my idea is to call each backend in the sequence until one returns
true saying to added the entry. Of course the cache should return
false so the next and real/store backend gets it... i.e., something
similar to del.

DuplicateHost? is different because it is an error condition. BTW
my reasoning did not handle this case so I am afraid more design
is needed here even it does not impact the Radius stuff as
the add (or del) operation has no meaning for it...

I have to stop addressing comments here. Shall continue tomorrow.

comment:11 Changed 3 months ago by fdupont

Thought more about the add problem:

  • I believe I should restore the original type i.e. return void
  • add will be called to each backend if none raises an exception
  • if an exception is raised it is handled by the caller
  • cache will ignore add
  • host manager will cache the just added entry at the end of the add method
  • documentation should be updated explaining this behavior (in fact more clarifying because it is not really new)

BTW it is far to be critical because add is used only by the host command hook. The cache has its dedicated method (insert) with a different semantic.

comment:12 in reply to: ↑ 8 Changed 3 months ago by fdupont

Replying to tomek:
Next part

libdhcpsrv.dox
The doc should briefly explain how to register a backend. Something
like "See host_data_source_factory.cc for examples." will do the
trick. I did this. Pleae pull and review.

=> I'll review.

host.h
The comment for negative flag should explain what exactly the flag
means (that it is a host object represents a lack of reservation and
that it is used in negative caching).

=> there should be more about negative caching anyway.

host_mgr.h
Historically we have messed up the names in here. The thing is
sometimes called backend, host data source or host manager. How about
we change the name to simply backend?

=> I disagree about the last item: the host manager is the entity which
handles host data sources so if it has the same signature (C++ base class)
it is not at the same level. Backend is not generic enough because
the CfgHosts class is not a backend. Now this depends on the context.

If you like the name (if not,
please propose something equally simple), let's use "backend" naming
in the new code. We don't want to rename existing code, at least not
yet. We may do it when we get all the patches for Cassandra finally
merged. If you like the name, please rename new methods in HostMgr? to
addBackend(), delBackend(), delAllBackends().

=> a bit better than source and shorter when the whole name has to
be used. I vote for backend too.

Descriptions for get4Any and get6Any should clearly s

=> your editor or trac ate the end of the line. I believe you want a better
description?

host_mgr.cc
get4(subnet_id, hwaddr, duid) - I have one fundamental and one
technical comment here. The fundamental one is that in my opinion the
for loop should not continue if a host is found. That was one of the
assumptions for the concept of multiple designs. You have faster
backends first and slowest last, so if you find a host, you omit
interacting with the slow backends.

=> there is !host && it != alternate_sources_.end(); so the loop
stops at the first.

The technical comment is that we should try to limit down the number
of getX methods. One of them that is not really useful is
get4(subnet_id, hwaddr, duid), because it's essentially two queries
bundled together. I don't want this refactoring to be done now, but
we should not make it more complicated in the future. So it would
be better to use get4(subnet_id, identitype-type, identifier-begin,
identifier-len).

=> fully agree. BTW these hwaddr/duid methods are not used and
all backends translate them internally into the identity call.
If you like I can create a ticket removing them (simplifying the code
is nearly always the right option). BTW the hwaddr/duid led me
to the duplicate mess I removed between premium trac5534
and trac5534a (so before trac5533) so it won't be only code
clean up!

The same two comments apply to get6(subnet-id, duid, hwaddr).

get4Any - would it be ok to simplify the for loop with:

for (auto it : alternate_sources_)

=> I opened before reading this a ticket about range-for...
So please comment on it. If we adopt a policy in favor of it
on new code I'll update new code at merge time or before.

Also, the last line could explicitly return empty return
(ConstHostPtr?()), rather than return host which is null at this
stage.

=> matter of style.

What happens in the first backend returns a negative cache in line
191? It would be returned, right? The same is true in get6Any in
line 345.

=> line 191 is in log so I took line 345: yes if there is a negative
cache entry it is returned (and not cached) and it is the job of
the caller to recognize it (I added the corresponding new flag in
the host class at the first step). Of course it makes sense to
ask this to callers of getXAny method.

If I understand the code correctly, this method may return either
actual host or cached negative.

=> or empty entry. When you read the tests you can see you have
to call the method twice: the first time it returns empty and as a side
effect it is cached negative, the second time it returns the negative
cache.

This fact should be better described.
There should be 2 comments: one in the method comment explaining
that the method can return actual or negative cached hosts.

The method could use some comments. I've added some. Please pull
and review. More comments are welcome.

=> I'll carefully review your comments.

database_backends.dox
The only change you did is copyright year bump :) Either change
something in the file or bump it back to 2017.

=> argh! I opened it to add something and found nothing.
IMHO as we are short in comments the first option (add something)
is the best.

host_data_source_factory.cc
The registerFactory should log something, like "Registration
successful. It is now possible to use host backend of type 'xyz'".
This should be logged on debug level.

Similar comment should be logged for deregisterFactory.

=> yes I have to add more logs (of course it is better to have too
many logs than the opposite. Log levels are here to avoid to emit
too many logs in production.)

memory_host_data_source.h
The methods that are not implemented should be described as such.
Added missing comments. Please pull.

=> note I just copied this from host command hook. But now
it is promoted to core code I agree it should be improved too.

host_mgr_unittest.cc

testGet4Any, testGet6Any: Other access method should be tested they
do not return negative cached host:

get4(subnet-id, hwaddr, duid), get4(subnet-id, address),
get6(subnet-id, duid, hwaddr), get6(prefix, len), get6(subnet-id,
addr)

=> I'll add these new unit tests.

I've updated couple things in 12 files. Please pull and review.

This concludes review of the open source part. I'm moving on to premium/ review.
As I understand it, this covers 5528, 5531 and 5532. Remaining tickets (5533 and 5534)
will be merged once the premium/ code is reviewed.

=> Kea branches are trac5528, trac553[123] and trac5533a.
Premium branches are trac553[1234] and trac5534a but the last one is trac5533,
i.e., trac5534/trac5534a were done before trac5533.
Unfortunately the git log messages have no branch in them (I found the problem
and fixed it but too late). Perhaps it is not a big issue because anyway
they are new files (not the Doxyfile where only one line was changed over
its 2430 lines i.e. 40% of the diff can be reviewed in a few seconds...).

Next steps on my side should be tomorrow:

  • review your changes
  • restore add() in core and premium (it should simplify premium code and tests).
  • address other comments
  • give back the ticket to you

Between make (check)'s I'll put ideas in #5562 because #5527 could depend on it
(at least allows to put the "database part" of the config in the server config as in the
design (today/in #5525 it is in the hook config). BTW I am finishing #5526
(I forgot to add logs for instance) i.e. the previous ticket...).

comment:13 Changed 3 months ago by fdupont

  • Owner changed from fdupont to tomek

Addressed all comments. I added the requested tests (not easy because test setups didn't support it but it led to IMHO better code and doc so I have no complain...).
Ready for another pass. BTW I suggest to use #5533 ticket for the premium code review.

Last edited 3 months ago by fdupont (previous) (diff)

comment:14 follow-up: Changed 3 months ago by tomek

  • Owner changed from tomek to fdupont

I have reviewed your updated code on trac5533a and have a couple comments:

  • Your 3016b7117e1ed0dd6d0e86bf2fe8d2c9c2e7e535 commit added premium file. Please remove it before merging.
  • Your changes are good. I did only cosmetic fixes. please pull.
  • Make sure you add changelog entries. My proposals are in comment 8 above, but feel free to use different/additional text.
  • Thanks a lot for writing the overview description in .dox. It is super useful.

Ok, that's it. Please pull, review my minor changes and if you're ok with them, merge.

The next step for me is to review trac5533 in premium repo.

comment:15 in reply to: ↑ 14 Changed 3 months ago by fdupont

Replying to tomek:

I have reviewed your updated code on trac5533a and have a couple comments:

  • Your 3016b7117e1ed0dd6d0e86bf2fe8d2c9c2e7e535 commit added premium file. Please remove it before merging.

=> it is hard to get rid of premium. If you know a way to avoid it using git commit -a -m ...?
BTW it won't merge without a lot of conflicts with database reconnect. I propose solution in #5574.

comment:16 Changed 3 months ago by fdupont

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged. Closing.

Note: See TracTickets for help on using tickets.