Opened 9 months ago

Closed 5 months ago

#5458 closed defect (complete)

Ability to put DHCPv6 packet processing on hold ("park")

Reported by: tomek Owned by: UnAssigned
Priority: medium Milestone: Kea1.4
Component: dhcp6 Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Sometimes it would be very useful during the packet processing to tell Kea that a certain packet processing should be put on hold. There are many use cases that could benefit from this, e.g. waiting for external database response, waiting for ping packet to possibly return, waiting for Radius to process the request and send back response etc. The point is that the mechanism should be as generic as possible.

It is recommended that the "park" mechanism be implemented as additional next step value returned by a callout. We already have skip, drop and continue. We should add on-hold, park or similar.

HADesign page discusses the HA application of this feature, but please do keep in mind that this is only one of many possible uses of this.

This is for DHCPv6 packet, its v4 counterpart is covered by #5457.

Subtickets

Change History (25)

comment:1 Changed 9 months ago by tomek

  • Component changed from Unclassified to dhcp6
  • Milestone changed from Kea-proposed to Kea1.4

comment:2 Changed 8 months ago by fdupont

  • Owner set to fdupont
  • Status changed from new to accepted

comment:3 follow-up: Changed 8 months ago by fdupont

I cut run_one() and processPacket() so a hook can reenter in the processing using ControlledDhcpv6Srv::getInstance() and one of the new methods (or processPacket()). I failed to share the code sending the response in the DHCPv4-over-DHCPv6 handler because sendPacket() was made virtual for testing purpose...
Still have to update the doc and perhaps write a schema of DHCPv6 processing with callout and entry points.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 8 months ago by marcin

Replying to fdupont:

I cut run_one() and processPacket() so a hook can reenter in the processing using ControlledDhcpv6Srv::getInstance() and one of the new methods (or processPacket()). I failed to share the code sending the response in the DHCPv4-over-DHCPv6 handler because sendPacket() was made virtual for testing purpose...
Still have to update the doc and perhaps write a schema of DHCPv6 processing with callout and entry points.

I hope you based it on the trac5457 code, rather than creating it from scratch.

comment:5 in reply to: ↑ 4 Changed 8 months ago by marcin

Replying to marcin:

Replying to fdupont:

I cut run_one() and processPacket() so a hook can reenter in the processing using ControlledDhcpv6Srv::getInstance() and one of the new methods (or processPacket()). I failed to share the code sending the response in the DHCPv4-over-DHCPv6 handler because sendPacket() was made virtual for testing purpose...
Still have to update the doc and perhaps write a schema of DHCPv6 processing with callout and entry points.

I hope you based it on the trac5457 code, rather than creating it from scratch.

Apparently you didn't. I recommend to not implement this ticket until trac5457 is merged.

comment:6 Changed 8 months ago by fdupont

  • Owner changed from fdupont to UnAssigned
  • Status changed from accepted to assigned

Backtracked as #5457 took another direction...

comment:7 Changed 6 months ago by fdupont

trac5457 was merged so this ticket is unlocked. BTW I copied the dhcp4_srv_configured code to dhcp6 server in trac5530a branch (the IO service was needed to send Radius accounting requests in "background").

comment:8 Changed 6 months ago by fdupont

  • Owner changed from UnAssigned to fdupont
  • Status changed from assigned to accepted

comment:9 Changed 5 months ago by marcin

What is the status of this work? Or better, what is the estimated time of delivery?

comment:10 Changed 5 months ago by fdupont

I did most of the code but stall about tests. So unfortunately it should take time (I don't believe I'll do significant progress during all hands' and I'll need to recover jet lag on return. If I don't find something hard / time consuming I expect before the end of next week...

comment:11 Changed 5 months ago by fdupont

Code done (in trac5458a). Working on tests...
BTW there are open points in the code which rely on the expected usage of the leases committed callout to be solved.

comment:12 Changed 5 months ago by fdupont

  • Owner changed from fdupont to UnAssigned
  • Status changed from accepted to reviewing

Code is finished, DHCPv4 unit tests were ported but there are a lot of specific DHCPv6 things which should be tested too but are not yet. As this ticket blocks some others I give it to review and I'll work to add missing tests (if reviewer has some ideas about tests to add it will help too).

comment:13 Changed 5 months ago by marcin

  • Owner changed from UnAssigned to marcin

comment:14 follow-up: Changed 5 months ago by marcin

  • Owner changed from marcin to fdupont

I reviewed the code up to commit 89d8d0bce92afc6546e124d81eb24fc56384bb7d.

I didn't see any ChangeLog propose.

I note that you removed "this" from some number of lambdas. The changes are ok, but I wonder why they are applied here given that they are supposed to be done in another ticket.

The following unit test failed for me:

[ RUN      ] LoadUnloadDhcpv6SrvTest.Dhcpv6SrvConfigured
hooks_unittest.cc:3653: Failure
      Expected: 0
To be equal to: status_code
      Which is: 1
[  FAILED  ] LoadUnloadDhcpv6SrvTest.Dhcpv6SrvConfigured (11 ms)

After dumping the error message I got:

"unable to configure server identifier: unable to open DUID file /Users/marcin/devel/kea-build/var/kea/kea-dhcp6-serverid for writing"

I think the way around it is to use the similar approach as in the dhcp6_test_utils.cc

CfgMgr::instance().setDataDir(TEST_DATA_BUILDDIR);

src/bin/dhcp6/dhcp6_hooks.dox
I applied some minor corrections in this file, so please pull and review my changes.

src/bin/dhcp6/dhcp6_srv.cc
The following declaration:

Dhcp6Hooks Hooks;

used to be within the anonymous namespace. Now it is outside any namespace. Why is this?

processPacket: It is ok to do sanity check on the query prior to calling processPacket. Also, it is a good catch to increase statistics of dropped queries. However, I suggest that this whole section:

    try {
        if (!sanityCheck(query)) {
            // We received a packet type that we do not recognize.
            LOG_DEBUG(bad_packet6_logger, DBG_DHCP6_BASIC,
                      DHCP6_UNKNOWN_MSG_RECEIVED)
                .arg(static_cast<int>(query->getType()))
                .arg(query->getIface());
            // Increase the statistic of dropped packets.
            StatsMgr::instance().addValue("pkt6-receive-drop",
                                          static_cast<int64_t>(1));
            return;
        }
                
    } catch (const RFCViolation& e) {
        LOG_DEBUG(bad_packet6_logger, DBG_DHCP6_BASIC, DHCP6_REQUIRED_OPTIONS_CHECK_FAIL)
            .arg(query->getName())
            .arg(query->getRemoteAddr().toText())
            .arg(e.what());

        // Increase the statistic of dropped packets.
        StatsMgr::instance().addValue("pkt6-receive-drop", static_cast<int64_t>(1));

        return;
    }

is enclosed within the function to avoid bloating of the processPacket function which is already very long.

bool sanityCheck(const Pkt6Ptr& pkt);

processPacket: you have taken out the following call:

processDhcp4Query(query);

from the switch block where you call respective functions for other message types. I think I understand that you did that to avoid calling initContext for this particular case, but:

  • I think you should add a short comment before this call to explain that
  • the call is now outside of the try-catch block, so if it generates an exception it will be caught and logged by the DHCP6_PACKET_PROCESS_STD_EXCEPTION rather than DHCP6_PACKET_PROCESS_FAIL which is probably more appropriate and also bumps up the statistics of dropped packets?

The part of the code around leases6_committed callout should really have much better commentary. Specifically, the parts that set leases6 and deleted_leases6.

Also, I am not very enthusiastic about the fact that there is now a distinction between global new/deleted leases and IA specific leases. Why can't we just reuse IA specific new and deleted leases within the ClientContext6? That would make the code surrounding leases6_commiytted hook point simpler.

processSolicit, processRequest etc: It is redundant to pass both Pkt6Ptr and ClientContext6 to those functions, because ClientContext6 already contains a query_ object. It should rather be ClientContext6 and nothing else.

src/lib/dhcpsrv/alloc_engine.h
I said this already elsewhere, but is it really necessary to introduce global new and deleted leases, as opposed to just reusing IA specific ones in the ClientContext6?

Instead of introducing the committed_ flag? Could you simply use fake_allocation_ ?

comment:15 in reply to: ↑ 14 Changed 5 months ago by fdupont

  • Owner changed from fdupont to marcin

Replying to marcin:

I didn't see any ChangeLog propose.

=> no but recycling #5457 entry seems a good idea.

I note that you removed "this" from some number of lambdas. The changes are ok, but I wonder why they are applied here given that they are supposed to be done in another ticket.

=> I applied 5591 diffs to impacted code because it didn't compile. You can remove
them before merging if you believe it is too soon.

The following unit test failed for me:

[ RUN      ] LoadUnloadDhcpv6SrvTest.Dhcpv6SrvConfigured
hooks_unittest.cc:3653: Failure
      Expected: 0
To be equal to: status_code
      Which is: 1
[  FAILED  ] LoadUnloadDhcpv6SrvTest.Dhcpv6SrvConfigured (11 ms)

After dumping the error message I got:

"unable to configure server identifier: unable to open DUID file /Users/marcin/devel/kea-build/var/kea/kea-dhcp6-serverid for writing"

I think the way around it is to use the similar approach as in the dhcp6_test_utils.cc

CfgMgr::instance().setDataDir(TEST_DATA_BUILDDIR);

=> I did not get this problem but I agree with your solution.

src/bin/dhcp6/dhcp6_hooks.dox
I applied some minor corrections in this file, so please pull and review my changes.

=> done.

src/bin/dhcp6/dhcp6_srv.cc
The following declaration:

Dhcp6Hooks Hooks;

used to be within the anonymous namespace. Now it is outside any namespace. Why is this?

=> I tried to use Hooks in dhcp6to4_ipc.cc so I needed to make it public. But as it didn't
work even it compiled I stepped back to getHookIndex* and forgot to restore the declaration...

processPacket: It is ok to do sanity check on the query prior to calling processPacket. Also, it is a good catch to increase statistics of dropped queries.

=> I am afraid there are still points where statistics are not correctly updated.
Now my goal is to not add new...

However, I suggest that this whole section:

    try {
        if (!sanityCheck(query)) {
            // We received a packet type that we do not recognize.
            LOG_DEBUG(bad_packet6_logger, DBG_DHCP6_BASIC,
                      DHCP6_UNKNOWN_MSG_RECEIVED)
                .arg(static_cast<int>(query->getType()))
                .arg(query->getIface());
            // Increase the statistic of dropped packets.
            StatsMgr::instance().addValue("pkt6-receive-drop",
                                          static_cast<int64_t>(1));
            return;
        }
                
    } catch (const RFCViolation& e) {
        LOG_DEBUG(bad_packet6_logger, DBG_DHCP6_BASIC, DHCP6_REQUIRED_OPTIONS_CHECK_FAIL)
            .arg(query->getName())
            .arg(query->getRemoteAddr().toText())
            .arg(e.what());

        // Increase the statistic of dropped packets.
        StatsMgr::instance().addValue("pkt6-receive-drop", static_cast<int64_t>(1));

        return;
    }

is enclosed within the function to avoid bloating of the processPacket function which is already very long.

=> inserting a park point will split it but I agree about any changes which makes the function
easier to read and of course making it shorter enters into this category.

bool sanityCheck(const Pkt6Ptr& pkt);

processPacket: you have taken out the following call:

processDhcp4Query(query);

from the switch block where you call respective functions for other message types.

=> yes, processDhcp4Query is very special (and note I added a comment in its body
about its (non) ability to throw).

I think I understand that you did that to avoid calling initContext for this particular case, but:

  • I think you should add a short comment before this call to explain that

=> not only it saves a call to initContext but initContext simply does not make sense
for DHCPv4 queries or unknown message types. Please add a comment (and see below).

  • the call is now outside of the try-catch block, so if it generates an exception it will be caught and logged by the DHCP6_PACKET_PROCESS_STD_EXCEPTION rather than DHCP6_PACKET_PROCESS_FAIL which is probably more appropriate and also bumps up the statistics of dropped packets?

=> processDhcp4Query is not supposed to throw: this point should
be added in the comment.

The part of the code around leases6_committed callout should really have much better commentary. Specifically, the parts that set leases6 and deleted_leases6.

=> yes, I copied them from DHCPv4 where they are far simpler. I agree about
more comments.

Also, I am not very enthusiastic about the fact that there is now a distinction between global new/deleted leases and IA specific leases. Why can't we just reuse IA specific new and deleted leases within the ClientContext6? That would make the code surrounding leases6_commiytted hook point simpler.

=> unfortunately there is a clear() in alloc engine which made me not moving old_leases_ from IA's
to context. And release/decline code requires something in context because they don't know IA's.
I am not very satisfied to have both IA old_leases_ and deleted_leases_ so if you find an easy way
to merge them please proceed. About new_leases_ I copied the allocated_resources_ code in
alloc engine.

processSolicit, processRequest etc: It is redundant to pass both Pkt6Ptr and ClientContext6 to those functions, because ClientContext6 already contains a query_ object. It should rather be ClientContext6 and nothing else.

=> good catch! I warn you there are a lot of instances in tests but nothing an editor macro can't do.

src/lib/dhcpsrv/alloc_engine.h
I said this already elsewhere, but is it really necessary to introduce global new and deleted leases, as opposed to just reusing IA specific ones in the ClientContext6?

=> I already answered: I tried this for old_leases_ (there is no new_leases_ in IA's) but it was far too complex
to adapt dhcp6_srv.cc release and decline codes for this. So I gave up and added a global deleted_leases_.

Instead of introducing the committed_ flag? Could you simply use fake_allocation_ ?

=> this seems a nice idea because committed_ and fake_allocation_ are the opposite.
But I don't believe as they are used in very different contexts it is so easy, for instance
the default value of fake_allocation_ is false... To conclude a fine but potentially time
consuming idea.

comment:16 Changed 5 months ago by marcin

So, I managed to address all my comments. I think the major change that deserves some commentary here is that the declined lease is now sent in leases6 rather than deleted_leases6 to the leases6_committed callouts. The declined lease is in fact the lease with the appropriate state indicating that it is declined so it shouldn't be deleted.

comment:17 Changed 5 months ago by fdupont

It makes sense: I copied the release code (and example) to decline but as you said decline is more a special state. For the code the update is trivial, a bit less for the test but still easy.

Do you need something from my side?

comment:18 follow-up: Changed 5 months ago by fdupont

I looked at the last trac5458a commit:

  • good catch for the rapid commit but it should be added to the to do test list.
  • currentIA does the job for eliminating global deleted lease collection.

In conclusion I am very satisfied and it seems I have nothing to do now...

comment:19 in reply to: ↑ 18 ; follow-up: Changed 5 months ago by marcin

Replying to fdupont:

I looked at the last trac5458a commit:

  • good catch for the rapid commit but it should be added to the to do test list.
  • currentIA does the job for eliminating global deleted lease collection.

In conclusion I am very satisfied and it seems I have nothing to do now...

If you already have a todo list for testing, please add the rapid-commit case to it.

Nobody has done a formal review of my changes from Friday. By saying that you're satisfied, doesn't mean I can merge this code?

comment:20 follow-up: Changed 5 months ago by marcin

Making further modifications to this code, because the dhcp6_srv_configured callout does not provide network_state to the hooks libraries.

comment:21 in reply to: ↑ 19 Changed 5 months ago by fdupont

Replying to marcin:

If you already have a todo list for testing, please add the rapid-commit case to it.

=> I added the //// comments in hooks_unittest.cc for this.

Nobody has done a formal review of my changes from Friday. By saying that you're satisfied, doesn't mean I can merge this code?

=> unfortunately no because we both contributed to the code. So we need a third person to look at it (light review as we both write or read changes).

comment:22 in reply to: ↑ 20 Changed 5 months ago by fdupont

Replying to marcin:

Making further modifications to this code, because the dhcp6_srv_configured callout does not provide network_state to the hooks libraries.

=> we jabbered about this: network_state was introduced after 5457 so corresponding code was not in the 5458 base. Now it is only an order issue: it is clear the final result must include it.

comment:23 follow-up: Changed 5 months ago by fdupont

  • Owner changed from marcin to UnAssigned

Ask for a third party review.

comment:24 in reply to: ↑ 23 Changed 5 months ago by marcin

Replying to fdupont:

Ask for a third party review.

Francis, let's do not complicate things. The schedule is tight and other team members have stuff to do, including reviews of HA specific tickets. Third party review would involve the review of all changes made in this ticket, so re-review of your changes. Please review my changes and let me know if they seem ready.

comment:25 Changed 5 months ago by marcin

  • Resolution set to complete
  • Status changed from reviewing to closed

Per our agreement on Kea call on May 2nd it is now merged with commit 04d6fb0a0ac5b9dff2a02764cc9265f9a2a05ae8

Note: See TracTickets for help on using tickets.