Opened 11 months ago

Closed 9 months ago

#5107 closed task (complete)

Create Control Agent HttpResponseCreator

Reported by: marcin Owned by: marcin
Priority: medium Milestone: Kea1.2
Component: remote-management Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Mozilla Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 1
Total Hours: 10 Internal?: no

Description

Control Agent needs its own derivation of the BaseCommandMgr to handle some of the (HTTP) commands and route some commands to the Kea services. See the http://kea.isc.org/wiki/ControlAPIDesign for the details.

Subtickets

Change History (11)

comment:1 Changed 10 months ago by marcin

  • Owner set to marcin
  • Status changed from new to accepted

comment:2 Changed 9 months ago by marcin

  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to reviewing

This ticket is now ready for a review:

12XX.   [func]         marcin
	Control Agent uses libkea-http to process commands over
	the REST interface.
        (Trac #5107, git abcd)

comment:3 Changed 9 months ago by tomek

  • Sub-Project changed from DHCP to Mozilla

comment:4 Changed 9 months ago by tomek

  • Owner changed from UnAssigned to tomek

comment:5 follow-up: Changed 9 months ago by tomek

  • Add Hours to Ticket changed from 0 to 7
  • Total Hours changed from 0 to 7

When I run the tests on Ubuntu 16.04.1 x64, the following unit-tests
failed:

[ FAILED ] 4 tests, listed below:
[ FAILED ] CtrlAgentControllerTest?.launchNormalShutdown
[ FAILED ] CtrlAgentControllerTest?.sigintShutdown
[ FAILED ] CtrlAgentControllerTest?.sigtermShutdown
[ FAILED ] CtrlAgentProcessTest?.shutdown

Exact errors are below:

[ RUN      ] CtrlAgentControllerTest.launchNormalShutdown
2017-02-24 12:59:50.555 INFO  [kea.ctrl-agent/22075] CTRL_AGENT_STARTED Kea Control Agent version 1.1.0-git started
2017-02-24 12:59:50.556 FATAL [kea.ctrl-agent/22075] CTRL_AGENT_FAILED application experienced a fatal error: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
2017-02-24 12:59:50.556 FATAL [kea.dctl/22075] DCTL_PROCESS_FAILED Control-agent application execution failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
unknown file: Failure
C++ exception with description "Application process event loop failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use" thrown in the test body.
[  FAILED  ] CtrlAgentControllerTest.launchNormalShutdown (1 ms)
[ RUN      ] CtrlAgentControllerTest.sigintShutdown
2017-02-24 12:59:50.556 INFO  [kea.ctrl-agent/22075] CTRL_AGENT_STARTED Kea Control Agent version 1.1.0-git started
2017-02-24 12:59:50.556 FATAL [kea.ctrl-agent/22075] CTRL_AGENT_FAILED application experienced a fatal error: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
2017-02-24 12:59:50.556 FATAL [kea.dctl/22075] DCTL_PROCESS_FAILED Control-agent application execution failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
unknown file: Failure
C++ exception with description "Application process event loop failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use" thrown in the test body.
[  FAILED  ] CtrlAgentControllerTest.sigintShutdown (0 ms)
[ RUN      ] CtrlAgentControllerTest.sigtermShutdown
2017-02-24 12:59:50.556 INFO  [kea.ctrl-agent/22075] CTRL_AGENT_STARTED Kea Control Agent version 1.1.0-git started
2017-02-24 12:59:50.556 FATAL [kea.ctrl-agent/22075] CTRL_AGENT_FAILED application experienced a fatal error: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
2017-02-24 12:59:50.557 FATAL [kea.dctl/22075] DCTL_PROCESS_FAILED Control-agent application execution failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
unknown file: Failure
C++ exception with description "Application process event loop failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use" thrown in the test body.
[  FAILED  ] CtrlAgentControllerTest.sigtermShutdown (1 ms)
[----------] 6 tests from CtrlAgentControllerTest (3 ms total)

What I discovered is that the unit-tests fail when being run in parallel (make check -j9) and pass when run without it (make check). I vaguely recall that Francis or Wlodek did something with the unit-tests not being run in parallel, regardless of the switch.


connection.h
getRemoteEndpointAddressAsText() was not documented. Added. Please pull and review.

response_unittests.cc
Updated copyright year.

response_creator_unittests.cc
Are you sure that Content-Length: 0 is valid? I'm no HTTP expert by any means, but I looked at RFC2616, section 9.2 and that seem to be the only case that allows Content-Length to be zero (when generating empty response to OPTIONS method).

ctrl_agent_process.cc
The code added in CtrlAgentProcess::run() could use some comments. I have tried to provide such comments. Please pull and tweak as needed. My understanding of the control agent is very rudimentary.

ctrl_agent_controller.cc
Copyright year updated. Also, the agent_app_name_ change is identical with the one done on #5134.

ctrl_agent_command_mgr.cc
I have mixed feelings regarding the hierarchy of classes. This functionality is very useful, no question about it. But I think HookedCommandMgr? and BaseCommandMgr? could be collapsed into one class, so DHCP and soon D2 could take advantage of that. We want the ability to add extra commands with hook libraries everywhere, right? This is just a suggestion to consider. If you think it's worthy to follow, then it's probably out of scope for this ticket anyway...

CtrlAgentCommandMgr::handleCommand is implemented incorrectly in my opinion. I think it should not wrap the response in a list. I can speculate that you want to have a command that would retrieve
something from multiple servers, but that's not the right way to do it. All commands we have so far return a single response. If you want a command to have multiple responses, there should be a single
response with overall status and specific responses stuffed into params (note that params may be a list).

I wrote 2 unit-tests that check if this behavior is followed. They're currently failing, because that extra list. It's up to whether you want to update the handleCommand or the unit-tests, but I strongly recommend to fix the handleCommand code.

As a side note, to be able to test this behavior I made the handleCommand method public. Hope that's ok.

Makefile.am
agent_lexer.ll should not be part of the libagent_la_SOURCES, it should be in EXTRA_DIST. Corrected as needed. (This is still broken in Dhcp4 and Dhcp6, so don't use them as examples).

Files in src/bin/agent/tests
Those file names are definitely too long. It's clear that those are related to control agent, because they're in src/bin/agent directory. Can you rename them to something shorter? Removing ctrl_agent_ prefix seems like a good way to achieve that.

ctrl_agent_response_creator_unittest.cc
Fixed couple typos and two compilation warnings. Parameters command_name and command_arguments passed in lines 107,108 are not used, which cause the unused parameter warning. Fixed this. Please pull and review (there may be a doxygen warning, because I left the doxygen description of those parameters intact).


Your proposed ChangeLog? entry is correct, but what's the proper way to refer to the interface: REST interface or RESTful interface?

comment:6 follow-up: Changed 9 months ago by fdupont

We force in configure unit tests to be run sequentially. BTW this does not work with submodules (e.g. legal logging).

comment:7 in reply to: ↑ 6 Changed 9 months ago by tomek

Replying to fdupont:

We force in configure unit tests to be run sequentially. BTW this does not work with submodules (e.g. legal logging).

Agent is not a submodule, it's src/bin/agent. Nevertheless the enforcement does not work.

comment:8 Changed 9 months ago by tomek

  • Owner changed from tomek to marcin

comment:9 in reply to: ↑ 5 ; follow-up: Changed 9 months ago by marcin

  • Add Hours to Ticket changed from 7 to 2
  • Owner changed from marcin to tomek
  • Total Hours changed from 7 to 9

Replying to tomek:

When I run the tests on Ubuntu 16.04.1 x64, the following unit-tests
failed:

[ FAILED ] 4 tests, listed below:
[ FAILED ] CtrlAgentControllerTest?.launchNormalShutdown
[ FAILED ] CtrlAgentControllerTest?.sigintShutdown
[ FAILED ] CtrlAgentControllerTest?.sigtermShutdown
[ FAILED ] CtrlAgentProcessTest?.shutdown

Exact errors are below:

[ RUN      ] CtrlAgentControllerTest.launchNormalShutdown
2017-02-24 12:59:50.555 INFO  [kea.ctrl-agent/22075] CTRL_AGENT_STARTED Kea Control Agent version 1.1.0-git started
2017-02-24 12:59:50.556 FATAL [kea.ctrl-agent/22075] CTRL_AGENT_FAILED application experienced a fatal error: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
2017-02-24 12:59:50.556 FATAL [kea.dctl/22075] DCTL_PROCESS_FAILED Control-agent application execution failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
unknown file: Failure
C++ exception with description "Application process event loop failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use" thrown in the test body.
[  FAILED  ] CtrlAgentControllerTest.launchNormalShutdown (1 ms)
[ RUN      ] CtrlAgentControllerTest.sigintShutdown
2017-02-24 12:59:50.556 INFO  [kea.ctrl-agent/22075] CTRL_AGENT_STARTED Kea Control Agent version 1.1.0-git started
2017-02-24 12:59:50.556 FATAL [kea.ctrl-agent/22075] CTRL_AGENT_FAILED application experienced a fatal error: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
2017-02-24 12:59:50.556 FATAL [kea.dctl/22075] DCTL_PROCESS_FAILED Control-agent application execution failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
unknown file: Failure
C++ exception with description "Application process event loop failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use" thrown in the test body.
[  FAILED  ] CtrlAgentControllerTest.sigintShutdown (0 ms)
[ RUN      ] CtrlAgentControllerTest.sigtermShutdown
2017-02-24 12:59:50.556 INFO  [kea.ctrl-agent/22075] CTRL_AGENT_STARTED Kea Control Agent version 1.1.0-git started
2017-02-24 12:59:50.556 FATAL [kea.ctrl-agent/22075] CTRL_AGENT_FAILED application experienced a fatal error: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
2017-02-24 12:59:50.557 FATAL [kea.dctl/22075] DCTL_PROCESS_FAILED Control-agent application execution failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use
unknown file: Failure
C++ exception with description "Application process event loop failed: Process run method failed: unable to setup TCP acceptor for listening to the incoming HTTP requests: bind: Address already in use" thrown in the test body.
[  FAILED  ] CtrlAgentControllerTest.sigtermShutdown (1 ms)
[----------] 6 tests from CtrlAgentControllerTest (3 ms total)

What I discovered is that the unit-tests fail when being run in parallel (make check -j9) and pass when run without it (make check). I vaguely recall that Francis or Wlodek did something with the unit-tests not being run in parallel, regardless of the switch.

If I understand correctly, this error is not specific to the agent code and it has nothing to do with this ticket. I personally never use "-j" when running unit tests so this is probably why I never hit the problem. If we think there is a regression in the code causing the enforcement to not take effect, it probably warrants another (low priority) ticket?


connection.h
getRemoteEndpointAddressAsText() was not documented. Added. Please pull and review.

Thanks, your changes look fine.

response_unittests.cc
Updated copyright year.

Thanks for that.

response_creator_unittests.cc
Are you sure that Content-Length: 0 is valid? I'm no HTTP expert by any means, but I looked at RFC2616, section 9.2 and that seem to be the only case that allows Content-Length to be zero (when generating empty response to OPTIONS method).

In this particular case we set it to 0 to pass sanity checks on the request message that the server performs. The server requires Content-Length for the JSON messages, regardless of the value. As to whether the value of 0 is ok or not, I think the server should be liberal and accept it. In many cases the request doesn't have any content, e.g. when issuing a GET command which carries parameters in the URI. I don't think that any of the HTTP servers would make an issue with receiving GET with Content-Length of 0. It would simply discard such header. In the case of these tests I could also set it to 100, or 10101. It doesn't matter because the Content-Length is really only used to determine how to extract the body from the HTTP message. The test doesn't peform such parsing. It merely sets the parsed values in the context and attempts to create a (parsed) Request object from it. This would however fail if we didn't put the Content-Length (with any value).

ctrl_agent_process.cc
The code added in CtrlAgentProcess::run() could use some comments. I have tried to provide such comments. Please pull and tweak as needed. My understanding of the control agent is very rudimentary.

Thanks. The comments look good to me.

ctrl_agent_controller.cc
Copyright year updated. Also, the agent_app_name_ change is identical with the one done on #5134.

Thanks.

ctrl_agent_command_mgr.cc
I have mixed feelings regarding the hierarchy of classes. This functionality is very useful, no question about it. But I think HookedCommandMgr? and BaseCommandMgr? could be collapsed into one class, so DHCP and soon D2 could take advantage of that. We want the ability to add extra commands with hook libraries everywhere, right? This is just a suggestion to consider. If you think it's worthy to follow, then it's probably out of scope for this ticket anyway...

There are three different cases when you want to use Command Manager:

  • In the code that needs hooks but doesn't need unix socket for receiving commands: Control Agent
  • In the code that needs hooks and needs unix sockets for receiving commands: DHCP servers, D2
  • In hooks libraries which provide their own implementation for some specific commands.

In the last case you run an instance of command manager within a hooks library, thus you don't need/want functionality provided by the HookedCommandMgr. Hooks libraries should derive from BaseCommandMgr, as they also don't need unix sockets.

The middle case needs hooks and unix sockets so it should use the CommandMgr which derives from HookedCommandMgr. Finally, the first case needs hooks but no sockets, so it derives from HookedCommandMgr. As you see, all classes are derived from in specific cases.

CtrlAgentCommandMgr::handleCommand is implemented incorrectly in my opinion. I think it should not wrap the response in a list. I can speculate that you want to have a command that would retrieve
something from multiple servers, but that's not the right way to do it. All commands we have so far return a single response. If you want a command to have multiple responses, there should be a single
response with overall status and specific responses stuffed into params (note that params may be a list).

The handleCommand is implemented according to a design. So, if "incorrectly", it is an incorrect design rather than implementation. We started discussion about this last Friday and I made a point during this discussion that CA is talking to many servers. Currently, we're not prepared to run multiple DHCP servers of the same type simultaneously. But, in the future we should be able to use CA to control multiple DHCPv4 servers on the same system. Each of these servers responds to CA in the format defined for a single server. What should CA do with multiple responses containing: "result" and "error" etc? You're suggesting combining answers within a single answer, but that has certain implications:

  • there is only one "result" field which has to represent multiple results and it is unclear what the "result" should be set to if some commands passed and some failed.
  • you want to be able to identify the server(s) which is/are returning error status and may want to perform some specific actions, so you need a pid of the failing server returned somewhere.
  • By stuffing responses within other responses we actually complicate the data structures.

The CA was meant to mostly provide commands/responses forwarding function and the interpretation of the responses was meant to belong to the controlling client. When we start combining the data we slow the CA and we introduce ambiguity in what the global "result" means. I think it is worth to bring an example of the IA status code processing in DHCPv6, where there is a global status code and the status codes for specific IAs. I recall we have modified the RFC3315 to say that the client should explicitly look into status results for particular IAs and the meaning of the global status code was sort of deprecated. I think we should follow the same direction here.

Personally, I don't see any issue with the response being a list. Even more, I think it is the cleanest, least intrusive solution, most efficient, and conveying the exact information about the configuration result.

I wrote 2 unit-tests that check if this behavior is followed. They're currently failing, because that extra list. It's up to whether you want to update the handleCommand or the unit-tests, but I strongly recommend to fix the handleCommand code.

I fixed the tests, according to what I said above.

As a side note, to be able to test this behavior I made the handleCommand method public. Hope that's ok.

It would be probably better to make it protected but I can live with that too.

Makefile.am
agent_lexer.ll should not be part of the libagent_la_SOURCES, it should be in EXTRA_DIST. Corrected as needed. (This is still broken in Dhcp4 and Dhcp6, so don't use them as examples).

Thanks.

Files in src/bin/agent/tests
Those file names are definitely too long. It's clear that those are related to control agent, because they're in src/bin/agent directory. Can you rename them to something shorter? Removing ctrl_agent_ prefix seems like a good way to achieve that.

I don't disagree. They are indeed long. I stated in the review of #5034 that I am not terribly pleased with mass renaming files while there are other tickets touching the same Makefiles. This usually result in annoying conflicts and makes it easy to miss some files which, in case of unit tests, may not be discovered because unit tests do compile without other unit tests. I'd prefer do the refactoring as a separate ticket when other work is done.

I can think of 3 choices:

  • remove ctrl_agent_ prefix (least preferred)
  • Leave agent_ prefix (less preferred)
  • Add ca_ prefix instead of ctrl_agent_ (most preferred)

My preferences are provided in parens.

ctrl_agent_response_creator_unittest.cc
Fixed couple typos and two compilation warnings. Parameters command_name and command_arguments passed in lines 107,108 are not used, which cause the unused parameter warning. Fixed this. Please pull and review (there may be a doxygen warning, because I left the doxygen description of those parameters intact).

Warnings aren't generated for unit tests I believe.


Your proposed ChangeLog? entry is correct, but what's the proper way to refer to the interface: REST interface or RESTful interface?

People in the web point out that REST usually refers to a concept, whereas RESTful is an implementation. So in our case it should probably be RESTful.

comment:10 in reply to: ↑ 9 Changed 9 months ago by tomek

  • Add Hours to Ticket changed from 2 to 1
  • Owner changed from tomek to marcin
  • Total Hours changed from 9 to 10

Thanks for the answers. Left only those that I'd like to comment on.

Replying to marcin:

Replying to tomek:

What I discovered is that the unit-tests fail when being run in parallel (make check -j9) and pass when run without it (make check). I vaguely recall that Francis or Wlodek did something with the unit-tests not being run in parallel, regardless of the switch.

If I understand correctly, this error is not specific to the agent code and it has nothing to do with this ticket. I personally never use "-j" when running unit tests so this is probably why I never hit the problem. If we think there is a regression in the code causing the enforcement to not take effect, it probably warrants another (low priority) ticket?

Agent unit-tests are the only ones that fail. I suppose we can go with running the unit-tests without -j.

CtrlAgentCommandMgr::handleCommand is implemented incorrectly in my opinion. I think it should not wrap the response in a list. I can speculate that you want to have a command that would retrieve
something from multiple servers, but that's not the right way to do it. All commands we have so far return a single response. If you want a command to have multiple responses, there should be a single
response with overall status and specific responses stuffed into params (note that params may be a list).

The handleCommand is implemented according to a design. So, if "incorrectly", it is an incorrect design rather than implementation. We started discussion about this last Friday and I made a point

during this discussion that CA is talking to many servers. Currently, we're not prepared to run multiple DHCP servers of the same type simultaneously. But, in the future we should be able to use CA to control multiple DHCPv4 servers on the same system. Each of these servers responds to CA in the format defined for a single server. What should CA do with multiple responses containing: "result" and "error" etc? You're suggesting combining answers within a single answer, but that has certain implications:

  • there is only one "result" field which has to represent multiple results and it is unclear what the "result" should be set to if some commands passed and some failed.
  • you want to be able to identify the server(s) which is/are returning error status and may want to perform some specific actions, so you need a pid of the failing server returned somewhere.
  • By stuffing responses within other responses we actually complicate the data structures.

The CA was meant to mostly provide commands/responses forwarding function and the interpretation of the responses was meant to belong to the controlling client. When we start combining the data we slow the CA and we introduce ambiguity in what the global "result" means. I think it is worth to bring an example of the IA status code processing in DHCPv6, where there is a global status code and the status codes for specific IAs. I recall we have modified the RFC3315 to say that the client should explicitly look into status results for particular IAs and the meaning of the global status code was sort of deprecated. I think we should follow the same direction here.

Personally, I don't see any issue with the response being a list. Even more, I think it is the cleanest, least intrusive solution, most efficient, and conveying the exact information about the configuration result.

Thanks for that explanation. I still think there's some value in using the same format in Control API (as documented here: https://jenkins.isc.org/job/Kea_doc/guide/kea-guide.html#ctrl-channel) and in the Control Agent. Since this is a matter of preference, I don't want to hold this ticket. Since all other issues have been addressed, please go ahead and merge it. At the same time, I'll start a discussion on kea-dev explaining two possible choices here. If the outcome of that decision is to change the syntax, we will update it in a follow up ticket.

Files in src/bin/agent/tests
Those file names are definitely too long. It's clear that those are related to control agent, because they're in src/bin/agent directory. Can you rename them to something shorter? Removing ctrl_agent_ prefix seems like a good way to achieve that.

I don't disagree. They are indeed long. I stated in the review of #5034 that I am not terribly pleased with mass renaming files while there are other tickets touching the same Makefiles. This usually result in annoying conflicts and makes it easy to miss some files which, in case of unit tests, may not be discovered because unit tests do compile without other unit tests. I'd prefer do the refactoring as a separate ticket when other work is done.

I can think of 3 choices:

  • remove ctrl_agent_ prefix (least preferred)
  • Leave agent_ prefix (less preferred)
  • Add ca_ prefix instead of ctrl_agent_ (most preferred)

That's a good plan. D2 uses d2_ prefix, so ca_ will play along well.


Ok, it seems the only thing left to do is to rename the files to use ctrl_agent_ and this code will be ready for merge.

comment:11 Changed 9 months ago by marcin

  • Resolution set to complete
  • Status changed from reviewing to closed

Merged with commit 88ce715926a46b6b3832630116fc7782adc46c7b

Note: See TracTickets for help on using tickets.