Opened 4 months ago

Last modified 4 days ago

#5439 assigned enhancement

parallel tests

Reported by: fdupont Owned by: stephen
Priority: medium Milestone: Kea1.4
Component: Unclassified Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description (last modified by fdupont)

make distcheck does not work well when launched in parallel (e.g. with -j 9) because unit tests and system tests conflicts. In bug RT 46648 for a different reason a parallel has to be disabled, unfortunately there is no easy and portable way to do this.
In fact now I think we took the issue by the wrong side: we should try to use the parallel test harness instead of forcing the sequential one which is not recommended.
Note it is more a topic for QA so if someone from QA can give his opinion?


Change History (6)

comment:1 Changed 3 months ago by fdupont

  • Description modified (diff)

comment:2 Changed 2 months ago by stephen

  • Owner set to stephen
  • Status changed from new to accepted

comment:3 Changed 8 weeks ago by fdupont

Note that bind9 and ISC DHCP were bound to an old version of ATF: it was a nightmare to switch to a recent version.
In conclusion IMHO it is a bad idea to stay with a test harness which is not recommended and clearly will be no longer supported at the next update (even when we still wait for this update for some years the number of pending bugs become high so it will happen).

And if make distcheck was fast we would use it more often and catch some bugs quicker...

comment:4 Changed 7 weeks ago by tomek

  • Milestone changed from Kea-proposed to Kea1.4
  • Status changed from accepted to assigned

Accepting in 1.4 as medium. Stephen volunteered to take care of this ticket. Thank you kindly.

comment:5 Changed 4 days ago by stephen

Started looking at this last month, but got side-tracked and forgot to update the ticket. From my notes:

1) I configured Kea and built it via "make -j 9 distcheck". As far as I could see, all the tests passed, whereas Francis reported that the unit and system tests conflict. The only worrying bit was during distcheck's installation of the tarball to the temporary directory, which generated messages like:

libtool: warning:
has not been installed in '/Users/Stephen/repo/kea/kea-1.3.0-git/_inst/lib'

However, the distcheck finished successfully, reporting no errors.

2) During discussion with Francis, he mentioned that he got the following output from the tests:

[ RUN      ] CtrlAgentControllerTest.noListenerChange
2018-02-08 22:08:38.179 INFO  [kea.ctrl-agent/67805] CTRL_AGENT_STARTED
Kea Control Agent version 1.3.0-git started
2018-02-08 22:08:38.720 INFO  [kea.dctl/67805] DCTL_SHUTDOWN
Control-agent has shut down, pid: 67805, version: 1.3.0-git Failure
      Expected: exp_socket_name
      Which is: "/second/dhcp4/socket"
To be equal to: sock_info->get("socket-name")->stringValue()
      Which is: "/first/dhcp4/socket" Failure
      Expected: exp_socket_name
      Which is: "/second/dhcp6/socket"
To be equal to: sock_info->get("socket-name")->stringValue()
      Which is: "/first/dhcp6/socket"
[  FAILED  ] CtrlAgentControllerTest.noListenerChange (751 ms)

When I ran it, I got:

[ RUN      ] CtrlAgentControllerTest.noListenerChange
2018-02-09 16:12:01.730 INFO  [kea.ctrl-agent/14654] CTRL_AGENT_STARTED
Kea Control Agent version 1.3.0-git started
2018-02-09 16:12:01.926 INFO  [kea.dctl/14654]
DCTL_CFG_FILE_RELOAD_SIGNAL_RECVD OS signal 1 received, reloading
configuration from file: d2-test-config.json
2018-02-09 16:12:01.926 INFO  [kea.ctrl-agent/14654]
2018-02-09 16:12:01.926 INFO  [kea.dctl/14654] DCTL_CONFIG_COMPLETE
server has completed configuration: listening on, port 8081,
control sockets: dhcp4 dhcp6, 0 lib(s):
2018-02-09 16:12:02.224 INFO  [kea.dctl/14654] DCTL_SHUTDOWN
Control-agent has shut down, pid: 14654, version: 1.3.0-git
[       OK ] CtrlAgentControllerTest.noListenerChange (502 ms)

In other words, Francis seemed to be seeing an immediate shutdown of the control agent after it starts up, whereas the successful test outputs a couple of additional messages.

The interesting point is the timing. In the successful case, there is 494 ms between the CTRL_AGENT_STARTED and the DCTL_SHUTDOWN messages. My reading of the code suggests that this is approximately the time that the dummy server should run for (500 ms). In that interval, the reload message is printed 194 ms after the start message (the code schedules a SIGHUP 200 ms after the start of the run) and the message about binding to port 8081 is printed at the same time.

In Francis's case, the CTRL_AGENT_STARTED and the DCTL_SHUTDOWN messages are 541 ms apart, but in that interval apparently no SIGHUP is received. Somewhere, the SIGHUP has got lost.

This could be a race condition in the code, but it also could be due to the heavy load.

comment:6 Changed 4 days ago by fdupont

A possible source of problems (and not only for parallel tests) is the -DBOOST_ASIO_DISABLE_THREADS=1 left in

Note: See TracTickets for help on using tickets.