Opened 8 months ago

Closed 7 months ago

#5217 closed defect (fixed)

HttpListenerTest fail on various systems

Reported by: wlodekwencel Owned by: marcin
Priority: medium Milestone: Kea1.2-final
Component: Unclassified Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 3
Total Hours: 3 Internal?: no

Description

Jenkins report:
https://jenkins.isc.org/job/Kea-multiconfig-build/138/testReport/

Tests like:

OS: Debian 7 amd64
CONF: ./configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost/include --with-boost-libs-dir=/opt/boost
BOOST: 1.63
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=debian7-amd64-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=debian7-amd64-1/console

OS: debian 7 amd64
CONF: ./configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost158/include --with-boost-libs-dir=/opt/boost158
BOOST: 1.58
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost158,WERROR=--with-werror=no,slaves=debian7-amd64-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost158,WERROR=--with-werror=no,slaves=debian7-amd64-1/console

OS: ubuntu 12.04.5 i686
CONF: ./configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost/include --with-boost-libs-dir=/opt/boost158
BOOST: 1.63
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=ubuntu12-i686-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=ubuntu12-i686-1/consoleFull

OS: Fedora 25 i686
CONF: ./configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost/include --with-boost-libs-dir=/opt/boost
BOOST: 1.63
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=fedora25-i686-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=fedora25-i686-1/console

OS: Fedora 25 i686
CONF: /configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost158/include --with-boost-libs-dir=/opt/boost158
BOOST: 1.58
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost158,WERROR=--with-werror=no,slaves=fedora25-i686-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost158,WERROR=--with-werror=no,slaves=fedora25-i686-1/console

OS: Fedora 25 x86_64
CONF: ./configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost/include --with-boost-libs-dir=/opt/boost
BOOST: 1.63
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=fedora25-x86-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost,WERROR=--with-werror=no,slaves=fedora25-x86-1/console

OS: Fedora 25 x86_64
CONF: ./configure --with-gtest-source=/opt/googletest --with-log4cplus=/opt/log4cplus --disable-silent-rules --with-werror=no --with-boost-include=/opt/boost158/include --with-boost-libs-dir=/opt/boost158
BOOST: 1.58
Test report: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost158,WERROR=--with-werror=no,slaves=fedora25-x86-1/testReport/
Console output: https://jenkins.isc.org/job/Kea-multiconfig-build/138/BOOST=boost158,WERROR=--with-werror=no,slaves=fedora25-x86-1/console

Subtickets

Change History (9)

comment:1 Changed 8 months ago by marcin

  • Owner set to marcin
  • Status changed from new to accepted

comment:2 Changed 8 months ago by marcin

  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to reviewing

I created a fix for the HttpListener tests but also for the TCPAcceptor tests and TCPSocket sequence test. Note that the last one had been failing on Debian for quite a long time. The fix is exactly the same in all cases.

When the client connects to a remote endpoint over TCP one of the errors it may get is EINPROGRESS, which has the following documented meaning:

[EINPROGRESS] The socket is non-blocking and the connection cannot be
completed immediately. It is possible to select(2) for completion by selecting
the socket for writing.

The way I understand it is that the connection is not established but one can try to use the socket and it will either succeed or may fail, depending on the reason why this error has been returned. In general, the connection can't be established immediately. There are documented ways of dealing with this situation using synchronous connect, reads and writes. In the case of async_connect it is somewhat unclear to me why such error would ever be returned, rather than execution of the handler be delayed until the connection is finally established. But, this is what is happening on the Debian. The handler is invoked and indicates this error. Because it doesn't have to indicate a fatal error in I think it makes sense to simply continue with async_read/async_write using this socket. If this turns out to be a fatal error, both async_read and write will fail anyway, triggering a test failure.

So far, all my experiments on Debian 7 demonstrate that we better ignore EINPROGRESS error and continue with the tests. All fixed tests now consistently pass on Debian.

Proposed ChangeLog entry:

12XX.	[bug]		marcin
	Corrected failing HttpListener, TCPAcceptor and TCPSocket
	unit tests on Debian.
	(Trac #5217, git abcd)

comment:3 Changed 8 months ago by wlodekwencel

tested on jenkins, fix works for Debian systems

comment:4 Changed 7 months ago by tomek

  • Milestone changed from Kea1.2-final to Kea1.3

As discussed on 2017-04-13 Kea call, moving all low review tickets that have not started review yet to Kea1.3.

comment:5 Changed 7 months ago by marcin

  • Milestone changed from Kea1.3 to Kea1.2-final

Moving it back to 1.2-final on the grounds it is not the low ticket as previous comment says. Also, it applies critical fixes to the new code. Without those fixes we'd see crashes and other unwanted behaviors. Finally, this work has started prior to the call when we discussed this.

comment:6 Changed 7 months ago by marcin

This ticket now includes fixes for both #5217 and #5218, as it turned out to be hard to decouple all these fixes. Some of the issues seen on Ubuntu 16 turned out to be reproducible on Fedora etc. So, I combined the fixes and the review of this ticket will also cover the review of #5218.

There is a single branch on which the code should be reviewed. This is: trac5217a!

Note that we have another ticket #5189 which will further improve the situation because rather than problematic synchronous writes and receive we'd use asynchronous calls.

With the fixes applied here we should NOT see:

  • test failures on Debian,
  • test failures in HTTP library on Fedora and crashes in agent tests on Fedora,
  • crashes on Ubuntu 16 in agent tests

The ChangeLog entry should also be different:

12XX.	[bug]		marcin
	Corrected numerous failing and crashing tests for libkea-asiolink, libkea-http
	and kea-ctrl-agent on Debian, Fedora and Ubuntu.
	(Trac #5217, git abcd)

comment:7 Changed 7 months ago by tomek

  • Owner changed from UnAssigned to tomek

comment:8 Changed 7 months ago by tomek

  • Add Hours to Ticket changed from 0 to 3
  • Owner changed from tomek to marcin
  • Total Hours changed from 0 to 3

Code reviewed. Your changes are good.

Nice trick with enable_shared_from_this.

I admit I don't understand why some systems return inprogress error when connecting. But there's probably nothing we can do about it. At least not in the short time frame we have before 1.2.
But it bothers me. Let's take UnixDomainSocket::connect() as an example. It ignores ec, if its
value is inprogress. Shouldn't the code wait briefly and check again that the connection actually succeeded? If you think it's a good idea, please add @todo in the code.

The code compiles and unit-tests pass on Ubuntu 16.04 x64.

The changelog entry needs some improvement as the changes were not only in unit-tests. How about this:

12XX.	[bug]		marcin
	Improved socket connection handling code, corrected numerous failing and crashing 
        tests for libkea-asiolink, libkea-http and kea-ctrl-agent on Debian, Fedora and Ubuntu.
	(Trac #5217, git abcd)

I corrected one minor typo. Please pull.

Last edited 7 months ago by tomek (previous) (diff)

comment:9 Changed 7 months ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged with commit 4bcb45f0c88aba3d0f70ca48d9fff6f1d4616bc2

Note: See TracTickets for help on using tickets.