Opened 6 years ago

Closed 4 years ago

#1537 closed defect (wontfix)

Handle error receiving file descriptors

Reported by: shane Owned by:
Priority: medium Milestone: Remaining BIND10 tickets
Component: Unclassified Version: bind10-old
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Medium
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I was seeing this in my log file:

2012-01-02 16:02:53.168 ERROR [b10-xfrout.xfrout] XFROUT_RECEIVE_FILE_DESCRIPTOR_ERROR error receiving the file descriptor for an XFR connection
2012-01-02 16:02:53.168 ERROR [b10-xfrout.xfrout] XFROUT_RECEIVE_FILE_DESCRIPTOR_ERROR error receiving the file descriptor for an XFR connection
2012-01-02 16:02:53.168 ERROR [b10-xfrout.xfrout] XFROUT_RECEIVE_FILE_DESCRIPTOR_ERROR error receiving the file descriptor for an XFR connection

Also, b10-xfrout then uses 100% of CPU once this condition occurs.

I discovered that this ultimately comes from fd_share.cc:

    const int cc = recvmsg(sock, &msghdr, 0);
    if (cc <= 0) {
        free(msghdr.msg_control);
        if (cc == 0) {
            errno = ECONNRESET;
        }
        return (FD_SYSTEM_ERROR);
    }

Looking via strace I find:

select(13, [9 12], [], [], NULL)        = 1 (in [12])
recvmsg(12, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0, msg_flags=0}, 0) = 0

My guess is that is what is happening is that the process on the other side of the Unix domain socket has closed the connection (perhaps due to dying), and that the xfrout gets stuck in a loop.

What I think we should do is:

  1. Check for this condition everywhere in the code and re-connect (or error in some meaningful way) when we discover it.
  2. Update the documentation to specify that this is necessary.

Subtickets

Change History (5)

comment:1 Changed 6 years ago by jreed

This was discussed in jabber a couple weeks ago.

I saw the same during actual shutdown. In my case, I received earlier:

INFO  [b10-xfrout.xfrout] XFROUT_RECEIVED_SHUTDOWN_COMMAND shutdown command received

This appears to cause intermittent systest failures.

comment:2 Changed 6 years ago by shane

  • Milestone changed from New Tasks to Year 3 Task Backlog

comment:3 Changed 6 years ago by jinmei

Shane: The xfrout problem should have been resolved in (#988).
But are you actually requesting a comprehensive check throughout the
source tree that may have the same issue?

comment:4 Changed 4 years ago by tomek

  • Milestone set to Remaining BIND10 tickets

comment:5 Changed 4 years ago by tomek

  • Resolution set to wontfix
  • Status changed from new to closed
  • Version set to old-bind10

This issue is related to bind10 code that is no longer part of Kea.

If you are interested in BIND10/Bundy framework or its DNS components,
please check http://bundy-dns.de.

Closing ticket.

Note: See TracTickets for help on using tickets.