Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1704 closed defect (fixed)

log output mixed

Reported by: jreed Owned by: muks
Priority: medium Milestone: Sprint-20120612
Component: logging Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 15 Add Hours to Ticket: 0
Total Hours: 12.25 Internal?: no

Description

I continue to get some log output interspersed. Here are examples from two different runs today:

2012-02-22 11:11:58.238 INFO  [2012-02-22 11:11:58.238b10-xfrout.xfrout ] INFOXFROUT_RECEIVED_SHUTDOWN_COMMAND shutdown command received 
 [b10-boss.boss] BIND10_COMPONENT_STOP component b10-zonemgr is being stopped

and

2012-02-22 11:20:33.287 INFO  [b10-auth.auth] AUTH_SERVER_CREATED server created
2012-02-22 11:20:33.287 DEBUG2012-02-22 11:20:33.287 DEBUG [b10-auth.auth] AUTH_DNS_SERVICES_CREATED DNS services created
 [b10-auth.cc2012-02-22 11:20:33.287 DEBUG [b10-auth.auth] AUTH_CONFIG_CHANNEL_CREATED configuration session channel created

No logging configuration is in place. This is on CentOS 5.3 Linux.

I guess fix in #1405 didn't help. Also related maybe is #1189.

Subtickets

Attachments (1)

log4cplus-1.0.4-clog.patch (1.7 KB) - added by muks 6 years ago.
Patch for log4cplus to use clog instead of cerr, which writes to stderr and is buffered

Download all attachments as: .zip

Change History (72)

comment:1 Changed 6 years ago by shane

  • Milestone New Tasks deleted

comment:2 Changed 6 years ago by jreed

Here is another example on OpenBSD:

2012-02-29 15:27:33.6812012-02-29 15:27:33.689 DEBUG [b10-auth.cc] CC_ESTABLISH trying to establish connection with message queue daemon at /home/jreed/src/bind10/msgq_socket
2012-02-29 15:27:33.690 DEBUG [b10-auth.cc] CC_ESTABLISHED successfully connected to message queue daemon
 DEBUG [b10-stats-httpd.stats-httpd] STATHTTPD_STARTING_CC_SESSION starting cc session
2012-02-29 15:27:37.6862012-02-29 15:27:37.687 ERROR [b10-zonemgr.zonemgr] ZONEMGR_CCSESSION_ERROR command channel session error: No answer from ConfigManager when asking about Remote module Auth
2012-02-29 15:27:37.697  ERRORINFO [ b10-cmdctl.cmdctl [] b10-boss.bossCMDCTL_COMMAND_ERROR error in command get_module_spec to module ConfigManager: Module 'ConfigManager' not responding] 
BIND10_PROCESS_ENDED process 6469 of b10-zonemgr ended with status 0

So I am seeing noise on this on several platforms.

comment:3 Changed 6 years ago by jreed

  • Milestone set to Next-Sprint-Proposed

comment:4 Changed 6 years ago by jelte

  • Estimated Difficulty changed from 0 to 15

comment:5 follow-up: Changed 6 years ago by stephen

First off, what is the logging configuration for each of these problems? I'm guessing that this is logging to a file as syslog should force log messages to be sequential.

For the first and third examples, my guess would be that it is due to the Python logging to stdout and having no flush set. However, the second example - where messages from different components b10-auth appear to be interfering with each other - is a puzzler. If there is no separate configuration for the "auth" and "cc" logger in the b10-auth program, they should all be logging through the same stream: as the code is single-threaded there should be no chance of interference between two log messages. It makes me wonder if what we are seeing is some buffer corruption in the logger or log4cplus.

comment:6 Changed 6 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120403

comment:7 in reply to: ↑ 5 Changed 6 years ago by jreed

Replying to stephen:

First off, what is the logging configuration for each of these problems?

This is with default logging to the console.

comment:8 Changed 6 years ago by vorner

Hello

I believe I can guess the reason. If there's a lot of logging or long message, the IO library could buffer the messages a little bit ‒ especially if it is being output to a pipe or file through '>'. However, that could mean one process writes part of the message and wants to write the second part later (because the write call couldn't write everything at once, so it needed to retry). However, a different component have called its own write() in between and inserted the text in the middle. I heard this can be circumvented with files opened in „append“ mode and writing by blocks no bigger than some constant. But that is a magic.

So we either need to output to different files or have some kind of synchronisation (if that was the actual problem, but I can't imagine anything else). Or maybe logging through a different logger process that would mix it all together, but that would likely be slow.

comment:9 Changed 6 years ago by shane

On the Unix side, each call to write() is guaranteed to be atomic, at least as far as disk blocks go. I think if each write() is less than 512 bytes then we are safe, assuming buffering within the application is disabled. Using O_APPEND does indeed make you safe, since the OS also handles making sure the writes are at the end of the file. I don't know if this qualifies as "magic" or not.

comment:10 Changed 6 years ago by vorner

I don't think this is completely true. The write indeed is atomic. But imagine the kernel has a writeback buffer of certain size. The buffer is for example 4096 bytes long (well, it will be more, but anyway). And it so happened someone was busy logging and filled 4090 bytes already and the buffer was not flushed to disk yet, because it takes several milliseconds to seek there (or to send it over the ssh on a slow network, or whatever). So we now call our write with 10 bytes, but only 6 fit. So these 6 are put to the buffer (atomically) and the write returns immediately, doing a partial write. So we retry with the 4 bytes that we have left (and block). Once a 512 block of the buffer is written to the disk, and the kernel takes one of the writes now waiting, which might be the write of ours with 4 bytes or a completely different write with a complete message from someone else, puts it (atomically) to the end of the buffer and picks another write waiting.

The exact sizes can be expected to be much more on modern linux with huge amount of memory and SATA disks with several-megabytes of internal caches and 4k sized blocks, etc, which makes the situation less probable (which might be the reason I never saw it myself, or maybe because I don't store it to disk and my urxvtd is able to eat data really fast from the pipe), but it is said there are systems with less memory or older disks or different kernels or something…

The magic I have in mind (if it exists) was to prevent this situation somehow.

comment:11 Changed 6 years ago by stephen

I don't think this explains the second problem. There the messages are coming from b10-auth.auth and b10-auth.cc, which are two loggers in the same process. What should be happening (with the configuration left at the default) is that both messages are output to stderr (via a "cerr << message call). Even if one write does not completely output its internal buffer, the second write should empty the buffer before writing the second message.

comment:12 Changed 6 years ago by vorner

Hmm, if there are in the same process, than this indeed would not explain it. Is it possibly with multiple auth instances running? If not, then my explanation does not apply to it (but I believe the described problem does exist, no matter if there's another one or not).

comment:13 follow-up: Changed 6 years ago by stephen

A further question: does this mixing of output only happen with output going to the console? Do we get the same problem if output is sent to a file?

comment:14 in reply to: ↑ 13 Changed 6 years ago by jreed

Replying to stephen:

A further question: does this mixing of output only happen with output going to the console? Do we get the same problem if output is sent to a file?

On the n10 server I enabled file based logging yesterday. Prior to that I just saved the output and here are three broken examples (not saved by logging system):

2012-03-17 00:32:35.q DEBUG [b10-resolver.cache] 2012-03-17 00:32:35.q DEBUG [b10-auth.datasrc] DATASRC_CACHE_CREATE creating the hotspot cache

2012-02-29 20:46:29.q ERROR [b10-xfrout.xfrout2012-02-29 20:46:29.q INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 83491 of b10-stats-httpd ended with status 0

2012-03-16 19:53:52.q ERROR2012-03-16 19:53:52.q INFO  [b10-boss.boss] BIND10_PROCESS_ENDED process 80370 of b10-stats-httpd ended with status 0

I don't see any problems in past day, but I will check this again in a week to see.

comment:15 Changed 6 years ago by shane

So the mixing of these messages is not completely random. They appear to be in blocks. My suspicion is that the underlying Unix write() gets called for each element:

  • date/time
  • level
  • process/library
  • message

I think this issue is that on console output to stderr this write gets flushed immediately, not at end of line, because the C++ output library does not buffer writes at all.

When writing to a file, the C++ output library buffers, but then we tell log4cplus to flush at the end of each log message, which is exactly correct.

If this is correct we should be able to fix this problem for the console by enabling buffering for cerr.

comment:16 Changed 6 years ago by stephen

  • Owner set to stephen
  • Status changed from new to assigned

comment:17 follow-up: Changed 6 years ago by stephen

  • Owner changed from stephen to UnAssigned
  • Status changed from assigned to reviewing

Following discussions on the bind10-dev list over the period 10 April 2012 to 12 April 2012, no change has been made to the code. Instead, the documentation has been updated to suggest that the user switch to stdout if the message jumbling occurs.

This is a small change to the documentation, so no ChangeLog entry is planned.

comment:18 in reply to: ↑ 17 Changed 6 years ago by shane

I guess we also need to update our test farms to use use stdout to avoid problems with this, right?

comment:19 Changed 6 years ago by jinmei

s/a due/due/

+                jumbled up together.  This is a due to a combination of the way

Other than that it looks okay.

comment:20 Changed 6 years ago by jinmei

  • Owner changed from UnAssigned to stephen

comment:21 Changed 6 years ago by stephen

  • Resolution set to fixed
  • Status changed from reviewing to closed

Typo corrected (and the .txt and .html files regenerated) and changes committed to master. Commit c8e538c10d73ea56f079fcb8f3c6f99e5407bdef

Changed 6 years ago by muks

Patch for log4cplus to use clog instead of cerr, which writes to stderr and is buffered

comment:22 Changed 6 years ago by muks

  • Resolution fixed deleted
  • Status changed from closed to reopened

The patch in the previous comment failed. If our strategy about fixing this issue was correct, the patch should have worked.

I tried another testcase:

#include <stdio.h>
#include <unistd.h>

int
main (int argc, char *argv[])
{
  if (fork() == 0) {
    while (1) {
      printf ("012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789\n");
    }
  } else {
    while (1) {
      printf ("ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ\n");
    }
  }

  return 0;
}

This makes two processes which both write different strings to stdout (which is buffered). The first few lines of output look like this:

[muks@jurassic ~]$ ./log > log.txt
[muks@jurassic ~]$ cat log.txt | head -n 20
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDE012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678KLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
[muks@jurassic ~]$

Notice how the strings intermix here (pan to the right, or search for 'ZABCDE012345678'). Buffering doesn't help as the final write()s from each process are getting mixed up.

If multiple processes have to write legibly to a single output, we will have to use some sort of locking or have a logging server.

Reopening the bug.

comment:23 Changed 6 years ago by muks

The issue is reproduced with smaller logging strings too (about 52 chars long each in the two processes).

comment:24 follow-up: Changed 6 years ago by stephen

I'm not certain that's an exact comparison with the logging output.

If line flushing is enabled, log4cplus appears to call ostream::flush() to flush the buffer after writing each line. printf() does not flush its buffer even if you terminate the string with "\n". To get the same behaviour as log4cplus with line flushing enabled, you would need to call fflush(stdout) after each printf().

Having said that, we are assuming that each underlying call to write(2) to the console cannot interrupt another call to write(2) made by a different process (which may be running at the same time on a multi-core system). I've not managed to find anything that guarantees this.

If multiple processes have to write legibly to a single output, we will have to use some sort of locking or have a logging server.

I think locking and unlocking a single BIND10-wide mutex would be the easiest.

comment:25 in reply to: ↑ 24 Changed 6 years ago by muks

Replying to stephen:

I'm not certain that's an exact comparison with the logging output.

If line flushing is enabled, log4cplus appears to call ostream::flush() to flush the buffer after writing each line. printf() does not flush its buffer even if you terminate the string with "\n". To get the same behaviour as log4cplus with line flushing enabled, you would need to call fflush(stdout) after each printf().

The default buffering behavior is applicable when stdout writes to a tty. stdout is line-buffered in that case (it will flush when it sees a '\n'). When writing to a pipe, I don't know if this still applies.

Regarding the explicit flushing, I have tried using fflush(stdout) after every printf() in the example above, and the same mixing up happens.

Having said that, we are assuming that each underlying call to write(2) to the console cannot interrupt another call to write(2) made by a different process (which may be running at the same time on a multi-core system). I've not managed to find anything that guarantees this.

With strace, it can be seen that for the string lengths in the example above, write()s block till they're done. They don't return with partial counts (i.e., upon a printf() that flushes, libc doesn't call write() more than once per a buffer). BTW, even in a single core system, another process can be scheduled as the write() is blocked on I/O.

If multiple processes have to write legibly to a single output, we will have to use some sort of locking or have a logging server.

I think locking and unlocking a single BIND10-wide mutex would be the easiest.

comment:26 Changed 6 years ago by stephen

  • Owner changed from stephen to jreed
  • Status changed from reopened to reviewing

Milti-process logging appears to be something in the pipeline for the next release of log4cplus. This ticket is being passed to Jeremy who will contact the developers to chase it up.

comment:27 Changed 6 years ago by jreed

  • Owner changed from jreed to UnAssigned

On May 1, a log4cplus developer said There is no inter-process synchronization for console output. In theory, it should be possible to implement it using lock file.

We discussed ideas for this on the May 8, 2012 bi-weekly conference call.

comment:28 Changed 6 years ago by jinmei

Why is this ticket still in the review queue? The latest comments
seem to indicate the status is still quite open.

comment:29 Changed 6 years ago by jinmei

  • Owner changed from UnAssigned to jinmei
  • Status changed from reviewing to accepted

According to the latest sprint planning meeting
http://bind10.isc.org/wiki/SprintPlanning20120515
it seems the status of this ticket should be changed to open. Changing.

comment:30 Changed 6 years ago by jinmei

  • Owner changed from jinmei to UnAssigned
  • Status changed from accepted to assigned

comment:31 Changed 6 years ago by muks

  • Owner changed from UnAssigned to muks

Picking

comment:32 Changed 6 years ago by muks

  • Owner changed from muks to UnAssigned
  • Status changed from assigned to reviewing

Up for review.

comment:33 follow-up: Changed 6 years ago by muks

  • Owner changed from UnAssigned to muks

Taking back. Currently this code uses a lockfile with fcntl(), but it seems a boost::interprocess::interprocess_mutex is better for it.

comment:34 in reply to: ↑ 33 Changed 6 years ago by muks

  • Owner changed from muks to UnAssigned

Replying to muks:

Taking back. Currently this code uses a lockfile with fcntl(), but it seems a boost::interprocess::interprocess_mutex is better for it.

That was a bad idea. It seems that if the processes terminates due to some reason, the mutex is still valid if it's in shared memory. That would block other processes and a reincarnation of itself when it restarts. The lock file approach is better for us.

I have also fixed the unit tests which require logging and don't properly set B10_FROM_SOURCE.

Up for review again.

comment:35 Changed 6 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:36 follow-up: Changed 6 years ago by jinmei

I've not given the branch a full review, but have noticed a few
possibly substantial points. So I'd like to discuss them first.

  • should we worry about performance impact of locking and unlocking?
  • As documented, we cannot do complicated things involving resource allocation in the Logger constructor. In fact, this branch causes crash on my environment due to the static initialization fiasco. Not looking into it closely, but we should probably move this stuff to LoggerImpl.
  • I know it could be tricky, but I'd like to test this feature as much as possible.
  • We'll definitely need a changelog entry for this ticket.
  • I'd like to see documentation about the lock file, and the relationship with the magic environment variables (including when we need to define them).
  • Considering the bind10 process will often start as root and initially open the lock file, are others running as a non privileged user can get access to it?
  • I'm simply not sure, so just asking: is it a good idea to repeat the "unable to use lock file" log when the process fails to open it?
  • I guess a single process can open multiple lock files. Would that cause any problem?
  • Maybe we should allow the bind10 process to delete the lock file at the very end of its process?
  • The else block of Logger::output is (possibly) not exception safe. Consider the case where the main outputRaw throws.
  • On a related note, isn't there a better primitive than the very low level system call? From a quick look boost also has a similar (in addition to interprocess_mutex) API: file_lock.
  • If we want to log the event when locking/unlocking fails, I think it should be an ERROR, not a WARN.
  • for log_test.py, may be a matter of taste, but since we'd need to run it from Makefile in practice (due to other setups), we could just set B10_FROM_SOURCE in the Makefile before running the test script. That way we don't have to auto-generate it from .in.

comment:37 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:38 in reply to: ↑ 36 Changed 6 years ago by muks

Hi jinmei

Thank you for a detailed review :)

Replying to jinmei:

I've not given the branch a full review, but have noticed a few
possibly substantial points. So I'd like to discuss them first.

  • should we worry about performance impact of locking and unlocking?

In normal use-case, log messages are generated perhaps every few hundred
milliseconds at worst. So the impact of locking/locking (even if it is
file based) should be negligible.

In the branch currently, we open() in the constructor and not on every
log message. The open() would be more costly as it would involve path
operations and access checks.

Overall we don't have to worry about it I think.

  • As documented, we cannot do complicated things involving resource allocation in the Logger constructor. In fact, this branch causes crash on my environment due to the static initialization fiasco. Not looking into it closely, but we should probably move this stuff to LoggerImpl.

Will check this.

  • I know it could be tricky, but I'd like to test this feature as much as possible.

This would indeed be complicated, especially as the file lock would have
to be tested from a different process between the lock and unlock. Not
sure if the code can take a mock object. I'll check it too.

  • We'll definitely need a changelog entry for this ticket.

How about this:

XYZ.    [func]          muks
        The logger now uses a lockfile named `logger_lockfile' that is
        created in the local state directory to mutually separate
        individual logging operations from various processes. This is
        done so that log messages from different processes don't mix
        together in the middle of lines.
  • I'd like to see documentation about the lock file, and the relationship with the magic environment variables (including when we need to define them).

Where shall I put this documentation? In the guide?

The magic environment variables are the same as used for the rest of the
codebase when shared resources are used and programs are run from the
top srcdir (B10_FROM_SOURCE). It would be worth writing utility
functions to get paths to files (bug #1979) and document that instead.

  • Considering the bind10 process will often start as root and initially open the lock file, are others running as a non privileged user can get access to it?

No they can't. Good catch. I'll check this too.

  • I'm simply not sure, so just asking: is it a good idea to repeat the "unable to use lock file" log when the process fails to open it?

Maybe not. I'll check it, but I can't log errors in the constructor
anyway. Will see about changing this.

  • I guess a single process can open multiple lock files. Would that cause any problem?

No it won't. Locks over already granted ranges in the same process will
all apply without blocking. (I don't remember if the behavior is
recursive, but it doesn't matter.)

  • Maybe we should allow the bind10 process to delete the lock file at the very end of its process?

I'll check this too.

  • The else block of Logger::output is (possibly) not exception safe. Consider the case where the main outputRaw throws.

*nod*. Will fix.

  • On a related note, isn't there a better primitive than the very low level system call? From a quick look boost also has a similar (in addition to interprocess_mutex) API: file_lock.

After using interprocess_mutex and reverting it, I side with fcntl()
for this case. Its use here is very straightforward and any UNIX
programmer can follow it. The behaviour is well defined (multiple locks
are OK, and when process terminates, the locks are removed
automatically).

If it was a data structure or something complicated, then using boost
would be worthwhile.

  • If we want to log the event when locking/unlocking fails, I think it should be an ERROR, not a WARN.

I'll change this. I thought that failure on locking is not a
showstopper, so it can be WARN, but a failure to unlock would be a
problem.

  • for log_test.py, may be a matter of taste, but since we'd need to run it from Makefile in practice (due to other setups), we could just set B10_FROM_SOURCE in the Makefile before running the test script. That way we don't have to auto-generate it from .in.

I agree. Will change this too.

comment:39 follow-up: Changed 6 years ago by stephen

I've been interested in this problem, so I'd like to add my two-pennyworth.

In the branch currently, we open() in the constructor and not on every log message. The open() would be more costly as it would involve path operations and access checks.

Since all log messages generated by a process should be synchronised - and not those from a single logger - we don't need to open in every constructor. We only need to open the file once in the process.

Not looking into it closely, but we should probably move this stuff to LoggerImpl.

I concur. The logger is written using a pimpl idiom, and most of the logger code is in LoggerImpl.

On a related note, isn't there a better primitive than the very low level system call? From a quick look boost also has a similar (in addition to interprocess_mutex) API: file_lock.

After using interprocess_mutex and reverting it, I side with fcntl() for this case. Its use here is very straightforward and any UNIX programmer can follow it. The behaviour is well defined (multiple locks are OK, and when process terminates, the locks are removed automatically).

Looking at the boost::file_lock documentation, I strongly suspect that this is a layer on top of fcntl.

Some other general points:

  • Regarding the lock file, is there any requirement that it be created when the process starts? Can't we use an existing file - for example, the configuration database file?
  • I think the lock code should be made more general, perhaps separated out into a separate class. This has several advantages:

a) Removes the details of lock implementation from the logger class (which should be focused on logging and nothing else).
b) If the unlock operation is put into the class destructor, locks could be declared as scope limited and automatically freed if an exception is generated in the locked section of code and not caught.
c) Easier to test.
d) Easier to use locks for other purposes as well.

Regarding the last point, file locks only lock a range of bytes in the file. There is no reason why we can't use multiple ranges within the same file to implement multiple locks.

Finally, one specific point with regards to the code:

  • Even single line "if" or "else" clauses should be enclosed in braces.

comment:40 in reply to: ↑ 39 ; follow-up: Changed 6 years ago by jinmei

Replying to stephen:

Not looking into it closely, but we should probably move this stuff to LoggerImpl.

I concur. The logger is written using a pimpl idiom, and most of the logger code is in LoggerImpl.

On a related note, isn't there a better primitive than the very low level system call? From a quick look boost also has a similar (in addition to interprocess_mutex) API: file_lock.

After using interprocess_mutex and reverting it, I side with fcntl() for this case. Its use here is very straightforward and any UNIX programmer can follow it. The behaviour is well defined (multiple locks are OK, and when process terminates, the locks are removed automatically).

Looking at the boost::file_lock documentation, I strongly suspect that this is a layer on top of fcntl.

I suspected that, too, but

  • it implements RAII in it, so we don't have to do anything to make it exception safe ourselves.
  • if should be more portable than the direct use of a system call. especially so if and when we consider supporting Windows.

comment:41 in reply to: ↑ 40 ; follow-up: Changed 6 years ago by muks

Replying to jinmei:

  • it implements RAII in it, so we don't have to do anything to make it exception safe ourselves.

This will mean instantiating file_lock (->opening the lock file) in the log output method, which will be more expensive than just the fd based operations.

comment:42 in reply to: ↑ 41 ; follow-up: Changed 6 years ago by jinmei

Replying to muks:

Replying to jinmei:

  • it implements RAII in it, so we don't have to do anything to make it exception safe ourselves.

This will mean instantiating file_lock (->opening the lock file) in the log output method, which will be more expensive than just the fd based operations.

First, I'd like to clarify my position: I'm not necessarily pushing
the idea of using the boost framework. I'm just trying to show
various possible options.

Then, about this specific point: are you saying this by actually
checking the code or the behavior of the boost file lock? I used the
term "RAII" as an idiom, which in this context means "locked" status
is the resource and it will be automatically unlocked when the
corresponding object is destructed, rather than the resource
acquisition in its literal sense. It doesn't automatically mean
opening the file.

comment:43 in reply to: ↑ 42 Changed 6 years ago by muks

Replying to jinmei:

First, I'd like to clarify my position: I'm not necessarily pushing
the idea of using the boost framework. I'm just trying to show
various possible options.

I know.. we are just discussing what would be the best way. :)

Then, about this specific point: are you saying this by actually
checking the code or the behavior of the boost file lock? I used the
term "RAII" as an idiom, which in this context means "locked" status
is the resource and it will be automatically unlocked when the
corresponding object is destructed, rather than the resource
acquisition in its literal sense. It doesn't automatically mean
opening the file.

I meant to say that boost's file_lock takes a file path in the constructor and when it goes out of extent, would start again by opening the file.

I've finished implementing the review comments now. Am running some final tests. Will update this bug then. This code also uses raii for working with exceptions by using a lock object (which takes an fd).

comment:44 Changed 6 years ago by muks

Some of the points have already been addressed in other comments. I'll just reply to the ones which haven't.

Replying to jinmei:

  • As documented, we cannot do complicated things involving resource allocation in the Logger constructor. In fact, this branch causes crash on my environment due to the static initialization fiasco. Not looking into it closely, but we should probably move this stuff to LoggerImpl.

This is now done.

  • I know it could be tricky, but I'd like to test this feature as much as possible.

There is now a unit test for this feature, that forks another process and tests the lock while the log message is being output.

  • I'd like to see documentation about the lock file, and the relationship with the magic environment variables (including when we need to define them).

Please see my earlier reply on this (there's a question).

  • Considering the bind10 process will often start as root and initially open the lock file, are others running as a non privileged user can get access to it?

The branch now creates the lock file with mode 0660. This should be fine if the directory is bind10 group owned with g+w, g+s and the processes run as bind10 group.

  • I'm simply not sure, so just asking: is it a good idea to repeat the "unable to use lock file" log when the process fails to open it?

It is indeed a bad idea as it'll clutter up the output. This message is no longer repeated.

  • Maybe we should allow the bind10 process to delete the lock file at the very end of its process?

This is now implemented.

  • The else block of Logger::output is (possibly) not exception safe. Consider the case where the main outputRaw throws.

The lock/unlock functionality is now inside an object which goes out of extent when it exits that block.

  • If we want to log the event when locking/unlocking fails, I think it should be an ERROR, not a WARN.

You are right on this too. A problem with unlocking (though unlikely) should be an ERROR definitely as it'd block all other log attempts.

  • for log_test.py, may be a matter of taste, but since we'd need to run it from Makefile in practice (due to other setups), we could just set B10_FROM_SOURCE in the Makefile before running the test script. That way we don't have to auto-generate it from .in.

Done.


On the use of file_lock

It seems to open/close the file each time (if we instantiate the object every time for raii) which will reduce performance. The raii part to work with exceptions is now implemented in a utility class which is fd based.

On having a lock file class

There are two things to consider:

  • We just have one use case, and muxing several locks into a single lock file (maintaining range lists) will be over-implementing it. Even if we had more than one use-case, IMO we should have locks in separate files so that there's no global state.
  • Our use-case also ties in with the logger. We log if there's a problem locking/unlocking.

The lock/unlock code is in a different utility class now.

On re-using the b10-config.db file for range locking instead of a separate lock file

Applying an exclusive lock (even advisory) requires the file to be opened with write access flag. I think it's a bad approach to do this on the config db. We don't need to mix functionality this way when a lock file is almost free.

Last edited 6 years ago by muks (previous) (diff)

comment:45 Changed 6 years ago by muks

  • Owner changed from muks to jinmei

comment:46 follow-up: Changed 6 years ago by jinmei

Okay...I was impressed about the effort of making it testable, but I'd
like to have some more design level discussion.

First, I have to say I'm not really excited about making the Logger
class and the way it's used. I'm not sure why some of the
Logger or LoggerImpl classes have virtual methods (these classes
doesn't seem to be used as a base class), but ignoring these
(ineffective) "virtual" declarations, the wrapper trick really
introduces polymorphism with its own costs such as vtable lookups.
Since some of the logger related methods are expected to be very
efficient (e.g., getSeverity()), I'd like to keep the overhead as
small as possible, especially if the added overhead is just for
testing.

Second, the definition of LoggerWrapper? seems to be too ad hoc. It's
mostly a just redundant wrapper and only with fatal() being a virtual
method. There's no documentation about it, so the class definition
just looks so awkward unless you read every other part of the code
including the tests that use it.

Third, with this change it exposes the originally private LoggerImpl
at the application-facing Logger level as a mutable pointer. While
I understand sometimes ugliness is the least bad for better
testability, this hack seems to be too dangerous and dirty.

Fourth, the separate file lock class isn't tested. Actually, as a
separate class it should be easier to test (also as Stephen
commented), and once we can confirm this class really handles
inter-process synchronization, we'd be able to simplify the logger
tests with some level of abstraction. That would also address the
above concerns.

So, if I may, I'd like to suggest something like this:

  • Introduce a base class for interprocess synchronization (at this level it could even be any type of conceptual lock; doesn't have to be file based).
  • Define a flock based derived class. write tests for it. also note that it could be a trivial wrapper for something like boost::file_lock, in which case we could even omit our own tests, to the extent we trust that library.
  • Also define a mock lock class for tests. It could be, e.g., just dumps log message for lock/unlock operations.
  • The Logger(Impl) class holds a pointer to the base interprocess-sync class. By default, it instantiates the file-lock based one and holds it.
  • Allow an application of the Logger to dynamically replace the internal lock object for Logger. The logger test uses the mock lock class, and simply checks if log output is encapsulated with lock and unlock. No interprocess checking is necessary here.
  • Within the logger, we still use a lightweight wrapper like the util::file_lock class for exception safety. It would be constructed with a reference (or pointer) to the base lock class instead of FD.

This way, we could separate the complicated interprocess test case
from logging specific tests, which have their own complexity. Also,
we don't have to tweak the main logging related classes. And, the
synchronization class can be further flexible: for example, it could
even be a "null" lock if the user knows there should be no lock issue
or it's very performance sensitive and the user is willing to accept
the consequence of omitting the lock such as mixed log outputs.

What do you think?

Comments on some other points:

  • the file permission trick seems to be acceptable (while subtle), but I'm not sure if the log library is the right place to implement it. Logging itself is pretty generic, while the permission issue is very specific to how the entire BIND 10 system is working. So, making it group writable may not always be appropriate. Now the boss process tries to remove the file at the end of its lifetime, I wonder whether it makes sense if we have the boss process creates the file with necessary permission setting at the beginning, and let the logging library just open it.
  • similar to the previous point, since we have other standalone progams that (perhaps implicitly) use the logging (at least dbtool for now, and I guess it's also the case for loadzone), we might even want to make the file path or whether to use locks customizable. If we write a "zone checker" script, e.g, which can be run by anyone as an ordinary user, we don't want to make it emit noisy logs or even fail to run because of the logging issue, right?
  • As for changelog, it basically looks okay, but I'd note log file permission issue. Also, for that reason, I'd add the '*' mark.
  • As for the note about environment variables, I was thinking about part of doxygen. But in the case of logging, it may also be part of README because it already has it. Definitely not the guide.

Below I'm making some general comments on the code itself, although
they may become moot depending on the result of the design level
discussions above.

general

I've noticed several editorial/style issues and fixed them. They are
basically documented in the style guide wiki. Please take a look at
it again.

log_messages

  • I don't like to include test specific log message (LOG_LOCK_TEST_MESSAGE) in the main source.

file_lock.cc/h

  • naming convention: file_lock class should be (something like) FileLock?
  • why is the destructor defined as virtual?
  • documentation: overall class description is needed. In general all methods need documentation. If any of the method can throw an exception, it should be documented (if it never throws it's better to say so explicitly). Parameters need to be described (also for the constructor).

logger_impl.h

On LoggerWrapper?: my compiler complains if it doesn't explicitly
define the destructor and make it virtual:

    virtual ~LoggerWrapper() {}

when it has other virtual method.

comment:47 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:48 Changed 6 years ago by muks

  • Owner changed from muks to jinmei

Hi jinmei

I've rewritten the code to use this new design. It's in the trac1704_2 branch. While there are unit tests for the InterprocessSync? class, maybe it's no longer necessary for the logger to be tested with it. Can you check and tell me if you're happy with this new branch?

comment:49 Changed 6 years ago by muks

(note: I haven't added documentation, etc. yet. Will do it after the code passes review.)

comment:50 Changed 6 years ago by jinmei

General

  • The new version still needed editorial nits. I made them directly to the branch. Please check. Maybe we need a common checker script because some conventions seem to be so difficult to follow for new developers (but that's not a topic of this branch).
  • (not specific to this version; sorry I forgot to mention this before) How does it work for multi-thread components? Right now, some python daemons use multiple threads. Although we'll probably get rid of them according to the previous f2f discussions, but I guess b10-resolver will eventually be multi-threaded.
  • About the design: it now looks much better, but I don't see why we need the separate class hierarchy for InterprocessSync. If I were to implement it, I'd simply include all file-related logic in InterprocessSyncFile (which I'd call InterprocessSyncFileLock) and make InterprocessSyncLocker a value (i.e., not a base) class:
    class InterprocessSyncLocker {
    public:
        InterprocessSyncLocker(InterprocessLock& lock) :
            locked_(false), lock_(lock) {}
        ~InterprocessSyncLocker() {
            unlock();
        }
        void lock() {
            if (locked_) {
                lock_.lock();
                locked_ = true;
            }
        }
        void unlock() {
            if (locked_) {
                lock_.unlock();
                locked_ = false;
            }
        }
    private:
        bool locked_;
        InterprocessLock& lock_;
    };
    
    This way we can avoid newing the locker, avoid using friend, and avoid using dynamic_cast. With this approach LoggerImpl::outputRaw() would look like this:
    void
    LoggerImpl::outputRaw(const Severity& severity, const string& message) {
        InterprocessSyncLocker locker(*sync_);
        locker.lock();
    
        LOG4CPLUS_DEBUG(logger_, message);  //, and so on
    
        // Lock will be automatically released.
    }
    

logger implementation

Maybe you're planning it, but we'll still need to test the logger
behavior. As I suggested, I'd use some mock lock class. Also, in
that case, we'll probably not have to set the magic B10_FROM variable
for tests.

interprocess_sync_file

  • it's better to make component_name const string& (cppcheck will even fail due to that):
        /// \brief Constructor
        InterprocessSyncFile(const std::string component_name);
    
  • lock/unlock/tryLock share almost the same code pattern. I'd somehow combine them (note, however, that we might not need tryLock in the first place; see below)
  • top source dir may not be writable, so we should use somewhere always writable e.g., top_builddir:
        const char* const env = getenv("B10_FROM_SOURCE");
        if (env != NULL) {
            lockfile_path = env;
        }
    

interprocess_sync_file_unittest

  • As far as I can see InterprocessSyncFileTest doesn't have to be a fixture (i.e. no class for it, replace TEST_F with TEST)
  • Overall the code is not exception safe wrt dynamic allocation, e.g.:
      InterprocessSync* sync = new InterprocessSyncFile("test");
      InterprocessSyncLocker* locker = sync->getLocker();
    
    Although in tests it may not a big deal in practice, I'd generally like to keep any code in our tree as clean as possible. In these cases we should be easily able to make it safe with boost::scoped_ptr.
  • I'd rather specify the full path than relying on the B10_FROM_ trick with setting in main() globally:
      InterprocessSync* sync = new InterprocessSyncFile("test");
    
    Setting this environment variable could affect other tests, and the full path is in fact closer to what would be used in the actual deployment. Note that if we do this we can/should remove setenv from main().
  • TestLock?: pipe(2) can fail. I'd explicitly check the return value either by assert or ASSERT_NE, or by throwing an exception on failure.
  • same for fork(), read(), write() (and technically, close(), although we can probably say it should be so unlikely that it's not worth checking in this context)
  • Testing tryLock() is suboptimal because in the real logger code we don't use it. Since we are now free from the logging context, I'd suggest using lock() for actually writing something to the file. Both the parent and child would write something to the file, in that order ensured by the lock, and we can check what is written. (Of course, though, this test could incorrectly pass depending on the timing). Note also that if we take this approach, we can probably just remove tryLock(), at least until we really need it in an application.
  • read() at the parent could block if there's an unexpected bug in the child part, and in the worst case the whole test could hang. I'd try avoiding that scenario as much as possible. For example, we can introduce a timeout for read(); we can also make sure the child really dies in some way.
  • this can be simplified:
          if (locked == 1) {
            was_locked = true;
          } else {
            was_locked = false;
          }
    ...
      EXPECT_TRUE(was_locked);
    
    as:
          EXPECT_EQ(1, locked);
    
    and we can remove was_locked completely. Same for TestMultipleFilesForked? with was_not_locked.

comment:51 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:52 in reply to: ↑ 46 ; follow-up: Changed 6 years ago by muks

  • Owner changed from muks to jinmei

Replying to jinmei:

General

  • The new version still needed editorial nits. I made them directly to the branch. Please check. Maybe we need a common checker script because some conventions seem to be so difficult to follow for new developers (but that's not a topic of this branch).

I've tried to be more careful about it, but I may still make the odd mistakes.

  • (not specific to this version; sorry I forgot to mention this before) How does it work for multi-thread components? Right now, some python daemons use multiple threads. Although we'll probably get rid of them according to the previous f2f discussions, but I guess b10-resolver will eventually be multi-threaded.

IIRC, trying to acquire a constesting lock from another thread will block, but I have to check this. We may have to remove the flag inside InterprocessSync? this case, if a single logger object is used.

  • About the design: it now looks much better, but I don't see why we need the separate class hierarchy for InterprocessSync. If I were to implement it, I'd simply include all file-related logic in InterprocessSyncFile (which I'd call InterprocessSyncFileLock) and make InterprocessSyncLocker a value (i.e., not a base) class:
    class InterprocessSyncLocker {
    public:
        InterprocessSyncLocker(InterprocessLock& lock) :
            locked_(false), lock_(lock) {}
        ~InterprocessSyncLocker() {
            unlock();
        }
        void lock() {
            if (locked_) {
                lock_.lock();
                locked_ = true;
            }
        }
        void unlock() {
            if (locked_) {
                lock_.unlock();
                locked_ = false;
            }
        }
    private:
        bool locked_;
        InterprocessLock& lock_;
    };
    
    This way we can avoid newing the locker, avoid using friend, and avoid using dynamic_cast. With this approach LoggerImpl::outputRaw() would look like this:
    void
    LoggerImpl::outputRaw(const Severity& severity, const string& message) {
        InterprocessSyncLocker locker(*sync_);
        locker.lock();
    
        LOG4CPLUS_DEBUG(logger_, message);  //, and so on
    
        // Lock will be automatically released.
    }
    

Something similar is now implemented. :)

logger implementation

Maybe you're planning it, but we'll still need to test the logger
behavior. As I suggested, I'd use some mock lock class. Also, in
that case, we'll probably not have to set the magic B10_FROM variable
for tests.

This test is written now. Due to this test and the API in general, we now lazily open the lock file (when it is actually required). This allows us to replace the sync object with a mock object or a NULL object (in the case of command line utilites).

log_messages

  • I don't like to include test specific log message (LOG_LOCK_TEST_MESSAGE) in the main source.

It's moved to a separate file now in tests/.

interprocess_sync_file

  • it's better to make component_name const string& (cppcheck will even fail due to that):
        /// \brief Constructor
        InterprocessSyncFile(const std::string component_name);
    

I think you've made this change. :)

  • lock/unlock/tryLock share almost the same code pattern. I'd somehow combine them (note, however, that we might not need tryLock in the first place; see below)

This is reduced to a utility function now. It also became a place to put the lazy open code. On the topic of tryLock(), see below.

  • top source dir may not be writable, so we should use somewhere always writable e.g., top_builddir:
        const char* const env = getenv("B10_FROM_SOURCE");
        if (env != NULL) {
            lockfile_path = env;
        }
    

Done.

interprocess_sync_file_unittest

  • As far as I can see InterprocessSyncFileTest doesn't have to be a fixture (i.e. no class for it, replace TEST_F with TEST)

This is changed now.

  • Overall the code is not exception safe wrt dynamic allocation, e.g.:
      InterprocessSync* sync = new InterprocessSyncFile("test");
      InterprocessSyncLocker* locker = sync->getLocker();
    
    Although in tests it may not a big deal in practice, I'd generally like to keep any code in our tree as clean as possible. In these cases we should be easily able to make it safe with boost::scoped_ptr.

Due to the change in design (above) the code doesn't look like this anymore.

  • I'd rather specify the full path than relying on the B10_FROM_ trick with setting in main() globally:
      InterprocessSync* sync = new InterprocessSyncFile("test");
    
    Setting this environment variable could affect other tests, and the full path is in fact closer to what would be used in the actual deployment. Note that if we do this we can/should remove setenv from main().

I don't follow completely here. The constructor accepts just the component name, and builds a path according to the runtime/test environment. It has to be able to accept different paths as the install prefix may not be writable or even exist. I would not prefer allowing to specify the full pathname to the lock file by the caller, as the lockfile locations can be managed by us, or may not even
be an actual file.

  • TestLock?: pipe(2) can fail. I'd explicitly check the return value either by assert or ASSERT_NE, or by throwing an exception on failure.
  • same for fork(), read(), write() (and technically, close(), although we can probably say it should be so unlikely that it's not worth checking in this context)

It is an unlikely situation (though not impossible). In case they fail, a lot of other things will similarly fail and it's best the test crashes and burns then. The test is written such that any such failure will cause the test to fail.

I felt similarly about the read() below and was going to group it into this list, but a blocked process will stop testing which felt uneasy. So I've addressed that one.

  • Testing tryLock() is suboptimal because in the real logger code we don't use it. Since we are now free from the logging context, I'd suggest using lock() for actually writing something to the file. Both the parent and child would write something to the file, in that order ensured by the lock, and we can check what is written. (Of course, though, this test could incorrectly pass depending on the timing). Note also that if we take this approach, we can probably just remove tryLock(), at least until we really need it in an application.

tryLock() is a utility function written to test lock() and unlock(). It's not used otherwise. I think the current code is good as-is, because we are only testing lock()/unlock() and the test only passes if tryLock() finds a contesting lock. As you've noted, we can't rely on the order of writes without actually testing the lock.

  • read() at the parent could block if there's an unexpected bug in the child part, and in the worst case the whole test could hang. I'd try avoiding that scenario as much as possible. For example, we can introduce a timeout for read(); we can also make sure the child really dies in some way.

I agree with this reasoning now. I've put in a select() that waits for 5 seconds and only reads if there's data available (1 byte).

  • this can be simplified:
          if (locked == 1) {
            was_locked = true;
          } else {
            was_locked = false;
          }
    ...
      EXPECT_TRUE(was_locked);
    
    as:
          EXPECT_EQ(1, locked);
    
    and we can remove was_locked completely. Same for TestMultipleFilesForked? with was_not_locked.

This is also updated accordingly. :)

Thank you for the review jinmei :) I think the redesigned code looks pretty clean compared to what we had before.

comment:53 in reply to: ↑ 52 ; follow-up: Changed 6 years ago by jinmei

Replying to muks:

The latest version seems closer to merge. I still have some comments
(below), but they are not design level ones and are generally minor or
trivial.

  • I'd rather specify the full path than relying on the B10_FROM_ trick with setting in main() globally:
      InterprocessSync* sync = new InterprocessSyncFile("test");
    
    Setting this environment variable could affect other tests, and the full path is in fact closer to what would be used in the actual deployment. Note that if we do this we can/should remove setenv from main().

I don't follow completely here. The constructor accepts just the component name, and builds a path according to the runtime/test environment. It has to be able to accept different paths as the install prefix may not be writable or even exist. I would not prefer allowing to specify the full pathname to the lock file by the caller, as the lockfile locations can be managed by us, or may not even
be an actual file.

Ah, okay, I was confused a bit; we cannot really specify the directory
for the file. I have some related comment on the revised version.
See the "B10_FROM_xxx trick" thing on the log test comments.

Comments on the revised version:

general

We need more detailed documentation for InterprocessSync and
InterprocessSyncLocker. And, I'd generally suggest we complete doc
before review, because we often notice things like some hidden issue
or missing corner cases when we try to describe classes/methods (that
has been the case myself). But for this particular ticket, we've
already spent too much time on it. So maybe we should even defer it
to a separate ticket and close this one sooner.

I made a few more editorial fixes.

interprocess_sync.h

  • Why did you make lock() variants protected and use friend to allow InterprocessSyncLocker to call them (from a quick look and check that seems to be the only purpose of the friend)? In my understanding the more common convention is to just make such virtual methods public (no friend). In some style it's recommended to make pure virtuals protected, but in that case the convention is (in my understanding) to provide corresponding non virtual methods to the base class, not relying on friend. Besides, (I believe you know that but) friend is too powerful and should be considered a last resort weapon. In general, if the friend class only needs to access friend's protected (not private) members, I believe it's better to achieve that goal with a less powerful tool than friend.
  • I'd use 'const string&', and no, I don't think I modified it myself with my previous comment:
        InterprocessSync(const std::string component_name) :
            component_name_(component_name), is_locked_(false)
        {}
    
    and make component_name_ const. but this time I did it myself to save our time. Same for InterprocessSyncFile, and I modified it, too.

interprocess_sync_file

  • I'd (explicitly) declare the InterprocessSyncFile destructor as virtual. I thought it's also documented in the guideline.
  • If we use build dir, it's better to name the corresponding env variable "B10_FROM_BUILD". "FROM_SOURCE" could be used to really specify a source directory.

interprocess_sync_file_unittest

  • We might want to simplify timeval initialization:
          struct timeval tv = { 5, 0 };
    
    but it may not be a big deal as we cannot make it const with that due to the select(2) interface.
  • I'd add comments why we use select(). Also explain why it's 5 seconds, if only to say it's an arbitrary choice.

logger.h

  • What happens if we pass NULL to setInterprocessSync? (it may depend on that, but) I'd clarify this method is exception free, and that when non NULL the pointer must be delete-able and the ownership is transferred to Logger (so the caller doesn't have to / shouldn't delete it)
  • a minor point, but I'd place setInterprocessSync in logger.h somewhere separated from the log level/severity, etc related methods. the current position seems a bit awkward.

logger_impl.cc

  • I believe you can now avoid including <memory>
  • LoggerImpl? constructor: a minor deal, but you could initialize sync_ in the initialization list:
    LoggerImpl::LoggerImpl(const string& name) :
        name_(expandLoggerName(name)),
        logger_(log4cplus::Logger::getInstance(name_)),
        sync_(new InterprocessSyncFile("logger"))
    
  • LoggerImpl::outputRaw: the locker is now an abstract object, not necessarily a file based:
    LoggerImpl::outputRaw(const Severity& severity, const string& message) {
        // Use a lock file for mutual exclusion from other processes to
        // avoid log messages getting interspersed
    

log test

  • I suspect we don't have the chicken-egg problem here.
  • I also suspect we don't need the B10_FROM_xxx trick here some (or perhaps most) of the cases because we replace the sync before the original one tries to open a file and not many tests actually try to dump logs. btw the variable should better be named B10_FROM_BUILD where needed.
  • technically, we should test the lock is acquired before the log output and released after that. The Lock test doesn't actually confirm it. But now that we've spent too much time on this ticket, if it's not easy to achieve, the current test is probably okay as a compromise.

bind10_src

  • remove_lock_files needs a test.

comment:54 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:55 in reply to: ↑ 53 ; follow-up: Changed 6 years ago by muks

  • Owner changed from muks to jinmei

Replying to jinmei:

We need more detailed documentation for InterprocessSync and
InterprocessSyncLocker. And, I'd generally suggest we complete doc
before review, because we often notice things like some hidden issue
or missing corner cases when we try to describe classes/methods (that
has been the case myself). But for this particular ticket, we've
already spent too much time on it. So maybe we should even defer it
to a separate ticket and close this one sooner.

This has been added now.

interprocess_sync.h

  • Why did you make lock() variants protected and use friend to allow InterprocessSyncLocker to call them (from a quick look and check that seems to be the only purpose of the friend)? In my understanding the more common convention is to just make such virtual methods public (no friend). In some style it's recommended to make pure virtuals protected, but in that case the convention is (in my understanding) to provide corresponding non virtual methods to the base class, not relying on friend. Besides, (I believe you know that but) friend is too powerful and should be considered a last resort weapon. In general, if the friend class only needs to access friend's protected (not private) members, I believe it's better to achieve that goal with a less powerful tool than friend.

It's done to somewhat force the use of InterprocessSyncLocker? rather than
InterprocessSync? for locking (so that any locks are cleared up when the
basic block is quit). This needs the friend as the lock methods in the
InterprocessSync? class are protected. I've described it in the class
documentation now.

  • I'd use 'const string&', and no, I don't think I modified it myself with my previous comment:
        InterprocessSync(const std::string component_name) :
            component_name_(component_name), is_locked_(false)
        {}
    
    and make component_name_ const. but this time I did it myself to save our time. Same for InterprocessSyncFile, and I modified it, too.

*nod* :)

interprocess_sync_file

  • I'd (explicitly) declare the InterprocessSyncFile destructor as virtual. I thought it's also documented in the guideline.

Done.

  • If we use build dir, it's better to name the corresponding env variable "B10_FROM_BUILD". "FROM_SOURCE" could be used to really specify a source directory.

Done.

interprocess_sync_file_unittest

  • We might want to simplify timeval initialization:
          struct timeval tv = { 5, 0 };
    
    but it may not be a big deal as we cannot make it const with that due to the select(2) interface.

I was not sure from the posix definition of struct timeval if it can allow
other members. This is why the members were set individually.

  • I'd add comments why we use select(). Also explain why it's 5 seconds, if only to say it's an arbitrary choice.

Done.

logger.h

  • What happens if we pass NULL to setInterprocessSync? (it may depend on that, but) I'd clarify this method is exception free, and that when non NULL the pointer must be delete-able and the ownership is transferred to Logger (so the caller doesn't have to / shouldn't delete it)

A valid sync object always has to be present. I've made NULL do no change
now.

  • a minor point, but I'd place setInterprocessSync in logger.h somewhere separated from the log level/severity, etc related methods. the current position seems a bit awkward.

I've moved it to the end of the list now. Please see if it's ok.

logger_impl.cc

  • I believe you can now avoid including <memory>

*nod*. :) Removed.

  • LoggerImpl? constructor: a minor deal, but you could initialize sync_ in the initialization list:
    LoggerImpl::LoggerImpl(const string& name) :
        name_(expandLoggerName(name)),
        logger_(log4cplus::Logger::getInstance(name_)),
        sync_(new InterprocessSyncFile("logger"))
    

Done.

  • LoggerImpl::outputRaw: the locker is now an abstract object, not necessarily a file based:
    LoggerImpl::outputRaw(const Severity& severity, const string& message) {
        // Use a lock file for mutual exclusion from other processes to
        // avoid log messages getting interspersed
    

Comment was updated.

log test

  • I suspect we don't have the chicken-egg problem here.
  • I also suspect we don't need the B10_FROM_xxx trick here some (or perhaps most) of the cases because we replace the sync before the original one tries to open a file and not many tests actually try to dump logs. btw the variable should better be named B10_FROM_BUILD where needed.

Not most of the cases, but only one testcase where we actually test the
logger for the lock. In other places where a mock object is not used,
this is still needed. I've changed the use of the env variable to
B10_FROM_BUILD everywhere now.

  • technically, we should test the lock is acquired before the log output and released after that. The Lock test doesn't actually confirm it. But now that we've spent too much time on this ticket, if it's not easy to achieve, the current test is probably okay as a compromise.

This will require something similar to the last design. It would need a wrapper
around log4cplus so that logging calls are intercepted by a mock object *between*
the lock/unlock. It would result in the same sort of ugliness as before.

bind10_src

  • remove_lock_files needs a test.

Done. :)

comment:56 in reply to: ↑ 55 ; follow-up: Changed 6 years ago by jinmei

Sorry, on even a closer look, I still found a substantial issue:
We need to tweak the lock file path for all test. Otherwise a test
could crash like this if the path isn't writable for the test:

[ RUN      ] DataSrcTest.Query
*** Exception derived from isc::exception thrown:
    file: interprocess_sync_file.cc
    line: 67
    what: Unable to use interprocess sync lockfile: /usr/var/bind10-devel/logger_lockfile
terminate called throwing an exception/bin/sh: line 1: 73461 Abort trap: 6

So, I suggest this:

  • first, use a separate environment variable than B10_FROM_BUILD, e.g. B10_LOCKFILE_DIR_FROM_BUILD. The former is generic and can be used for a different purpose only for some selected situations. Now we are going to set it (almost) always for all tests, it could break something if we use the generic variable.
  • second, set the variable in logger_unittest_support.cc:initLogger(). rather than in each test's run_unittests. The latter approach is a duplicate and would be easy to miss.
  • btw, as far as I can see log/tests/run_initializer_unittests doesn't need to set this variable. That should be same for init_logger_test.sh or tests/run_unittests if we add this trick in initLogger(). Or even any other of them. Please check.

styles

there are still some issues. I've fixed them.

documentation

I originally planned to defer or ignore this, but now we still need to
do some more ting...I'd explain a bit more about
InterprocessSyncLocker:

  • describe parameters
  • describe return value (and for false on lock/unlock, whether it can normally happen or not, and what the caller would have to do with it)
  • any possible exceptions (and requirement on this point for derived classes)

bind10_test

  • not a big deal, but since this is already auto generated, you could embed @abs_top_builddir@ in the py.in file rather than set it via the Makefile. But that's a minor point, so I'd leave it to you.
  • I think it's safe to create/delete the dir in setup/teardown methods. please also check this doesn't break distcheck.
  • isn't it better to use os.path.exists than os.path.isfile?
  • what's the purpose of the explicit close?
                    open(fname, "w").close()
    
    If it's for the false positive about open file like due to the stupidity of Python as we discussed #1828, I'd avoid making our own code look counter intuitive due to that (besides, my python interpreter doesn't complain about it without close() even if it barks for other false positive cases). If we really want to keep it, please add a comment about why we bother to do that.
  • I'd confirm the file exists after the possible open:
                if not os.path.isfile(fname):
                    open(fname, "w").close()
                self.assertTrue(os.path.isfile(fname))
    

Replying to muks:

interprocess_sync.h

  • Why did you make lock() variants protected and use friend to allow InterprocessSyncLocker to call them (from a quick look and check that seems to be the only purpose of the friend)?

It's done to somewhat force the use of InterprocessSyncLocker? rather than
InterprocessSync? for locking (so that any locks are cleared up when the
basic block is quit). This needs the friend as the lock methods in the
InterprocessSync? class are protected. I've described it in the class
documentation now.

Hmm, okay. I'm not still convinced if it's a reasonable use of
friend, but I see the advantage of restricting the lock usage. So
this part is okay.

interprocess_sync_file_unittest

  • We might want to simplify timeval initialization:
          struct timeval tv = { 5, 0 };
    
    but it may not be a big deal as we cannot make it const with that due to the select(2) interface.

I was not sure from the posix definition of struct timeval if it can allow
other members. This is why the members were set individually.

Okay.

logger.h

  • What happens if we pass NULL to setInterprocessSync? (it may depend on that, but) I'd clarify this method is exception free, and that when non NULL the pointer must be delete-able and the ownership is transferred to Logger (so the caller doesn't have to / shouldn't delete it)

A valid sync object always has to be present. I've made NULL do no change
now.

Hmm, frankly, I don't like this behavior: if NULL isn't expected I'd
throw; otherwise I'd use it somewhat useful, like "disabling lock".
But at this stage for this ticket I won't argue.

  • a minor point, but I'd place setInterprocessSync in logger.h somewhere separated from the log level/severity, etc related methods. the current position seems a bit awkward.

I've moved it to the end of the list now. Please see if it's ok.

Yes, that looks better.

log test

  • I suspect we don't have the chicken-egg problem here.

What about this? I believe we can just auto generate .cc and .h. But
again, we should probably prioritize completing this ticket than
trying to solve all small problems...

  • technically, we should test the lock is acquired before the log output and released after that. The Lock test doesn't actually confirm it. But now that we've spent too much time on this ticket, if it's not easy to achieve, the current test is probably okay as a compromise.

This will require something similar to the last design. It would
need a wrapper around log4cplus so that logging calls are
intercepted by a mock object *between* the lock/unlock. It would
result in the same sort of ugliness as before.

I don't necessarily think so; I can imagine having the mock
lock/unlock produce additional logs, then check the ordering of the
resulting logged message some way. This approach shouldn't require a
change to the base and main (non test) classes. But that probably
requires an additional setup like a checker script, and due to the
excessive time we've spent for this ticket, I should probably live
with the current test.

comment:57 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:58 in reply to: ↑ 56 Changed 6 years ago by muks

  • Owner changed from muks to jinmei

Replying to jinmei:

Sorry, on even a closer look, I still found a substantial issue:
We need to tweak the lock file path for all test. Otherwise a test
could crash like this if the path isn't writable for the test:

[ RUN      ] DataSrcTest.Query
*** Exception derived from isc::exception thrown:
    file: interprocess_sync_file.cc
    line: 67
    what: Unable to use interprocess sync lockfile: /usr/var/bind10-devel/logger_lockfile
terminate called throwing an exception/bin/sh: line 1: 73461 Abort trap: 6

So, I suggest this:

  • first, use a separate environment variable than B10_FROM_BUILD, e.g. B10_LOCKFILE_DIR_FROM_BUILD. The former is generic and can be used for a different purpose only for some selected situations. Now we are going to set it (almost) always for all tests, it could break something if we use the generic variable.
  • second, set the variable in logger_unittest_support.cc:initLogger(). rather than in each test's run_unittests. The latter approach is a duplicate and would be easy to miss.
  • btw, as far as I can see log/tests/run_initializer_unittests doesn't need to set this variable. That should be same for init_logger_test.sh or tests/run_unittests if we add this trick in initLogger(). Or even any other of them. Please check.

These points are addressed now. Many unittests needed the B10_LOCKFILE_DIR_FROM_BUILD set. I've also added a null InterprocessSync? implementation so that logger_example (which is a standalone utility) can use it. I've verified that make check at toplevel passes without the install prefix dir being available.

styles

there are still some issues. I've fixed them.

:)

documentation

I originally planned to defer or ignore this, but now we still need to
do some more ting...I'd explain a bit more about
InterprocessSyncLocker:

  • describe parameters
  • describe return value (and for false on lock/unlock, whether it can normally happen or not, and what the caller would have to do with it)
  • any possible exceptions (and requirement on this point for derived classes)

Done.

bind10_test

  • not a big deal, but since this is already auto generated, you could embed @abs_top_builddir@ in the py.in file rather than set it via the Makefile. But that's a minor point, so I'd leave it to you.
  • I think it's safe to create/delete the dir in setup/teardown methods. please also check this doesn't break distcheck.
  • isn't it better to use os.path.exists than os.path.isfile?
  • what's the purpose of the explicit close?
                    open(fname, "w").close()
    
    If it's for the false positive about open file like due to the stupidity of Python as we discussed #1828, I'd avoid making our own code look counter intuitive due to that (besides, my python interpreter doesn't complain about it without close() even if it barks for other false positive cases). If we really want to keep it, please add a comment about why we bother to do that.

The .close() pretty much indicates that we don't (directly) use that file beyond that point. It doesn't appear to me as cluttered code.

  • I'd confirm the file exists after the possible open:
                if not os.path.isfile(fname):
                    open(fname, "w").close()
                self.assertTrue(os.path.isfile(fname))
    

Done. I've added a few more such asserts to make everything go lock-step.

Replying to muks:

logger.h

  • What happens if we pass NULL to setInterprocessSync? (it may depend on that, but) I'd clarify this method is exception free, and that when non NULL the pointer must be delete-able and the ownership is transferred to Logger (so the caller doesn't have to / shouldn't delete it)

A valid sync object always has to be present. I've made NULL do no change
now.

Hmm, frankly, I don't like this behavior: if NULL isn't expected I'd
throw; otherwise I'd use it somewhat useful, like "disabling lock".
But at this stage for this ticket I won't argue.

I've made the code throw now, updated the doc and have added a unit test that checks it.

log test

  • I suspect we don't have the chicken-egg problem here.

What about this? I believe we can just auto generate .cc and .h. But
again, we should probably prioritize completing this ticket than
trying to solve all small problems...

This is also done now. The .cc and .h are generated by the .mes file and are no longer checked in the tree.

  • technically, we should test the lock is acquired before the log output and released after that. The Lock test doesn't actually confirm it. But now that we've spent too much time on this ticket, if it's not easy to achieve, the current test is probably okay as a compromise.

This will require something similar to the last design. It would
need a wrapper around log4cplus so that logging calls are
intercepted by a mock object *between* the lock/unlock. It would
result in the same sort of ugliness as before.

I don't necessarily think so; I can imagine having the mock
lock/unlock produce additional logs, then check the ordering of the
resulting logged message some way. This approach shouldn't require a
change to the base and main (non test) classes. But that probably
requires an additional setup like a checker script, and due to the
excessive time we've spent for this ticket, I should probably live
with the current test.

This has been implemented now.

If the branch looks ok, let's get this in.

comment:59 Changed 6 years ago by jinmei

Quick feedback:

  • distcheck fails:
    rm -f Makefile
    ERROR: files left in build directory after distclean:
    ./logger_lockfile
    ./test1_lockfile
    ./test2_lockfile
    ./test_lockfile
    
    In general, if your branch introduces a new file, it often breaks distcheck in some way, so it's better to check it beforehand (distcheck is time consuming, so it's understandable if you're tempted to skip it for much trivial tasks, like fixing a bug within a single existing file. But I personally try to run it when my branch introduces a new file).

comment:60 follow-up: Changed 6 years ago by muks

Passes distcheck now. :)

comment:61 in reply to: ↑ 60 ; follow-up: Changed 6 years ago by jinmei

Replying to muks:

Passes distcheck now. :)

Okay, the substantial one was resolved.

As usual with a certain amount of diff, however, I have some more
comments. See below.

Other general things

Please don't forget updating changelog with taking into account the
review comment. Please also remember documenting the magic env
variables (which I think should go to a separate ticket at this stage).

top Makefile

Why are we using clean-local? Can't we use CLEANFILEs? What's the
purpose of '-' before 'rm'?

bind10_src.py

  • Should we still use B10_FROM_BUILD for the lock path?
        if "B10_FROM_BUILD" in os.environ:
            lpath = os.environ["B10_FROM_BUILD"]
    
    See also the similar comment on interprocess_sync_file.cc.

bind10_test.py

  • The intent of the trick in check_environment_unchanged isn't obvious. Please comment. (and does it really make sense? at least the tests seem to pass without this)

interprocess_sync_file.cc

  • Why are we still using B10_FROM_BUILD in InterprocessSyncFile::do_lock?
            const char* const env = getenv("B10_FROM_BUILD");
            if (env != NULL) {
                lockfile_path = env;
            }
    
    It doesn't seem to have any effect now (it doesn't cause any regression if we disable it.
  • tryLock doesn't seem to be necessary any more. In general, if we don't use a particular feature and don't have an immediate plan to use it, I'd apply the yagni principle and clean it up.

logger_manager.cc

What's the purpose of turning off the lock? Maybe the reason
explained in 6b3132f71b397e4925b80ed1dc1e3cb834be1294? It's not
obvious from the code, so please add some more detailed comments about
why.

log/tests/Makefile.am

Using AM_V_GEN seems to make sense now that we use the silent mode by
default. But in that case it's even better to use it for all similar
cases for consistency. Not an issue for this branch though. Maybe
create a new ticket?

log/tests/logger_example.cc

What are these setInterprocessSync calls testing? The same as
logger_manager trick? Please make some comments about the intent.

python tests

It doesn't seem to be sustainable to require the
B10_LOCKFILE_DIR_FROM_BUILD trick in all Makefile's. I'd, e.g.,
extend lib/python/isc/log/log.cc:resetUnitTestRootLogger() to set the
environment variable. But that's probably too much for this branch.
Let's create a separate ticket and defer the task to it.

interprocess_sync_null.h

  • Assuming this file could be used outside the source tree, I'd use <util/interprocess_sync.h> here:
    #include "util/interprocess_sync.h"
    
  • The declaration and definition of the destructor seem to be able to be skipped here.

interprocess_sync_file_unittest.cc

  • What's the purpose of the empty string at the beginning of unlink?
      EXPECT_EQ (0, unlink("" TEST_DATA_TOPBUILDDIR "/test_lockfile"));
    
  • And what's the purpose of these unlinks? Would something wrong happen without these? If so, it's generally better to do that by introducing a fixture and doing the cleanup in the destructor or in tearDown() and/or making sure the test starts cleanly by doing the cleanup in the constructor or in setUp().

interprocess_sync_null_unittest.cc

  • What's the intent of repeating lock variants 5 times?

logger_lock_test.cc

  • This comment seems to be a copy-paste and not make sense in this context:
    /// \brief Test InitLogger
    ///
    /// A program used in testing the logger that initializes logging using
    /// initLogger(), then outputs several messages at different severities and
    /// debug levels.  An external script sets the environment variables and checks
    /// that they have the desired effect.
    
  • Hmm, this output makes me notice we do unlock twice (once explicitly, then via the destructor). This may be safe in the file lock implementation, but I'd personally prevent that from happening at the InterprocessSyncLocker level. That's especially so as this implementation intends to force applications to use the InterprocessSyncLocker wrapper. And, in any event, if we want to require derived classes of InterprocessSync that duplicate unlock be safe, I think that should be explicitly documented.

comment:62 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:63 in reply to: ↑ 61 ; follow-up: Changed 6 years ago by muks

  • Owner changed from muks to jinmei

Replying to jinmei:

Please don't forget updating changelog with taking into account the
review comment. Please also remember documenting the magic env
variables (which I think should go to a separate ticket at this stage).

How does this look:

XYZ.    [func]*		muks
        The logger now uses a lockfile named `logger_lockfile' that is
        created in the local state directory to mutually separate
        individual logging operations from various processes. This is
        done so that log messages from different processes don't mix
        together in the middle of lines. The `logger_lockfile` is created
	with file permission mode 0660. BIND 10's local state directory
	should be writable and perhaps have g+s mode bit so that the
	`logger_lockfile` can be opened by a group of processes.

top Makefile

Why are we using clean-local? Can't we use CLEANFILEs? What's the
purpose of '-' before 'rm'?

Yes CLEANFILES can be used instead. Both are equivalent (whereas clean-local allows commands to be run for cleanup, CLEANFILES only deletes the files). I've updated the file. The '-' in front of 'rm' instructs make to ignore error exit status of rm and continue making.

bind10_src.py

  • Should we still use B10_FROM_BUILD for the lock path?
        if "B10_FROM_BUILD" in os.environ:
            lpath = os.environ["B10_FROM_BUILD"]
    
    See also the similar comment on interprocess_sync_file.cc.

In both cases, I've retained B10_FROM_BUILD and override it with B10_LOCKFILE_DIR_FROM_BUILD, so that at least if B10_FROM_BUILD is set, it'll use the corresponding dir for creating lockfiles.

bind10_test.py

  • The intent of the trick in check_environment_unchanged isn't obvious. Please comment. (and does it really make sense? at least the tests seem to pass without this)

That code has been removed now (it was necessary in a temporary step that never was committed).

interprocess_sync_file.cc

  • Why are we still using B10_FROM_BUILD in InterprocessSyncFile::do_lock?
            const char* const env = getenv("B10_FROM_BUILD");
            if (env != NULL) {
                lockfile_path = env;
            }
    
    It doesn't seem to have any effect now (it doesn't cause any regression if we disable it.

Same explanation as above.

  • tryLock doesn't seem to be necessary any more. In general, if we don't use a particular feature and don't have an immediate plan to use it, I'd apply the yagni principle and clean it up.

It is used in the unit tests to test the lock implementation. It's best if tryLock() is in the place it is, than in the unit tests as it would depend upon the lock implementation and uses similar code (the do_lock() method).

logger_manager.cc

What's the purpose of turning off the lock? Maybe the reason
explained in 6b3132f71b397e4925b80ed1dc1e3cb834be1294? It's not
obvious from the code, so please add some more detailed comments about
why.

A comment has been added.

log/tests/Makefile.am

Using AM_V_GEN seems to make sense now that we use the silent mode by
default. But in that case it's even better to use it for all similar
cases for consistency. Not an issue for this branch though. Maybe
create a new ticket?

Ticket #2015 has been created for it.

log/tests/logger_example.cc

What are these setInterprocessSync calls testing? The same as
logger_manager trick? Please make some comments about the intent.

A comment has been added.

python tests

It doesn't seem to be sustainable to require the
B10_LOCKFILE_DIR_FROM_BUILD trick in all Makefile's. I'd, e.g.,
extend lib/python/isc/log/log.cc:resetUnitTestRootLogger() to set the
environment variable. But that's probably too much for this branch.
Let's create a separate ticket and defer the task to it.

Even that is not enough. The logger is called/initialized by the tests in different ways. We can investigate this later, but setting the env variable in the Makefiles is necessary now for tests to pass.

interprocess_sync_null.h

  • Assuming this file could be used outside the source tree, I'd use <util/interprocess_sync.h> here:
    #include "util/interprocess_sync.h"
    

This has been updated, as have some other header files.

  • The declaration and definition of the destructor seem to be able to be skipped here.

As InterprocessSync? has an implementation, I have not added one to InterprocessSyncNull? as it'll be empty too.

interprocess_sync_file_unittest.cc

  • What's the purpose of the empty string at the beginning of unlink?
      EXPECT_EQ (0, unlink("" TEST_DATA_TOPBUILDDIR "/test_lockfile"));
    

That came out of habit. I have removed these.

  • And what's the purpose of these unlinks? Would something wrong happen without these? If so, it's generally better to do that by introducing a fixture and doing the cleanup in the destructor or in tearDown() and/or making sure the test starts cleanly by doing the cleanup in the constructor or in setUp().

The unlinks are used to cleanup (remove the test lockfiles). Without these, distcheck would complain about stale files that exist after make clean. They can be removed by a clean target, but in this case it's better if the test cleans up. Is a fixture absolutely necessary here?

interprocess_sync_null_unittest.cc

  • What's the intent of repeating lock variants 5 times?

It is just an arbitrary number. Any number of times lock() and unlock() are called, the null implementation should always return true.

This has changed to "just 4" calls in a row now due to code changes. :)

logger_lock_test.cc

  • This comment seems to be a copy-paste and not make sense in this context:
    /// \brief Test InitLogger
    ///
    /// A program used in testing the logger that initializes logging using
    /// initLogger(), then outputs several messages at different severities and
    /// debug levels.  An external script sets the environment variables and checks
    /// that they have the desired effect.
    

Eek. This has been fixed.

  • Hmm, this output makes me notice we do unlock twice (once explicitly, then via the destructor). This may be safe in the file lock implementation, but I'd personally prevent that from happening at the InterprocessSyncLocker level. That's especially so as this implementation intends to force applications to use the InterprocessSyncLocker wrapper. And, in any event, if we want to require derived classes of InterprocessSync that duplicate unlock be safe, I think that should be explicitly documented.

Now the destructor checks the lock status before unlocking. I have updated the class comment about maintaining this status variable, and also updated tests towards it.

comment:64 Changed 6 years ago by jinmei

Okay, this was false positive due to skipping re-running ./configure
(I thought automake regenerated these files, not sure why it didn't work):

Not looking into it yet, but logger lock test failed:

/bin/sh /Users/jinmei/src/isc/git/bind10-1704-2/src/lib/log/tests/logger_lock_test.sh
-n Testing that logger acquires and releases locks correctly:
4d3
< LOGGER_LOCK_TEST: UNLOCK
 FAIL
Last edited 6 years ago by jinmei (previous) (diff)

comment:65 in reply to: ↑ 63 ; follow-up: Changed 6 years ago by jinmei

Replying to muks:

In short, the branch now looks ready for merge. Thanks for your
patience:-)

Responding to some selected points:

How does this look:

XYZ.    [func]*		muks
        The logger now uses a lockfile named `logger_lockfile' that is
        created in the local state directory to mutually separate
        individual logging operations from various processes. This is
        done so that log messages from different processes don't mix
        together in the middle of lines. The `logger_lockfile` is created
	with file permission mode 0660. BIND 10's local state directory
	should be writable and perhaps have g+s mode bit so that the
	`logger_lockfile` can be opened by a group of processes.

Looks okay.

bind10_src.py

  • Should we still use B10_FROM_BUILD for the lock path?
        if "B10_FROM_BUILD" in os.environ:
            lpath = os.environ["B10_FROM_BUILD"]
    
    See also the similar comment on interprocess_sync_file.cc.

In both cases, I've retained B10_FROM_BUILD and override it with B10_LOCKFILE_DIR_FROM_BUILD, so that at least if B10_FROM_BUILD is set, it'll use the corresponding dir for creating lockfiles.

Hmm, I'm not sure if it really makes sense, but I don't oppose. In
any case, we'll need to clarify the magic variables somewhere.

  • And what's the purpose of these unlinks? Would something wrong happen without these? If so, it's generally better to do that by introducing a fixture and doing the cleanup in the destructor or in tearDown() and/or making sure the test starts cleanly by doing the cleanup in the constructor or in setUp().

The unlinks are used to cleanup (remove the test lockfiles). Without these, distcheck would complain about stale files that exist after make clean. They can be removed by a clean target, but in this case it's better if the test cleans up. Is a fixture absolutely necessary here?

Hmm, it does not necessarily have to be a fixture, but, technically,
adding cleanup code in the end of a test case is fragile because we
might skip it due to ASSERT_xx failure (and the missing cleanup might
cause unrelated false positives). Also, it's more error prone because
we could easily add another similar test case but forgetting the
cleanup. For the distcheck case you explained, the first reason
doesn't matter because the check will fail anyway, but in general I
prefer keeping it as safe as possible.

That said, it's quite minor at this stage. Let's keep it as it is and
merge. If we encounter a problem due to the way we clean it up we can
then revisit it.

comment:66 Changed 6 years ago by jinmei

  • Owner changed from jinmei to muks

comment:67 Changed 6 years ago by jinmei

  • Total Hours changed from 0 to 12.25

comment:68 in reply to: ↑ 65 Changed 6 years ago by muks

Replying to jinmei:

Hmm, it does not necessarily have to be a fixture, but, technically,
adding cleanup code in the end of a test case is fragile because we
might skip it due to ASSERT_xx failure (and the missing cleanup might
cause unrelated false positives). Also, it's more error prone because
we could easily add another similar test case but forgetting the
cleanup. For the distcheck case you explained, the first reason
doesn't matter because the check will fail anyway, but in general I
prefer keeping it as safe as possible.

That does make good sense for cleanup. We should update it at some point.

comment:69 Changed 6 years ago by muks

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged to master:

* f71a0c6 [1704] constify
* 7987b09 [1704] Use the is_locked_ flag to call unlock() just once
* 1c0e6f8 [1704] Fix comment describing the test in logger_lock_test.cc
* 52edf76 [1704] Remove empty strings used when concatenating to TEST_DATA_TOPBUILDDIR
* cc84b0a [1704] Change search style for includes in the interprocess_sync_* headers
* f68857b [1704] Don't update the environment inside check_environment_unchanged()
* 0785152 [1704] Clarify why the null logger is used in logger_example
* 6306fee [1704] Clarify why a null sync object is used in LoggerManager::readLocalMessageFile()
* 8d55fd2 [1704] Clarify why we update the environment in check_environment_unchanged()
* a0bf4ce [1704] Use CLEANFILES instead of clean-local
* 0887605 [1704] style matters: fold long lines, explicit virtual for inherited methods.
* 56c895c [1704] Cleanup logger_lockfile during make clean
* 60c3dce [1704] Delete test lockfiles when done
* d6859a6 [1704] Change env variable used in tests for lockfiles
* 162e151 [1704] Change task name of object
* 1f12f3c [1704] Add some tests for InterprocessSyncNull class too
* 3e8b461 [1704] Use 'logger' as the task name for all interprocess sync objects inside log/
* 949d3d2 [1704] Check that logger acquires and releases locks correctly when logging messages
* e04531f [1704] Add note about exception thrown
* 3098730 [1704] Set B10_LOCKFILE_DIR_FROM_BUILD for various tests
* 6b3132f [1704] Use a null InterprocessSync object for some tests
* 865669e [1704] Update documentation
* 58c0a8e [1704] Set and use B10_LOCKFILE_DIR_FROM_BUILD in tests
* 8c38fe1 [1704] Remove log_test_messages.{cc,h} from the tree
* e007ac3 [1704] Add targets to create log_test_messages.{h,cc} from .mes
* fcb6345 [1704] Throw if NULL is passed to logger.setInterprocessSync()
* cd37331 [1704] Use @abs_top_builddir@ directly in the .py.in file instead of using an env variable
* 277be10 [1704] Assert that the lockfiles exist
* 8eb4eb3 [1704] Assert that the lockfile_test directory exists, and is removed
* e72bc12 [1704] Move test directory creation to setUp() / tearDown()
* 075eef9 [1704] style matters: don't omit {}, func name convention, typo.
* 2e95197 [1704] Test remove_lock_files()
* 7c1277b [1704] Use B10_FROM_BUILD instead of B10_FROM_SOURCE (contd.)
* 994827b [1704] Use B10_FROM_BUILD instead of B10_FROM_SOURCE
* 39ebe9d [1704] Don't set B10_FROM_SOURCE for logger unittest
* 9227b2d [1704] Add documentation for InterprocessSync and InterprocessSyncFile
* 1d5bd47 [1704] Add documentation for setInterprocessSync()
* 328f878 [1704] In case NULL is passed as the sync object, do nothing and return
* 16fc370 [1704] Move code to different places (no functional changes)
* d575fba [1704] Add a comment about use of select() in the test
* bd5494f [1704] Initialize sync_ in the initialization list
* 8b555fc [1704] Update comment about using locking in the logger
* 93fb5ca [1704] Remove obsolete include
* caa5bb9 [1704] Make InterprocessSyncFile's destructor virtual
* 227d017 [1704] style matters: long line, () for return, const, { position.
* 30198a3 [1704] Use top_builddir during tests as top_srcdir may not be writable
* c0a328b [1704] Unit test the logger for the lock
* b45a54e [1704] Open lock files lazily
* 8c455b3 [1704] Add a select() to the parent reader in case the child messes up
* 31b153d [1704] Replace TEST_F with TEST
* 10b434b [1704] Minimize code
* db2fa0c [1704] fd_ cannot be -1 here
* 09e8c9e [1704] Use a helper for lock/unlock/tryLock as they use the same pattern
* 2bf7dad [1704] Remove separate class hierarchy for InterprocessSyncLocker
* 876dd74 [1704] editorial nits: folded long lines, position of '*', sizeof(), constify.
* 17f327d [1704] Add proper testcases for the file based locker
* 9639134 [1704] Delete lock files when BIND 10 shuts down
* 9427bd4 [1704] Update test environments to set B10_FROM_SOURCE
* 588bffd [1704] Add InterprocessSync interface and a file based implementation
* 106ec83 [1704] Use LoggerImpl as a testcase for InterprocessSync

Resolving as fixed. Thank you for the reviews Jinmei! :)

comment:70 follow-up: Changed 6 years ago by wilx

I see you have fixed it with your own solution, so this is just FYI. I have extended the locking mechanism I have recently added to log4cplus::FileAppender over to all appenders on trunk and 1.1.x branches of log4cplus. If you want to test it, it is enabled by UseLockFile and LockFile properties. Setting the LockFile property is necessary for all appenders not based on FileAppender.

comment:71 in reply to: ↑ 70 Changed 6 years ago by shane

Replying to wilx:

I see you have fixed it with your own solution, so this is just FYI. I have extended the locking mechanism I have recently added to log4cplus::FileAppender over to all appenders on trunk and 1.1.x branches of log4cplus. If you want to test it, it is enabled by UseLockFile and LockFile properties. Setting the LockFile property is necessary for all appenders not based on FileAppender.

Cool, that's nice to see. Of course we need to work with the 1.0.x code for the near- and medium-terms, so we needed our own solution. :)

Note: See TracTickets for help on using tickets.