wiki:WeeklyMinutes20121127

BIND 10 Team Call 2012-11-26

Attendance

  • Shane
  • Kambe
  • Jelte
  • Mukund
  • Jinmei
  • Jeff
  • Michal
  • Aharen
  • Larissa
  • Jeremy
  • Fujiwara

Sprint Progress (10 minutes)

http://bind10.isc.org/query?group=status&milestone=Sprint-20121204

Michal: problems with dependencies, next time check dependency graph
Jinmei: #2442 should now be possible
Jelte: a lot of parallel in beginning, and at end, but a lot of dependency in the middle

Michal: would be nice with a tool to see dependency graph
Shane: spend cycles on trying to find Trac dependency plugins?
Michal: not now, with beta coming soon
Jinmei: a tool may help, but another background reason is that sometimes the ticket description is incomplete or does not fully describe dependencies, or we don't recognize them

AP: Shane research dependency plugins for Trac (for after beta)
(have a look at http://trac-hacks.org/wiki/TracLinks and http://trac-hacks.org/wiki/MasterTicketsPlugin)

Jinmei: do we want to maximize parallelism by starting some tasks partially?
(crickets chirping)
Jinmei: maybe we should wait a couple of days if everyone has work right now?

Beta Discussion (15 minutes)

https://lists.isc.org/pipermail/bind10-dev/2012-November/004080.html
Jinmei has modified the plan for zone loading to meet our beta deadline. We'll discuss this new plan, as well as other issues.

I note that Jeremy added something like 15 tickets to Next-Sprint-Proposed, which seems tricky. Probably we should discuss the motivation there.
(wild guess: Things That Should Be Fixed For The Beta)

Jeremy: These are old tickets... some were already in Next-Sprint-Proposed in last couple of months.
Jeremy: Around half were in list of beta tasks. Other half were already marked as a High Defect Severity.

Jelte: Backlog for big release?
[ Shane & Jelte to discuss ]

Jinmei: 2 options once basic parser complete:

  • post-load validation
  • enhance in-memory version (out-of-order loading)

Shane: slightly preference for #2, partially depends on progress of this sprint
Jelte: I prefer later part for a different reason - *IF* unvalidated data cannot crash our server
Shane: not sure... depends on the specific checks

Future BIND 9 Feature Support (15 minutes)

BIND 9 has a number of new features that will probably be coming soonish. We should discuss general approach for handling these in BIND 10, before we actually develop them in BIND 9.

  • EDNS-client-subnet (Google-sponsored idea; auth server can discover subnet of stub resolver) (To be funded for BIND9...what about BIND10?)

Shane: I thought about a hook for this Michal: EDNS as a ACL check, could be reasonably simple

  • Geo-Location (Giving a different answer depending on where the query originates from, based on IP addr of resolver)

Jelte: If we have it, we should support client-subnet

Michal: like an ACL check, extended into dynamic libraries in the future

  • RRL (response rate limiting - technique to mitigate reflection (and to a lesser extent, flood) attacks)

multi thread/process issues and logging (from Jinmei, 15 minutes)

also related to: what we should do for #2198
http://bind10.isc.org/ticket/2198

I'm afraid we've got lost in the use of log4cplus with multi processes and multi threads. The multi-process usage revealed the problem of mixed logs, and the recent introduction of multi threads introduced various race conditions and portability issues. To deal with these we've been developing our own hack at the lower level (not even using third party tools like the boost interprocess or thread related modules), such as inter-process file locking and in-house inter thread locks (the latter is used for other purposes, but we are going to end up introducing the try-lock primitive just for testing logging usage of the locks). We've then suffered from all lower level troubles.

IMO we're now losing major advantages of outsourcing non-critical business: avoid re-inventing wheels and concentrate on our core business, and yet having drawbacks of increasing dependency. I think we should revisit our approach based on these lessons, rather than trying to fix specific issues superficially.

Some options I can see are:

  • use our own logging module instead of log4cplus (we'll need to implement more, but we can control the things, and its availability).
  • say "use log4cplus 1.1 for advanced stuff like multi-process or multi-thread" and ask for using some workaround if it cannot be done (a strawman idea is to separate log files for each process; for multi-thread issue we may not be able to solve all problems this way but at least we can minimize the scope of the problems).

Jeremy: summarize problems and go back to log4cplus developers
Jelte: API incompatibilities with latest version...
Jeremy: I've installed latest release
Shane: I think that makes sense, and in principle we can say "upgrade to 1.1 if you have problems"

Mukund: we should get log4cplus to work properly, if that takes too long we need an intermediate solution
Mukund: so ask log4cplus developers to fix problem, we don't have scheduled implementing a logging library

Shane: do we have a list of issue?
Jeremy: there are tickets for everything, can find after call
Mukund will gather info to contact with
(note regarding those tickets, we need to identify which ones are indeed log4cplus issues and mention that in the ticket (preferably with a link to log4cplus list/issue tracker)

++/-- style proposal (bikeshed of the week) (from Jinmei, 5 minutes)

I thought it was already in the style guideline but it doesn't seem to be so, so:

use the prefix style by default: i.e.,

  int i = 0;
  ++i; // instead of i++

it's a well known practice for non trivial types for performance reasons, but I suggest we do this for basic types like int for consistency. by being consistent, it will be easier to notice

when we use the less efficient style when it really matters.

of course, sometimes the context requires a particular style, so there can be exceptions.

if it's not obvious from the context, leave a comment about why we use the different style.

jinmei will update style guide for this.

Status of trac (from Michal)

It's down from time to time for longer periods of time. Should we have a backup plan?

Shane to talk to operations team

Michal: snapshot of current tickets from time to time, work without Trac running...?

Shane to ask ops for monitoring. They use nrpe, but it is not enabled on the bind10 webserver.

Cause of inconsistencies in in-memory datasrc APIs

There seem to be many small glitches and inconsistencies in the APIs of the new in-memory data (ZoneSegment? is created by ::create, but some places expect it as shared_ptr, for example). But I don't remember these issues happening before. Is my memory bad, or something changed recently?

User complaints about installation barriers, mainly dependencies
ever lasting issue, but is becoming more serious toward beta.
several random ideas:

  • (try to) push ports/pkg/etc to various OSes/distributions before beta
  • include boost tar ball in bind10 tar ball
    Jelte: if we remove Boost from public headers, we can do away with external problems
    Michal: I don't think that's an option
    Mukund: is Boost a problem?
    Michal: we can ship, use if no system library, but prefer system library

    Jelte: another idea was to make a tarball that include all dependencies
    Mukund: Boost is not unreasonable to ask
  • introduce libdns++-only mode (those who don't need servers)
  • introduce null-encryption mode (to exclude botan)

Jinmei: propose we pursue this for beta (or soonish)
Mukund: can we cut down on dependencies? like use OpenSSL instead of Botan
see http://bind10.isc.org/ticket/2406
Or use python2 instead of python3 (as an example)

Security practice review

Jinmei: Private repository should be moved to a shared place

Production service crashing too often

as112 server near 50 crashes of b10-auth and a few times for whole bind10. Yesterday all crashed again.
http://bind10.isc.org/ticket/1937
http://bind10.isc.org/ticket/2398 (maybe)
https://lists.isc.org/pipermail/bind10-dev/2012-November/004075.html

Jelte: kind of convinced we should remove top-level try/catch
Shane: I agree, I think it obscures useful information
Jinmei: removing catch is not good solution, will terminate in an uncontrolled way
Jinmei: could get stack trace in another way
Michal: are we doing any kind of cleanup
Shane: possible a data source could register some kind of cleanup (atexit)
Jinmei: okay removing catch for debug-only environment

Jelte: if we have a way to print stack trace I would like to have it!
Michal: in the catch part of code, the stack is already unwound
http://stackoverflow.com/questions/691719/c-display-stack-trace-on-exception

Jinmei: temporarily remove catch to debug issue; want to avoid doing it permanently

Status of 2494 issue?

need to score it
Jeff & Shane (or Larissa) to discuss

Over at 16:06 UTC

Last modified 5 years ago Last modified on Dec 12, 2012, 3:19:13 PM