Changes between Version 1 and Version 2 of March2011OriginalMeetingAgenda

Mar 28, 2011, 9:23:30 AM (7 years ago)



  • March2011OriginalMeetingAgenda

    v1 v2  
    1 ''The March 2011 meeting falls at the end of the project's 2nd year. We
    2 should have a pretty good idea of where the project is at - the focus
    3 of this meeting will be organizing the work for the project's 3rd
    4 year.
    6 CZ.NIC is hosting the meeting, and we will be joined by 3 people from the CZ.NIC DNS server team. This should be good for the BIND 10 developers and the DNS operators!
    8 = Logistics =
    10 We will be at the CZ.NIC office, and try to keep to 09:30 to 17:00 each day.''
    12 ----
    14 ''= Agenda =
    16 We have certain topics scheduled at particular times, and a number of "free-floating" topics that we can slot in on an "as needed" basis.
    18 == Monday, 2011-03-21 ==
    20 We have no new people attending this meeting, so we will not have an introduction to BIND 10 development.
    22 '''Opening Remarks'''
    24 When everyone arrives, we'll officially open the meeting, with our standard opening.
    25 ''
    26  * Greetings & Salutations
    27  * Introductions
    28  * Meeting roles & etiquette
    29  * Meeting goals
    30  * Meeting plan overview
    32 Introductions
    34 Attendees:
    35  * Shane
    36  * Larissa
    37  * Stephen
    38  * Jeremy
    39  * Michal
    40  * Jelte
    41  * Likun
    42  * Jerry
    43  * Lubos (guest from CZ.NIC)
    45 Via phone starting Tuesday:
    46  * Aharen
    47  * Kambe
    49 Joining us later in the week:
    50  * Fujiwara
    51  * Michael
    52  * Other cz.nic DNS developer folks
    54 Scrum planning is later in the week after some overall topic discussion
    55 Agenda is vast but flexible
    57 Goals:
    58 Like all of our annual kickoff meetings we have two goals
    59 1. project status - not the major goal, we hope to review and wrapup and evaluate today.
    60 2. the rest of the week will be a discussion of year three stuff: what and how we will deliver in year three.
    62 '''BIND 10 Year 2 Wrap-Up'''
    63 ''
    64 This will be a discussion of how Year 2 of the project went. We should look at the technical and other aspects of the project, and make sure we have consensus about where the project is at right now.
    65 ''
    66 (viewing
    68 After our January meeting we knew what we would actually deliver in Year 2. Shane then sent mail to the BIND 10 Steering Committee detailing what we could and could not send, and some of that is the list represented here.
    70 Our main goal was to focus on the resolver, and our secondary goals were an authoritative performance increase, additional backends, and to begin looking at the command tool.
    72  - We do have a working resolver, but we do not have a dnssec enabled resolver.
    74  - The hot spot cache is actually slower than the in-memory, for now.
    76  - We didn't quite get "bind 9 query performance" but we are within a binary order of magnitude.
    78  - We don't have a command language prototype but we may get a hack going next week
    80  - We will definitely not have XML statistics reporting done - its in a branch but it needs thorough review. It is not in the release tarball.
    82  - We have done a lot of work (maybe not quite half) toward DDNS and IXFR - though the recent in memory work may have impacted this.
    84 Shane's thinking about why we got where we got and did not make all y2 objectives:
    86 1. At the year one deliverable, we had a lot of extra work we delayed - a lot of technical debt from the y1 release had to be done after year one which took about two months.
    88 2. We had one less ISC engineer working on the project than we expected - it took a long time to hire Scott and then he moved to the BIND 9 project.
    90 3. We also took a long time to come to what it meant for us to build a resolver. We're not sure how we could have sped this up. It seemed important to analyise processes but maybe it wasn't necessary at this point in the project. We didn't know how to break the big problem into little problems. We didn't really know the problem space.
    92 4. We adopted scrum in a somewhat piecemeal fashion, and the adoption of new process temporarily slowed us down. It has probably now sped us up though.
    94 Jeremy's thinking:
    96 1. Not always knowing what other developers are doing - we could have reused code that we didn't, and new designs were implemented without using existing code
    98 Jinmei:
    100 I don't think that is such an issue - using jabber and the scrum organisation
    102 Shane:
    104 Early on in the project we adopted a default ISC development model where you assign a piece of work to a developer and they "submarine" with it. One of the really good things about Scrum is it takes us away from that. You're given "automatic buoyancy"
    106 Jeremy: we didn't implement the features that the sQLite implementation has in the in-memory implementation
    108 Stephen: the last few months have been very much about the y2 deliverables, and as ScrumMaster I was pushing just to get year two out, focus on what is essential for y2 deliverables only. It was necessity over function. Shane is aware of the politic, but it is important for us to keep our obligations so sponsors see progress. So when we're coming up to planning for year three, I would like for us to say by xx dates we will get certain deliverables done. That way we can tell the sponsors periodic progress updates.
    110 Larissa: I think that is the plan and should be the plan. We can break it down even more.
    112 Jelte: do the sponsors think we achieved our goal?
    114 Shane: we have 11 or 12 sponsors. Of them, the ones who have contributed developers have taken the strongest interest. We have a mix of involvement. A lot of them are non profits tlds who have money, and they don't worry so much about it.
    116 Stephen: but some of the sponsors *have* given feedback, and we should take that seriously.
    118 Shane: but not all are in this situation, many have specific hopes for bIND 10. A lot of the sponsors want a higher performance server. If you've got 20 sites around the world with a stack of servers in each one, it would be a considerable savings to have less servers in each. They're all adopting DNSSEc, etc. From this point of view, most sponsors are just happy for us to make good progress. As far as being visible and giving them incremental progress, one of the things we don't have that Scrum insists upon is customers working with the team. We don't have that and we have to guess. However, we're not just building the software for the sponsors. None of our current sponsors run big resolver farms. As much as I send updates to the sponsors, they are very busy and distracted, and we often don't get feedback from them. We sent a draft of the year 3 plans to the sponsors and we heard nothing. Silence doesn't mean they're happy, just that they're busy. So Shane approached each of them directly and then we got a lot of suggestions. We're going to try to get more feedback and more testing in year three.
    120 Jelte: I think they don't care too much about the resolver, for which the essential parts of the resolver only came out last week.
    122 Larissa: Perhaps we have less technical debt at the end of year two?
    124 Shane: I don't think so, but I think we know what it is and the work is planned. Also we now are always putting out user visible features.
    126 Jinmei: Regarding the sponsors, I've been feeling that they are too generous. If they're serious, I would think we're going to get more pressure from them.
    128 Shane: the one sponsor who has said exactly what they want is JPRS. But I think they are reluctant to do that. I think it cost us a little bit, that we didn't know sooner what they wanted. So for example, we had a discussion at the 2010 fall face to face, about performance, where we said well, from our perspective we've met our year two goals. JPRS wasn't happy about that and eventually came to us and said that it wasn't okay, so then we changed our goals. But changing goals has a cost. We hope now we can get feedback earlier and with scrum we can change course more quickly when we need to respond to changing sponsor requirements.
    130 Stephen: of course, we have a long range plan, in that we need to accomodate all the requirements laid out in the RFCs.
    132 Jinmei: I think there have been some inevitable overhead cycles with Scrum. If we simply want to port BIND 9 behavior such as the red-black tree to BIND 10, and we only care about the speed/easiest way to develop this, that is one thing, but instead we tried to break it down and share it in a scrum model, and this caused overhead.  Of course this is also a good thing, because many more people are familiar with the development. Hopefully it is a long term investment, which we can benefit from in year three.
    134 Shane: I hope so, I've been very happy with the recent cycles since the last face to face meeting.
    136 Jinmei: I hope so too, we just need to carefully watch how things improve, or not.
    138 Shane: on paper, the overhead cost of Scrum is very high.
    140 Stephen: but its actually standard for other projects.
    142 Jelte: of course in waterfall the initial phase is more than 10%
    144 Shane: and quite a few of our sponsors also use Scrum. Fredrico, our Brazilian sponsor, they recommended it, so does RIPE NCC and CIRA.
    146 Jinmei: The overhead of the project, with the testing and review, are also considerable but important. Just for example, comparing us to BIND 9 development two or three years ago, all of our code must have tests and careful review, so we tended to underestimate the work time, and I guess thats another reason why we couldn't make the progress we expected. I guess providing the tests is also a kindof an investment. It will help with refactoring later. For the long term it may result in shorter development.
    148 Shane: In engineering courses we were taught that 30% of the time is coding, 30% is testing and review, and 40% is design, requirements, and other overhead. I actually think its pretty accurate. I probably spend twice as much time testing when I work on BIND 10 code as I do coding. I don't think testing is overhead I think its part of the process.
    150 Jinmei: Maybe I should have said it differently, I am actually positive abotu providing tests. I just think it introduces underestimation.
    152 Shane: I think we have gotten better about not having review backlogs. I think this is a natural output of getting away from the submarine model.
    154 Larissa: I think the scrum model has helped us a lot with estimates and understanding status and we can do even better this year.
    156 Shane: I'm really happy with the status of the project now. We had a dip this year but if we can do even 75% of what we've done the last three months, I think we can make our goals this year.
    158 Jinmei: I hear that google is going to sponsor. What do they want?
    160 Larisa: I believe they want geolocation. We can meet with Warren Kumari next week.
    162 Shane: they did reject us for Google Summer of Code though. Perhaps we can get some of our own interns though. (general agreement that we will try to do this)
    167 '''BIND 10 Scope of Work''' (Shane)
    169 ''Shane will present the SoW document that is being used to get grant agreements from sponsors for the year 3 work, and go through it in some detail.''
    171 Shane: we have to get renewed committments from the sponsors annually. We show them what we did and what we're going to do, and we don't know for sure who will sponsor year to year. Sponsors may change year to year, Norm Ritchie is getting us some new ones for year three (google, .ru, .nz, possibly Chilean and/or Australian registries)
    174 The SoW is the document we use to tell people what we're doing. The year by year plan should be no real surprise: year 1 auth server, year 2 recursive server, year 3 goal, build on previous work and come up with a "production ready" server. Of course "production ready" is highly subjective. The goal of year three is the "80/20 rule" - 20% of the work can cover 80% of the users - this is the goal for year 3.
    176 Years four and five - our two most important things to do in BIND 10 are not to make the mistakes of BIND 9 - we must be faster, and more compatible, not less. That is the work of year four. In year five, we have reserved time for "all the other stuff". It will be a wrap up, loose ends, and some "cool" things. Right now BIND 10 is about 10% of ISC's operational budget. We have to transition the project financially at that time. We move from being a special project to a mainstream ISC software product. The "ideas page" on the wiki is a place where potential year 5 projects can go. We hope by then the thriving user community will give us guidance as well.
    178 Referencing the Y3 Wiki Page: (
    180 JPRS suggested we divide this work into three categories: things originally planned for year two, things originally planned for year three, and newly introduced things, and that makes sense.
    182 (discussion of the list on the wiki)
    184 Jinmei: regarding views, in some sense we already have them in the modular system, separating authoritative and recursive. Some people will use them that way.
    186 Shane: I think a lot of people use views to provide two faces on the authoritative side.
    188 Jinmei: so I want to know what do people want: separate auth and recursive, or views within on of these?
    190 Larissa: I will look into the survey data more thoroughly.
    192 Michal: Its harder if its in one side
    194 Shane: yes we may have to do extra refactoring work
    196 Shane: I always planned operational tools support, this is an umbrella topic. One thing we need here is "phone home" technology. Recursive resolution tracing will be a stand out feature for operators using BIND 10. Full system information aka "BIND 10 showtech" is important for support, debugging, etc. Fedora has a really cool crash report feature where you submit a report and then it looks up your issues in the database to see if others have had it. Quite sexy. We have to decide how many of the BIND 9 tools we duplicate this year, as well.
    198 Features we added since project incetption:
    200 Command tool: Jerry Scharf has specified this and we need the initial implementation this year.
    202 Authoritative data sources: we have multiple options here. We had to choose the lowest risk path in BIND 9, but we have other options: radix trees, Jinmei's initial work, and something Vixie suggested - the "first no compromise DNS in memory data structure".
    204 Jinmei: these are not necessary the only alternatives nor are they exclusive. The other thing is that I have actually looked at the code Vixie provided and my option is the application is quite limited. Its mainly focused on a recursive IPv4 reverse lookup use case. For such purposes it might be quite fast, but its not a general purpose use case. (Shane notes maybe this was for blackhole use cases)
    206 Michal: maybe we can get the ideas out there and then decide if we can apply it.
    208 Shane: my personal feeling is that performance optimization on the auth side is mostly a distraction. Most authoritative operators don't need more performance than we currently have. However, like graphics benchmarking, its what people look at. And our sponsors care about it.
    210 Another goal we want for this year now is hooks for plugins.
    212 Jeremy: what about recursive performance?
    214 Shane: its not on the official year 3 goals for now. (it is on the overall project goals of course.)
    216 Jeremy: I have benchmarked it.
    218 Shane: I do want to see a comparison across resolver products on this. Its hard to benchmark this but it would be very interesting.
    220 Jelte: actually this is an area where the current sponsors may care about recursive performance - when we start resolution lower down, we can save traffic for root operators and TLDs.
    222 Shane: we have other topics for this year which are not exactly development but they are in increasing reliability and people's trust in the reliability of the software. Testing, test platforms, security audit, system testing, and operational experience and documentation of that.
    224 Jeremy: we have done some of the interoperability work already using some tools by Robert Edmonds.
    226 Shane and Jelte: and building on the NSD work, some of which Jelte worked on for NSD 3.
    228 Shane: we need a security audit, but Barry Greene has pointed out that this could mean a lot of things. We need to figure out what this will mean. We need to help people feel confident. Security is always a trade off between functionality, ease of use, and cost.
    230 Jeremy: regarding system tests, we need to review the BIND 9 model this week and decide if we want to move forward with that model or use another.
    232 Shane: operational experience. We're going to be running this on some ISC servers in operation. We're starting with and then moving to our AS112 server as our next operational step, the ISC's internal resolvers, and then the big scary things, SNS-PB (which is a best effort service) hopefully in September. And then hopefully after a few months we could test it on root nameservers. By our year four kickoff, maybe we can discuss the operational problems of root nameservers in BIND 10 :)
    234 Stephen: I am wondering about discussion of other refactoring in year 3, in logging, and TCP, probably a lot of other areas.
    236 Shane: we had to make some estimates, so I made some SWAGs. I don't think they're too far off.  (see SoW for SWAG estimates) We will revise the estimates based on the output of this meeting. If history is any indicator, we will end up with more work not less.
    238 IF we get more input from users, we could get more direction based on actual user experience. That is our goal.
    240 Jeremy: one thing that isn't on the list is a DNS specifications document. Basically going through all the RFCs, BIND 9, and other implementations and creating a specifications document now.
    242 Jelte: people have started to do this inside the IETF but its crazy to do it there.
    244 The existing Wiki page on resolver design is more design than requirements and does not reference RFCs specifically.
    246 Jeremy: this would be a thousand pages if it doesnt just reference the RFCs.
    248 Jeremy: having a design document like this might really help us know what we missed.
    250 Stephen: a design document says how you do it. Functional testing requires a list of what you do, which is what this would be.
    252 Shane: when you do requirements documents, the scope of what you're describing is the requirements document itself. If its meant to be for other implementations besides just BIND 10....
    254 Stephen: we're also talking about answering questions like "what does BIND 9 compatibility mean?"
    256 Jelte: and sometimes the thing that the RFC says is not the thing any implementation is actually doing, because the RFC doesn't make sense.
    258 Larissa: are there other year three feature level items the team thinks are missing?
    260 Shane: Jinmei sent one about standalone library packaging.
    262 Jelte: we also need to discuss an API freeze at some point.
    264 Shane: or at least versioned.
    266 Michal: or you can do it the way Linux kernel does it.
    268 Shane: we may have certain points where we can break API compatibility, thats how BIND 9 does it.
    270 Jelte: you can go the Linux Kernel way, or the Firefox (and BIND 9) way. In Firefox only break on .0 releases.
    272 The missing and incomplete items seem to be:
    274  * DNS requirments document
    275  * refactoring of: ASIO, and....??
    276  * finishing incomplete features such as logging and TCP connection handling
    277  * standalone packaged libraries
    279 (list to be continued later)
    281 Shane: we need a measurement of deciding what we include. I think "will this help us becoming production ready" should be it.
    283 Larissa: yes
    285 Michal: the problem is we keep putting things off for later.
    287 Shane: everything we add means bumping something off.
    289 Shane: people seem to be uncomfortable with my proposed yardstick, so maybe we can discuss that.
    291 Stephen: we have already given the current list to the sponsors, so must we stick with it?
    293 Larissa and Shane: not necessarily.
    295 Larissa: so we can add other features to the "if we can" list
    297 Discussion here about how to handle requests we cannot currently accommodate, such as HSM support and IXFR-from-differences, etc. Current plan is that we let people know we cannot add more features without more human resources - this is tricky, the sponsors views are critical, but they don't always reflect 80% of DNS users needs.
    299 Michal: when I ask around, what people want is support for MySQL. That seems to be what would motivate DNS users I know to try it out.
    301 This is a topic we had discussed assigning to a GSoC intern, we'll see what happens now.
    303 Likun: some of Chinese DNS users prefer PowerDNS because the backend is Oracle. They use Times Ten.
    305 Shane: I looked at doing zone transfers with PostgressSQL but its not recommended.
    307 '''''Lunch'''''
    309 '''Scrum Setup'''
    311 ''The A-Team / R-Team split made sense when we were working on two separate goals at the same time. It was based on needing to finish 2 separate pieces of work, as well as the expected size of the team becoming quite large. For Y3, we have many more work items, and the team size does not look like it will expand that much (I hope I am wrong - in which case we will revisit the setup).
    313 We need to discuss how we want to organize our team.''
    315 Shane: for the past several months we've been split into two teams. The prime motivation was that the team is too large for the ideal scrum size. Ideal size is 5-10 people. We thought it would grow to 20 people by the end of the project year. The other motivation was that we had two distinct pieces of work to do. It has worked pretty well, each of the teams has focused, but people have expressed some concern that they dont know what the other team is doing.
    317 What has changed now? The team has not gotten smaller, but we don't all work full time on the project. We havent added people as much as we might have. We also all work remotely, which makes communication more formal. We have delivered the initial server implementations. There does not seem to be a logical team split for year three.
    319 I think we should reunify into one scrum team. What do people think?
    321 Jelte: I think the way the teams are split now they are too small.
    323 Shane: sometimes people seem lonely
    325 Jinmei: I think this makes sense
    327 Michal and Jinmei: I have been the only person on my team working in my timezone (Jinmei: on the CONTINENT!)
    329 Jeremy: I think this would save us a lot of time. All of the people who dont have to attend two sprint plannings.
    331 Larissa: sprint planning might run slightly longer.
    333 Jinmei: I do think a slightly larger team is preferred but we will have a slightly larger team than the ideal scrum size.
    335 Jelte: also it sometimes happens that people who are co-located in a timezone or company work together more, so we end up with specialized teams of two.
    337 Stephen: the thing with the timezones is important. Speaking in real time to discuss a problem is really helpful.
    339 Michal: when there is a problem with some code, and the person who wrote it is on the other team from you, it is difficult to figure out what to do.
    341 Stephen: I think we can do it in one team. If we have one team, there will be related groups of tasks. It would then be logical for people in the same areas of the world to do related groups of task, so they can talk more easily.
    343 Jinmei: if we have one big team, sprint planning session could become uncomfortably long.
    345 Shane: we're trying to do most of our sprint planning in the face to face meetings. however we may need a marathon planning session or two before the next face to face.
    347 Shane: so we have consensus we will try one team, though we know there may be a few problems.
    350 '''BIND 10 Year 3 Release Schedule'''
    351 ''
    352 Based on the SoW we need to discuss the release schedule for Y3.''
    354 Returning to the discussion topic from the morning, about additional features and issues we must handle in Year 3 beyond the statement of work items.
    356 The things we know we need:
    358  * BIND 9 style IP address based ACLs (TSIG, IP, extensions/hooks)
    359  * TSIG
    360  * IXFR in and out - protocol level (and data source level)
    361  * DDNS (server side only - same issues as IXFR)
    362  * DNSSEC validation for resolver
    363  * DNSSEC support for in memory data source
    364  * Views
    365  * Operational Support Tools:
    366     * Version Check / Phone Home
    367     * Recursive Resolution Tracing
    368     * DNS ShowTech
    369     * Cache Management (deleting, injecting, viewing, loading, dumping)
    370  * Command Tool
    371     * Demonstration Version
    372     * Framework
    373     * Specific functions:
    374       * Replicate functionality in bindctl
    375       * Load, Delete, List, Modify(?) Zone ("rndc addzone")
    376       * Per Feature Configuration
    377  * High Performance Back end (faster than BIND 9 for in memory and *maybe* hot spot cache)
    378  * Requirements, Design, and Implementation for Hooks (for Plugins)
    379  * Test Platform for Recursive Resolution
    380  * Interoperability Testing
    381  * Security Audit (and followup)
    382  * System Level Testing
    383  * Operational Experience (and followup)
    385 Additional Possible Requirements:
    386  * Completion of Logging - multiple files, destinations, filters. (logging API)
    387  * Configuration of the BOSS (using cmdctl), command line configuration, and config manager configuration
    388  * Save and load config (export and import)
    389  * Scattered TODO items
    390  * Refactoring
    391    * ASIO
    392    * Auth and Recursive server callbacks
    393    * General utility library
    394    * Generic BIND 10 process (make modules into libraries)
    395    * Stand Alone Mode (like b10-auth only)
    396    * datasource refactoring
    397  * Finish Socket Creator
    398  * DNS Specifications Document (Referencing RFCs, etc)
    399  * Complete support for RR types (everything on the IANA list)
    400  * Link to Crypto Libraries
    401  * Replacing msgq
    402  * Supporting multicore systems (multiple process model) at least auth
    403  * Complete zone file parser
    404  * Offline Configuration
    405  * Additional datasources: MySQL, PostgreSQL, BDB
    406  * Reduce the bug backlog! (resume inclusion of bug fixes per sprint)
    407  * Status query (zones being transferred, timeouts, qps, acls loaded, etc)
    408  * Demuxer (handling multiple queries on the same port) - suppressing duplicate queries
    409  * Randomization of Ports
    411 We have two major milestones listed, and 41 feature level tasks. (minor issues with the dependencies working correctly). There are a lot more tasks listed for auth than for recursive. (note today we have remote participation by JPRS members)
    413 List of tasks is on a separate url to be added to these notes.
    415 List of features with dependencies is complete but we will be breaking out into tasks for sprint planning.
    417 Replacing msgq is an item that we may leave out if time does not allow - but Michal notes that it needs enough work, that perhaps replacing it would be faster than refactoring it.
    419 We wonder how many people use TSIG in their recursive implementations today.
    421 We note GSS-TSIG is something to do when we do our windows implementation. Later.
    423 Discussion of how early in the year to start DNSSEC validation work. It is one of the most complex tasks, but it is also not linked to the first major deliverable of the year (production Auth only server). We think we need to hold off starting in on DNSSEC validation for a few months, though there is risk in not specifying this work for too long. So a second quarter start for DNSSEC validation.
    425 Added a dependency between the DNS specifications document, TSIG, and DNSSEC validation.
    427 Much of the work is split into the authoritative and the recursive implementations (documentation, multicore support, etc)
    429 Refactoring tasks are generally put ahead of new code, though not always.
    431 Added a feature to the task list for datasource refactoring (and API standardization)
    433 Operational support tools actually can be done at any time on an as needed (as resource is available) basis.
    435 Command tool has quite a few tasks inside of it, but it does not depend on any other tasks
    437 When do we do support for multi-core systems? Its not that its necessary for administrators in auth-only, but people may not want to install if the system does not benchmark fast.
    439 Once we refactor the SQL backend perhaps it will be non-major to implement more SQL backends.
    441 Status query can be pushed to later in the year if necessary, though administrators will start asking for it, and it
    442 "looks cool".
    444 A lot of things depend on the security audit. It may take a few weeks to do. We need to decide what the terms of reference are and who will do it.
    446 Views - scheduled for the recursive timeframe, even though they are useful to auth only systems.
    448 High performance data source is moved into the recursive section. This is something we can drop off if we need to.
    450 Hooks: it can be held to the recursive part of the year, but the team expressed concern that the longer we wait the more refactoring we would need to do. Stephen felt that once we write hooks we need to freeze the relevant APIs. Michal said he would like to be able to tell the world we have an early implementation of hooks, to get them to play with it. Stephen feels we should potentially hold off because this is not essential and we have so much to do.  General agreement that among things we will "do with enough time", this would be very high priority.
    452 Interoperability testing and system level testing: when? we may want to start approaching it as we write our unit tests.
    454 Which items should depend on the DNS specifications document being written first? Stephen: I see the auth and recursive as being written in parallel, not dependent. We need these to be written in bite size chunks - its boring and tedious work, and we need to be able to consume it as we develop as well. If we do them in parallel we can move through it one RFC at a time.
    456 Export/import configuration is not recursive specific but we can delay it to later in the year so it falls into that part of the year for now.
    458 Offline configuration - we may want to do before the auth server release because bootstrapping may be cumbersome otherwise.
    461 '''What is a "Forwarder" Anyway?'''
    463 ''Apparently nobody has ever defined what a DNS forwarder is. At least not to our satisfaction. We need a list of what a forwarder does and does not do.''
    465 Jelte: We've been adding and removing features to/from the current forwarding feature as we've developed the resolver. Lets make a list.
    467 Shane: is the concept of a forwarder defined in the RFCs?
    469 Michal: A proxy is mentioned, but not much. Only to the level that it exists.
    471 Jelte: it may be mentioned in an informational doc on DNS setups
    473 Jeremy: 5625?
    475 Michal: 1033?
    477 Shane: if its just casually mentioned and not carefully defined, does anyone else implement a forwarder? I guess DNSmasq have a forwarder?
    479 Michal: it has a cache and a dumb proxy
    481 Shane: is this a BIND-ism or not?
    483 Jelte: I guess Unbound has this but I am not sure how it implements it
    485 Michal: you can forward, but its still a resolver
    487 Stephen: RFC 2308 section 1, defines a forwarder.
    489 Jelte: this definition would suggest it is on the other side of the resolver - between the resolver and the internet, not between the stub and the resolver.
    491 Jeremy: RFC 2136 section 6 also discusses the forwarder.
    493 Michal: Why do we actually need one? We can create whatever server we like as long as it speaks the protocol correctly, so we can have the feature there, but what is the point?
    495 Shane: my use case: my ISP runs a resolver and its fine, but I'd like to have a local cache also, to save time.
    497 Michal: I use the DNSmasq for this.
    499 Jelte: I can see that use case in this scenario, but I don't see that it has a lot of benefit.
    501 Michal: I use unbound for another thing, I want validation that my provider doesn't do, but I was unable to configure BIND to do it because the provider blocks all other DNS traffic than to its own server. I use a validating forwarder.
    503 Jelte: thats a good thing to come out of this discussion, you would implement this differently than what we had in mind.
    505 Michal: maybe if we had plugins, we would do this this way, by replacing the part that sends queries.
    507 Jelte: Maybe we directly call the query which sends to the upstream address and then when it returns instead of going back into the resolver query you just pass the answer to the original client. That does mean you would be going through all the logic even though you don't need to.
    509 Jelte: if you run a straight forwarder you want to copy all the flags but if its a validating forwarder you do not.
    511 Stephen: three modes? one, pass through, no interpretation, second way adds a cache, third way is a validating forwarder
    513 Use cases:
    515  * First is for firewalls, or a computer not connected to the internet
    516  * Second is for local cache
    517  * Third is for getting additional or more trustworthy validation than is provided upstream
    518  * Fourth is selective forwarding - to get specific information from a particular server
    520 (note that BIND 9 has a default to fallback to iteration when forwarder fails. We may or may not want to do this. Useful to know why people would want this and what the behavior is)
    522  * There is also DDNS forwarding - some clients try to send requests to non primary master (other auth servers) - causing problems.
    524 Michal: we may want to try some scenarios in the forwarder before putting it in the resolver
    526 Jelte suggests that a forwarder does minimal work. No retry or fallback is done by a forwarder.
    528 Shane considers a forwarder as one which acts as a proxy and does fallback (and maybe retries).
    530 Behavior:
    532 See RFC 5625:
    536  1. Very, VERY Simple Forwarder
    537     Pass everything through without interpretation, except:
    538     * QID
    539     * port number
    540     * ACL considerations?
    542  2. Very Simple Forwarder
    543     Pass everything through without interpretation, except:
    544     * QID
    545     * port number
    546     * ACL considerations?
    547     * EDNS0 (adjusted?)
    548     * VERSION.BIND
    550  3. Proxy Forwarder
    551     Read query
    552     Do everything (interpret/strip EDNS, ...) except follow delegation, TCP fallback (?)
    553     Note: BIND 9 may originate other queries, for example follow CNAME chains
    555  4. Very Simple Forwarder + Cache
    557  5. Proxy Forwarder + Cache
    558     Maybe setting DO bit is helpful so we can cache that information. That may bloat cache though.
    560  6. Validating Forwarder
    561     Full resolver that only goes to specific address(es) (except with RD bit on)
    563 RFC 3490 mentions forwarders for IDN transformations.
    565 RFC 3901 mentions using forwarders for IPv6 to IPv4.
    567 RFC 2845 is about forwarders and TSIG.
    569 Forwarder is not a goal for Y1/2/3 so maybe we should remove the current support.
    571 Google draft about geolocation EDNS0 option.
    573 RFC 2671 mentions what *not* to forward.
    575 Jelte notes that if we modify the current ticket to pass the DO bit (#598) to lower the EDNS buffer size if the client's is greater than ours, then we have forwarder type 2 (simple forwarder). Also we probably don't copy all the correct response flags yet.
    577 == Tuesday, 2011-03-22 ==
    579 '''BIND 10 Year 2 Release'''
    581 We'll actually make our official Year 2 release. Everything will be prepared in advance, so it should just be a matter of sending some e-mails and updating some Trac pages.
    583 hurrah! champagne and sparkling cider were had.
    585 '''Y3 deliverables: approach to discussion'''
    587 ''Make sure we all understand how we're going to go through the list. Shane did his homework and made a list with dependencies, and we'll go through those together. Shane decided to organize this using Task Juggler. A copy of the gantt chart will be linked to the developer wiki. We did leave a few things out. We have a lot of work to do, and a few of the tasks were not needed or requested by sponsors in year three. We may remove additional items as we discuss.''
    589 '''Y3 deliverable: ACLs'''
    591 '''Y3 deliverable: TSIG'''
    593 Stephen: what does BIND 9 do?
    595 Jinmei: named key-gen
    597 Stephen: do we want to replicate this? or not?
    599 Jelte: tsig key generation is basically just writing random data
    601 Larissa: what would be easier?
    603 Stephen: is it a separate program?
    605 Jeremy: it is but it uses the libraries
    607 Stephen: so we write our own
    609 Jelte: but it shouldn't be hard
    611 Jeremy: we can also provide workarounds to do it with OpenSSL etc
    613 Stephen: how about the relevant crypto?
    615 Jinmei: we dont have it
    617 Stephen: okay so the first question is what crypto library
    619 Jinmei: well, we do have SHA1 code. And so we have some minimal crypto of our own, but it is still a question whether we want to have an outside crypto library or use our own minimum version.
    621 Stephen: this is our first assay into cryto really So what are the option:
    623 (refer also to the Beijing meeting notes at:
    625  * Soft HSM (is this where we add our HSM transparency layer?)
    626  * Botan
    627  * OpenSSL
    628  * Crypt++
    630 Discussion: how much more work would it be to add the HSM transparency layer when we're already adding crypto?
    632 JElte: so if we define an abstract crypto interface that takes keys as arbitrary identifiers, it doesnt matter what that uses internally.
    634 Stephen: we probably dont want to rely on SoftHSM. so what underlying library?
    636 Larissa: OpenSSL has been problematic in BIND 9. What about Botan? SoftHSM uses it...
    638 Jelte: the reason I wanted the SoftHSM in OpenDNSSEC ws that I didnt want a different code path whether you used an HSM or not.
    640 Stephen: we need to do our own implementation of libHSM?
    642 Jeremy: why don't we hafe someone try replacing our current SHA code with Botan and see how it goes?
    644 Larissa: what about GOST?
    646 Jeremy: maybe we get Botan to support GOST.
    648 (Continued after lunch...)
    650 Shane: Current BIND9 use of TSIG is broken - can't have two keys with the same wire information.  Need to decouple DNS name from identifier in configuration.
    652 Shane: TSIG from resolver side not a priority this year.
    654 (Discussion on bootstrapping problem.)
    656 Shane: Clients/stub resolver out of scope.  Main use of TSIG is to secure connection between servers.  What are issues with Crypto library?
    658 Jinmei: none that are insuperable.
    660 Jelte: One issue - if query signed with TSIG, answer must be so signed.  However, must be aware of keeping copy of wire data.
    662 Jinmei: TCP is tricky - need to provide signature every 100 messages or so.  Current impression is that it will be part of libdns++.
    664 Shane: need way to configure TSIG certificate as "global" data. 
    666 Jelte: Need way to configure data.  Question is where to put it?  How about "System" meta-module?
    668 Shane: Create TSIG configuration module in which TSIG data is put.
    670 Shane: Issue about NOTIFYs.  BIND9 does not support this (NSD does).
    672 Vorner: Not critical to sign them - can't do it now.
    674 Jeremy: Can configure BIND9 to do this.
    676 Shane: Motivator: NSD does it now.  Also, do we want to avoid remote used being able to get server do do something?
    678 Michael: Q: how will it be configured? (A: via bindctl.)
    682 '''Y3 deliverable: Views'''
    684 Jeremy: in BIND 9, Views are basically matching a client, matching a destination, match TSIG,  or match if the recursive RD bit is set. The goal is to provide a different data source back end based on the match.
    686 Stephen: If you have a look at the NSCP draft, we discuss views in there.
    688 Shane: Views are a BIND-ism, right?
    690 All: Pretty much.
    692 Shane: what do you do based on the match?
    694 Jeremy: provide different data.
    696 Stephen read out more on zones from the NSCP draft (
    698 Jelte and Michal: this gets tricky if you mix auth and recursive
    700 Jeremy: the match takes you to separate data sources. This is why you need to figure out ACLs and TSIGs first. I thought at one time we had talked about being able to provide different data sources.
    702 Jinmei: is this recursive, auth, or both?
    704 Stephen: there are two parts, the access part, and then the selection of the data source part.
    706 Shane: and there is first match or best match.
    708 Stephen: what happens when you have 10,000 zones
    710 Shane: is there a performance penalty with zones?
    712 Jinmei: yes, with the matching part.
    714 Stephen: is that the way we want to do that? For a given zone you probably have relatively few views, but if you have 10,000 zones with different views, and you match by view, you have potentially many thousands of views...
    716 Jinmei: views have zones. not the other way.
    718 Michal: that is why it is so powerful. you can have one server pretending to be many servers.
    720 Shane: the difference between the way we use datasources and views: different views can contain the same name of a zone, but in a datasource they would have to physically copy the data.
    722 Jeremy: I would like to see our nameserver, regardless of views, be able to use multiple datsources at the same time.
    724 Jinmei: we cant do that now
    726 Jelte: but we will refactor to be able to
    728 Jeremy: in bIND 9 you're always loading everything into memory. This could make it easier.
    730 Larissa: and faster?
    732 Michal: you can have a mix, too. some things in both, and some in other things. how would this look?
    734 Jelte: I've never used views, but I think each view has its own full configuration.
    736 Jeremy: yes. There are 60 or more toggles you can put inside a view
    738 Stephen: and its like a virtual server
    740 Jelte: Michal suggested you could change your pipeline by views
    742 Michal: yes, then only the critical part would care about zones the rest could ignore
    744 Shane: as far as working with hooks, maintinaing what view you're in context wise should be passed around. should be straight forward.
    746 Michal: we would need hooks per view. if we have different configuration per view, we could have a hook in one view and not in another one.
    748 Stephen: do you pass the view to every hook and the hook decides?
    750 Michal: the first thought is you need to take care of views everywhere. Its a lot of code.
    752 Stephen: we're in danger of getting very very complex for corner cases. the main use of views as i understand it is to separate internal vs external networks in a company. That is the use case we should optimize for.
    754 Jeremy: one easy solution we have now for a destination based view is to make sure bind 10 can run multiple resolver processes listening on different IPs. They would have different configuration and different caches. Same with multiple b10-auths.
    756 Shane: we talked about config being different but there are different caches per view on the recursive side?
    758 Michal: so you can redirect a zone in one view but not another
    760 Shane: some people will not be able to set up two processes listening on different IPs
    762 Stephen and Larissa: lets work on the common case. 80/20. corporate situation, intranet/extranet.
    764 Michal: maybe we can simply solve both the common case and many corner cases.
    766 Jinmei: maybe there isn't much difference between common case and corner cases.
    768 Michal: we can restrict configuration somewhat
    770 Jinmei: there will be an exception
    772 Shane: thinking from an administrator point of view. I've got three zones, one each in two views and a third zone in both views. Would i have to put them each in their own database?
    774 Jeremy: our database needs another level
    776 Shane: we need a layer of indirection
    778 Jinmei: we should separate the notion of type of datasource and the database files
    780 Shane: I'm thinking of the abstract concept of a datasource. Right now when I query a datasource I ask for a name. When we add views, I have to ask for a name, and a view.
    782 Michal: yes, and the datasource can either look specifically for data based on the view, or...
    784 Shane: I don't care right now. What I'm realizing is what we need to do is expand our data source API to include views.
    786 Jeremy and Jinmei: how will we share a single zone file in multiple views?
    788 Jinmei: I see the desire but that will be very tricky and error prone
    790 Michal: the price of passing it to the API is nearly zero. I think we can handle this better on the datasource level than the higher level
    792 Jeremy: if you're changing configuration all the time, do you need to replicate that in your data source?
    794 Shane: not if it is done in an abstract way, or in SQL, in a reference table. Depending on how we implement, sQL could look to see if it has views and do different queries if it has it or if it doesn't, for performance.
    796 Jeremy: I don't know how you do this in BDB.
    798 Michal: every piece of configuration can be different. We dont want to go through the whole server and add conditions.
    800 Shane: we can say views are not able to configure *everything* just a specific set of commonly used things.
    802 Michal: it depends on the plugin system I suppose but the plugin system could provide a piece of logic that could copy views itself
    804 Shane: not a bad design but i hesitate to implement that without  a use case
    806 Stephen: what about the receptionist model?
    808 Michal: this is similar to my idea
    810 Stephen: the plugins could be determined by the configuration of the server
    812 Michal: the plugin means that its in some hook, and would be in the hook for one view and not for another. But you could also have common places for all views. You don't configure everything differently. You just can.
    814 Shane: I worry about using receptionist for this I dont think it would be that much simpler and it might cost performance.
    815 Maybe for BIND 9 compatability. Where everything is configurable per view.
    817 (Michal draws on the notepad)
    819 Jeremy: there is no memory sharing between caches in the BIND 9 way. So important information doesn't leak, but it uses 10 times the memory.
    821 Stephen: only if you have 10x the queries.
    823 Jelte: well.....
    825 Stephen: so where do the definitions of the views live. In the configuration database?
    827 Michal: I don';t know how. If we're allowed to configure everything, you need a configuration overlay.
    829 Shane: that wont be the initial implementation. We will just configure zones and recursive behavior.
    831 Michal: the configfuation manager can be handled somehow.
    833 Jinmei: we should ensure that views implementation
    834 are consistent across all the modules.
    836 Shane: we need a work item for non module related configuration.
    838 Stephen: we could have a pseudo module called system, and put it all in there
    840 Jeremy: in BINd 9, statistics can be separate by view
    842 Stephen: you need statistics per zone too
    844 Shane: we will need to capture that and report it, reporting should not be a problem with this, reporting is quite flexible now.
    846 Jeremy: BIND 9 by default has three views: BIND 9 view, _default view, _meta view.
    848 Stephen: even if you dont define views, everything goes through BIND 9 views. it simplifies the data model.
    850 (interlude about NSCP and nominet and whatnot)
    853 '''''Lunch'''''
    855 '''Y3 deliverable: DDNS'''
    857 Shane: tell us about the current status to changes to the backend to make them writeable?
    859 Jelte: yes, for the first SQLite data source I added functions that could add and remove RRs and also parse a dynamic update and perform nearly every action in there. It does not do data consistency. But that was for the SQLite data source, Jinmei had one look, and kindof disagreed to the general design, since I added everything on the abstract data source level. He thought we might want to add a separate class. We might want to make every datasource writeable.
    861 Shane: no, surely you want read only data sources.
    863 Michal: could we make a write only datasource?
    865 Jelte: I dont see a use case for that
    867 Shane: it might make sense if you had programmatic data sources. You could say, use DDNS to do logging.
    869 Michal: I am just thinking it might make sense to have readable and inherit read writeable, or to have all three.
    871 Shane: you can do this with aggregation instead. I don't know. I see why it would be nice to do that, then if you are implementing a datasource and you don't want it to be writeable you don't have to implement it at all.
    873 Jelte: I think you can do that today. It was written over 6 months ago now, though, so...
    875 Michal: I believe we want to merge first and see what a writeable datasource might look like before we start refactoring.
    877 Shane: questions: How do you handle concurrent access in the current code?
    879 Jelte: for DDNS it is the datasource itself that handles the packet, so right now it doesn't worrry about it, in the case of IXFR it is a separate process and it will send a fail.
    881 Shane: with IXFR we should not have to worry about that since we are the ones doing the updates. Thats probably appropriate though we may want to define a default where we lock everything, for naive implementations.
    883 Jelte: if you make a very simple implementation it just sets a lock.
    885 Shane: for ease of use for implementors, we may want to put an in memory mutex there by default.
    887 Shane: can we not use an in memory lock if we have multiple processes?
    889 Michal: we could but it wouldnt be easy. If you provide it in the abstract class the simple version may use it, but...
    891 Shane: okay we can refactor this later if we need to.
    893 Shane: Multiple processes? I guess with SQLite we dont care too much. I guess Jinmei or Michal or someone thought about this for multiple processes.
    895 Jinmei: in the in memory data source?
    897 Michal: if you have the memory shared you can share a semiphore. But you need another daemon that handles it that holds the data. It would be another process. It seems quite heavyweight.
    899 Shift to some discussion of multicore model as it relates.
    901 Shane: my thinking is we would scale across multiple cores by using multiple processes.
    903 Michal: you could use the writeable as SQLite and inmemory as the secondary store.
    905 Shane: we could encode deltas as well, useful for a big zone.
    907 Michal: we could start loading from the datasource in parallel with handling the current data as well.
    909 Shane: if you're using a system that requires the performance of an in memory data store, you will then start dropping queries.
    911 Jinmei: can we get back to dynamic updates?
    913 Shane: the proposal is that we dont allow dynamic updates to the inmemory source at all - that when you need to change it you do partial or full zone reload.
    915 Jelte: either DDNS or IXFR says you have to store it there before you start serving it anyway.
    917 Stephen: one thing about an auth server is the updates wont be that frequent.
    919 Shane: I really think this might be the right way, where in memory gets its data from another source, and if you want to update it, we have an upload method that can be done with a delta, and have an API for the upload method.
    921 Stephen: you can load it into memory as soon as possible, but if you get multiple updates to a single record (Shane notes this happens in dHCP) it is complex.
    923 Shane: if you presume the set of changes will be small and infrequent, you can lock the whole dataset to make the changes.
    925 Jelte: this sounds remarkably similar to constructing an IXFR out packet.
    927 Stephen: how often are things read in the dHCP case?
    929 Shane: it depends on the environment. In a reverse tree, probably pretty soon. For some reason many machines want to do reverse lookup.
    931 Stephen: if its going to be updated 5x before its uploaded again there is no point. just mark it as dirty.
    932 If you've stored it, and you update it on disc before you bring it to memory, then you do essentially have a hot spot cache.
    934 Jelte: as a general design thing.
    936 Shane: it listens for the query, applies all the prerequisites, and the pushes it down to the datasource
    938 Shane: do views apply to DDNS?
    940 Jinmei: yes
    942 Jelte: the reason I applied this to all the layer of abstraction is that if you have a datasource that can handle more efficiently you can rewrite it
    944 Shane: if I have prerequisites across multiple data sources will that be a problem for us?
    946 Shane: Then the datasource layer needs to do prerequisite checking and then the actual updates. Then for in memory, we need an abstract class for stable storage.
    948 Stephen: I think this is a problem for a very large zone
    950 Shane: so we need a signaling mechanism for them to get updated, which would end up a lot like IXFR out.
    952 Stephen: unless we say well, when we load a zone from zone file, we load it into a database, full stop. If you want a zone file out, we just write a zone file back out.
    954 Shane: yes that is the right model
    956 Michal: yes and you can use the zone file as a source for the inmemory datasource
    958 Stephen: only going through an intermediate database
    960 Michal: you don't need that.
    962 Shane: we probably need a special case.
    964 Shane: it probably needs to be a synchronous notify, so it can also send data back to the stable database.
    966 Michal: but there is no guarantee.
    968 Jinmei: so how do we ensure consistency between original and in memory?
    970 Shane: that is why I propose the synchronous model, so when there is an update to the "disk space" datasource, it sends an update to the in memory, which is a process, waits for the reply, indicating the update, and only then is the process complete. This will also allow us to do other things in the future.
    972 Jeremy: I don't think the current msgq can keep up with this. Which is why we may replace it.
    974 Shane draws the current design plan on the notepad (see photo)
    976 Shane: updates are *really* slow in BIND 9, I think using a real SQL database in the back can buy us a lot.
    978 Jinmei: this might be faster than writing to disk directly?
    980 Shane: I think maybe. SQL people have worked very hard to get their writes fast.
    982 Michal: they tune the performance toward parallel updates
    984 Shane: the trick here is the SOA update which has to occur with every update
    986 Jeremy: no, it can be trained to every 300 seconds.
    988 Shane: so that would be a lot faster, yes
    990 Jelte: I was thinking of a shorter time, but yeah, we would do that
    992 Shane: in DHCP they queue up the answers and synch them periodically.
    994 Jinmei: does this architecture have its own bottlenecks in it, and in the worst place, does the request from DDNS block the auth server from responding to further queries?
    996 Michal: this is where we need the good msgq.
    998 Shane: there are potential bottlenecks. I think with this model though, its a bit like microkernel architecture, you can throw it away if its a problem.
    1000 Jinmei: in the case of IXFR with NSD, I thought that it does periodic updates, like every 30 seconds or something. Not update immediately upon receipt.
    1002 Shane: in the update I worked with it in we did updates every minute.
    1004 Jinmei: so it can combine incoming updates. In that case I dont know if it also makes sense for dynamic updates. Especially if the update rate is quite high.
    1006 Shane: could be.
    1008 Jinmei: I simply dont know.
    1010 Shane: batch processing can be a lot more efficient but with DDNS it may be difficult to ensire fairness.
    1012 Likun: we need to think about the lightest uses, like a user who just needs to start an auth server and we dont want the model coupling too much.
    1014 Shane: I think we can easily hide all this from the user. You just load a zone file in and start. That should be the default. If you're not configured as a secondary we should not start xfrin or zone manager modules. Automatically.
    1016 '''Y3 deliverable: logging'''
    1018 Stephen: log for cxx did not work, so now logging just goes to std-out and that's it. We have to decide what we want to do with the logging. DO we go to another existing package or do we write our own? The log4cxx has the advantage that you can create independent loggers with individual characteristics. So for one module you could have detailed logging and very primitive logging for another. It also provides multiple levels and destinations. My principle reason for choosing it was that it is already there. If however, we decide we want to implement our own, we need to do everything log4cxx does now plus... ?? So.... what do we do?
    1020 Jinmei: Is it true that FreeBSD doesn't have sufficiently new version of Log4cxx?
    1022 Jeremy: it was not in the packages collection.
    1024 Jinmei: which version?
    1026 Stephen: I downloaded the Ubuntu version and that was 0.9.8.
    1028 Jinmei: on my laptop is 0.10.0
    1030 Stephen: the version we had problems with was a 0.9.x and the issue was they changed an underlying strings thing due to a windows issue.
    1032 Stephen: we could leave it logging to std-out for the OSes it doesnt work on and hope that upcoming versions fix this? Log4cxx comes from Apache, but there are others.
    1034 Jelte: SyslogNG has its own API?
    1036 Stephen: so what is a simple logging system? Log4cxx is really complex, and realistically you dont need this.
    1038 Jeremy: BIND 9 logging is hard to configure, but it does have a lot of features
    1040 Stephen: this is part of why I wanted Log4cxx, becaue it seems to have the features people want
    1042 Jeremy: what about log4c+? It is just run by one guy (
    1044 Stephen: yeah that makes it a non starter
    1046 Jeremy: or we embed and maintain.
    1048 Stephen: going back, would it be right to go along the same lines as log4cxx, but not as flexible, in our own implementation.
    1050 Jelte: I would be fine with that
    1052 JEremy: I dont think I want us handling log rotations or anything like that
    1054 Stephen: so we need to either base on an existing package or...
    1056 Stephen: principle of least surprise, do we make it do what bind 9 does?
    1058 Jinmei: maybe its sufficient to use the features in the operating system support - but I also think it makes sense to have a minimal version that does *not* rely on something like Log4cxx, as Jelte said.
    1060 Stephen: ok, compromise. we write a minimal implementation, no log rotations, goes to a few specific locations, and we have the option for plugging in log4cxx later for people for whom it works with their OS.
    1062 Others: we also lookd at glog, logging for c++ by google. It didn't have documentation with it.
    1064 '''Y3 deliverable: IXFR-out'''
    1067 '''Y3 deliverable: XFR-in'''
    1070 '''Y3 deliverable: DNSSEC validation '''
    1072 Jeremy: really need a specifications document.
    1074 Stephen: Q: are we really trying to requirements or design. (A: neither at the moment.)
    1076 Michael: can approach this by supporting one algorithm initially.
    1078 Shane: can decompose.  e.g. know about trust anchor management, could document that.  However, really do need to understand this before we start writing code.
    1080 Michael: corner cases make life very complicated.  Also, validation is a combination of top-down and bottom-up validation.  Odd cases where you can almost reach it one day, then have to go back and read data other day.  If I plead for one requirements, its to make it easily validatable. 
    1082 Michael: many recent bugs in BIND9 due to different trust levels of data.  Suggest having two caches, and copy between untrusted and trusted data.
    1084 Vorner: Suggest we walk chain from root each time and check - don't need to do crypto every time.
    1086 (Discussion on validation procedure: Problems when elements validation chain have different TTLs.  Hardest cases come when something is wrong - remember "roll over and die",  Stress need to have every corner case as a test case.)
    1088 Michael: Really do need a specification/design document - need to document Mark Andrew's experience.  Can see it becoming a best practice document.
    1090 Jinmei: can't document corner case here.
    1092 Michael: how easy is it to issue queries?
    1094 Jelte: not too difficult.
    1096 Michael: need to do fetches in parallel.
    1098 (Discussion on when to issue queries for DS records.)
    1100 Michael: biggest problem in BIND-9 is retry time and retries.
    1102 (Discussion on what to do with insecure responses.)
    1104 Shane: Will task Jeremy to produce document describing validation process.  Will need to get periodic updates on the document - say every two weeks.
    1106 Jeremy: Will work with BIND-9 developers to document existing code.
    1109 Jinmei: Q: 5011 support?
    1111 Michael: A: Yes - is critical.
    1113 Jinmey: Q: DLV Support?
    1115 Michael: whether or BizOps says we need it.
    1117 Shane: Need to support it for next year.
    1119 Michael: Why do we need it? (If parent does not support DNSSEC)
    1121 Jelte: Nice but not essential (Shane: agree.  Michael: recommend we don't implement it - nasty hack needed before root was signed; expect fewer zones will be signed with key here.)
    1123 Conclusion - does not make sense to implement DLV now.
    1126 '''Refactoring ASIO'''/'''Event Driven or Threaded Model?'''
    1128 ''We need to talk about how we're going to refactor the ASIO code, or at least the coroutine style. It's hard to work with.''[[BR]]
    1129 ''A suggestion to use non-preemptive threads for processing. We need to decide if this is worth pursuing, and what it would mean if we did.''
    1131 '''Event Driven or Threaded Model'''
    1133 External Assertion: event driven not good for high-performance server.
    1135 With threads, have problem about concurrent access, and scaling gives problems?  (Assertion - no way to make program thread-safe?)  Proposed that real problem with threading is concurrency.  Proposed that threads operate one at a time.  Way to do this is co-operative multi-tasking(?)
    1137 (Discussion on ASIO and coroutines.  Recommendation to remove coroutines)
    1139 Jinmei: if we use event-driven model, don't see reason to drop ASIO.
    1141 Michal: thread model can be used, but will hit problems with it.
    1143 Conclusion: can get rid of coroutines with relatively little effort.
    1145 Q: What version of ASIO do we use and are we updating it?  A: Jeremy will check.
    1147 Q: do templates give code-bloat? A: Stephen will investigate.
    1149 Shane: Threaded code may be simpler to read, but interface provided by pthreads is not easier to read.  However, have problems with things like cancel.
    1151 Michael: multiple threads but only one thread running at a time.
    1153 Shane: proposal from the comments was state threads (Apache project).  (Discussion of the state threads model)
    1155 Vorner: potentially only single core, but multiple processes.  This does not appear to support multiple (real) threads.
    1157 Vorner: Believe that we can run multiple processes for authoritative server.  But will need multiple threads to run resolver.
    1159 Shane: What about current code?  Proposal is that we won't pursue this now - event driven code is easy enough to read.
    1161 (Discussion about general multi-threading issues.)
    1163 Michael: authoritative server can be done multi-process (although there is a lot of interaction in the data base).  Recursive server has too much interaction.
    1167 == Wednesday, 2011-03-23 ==
    1170 '''Unit Testing: How to Do It''' (Medium)
    1172 ''We should talk about our unit tests, and where and how we draw the line on testability. Some things are ''hard''.''
    1174 Shane: our general rule is we test everything. There are cases where that is really hard. I have to say, though, some places I thought it would not be possible, it was, with refactoring. Do we have examples of places we dont have tests now because they're too hard? Assuming we don't test the libraries we rely upon.
    1176 Jelte: I have one test that doesn't actually do statistical test on the QID but it does test that it doesn't get the same QID a few times in a row.
    1178 Michael: a random number doesn't mean you never get a repeat.
    1180 Jelte: which is why it does a few checks in a row.
    1182 Michal: the part of the code without many tests is the TCP and UDP servers.
    1184 Jelte: msgq is also insufficiently tested.
    1186 Shane: that is one area that is quite difficult - when you interact with the external environemnt
    1188 Michal: I dont think thats why they dont have tests - they were written at the beginning before the strict policy.
    1190 Shane: For things like that, we could create our own descendent of the listening classes themselves, and use that for testing somehow.
    1192 Michael: the Samba folks have a full virtual networking layer that lets you inject any format you want without using a networking stack to do it.
    1194 Michal: you could use the loopback interface
    1196 Shane: how do you cause bad behavior then?
    1198 Stephen: the problem is testing how it fails
    1200 Shane: if our code is structured so anything that doesn't succeed goes to the same code paths, this matters less.
    1202 Michael: if you remove the network part of the unit testing, its more reliable.
    1204 Jinmei: what is the goal of the topic?
    1206 Shane: to discuss where we are failing to make unit tests, how to fix it, what we can do about it?
    1208 (looking at an example of tcp server code)
    1210 Shane: its easier to instrument python code for testing than c++.
    1212 Stephen: if you're writing your c++ code and you want to point to something different for testing, build it into the object and put in a flag, so the production code includes code for testing, and I think that's valid. Its like an automobile with diagnostics for maintenance.
    1214 Michal: you could use inject the tests with templates if you dont want the test code compiled in?
    1216 Shane: possible.
    1218 Jinmei: we can also use some higher level abstractions, by introducing class hierarchy just for the purpose of tests. There are techniques, but it is true it will be more difficult.
    1220 Shane: its early binding which makes it more difficult.
    1222 Jinmei: I dont think that is the essential difficulty.
    1224 Michal: the places we dont test are sometimes main functions.
    1226 Jinmei: One possible good thing is to have a wrapper layer - then we can separate the dependency - so we can test the code using the network related things.
    1228 Shane: so add an indirection layer?
    1230 Jinmei: right. Then we can use a fake certificate, fake network communication, etc. Then we can test all of these other things with the ASIO wrapper.
    1232 Jelte: so we already have the layer, but if you replaced it you'd be rewriting much of ASIO. If we have that layer and we don't directly use ASIO directly, we use ASIO link. But if we replace it for testing, we'd have to replace all the functionality.
    1234 Michal: we only have to replace some specific network parts.
    1236 Stephen: you can inject packets, but if you have a fate where you replace a routine to write packets to the netowork, the routine has to do a callback, and it replicates a lot of effort. I think its really only the servers we haven't really tested.
    1238 Michael: have you tested the client query stuff?
    1240 Shane: no, we don't check for it.
    1242 Jelte: we do test the resolver behavior.
    1244 Likun: can we look at ASIO's test code?
    1246 Jeremy: I was just looking, ASIO and Boost have unit tests. Maybe we can work with them.
    1248 (Shane brings up and a google search for ASIO and Boost tests)
    1250 Shane: we should research this
    1252 Jinmei: at least in theory we should be able to test all parts but the wrapper itself, but some things heavily rely on the core ASIO. Another thing is that if the wrapper itself is very trivial, we can maybe skip that - it will simply mean testing with the external library itself. If the wrapper is difficult, then it needs tests
    1254 Shane infinite regression!
    1256 Michael: if they test the ASIO stuff, they've got to have a way to do this
    1258 Jeremy: check out
    1260 for example:
    1262 Michael: I think client behavior is trickiest.
    1264 Shane: do you mean the resolver?
    1266 Michael: yes.
    1268 Shane: we will test packet drops, packet delays, incorrect answers, etc, but we wont test UDP checksum errors etc.
    1270 Michael: of course not.
    1272 Michal: we will have the demultiplex thing, so we will test on that level. Right now the client in the resolver is... temporary, right?
    1274 Shane: part of it is. the demuxer is a layer in front of that.
    1276 Jelte: yes. Right now the resolver issues its own queries and it would ask the demuxer to do the sending of the actual packets.
    1278 Michael: do you use the system resolver to send notifies?
    1280 Shane: yes
    1282 Michael: thats the right way. don't change that.
    1284 Shane: for unit testing, for new code, there should be *no* new code that you cant write a unit test for. If you cant figure out how to write the test, speak to the team. I was trying to do some BOSS work and I couldnt figure it out and tried functional tests and then Michal asked if I needed to and I realized I didn't. So that works.
    1286 Larissa: and people can mention it in a daily scrum call if they're stuck on a test
    1288 Shane: yes
    1290 Larissa: arent we also doing TDD?
    1292 Shane: yes.
    1294 Larissa: and arent those unit test?
    1296 Shane: yes, but people get stuck, so they dont write the test, they just write the code.
    1298 Michal: what about refactoring?
    1300 Shane: if you refactor the code you refactor the test. If you're writing a sort function and then you refactor, even though its internal, or private, you refactor the test.
    1302 Stephen: you test *all* the code you write.
    1304 Michael: if you dont test functions, you have to write more tests from the outside. If you have internal tests, your tests are less fragile.
    1306 Shane: you have to test the function somehow.
    1308 Michael: right if you know you tested that then at the higher level you can trust that its tested, its opaque, and thats okay.
    1310 Jinmei: I got lost. This is about testing private things? I am afraid there is no single universal solution to this problem I think we need to use our discretion.
    1312 Stephen: the simple way is to make it protected.
    1314 Michael: we had this discussion in another BIND 10 meeting, that we will allow other people to shoot themselves in the foot, if they want to mess with this stuff. Why make it private?
    1316 Shane: private is an *advisory notice*, not something we use to prevent.
    1318 Jelte: I thought the decision back then was to not change our interfaces for testing.
    1320 Stephen: as the code becomes more complex, why not put in code that is just for testing?
    1322 Shane: I think we were saying we didn't want different code executed for testing.
    1324 Michael: the plan for BIND 9 is to be able to compile a test version that's static. We also have to rename functions in BIND 9, but you're protected from that with c++.
    1326 Stephen: if you access something protected for test use, but have it set to private for regular use, and there is a macro, it just wont compile if you try to compile it for real environment not test environment. It will compile for testing only.
    1328 Michael: its worth trying this and seeing how it goes, but it may end up you just need a comment or somehting.
    1330 Michal: people don't read docs/comments.
    1332 Shane: its good to use the standard way the language is normally used.
    1334 Larissa: project goal of understandable hackable code....
    1336 Jeremy: should we focus time on getting better coverage? We have some specific areas with poor coverage.
    1338 Shane: I think this has been getting better. Except msgq. And BOSS. But these have a refactoring scheduled.
    1340 Jeremy: bindctl and xfrout and xfr library need more tests. The datasource master. We knew, but it needs all testing done.
    1342 Shane: we will also be refactoring datasource soon. I hope.
    1344 Jeremy: there are a few things.
    1346 Shane: all of these places will be touched within the next 3-6 months, so the question is should we expand the scope of the changes to also add tests.
    1348 Jeremy: I would guess yes, because otherwise people will only test what they are writing
    1350 Michael: and its always better to have tests first.
    1352 Jinmei: I think in general we should care about test coverage but should we introduce specific action to address this concern?
    1354 Shane: we have two pieces of work scheduled that will affect xfrout daemon. So we can schedule another task before those that is for writing tests and the relevant refactoring.
    1356 Jinmei: there are some other cases that are normally considered difficult to test. Database related things. That would be moreso when we add more backend databases. I anticipate some excuses and reasons we cant test in this instance.
    1358 Shane: I think the tests we have for SQLite now are a little broken.
    1360 Michael: you have to run the relevant server to test the specific backend, which can eventually not scale.
    1362 Jelte: it would be nice to have a generalized datasource functional test suite.
    1364 Michal: isn't there some kind of general database library where you send SQL but it doesn't matter which server is thee?
    1366 Shane: I looked at this 7 years ago and the answer was no because once you do anything non trivial, things vary really a lot.
    1368 Michael: databases are becoming more standard now.
    1370 Shane: its the details of how things work within databases that are really different. Jelte and I looked at the SQLite schema, and normally where you would expect a between command to work, it doesn't work there. Thats a really simple thing. Some systems don't support nested selects, etc and so forth.
    1372 Jelte: we need to have some high level tests, functional tests, that run on any datasource.
    1374 Michael: unit test what you can, don't unit test what you can't, in this instance.
    1376 Jinmei: we can't solve this today, but this way we are prepared for the case.
    1378 Michal: some people do not want the SQL backend to be compiled at all, and some will, and they will have SQL running anwyay, and will want to test it, and we want to test it.
    1380 Jinmei: another point: time related tests?
    1382 Michal: for some of the time related stuff we could provide our own function that gives the time. And then the time moves. We could put it in a common library.
    1384 Jeremy: I am just wondering how important these things are. I don't know what all the tests are but 5 time tests have been failing. I don't know how important they are. Would bind10-auth or resolver fail on a virtual machine?
    1386 Jinmei: possibly. Even forgetting about VMs, time related tests are tricky.
    1388 Larissa: at BayLISA multiple operators asked me if we are optimizing for VMs.
    1390 Jinmei: I think even if its ugly, its much better to test it than not test it - but its not so sophisticated.
    1392 Michal: one of the tests that failed, is a test where somehow I created a msgq core and a client, and tried to see if the traffic will arrive, and I put a timeout there. There is no timeout in real life, but if its stuck forever... I put a timeout there I thought was large enough but it turned out it was not.
    1394 Michael: we also have to start considering timing involving DNSSEC validation stuff. Then you have to plan time tests involving months.
    1396 Larissa: Francis wrote some sort of time machine meant to help with that.
    1398 Michal: we don't want to ask for the time once we are computing, but we ask so many times, and the time only differs by milliseconds.
    1400 Michael: but how do you know?
    1402 Michael: BIND 9 has two useful things - one, once a test starts, gettimeofday locks down. Second, Francis wrote this time library with an exponential curve that crushes 30 days into 15 seconds. There are some tests you can do that are helpful that way. Particularly for functional tests. Its a library that you can use. Compiled in for some things.
    1404 Jeremy: to finish my point, once we know the test is what we want, and it still fails on virtual machines, maybe its the code that needs tuning not the test.
    1406 Shane: sometimes its really not the code causing the test to fail.
    1408 Shane: also about timing, every time we add time to a timing test, it adds waiting when I type make check. Sometimes you need a small wait, but they add up over time.
    1410 Stephen: then get a biscuit with your coffee.
    1412 Shane: but in a year or two, will it take three hours? Lets think about this as we write the tests.
    1414 Michael: eventually maybe we can get tests running in the background. make test running continuously on the laptop.
    1416 Shane: I run make check across the whole system when I do a review.
    1418 Michael: it takes 8 hours to run the tests on BIND 9. Don't ever get there on BIND 10.
    1420 Jinmei: Can we make a rule for this? Timing tests? We may want a generic framework for faking time.
    1422 Stephen: can we pull across Francis's work?
    1424 Shane: for functional testing. For unit tests we need arbitrary time values.
    1426 Jinmei: regarding tests taking time, there are severl issues. In general taking time for tests is a bad thing because it makes people skip running tests. So one question is whether we want to avoid that. I personally think its better to run the tests.
    1428 Shane: could we flag time related tests?
    1430 Jinmei: there is not a general flag but we could include time in the name and separate them that way.
    1432 Shane: is it possible in google test to run tests in parallel?
    1434 Jinmei: maybe
    1436 Michal: I don't think so. But we have many test programs, they could run in parallel, but I worry that they use ports.
    1438 Michael: we can't run all our variants in BIND 9 in parallel. We have to stop unit tests to run specific tests and then remember to turn them back on, and it sucks. This is why I recommend looking at what Samba does.
    1440 Jelte: I think this is also what Unbound does.
    1442 Michael: if you don't use ports, you can run in parallel.
    1444 Michal: it would work if we didn't use auto tools.
    1446 Jinmei: so we could introduce a filter for longer duration tests. The other thing is that I would suggest using smaller timeouts as much as possible. That also means we may want to change the API so that it will take a milisecond granularity.
    1448 Shane: which API?
    1450 Jinmei: an example would be the cache timeout for Hot Spot Cache. It is set to seconds which makes sense functionally but not for tests.
    1452 Michael: google test does not run tests in parallel and has no ability to built in, but it does support the naming pattern sets. So if you say named things "slow" or "fast" you could break down some tests.
    1454 Jelte: lots of projects do "make test" or "make all tests"
    1456 Shane: then people never run "make all tests" - I want there to be pressure against avoiding tests
    1458 Jelte: except that if the tests take sooooo long people stop running them at all. Just run the tests you are interested in. You can specify which tests run with which features too.
    1460 Jinmei: in any case my approach would be to have high level techniques to shorten the time we need for tests, and to have that concept in the review test, so if the reviewer can check the time of the test and bring it up if its long...
    1462 Shane: someone add that to the review process now!
    1465 '''Functional Testing: How to Do It''' (Medium)
    1466 ''
    1467 This is testing at a higher level. We have had some brainstorming about this at the end of Y2 during our mad testing phase, but we need to formalize our work here.''
    1469 Shane: testing is one of those things where getting the terminology right is tricky. In our project we understand unit testing but we have no or nearly no functional testing. In our case we mean running the software as a system and seeing what happens.
    1471 Jeremy: I have a few ad-hoc scripts for server start, loadzone, xfrin, dig, etc
    1473 Shane: unlike unit testing we want to do this at the system level, right? Do we want to define it by module?
    1475 Jinmei: what?
    1477 Shane: do we want to define tests for cmd-ctl, or just for configuration, etc
    1479 Stephen: if you list requirements, there might be functional tests that correspond.
    1481 Shane: a note for jeremy, we need to at least identify which tests cover which areas of the functional dns specification.
    1483 Michael: how will you write specifications? Is it a user story format?
    1485 Stephen: I think we're talking about the same thing. Every requirement should be testable.
    1487 Michael: the reason I like user stories is because it focuses you on the user focused outcome.
    1489 Stephen: except we write from RFCs
    1491 Michael: BIND 9 was written in RFCs... and the user interface...
    1493 (discussion about what user stories are)
    1495 Michael: the idea that a user story translates to a functional test is very useful.
    1497 Shane: let me pull up an example.
    1501 This has functional and programatic requirements.
    1503 Shane: assuming we have a framework to execute tests on a functional level, who writes the tests, when do they get written, and do we have a document to track them?
    1505 Stephen: whether we use user stories or requirements statements or a combination, how do we test it?
    1507 Michael: you can do a "work in progress test" where a test you're going to add goes.
    1509 Stephen: the reason why this business about the requirements came up is that DNS is specified by many RFCs plus we have BIND 9 compatibility.
    1511 Michael: can the requirements be generated from the test suite, or are the requirements their own document?
    1513 Michal: I would rather have them in the same file, from the developer point of view.
    1515 Michael: this is what I would recommend. But there is one catch - you end up with one functional spec, but 40 tests for one functional spec. Numbering can get weird.
    1517 Jeremy: lets say I write 700 statements. They are a few sentences each, and I attribute them to source code or RFCs. I can put it in XML, parse it out, generate HTML, whatver, and point to URL in the test cases?
    1519 Shane: in XML it will generate directories, it could even generate test stubs.
    1521 Michal: then someone has to write the test, and they can put a comment that links to the specification. But when the test fails the error message should indicate what the test tried to do.
    1523 Jeremy: I have this document and then changes go along and we change a requirement, then we change tests?
    1525 Michael: but we're talking about having the descriptions in the tests. So the master file is that XML document. How do we structure this and is there a tool that will do it?
    1527 Shane: there are probably 700 test frameworks that academics have written.
    1529 Jeremy: I think we should try one of the three python cucumber clones.
    1531 Michael: you can use either one, you dont end up writing much code in those. Its very verbose, english language type testing. Its really driven for user stories.
    1533 (looking at
    1535 Michael: I experimented with this and I liked it, but I dont think it would be easy to get BIND 9 people to do it. You would be more able to do this because you're just starting to implement functional tests. Also this is a very good format for developing tests progressively.
    1537 Shane: I'm trying to think of corner cases. How would this work for say a key rollover in DNSSEC. There are a lot of ways to *do* a key rollover. Do you document them all?
    1539 Stephen: there are a sequence of tests. "Given I have put a DNS key in the zone and I have waited xyz I should see xxx"
    1541 Shane: and I guess we choose how we implement this.
    1543 Jeremy: in some situations we start one server snd run many tests. in other situations we run multiple server to run one test, and stop between, etc. How does that work?
    1545 Michael: its just. slow. You can set it up to specifically track and kill processes, etc. I also have things I call "meta sets". It knows what having a dnssec implementation with 3 masters means.
    1547 Jeremy: the good thing for us as we create these rules, if it doesnt work right, we can fix BIND 10.
    1549 Michael: I would love to be able to run the same test suite against BIND 9 for things that make sense.
    1551 Shane: like tests where we change config engines would be different.
    1553 Shane: so getting to implementation, I think finding a python cucumber clones would make sense. In the past I would have asked Jeremy to look for that, but will you have time?
    1555 Jeremy: I would like to but I would only to have a couple of days to look. I would also like Jinmei and Michael to explain the systest that is in BIND 9 now.
    1557 Jinmei: its basically lifted from BIND 9's system tests.
    1559 Shane: is this an executable program?
    1561 Jinmei: for now its a shell wrapper thing. You can look at the source code.
    1563 (team looks at in bindctl)
    1565 Michael: yes this is vry much like what BIND 9 does, its disgusting, but it works.
    1567 Jinmei: yes this was a quick hack to get some testing done before a release. We can throw it away or enhance and integrate it. Or I don't know.
    1569 Michael: the one problem with BIND 9s system tests is that you really want to start the server, issue a query, do a specific thing, shut it down, do the next one. BIND 9 starts, does a lot of tests, and then shuts down. Its not as clean of a test. Its expedient in some cases but its not good test methodology.
    1571 Shane: this may depend on the kind of test.
    1573 Michael: one improvement I want is, the way you make a test is, you find one that does something like you did and you copy it. Refactoring to a library for common use cases would be better. This could be shared between BIND 9 and 10.
    1575 Shane: so.... yeah. I don't even know if we would port these, maybe we would, but they should reflect a requirement. We will have requirements that arent in the DNS spec. Like statistics, etc.
    1577 Stephen: we need to make an assessment, as to how much is automated, a couple of things may not be worth it.
    1579 Shane: we may need at least two documents. One is a DNS specification but the other is other related things.
    1581 Michael: in cucumber you can tag them, so we can have a set of RFC compliance specific tests, statistics specific tests, etc.
    1583 Michal: can you have a test that has no requirement?
    1585 Shane: no, actually, there needs to be a requirement or why is it there? You need to say what happens if you start a server when its already running? etc.
    1587 Michael: remember how we're doing unit testing. Once something runs cleanly you can rely on the unit test.
    1589 Shane: this also applies backward pressure on developers to avoid adding cool features that no one asked for.
    1591 Shane: we may have to have developers do some of the research on test frameworks and set it up.
    1593 Michael: maybe 3 people each research one and bring it to the engineering forum for 15 minutes.
    1595 Shane: hmm...
    1597 All: maybe we do this in a bind 10 staff meeting and then present the decision.
    1599 Jeremy: there are 3 python based cucumber clones, and maybe we can just look at those.
    1601 Michael: ATF is an option too. It spits out XML.
    1603 Jeremy: and I know the ATF developer.
    1605 All: hmmmmm.
    1607 Shane: okay. Jeremy, if you have time over the next two weeks to figure this out, then cool. If not, we'll flag it, and we'll get other resources onto the solution.
    1609 Jinmei: what do we do with the existing test framework?
    1611 Shane: will we need to add tests in the next two weeks? We don't know.
    1613 Michael: did the tests you ported over from bIND 9 find problems?
    1615 Jinmei: yes I did
    1617 Michael: then I would continue with this and prioritize for importance and ease
    1619 discussion of existing tests written against dns-python and what to do with them? Should we rewrite to use our own library or not?
    1621 Jeremy: can we set goals for the year?
    1622  * Jeremy will research test frameworks and not spend more than 3 days
    1623  * set our functional test framework by end of May
    1624  * develop xx number functional tests or % of  tests by end of y3 - for example (100% P1, 50% P2, 0% P3)
    1625  * Jeremy will share his list of requirement/stories with Larissa sprint by sprint and she will set priority with guidance from the team (we will see if this works, resource wise) - developers write test implementatios and they are reviewed with code.
    1628 '''Testing Suites''' (Medium)
    1629 ''
    1630 In addition to functional testing, we may want to include several other type of testing suites such as Tahi (for example, performance).''
    1632 Shane: Jeremy looked at Tahi, which is an IPv6 thing with close ties to the WIDE project.
    1634 Michal: I looked at it, its for testing IPv6 infrastructure.
    1636 Jeremy: it seems like the scripts and requirements are not generated automatically, but I've never set up the platform.
    1638 Michal: It seems like you need a complete laptop setup and you need to change your environment to run it. They provide their own DNS server and client. If I understand correctly, they are checking to see that the network runs DNS, not that the DNS server runs.
    1640 Jeremy: it might be useful, but the setup time might be high. 2-3 days at least to set up virtual servers.
    1642 Shane: the main use of it is probably to tell people we run it.
    1644 Jinmei: I can talk to the developers of it, I know them.
    1646 Shane: the coolest thing would be if there is an existing lab we could use it in. CNNIC is using it.
    1648 Jeremy: we could ask Cathy that.
    1650 Shane: of Jinmei could talk to the developer, that might be best.
    1652 Jinmei: if we are very lucky they may be interested in testing bIND 10, but I don't know. I will ask for general advice.
    1654 Jeremy: there is another test suite called Protos that is a java based conformance suite.
    1656 Michael: there is a huge set of people writing test suites. Its a service model
    1658 Shane: maybe OARC could ask people... lets ask dnssec-deployment what suites they are using for dnssec conformance? Shane will ask.
    1660 Fujiwara-San: I made a specifications document I will share with the team.
    1662 Larissa: that document was excellent and may be useful to Jeremy's requirements doc as well.
    1664 Shane: there is also non functional testing, you can convert a lot of it to functional testing. But for performance benchmarking you really want a chart or a list.
    1666 Jeremy: our current tests are not automated because there were always failures.
    1668 Shane: it would be really nice if we could include that testing in our test suites so the team can run the tests.
    1670 Jeremy: some of it will be duplicated by what the functional tests do. So I am wondering if I should move it into the functional test layout.
    1672 Shane: maybe see if any of the functional test framework supports performance benchmarking. Or we could also have timing reported for all our tests and tag things for performance specific tests.
    1674 Jeremy: we also have Jinmei's microbenchmark testing that is a bit like unit tests. I dont think people use them outside development.
    1676 Jinmei: they are not for regular use, they are for when you want to introduce an optimization to see if you actually improve performance.
    1678 Jeremy: my concern is maybe people don't know about it.
    1680 Shane: what about Stephen's fuzz testing?
    1682 Stephen: yes I am planning to expand it actually.
    1684 Shane: what we want to do at some point is leave fuzz testing running for a weekend prior to release. We will want to include that.
    1686 stephen: its in the experiemental branch for now.
    1688 Shane: there is a test directory off main. It can go there.
    1690 Jeremy: Fujiwara-San also has a fuzz tool that fakes traffic.
    1692 '''Modularity & Hooks''' (Medium)
    1694 Michal notes:
    1696    I proposed it some time ago on the mailing list,
    1697    some people looked at it, I got few comments from few people, but we should
    1698    talk more widely if we want something like this. If so, we should start using
    1699    it ASAP, because it could easy some development or at last lower the need to
    1700    refactor later.
    1702    The ideas are here:
    1705 Michal: I would like the user to be able to not just add behavior but also remove the default behavior to replace it with theirs. We would build a whole system for the hooks, and it would have advantages for us as well, where we can generalize a library that does listening on the network.
    1707 Larissa: so are you saying make all the existing process modules act like hooks?
    1709 Michal: yes.
    1711 Stephen: One of the things about hooks and putting data out and pulling it in is the data is basically self contained. As soon as you start doing processing, you're accessing internal data structures, and that complicates things. If you want to change data in the cache, do you put hooks into the cache?
    1713 Michal: I would make the cache itself a hook.
    1715 Stephen: I see hooks more as a set of well defined points where you can change specific simple things in the code.
    1717 Shane: explain more please.
    1719 Michael: is this a hook or more like a filter?
    1721 Michal: I don't know exactly what to call it, its a bit like Apache.
    1723 Shane: ok, so..... I can see how this could be fairly straighforward in our event processing today
    1725 Michal: so then you build the server at runtime from the parts.
    1727 Shane: so basically when we get an event we do things and at the end we register a callback to another thing. We could change the callback to be what the user wanted, which would fit with this model.
    1729 Jelte: we kindof discussed this before, but currently we have two callbacks, dns-lookup and dns-answer, and if we made that a configurable list of dynamically available callbacks, maybe that would work?
    1731 Michal: I want the callback to be able to modify the data. You could say "this is bogus, drop it" or "Stop processing, servfail" or...
    1733 Michael: in asterisk call forwarding of all things, you do something and then you call what the next hook would be. Then you dont have a pre-defined list but you do have a library of options.
    1735 Shane: if you're too flexible, if you don;'t want to write an entire telephone system, it is hard to set up asterisk.
    1737 Jeremy: I think we need to write down 20 things we would want here. Some of them were discussed before we started the bind 10 project. two examples: have code points that point out to places where people would write scripts with an if-then statement. Another way is using firewall rules, like if ___ matches ____, accept/reject. Those would be a lot easier to do than configuring named.conf is today.
    1739 Michal: if we could configure them like this, we could make them very powerful for power users.
    1741 Shane: to me this seems like... how would this be different, for the user, than writing code? Easier I mean?
    1743 Michal: because you can replace the library at run time. I want them to be able to both put in and take code out.
    1745 Stephen: at some point, you can reconfigure everything at run time, and providing we've got our encapsulation right, you could replace the cache, you've got the object interface, replace it, and it works.
    1747 Jelte: I would not do that with the cache.
    1749 Michal: I would make the cache replaceable because the cache would be a source of data.
    1751 Shane: if you want to change cache data, you can inherit from the existing cache and write your own, or you can also use the API for how the cache works today, and in the hook world, when you do xyz with the cache, a series of hooks are called. Administrators can make changes at each point.
    1753 Jelte: I don't like that I think its wrong way round. I don't think people should modify cache behavior.
    1755 Michal: if you want to change what to throw out, what do you do?
    1757 Shane: An administrator wants to never cache data related to a specific website. So there is a specific hook point he can edit.
    1759 Stephen: what is the business case? 80/20 rule
    1761 Michael: if you cant make a case for why its useful, then why do it?
    1763 Shane: there are blacklists in BIND 9, right? It would be nice if you didnt have to have special code to do that.
    1765 Michael: that's a specific example.
    1767 Jelte: I think everything people will want to do can be done with a fairly simple API. And we have several places (currently in TCP or UDP server now) and we point to specific callouts, we can do everything people would want.
    1769 Larissa: I just want to make sure that this is still something sysadmins can deal with.
    1771 Shane: what is the difference between this and writing a new ASIO block?
    1772 On a web server it used to be you had a callout point and you added a function.
    1774 Jelte: if you write a module for apache or lighty, you write a function thats called, you configure when it will be called, and the context. It can modify anything, and it can send back some defined options.
    1776 Shane: and there are defined steps. In this way there are no defined steps.
    1778 Jelte proposes a model with a specific plugin module and specific limited list of points where it plugs in.
    1780 Shane: why does this scare me less?
    1782 Michael: I am worried we will write a language here. That is a big mistake. Think of the blacklist option? You're actually shortcircuiting certain options.
    1784 Stephen: I think we need to keep it simple.
    1786 Shane: maybe we do something simpler and then consider Michal's option later if we need to
    1788 Larissa: I suggest a very simple prototype and then some user discussion.
    1790 Jinmei: we need it to be testable by itself, we dont want to be able to replace everything. I generally think its a good idea to have a small potentially replacable module. I kindof think its a good idea to have a framework that makes this whole idea possible.
    1792 Shane: one possible concern is that whenever you design something new thats complex you will get it wrong the first time.
    1794 Michal: I really didn't completely design it, I just was inspired by Miranda and Apache.
    1796 Shane: I am worried about an elaborate design that won't get used.
    1798 Jelte: SIDN very much wants exactly the thing I described.
    1800 Shane: we need a defined set of calls.
    1802 Jinmei: decomposing the feature into separate pieces, or making everything decomposable, seems to be different.
    1804 Larissa: I need to understand what people want to do. To figure out whether this is more complex than we need.
    1806 Shane: Jelte and Michal's position is that it wont be any harder to do what they want than to do a smaller thing. So I suggest whoever wants to proposes a design. Define an API and some configuration examples, maybe some pseudo code, and then we evaluate it.
    1808 Potential use cases:
    1810  * DNSSEC signing w/ on the fly answers
    1811  * validating forwarding resolver
    1812  * blacklists
    1813  * NXDOMAIN redirection
    1814  * NSEC masking
    1815  * non DNS operational data management?
    1816  * script run upon AXFR
    1817  * query introspection (need to know why)
    1818  * alternate method to configure ACLs - to use an LDAP database to authenticate updates
    1819  * dynamically generated content of zone data - be able to write a script to send answers
    1820  * experiments with new data sources
    1821  * debugging - log various steps
    1822  * AS112?
    1823  * possibly use this to combine auth and recurse
    1824  * evlDNS stuff
    1825  * network discovery from behind a NAT
    1826  * change timing behaviors on the XFR side - have zones refresh more or less often
    1827  * pick or prefer specific masters
    1828  * change query behavior - resolver gets a timeout then it tries all the servers in the NS set
    1829  * non expiring cache for better performance
    1830  * reduction of configuration knobs
    1831  * Filter-AAAA or other IPv6
    1832  * stub zones?
    1833  * SCTP
    1834  * Shim6?
    1835  * alternate classes (think MIT people like Hesiod users)
    1837 Thoughts: could we use the hooks system for BIND 9 compatability?
    1839 We don't want to avoid coding in things that we really want, though
    1841 What kind of programming languages will we support hooks in? C++ and Python, but... do we extend to other languages... we probably need perl. Could other people write layers to support other languages?
    1843 '''''Lunch'''''
    1845 '''Task Breakdown Part 1'''
    1847 We begin our Epic Quest to break down the tasks for the first 6 months of Y3.
    1849 == Thursday, 2011-03-24 ==
    1851 '''Task Breakdown Part 2'''
    1853 '''''Lunch'''''
    1855 '''Scrum Estimation Part 1'''
    1857 We need to do some planning poker for the tasks that we have identified for the start of Y3, so we can estimate how much we can deliver in each sprint, and so we can track our performance on an ongoing basis.
    1859 == Friday, 2011-03-24 ==
    1861 '''Scrum Estimation Part 2'''
    1863 We should be able to finish our Scrum estimations here.
    1865 '''''Lunch'''''
    1867 '''Working with BIND 9 (Michael Graff)'''
    1869 The main goal for Y3 is not BIND 9 compatibility, but we are going to
    1870 be living in a world where BIND 9 and BIND 10 are both running in the
    1871 wild. We would also like to avoid duplicate work and divergent code
    1872 paths as much as possible.
    1874 Michael Graff, the BIND 9 programme manager, will be joining us and we
    1875 will discuss this topic.
    1877 Shane: Michael has been running BIND 9 for about a year, as its first dedicated engineering manager.
    1879 Michael: So we've been trying to do TDD, Scrum, and some other concepts used in BIND 10, with varying success.
    1881 Jeremy: how long will BIND 9 last?
    1883 Michael/Larissa/Shane: well, 7-10 more years... some current OS versions can't upgrade, people need motivation to upgrade, but there is a plan to deprecate ununsed features in BIND 9 so they need not be ported to BIND 10.
    1885 Larissa: and can we talk about how code can be shared?
    1887 Michael: yes, we are going to be using pythion
    1889 <discussion of python 2 and 3>
    1891 Michael: we will be writing key managment tools in BIND 9 in python that maybe we can use for both. (Discussion)
    1893 Shane: one challenge  I have in bIND 9 is the tight coupling.
    1895 Michael: The biggest problem I think is that it was written by engineers without object oriented experience to separate the data parts. That was a decision by some original BIND 9 developers and it was questioned then and its not consistent in the code.
    1897 Shane: you're trying to figure out what behavior is going on but it has pseudo object orientation and you can't figure it out. This was to the database.
    1899 Shane: we would like to lift/share code from bIND 9 when possible. If we do that, how do we keep changes in sync?
    1901 Michael: It seems silly to reproduce things. There are a couple of things. In BIND 9 we need to write code thats easier to test and compatible with modern design techniques used in BIND 10. We have a unit test framework now. And we use it! We're working on writing testable functions and reasonably sized functions. (discussion of code copying and problems therein)No more 5,000 line functions.
    1903 Shane: BIND 9 also has a lot of functions with 15 parameters
    1905 Michael: actually I think its about 8. The problem is you pass them in almost every context and that makes it bigger
    1907 Shane: I don't understand the directory structure.
    1909 Michael: libdns is a supporting library for named. There are a lot of things in libdns specific to named and vice versa
    1911 Shane: lets talk about the logger in particular
    1913 Stephen: we're talking about how to share code. Thats a goal, to make an independent library both projects can use.
    1915 Jelte: the "real" libbind. If we have tools that work with either project, it should be a separately distributable thing.
    1917 Michael: not distribute but treat separately.
    1919 Jelte: I mean package.
    1921 Stephen: say you want to release BIND 9, there is a formal internal release of the library, and its separate.
    1923 Michael: we kindof have this issue with DHCP already.
    1925 Larissa: maybe DHCP could use this library instead of libDNS which makes a mess.
    1927 Shane: and we can optimize things in one place.
    1929 Michael: someone has to change, but i dont care who. maybe easier for bind 10 because it has tests and because most C programs are valid in C++ but not the other way.
    1931 Jeremy: BIND 9 has coe thats compiled, and built, but no paths ever use it. Like logging from source. Bob Halley told me nothing uses it. I found that easily.
    1933 Michael: I've considered writing a script that changes the names of the functions and then if it compiles, nothing is using it, and we can clean it out. We add functionality but we don't remove it.
    1935 Discussion of issues with shared libraries.
    1937 Shane: Michael, tell us the release schedule plans.
    1939 Michael: we're releasing a feature version about every 6 months, and maintenance releases between quarterly and monthly, depending on whats going on.
    1941 Shane: all of our bug tickets are currently private, in bIND 9, right?
    1943 Michael: yes. Working on this.
    1945 Shane: and its all in RT?
    1947 Michael and Larissa: yes, there are two instances, so support manages a case a customer logs, and the customer can see it, but then if it becomes a bug, it goes to the bugs instance, which is closed to ISC people only. RT is almost too powerful. An example is our review process. It is in the bug queue, moves to the review queue, then the notes queue, then the resolved queue. But it looks like the guys didnt finish work because things never just close. Also Dan has gone to RT training now, and he has ideas about how to fix it.
    1949 Shane: we discussed this at all hands, and Barry mentioned that you do want to decouple ticket handling from bugs.
    1951 Michael: I'm not worried about trac.
    1953 We can put a trac ticket item link into the support instance.
    1955 Shane: you also have an alpha, beta, release candidate model for major releases.
    1957 Michael: we have an obligation to the forum, for advance code release at each point. This impacts our schedule. Alpha is something we have for .0s betas and RCs for everything.
    1959 Shane: are there fixed times?
    1961 Larissa: no, but there should be. its an attempt to build community testing but it fails.
    1963 Michael: people ignore everything until the .0 and then they send bugs.
    1965 Stephen: and you dont change after beta
    1967 Michael: our rules: alpha establishes syntax. Beta is bugfixes but the feature set is locked down. RC1 is critical bug fixes and docs, and the final only has docs changes.
    1969 Michael: some things that didn't work well. I wanted to start putting features in point releases. Lots of projects do it. But it was a disaster for us. We can't do it, it confused everyone. the other thing that didnt work well was setting a fixed release date. What they really wanted was release on this day, except if there are bugs, and well, don't take my features out.
    1971 Discussion of the forum model and its issues and open source etc.
    1973 Stephen: we do have to be careful about the copyright for patches etc.
    1975 Michael: lets not go into legal issues
    1977 Stephen: if something is a release candidate, make sure its a real release candidate, and not a buggy version you put out because your release said.
    1979 Michael: agile has helped wit this, we know sooner when a feature will be too buggy and not ready in time. We let release dates slip, but if they slip because of poor planning, we need to fix planning, if they come in late bugs, we need to fix the schedule.
    1981 Larissa: we're going to have beta programs across the board too
    1983 Jeremy: and we claim ops tests our software but its not that effective
    1985 Michael: they compile it (which is a good test) and they run it for a bit, at least a weekend. IS that real testing. It does show that someone could install this.
    1987 Larissa: Jim has indicated he would like to improve this.
    1989 Michael: we need to give them a specific checklist.
    1991 Jeremy: BIND 10 has the same problem. You'll probably notice my bursts of bug submission, its because suddenly I'm using stuff or new stuff.
    1993 Larissa: we need to treat ops a bit like a beta test person. Specific instructions.
    1995 Jeremy: BIND 9 sometimes gets bad press on security issues. You know, there was a long period of no security bugs. Do we know what happened?
    1997 Michael: DNSSEC. In 1994 we wrote DNSSEC. in 1995 we rewrote it the spec changed. in 2004 we rewrote DNSSEC because the spec changed. We didn't introduce it per se. Now, in 2011, people are using DNSSEC. All of a sudden, here are the bugs. It was written poorly, it had no tests. Rob warned us about this.
    1999 Jelte: DNSSEC is so new, its logical that its in this state.
    2001 Michael: and we don;t get yelled at for this. People understand. But also one person's little bug is actually a giant security hole.
    2003 Jeremy: so in hindsight unit tests and functionality tests might help.
    2005 Michael: the projects compete for resources. BIND 10 had money but BIND 9 didn't, and we had to shuffle people around because we didn't want to lay off, and we are still suffering from this. In any case, I'm looking for BIND 9 developers, if anyone is looking! Especially someone who can do Windows *and* UNIX
    2007 Shane: lets talk abotu how to organize shared efforts.
    2009 Stephen: maybe logging is a good first option
    2011 Jelte: ideally you could have a shared scrum thing for the shared project
    2013 Michael: or a "prisoner exchange" where developers trade for a sprint or a few sprints or something.
    2015 Larissa: I would want to have people on sprints, and probably more than one in a row, for coherency. Mike Cohn advised on this.
    2017 Michael: maybe pair programming is the solution here.
    2019 Shane: hmmm so we put one BIND 10 person on logging paired with one BIND 9 person, on logging, together.
    2021 Larissa: I want to also figure out how we share the whole culture not just the code, so we need to figure tht out.
    2023 Michael: one last word: when you go to develop thigns, please consider the bIND 9 code, and why you did things. Please.
    2025 Shane: we are thinking about crypto libraries. what does bind 9 do? OpenSSL?
    2027 Michael: also, please, tell us, when you find a BIND 9 bug?
    2029 Jinmei: I often refer to BIND 9, to import logic, and I do report bugs I find.
    2031 Jeremy: I think this should be a blog article
    2034 '''Confidential Work for Security (Jeremy Reed)''' (Medium)
    2036 We need a procedure for privately using git and our discussions for security issues (such as #80).
    2038 Jeremy: Aaaright. We need a way for the customer to contact us if they have an issue. And they might need a private way. Phone or an alias.
    2040  * We need a secure email method (Securityofficer@)
    2041  * We need an obvious way to mark a ticket confidential (Jeremy needs to note it still works)
    2042  * We need a wiki page on how to do this (and report problems)
    2043  * Form that goes to the securityofficer list?
    2044  * Maybe we should default to the secured method of submission
    2045  * Michael: maybe we can do a threestate toggle
    2046  * we should always *ask* the customer before we mark something insecure after they mark it secure.
    2047  * Decision: the best solution is a pulldown box that defaults blank, with yes no or not set. (do not display not set tickets until we review)
    2048  * We need a human to respond when someone submits a security issue (bug triage)
    2049  * if the issue comes to securityofficer@, that person creates a ticket and then comments to the submitter.
    2050  * Quick evaluate the issue - run a CVSS check - determine approximate severity and work estimate
    2051  * move discussion to a private email list
    2052  * determine if the issue is in the wild or not - type 1 vs type 2
    2053  * if the issue came in over an open list, assume it is in the wild
    2054  * contact reporter, inform them that we think its a security issue, ask them to refrain from discussion, and offer them a credit in the CVE if desired
    2055  * determine schedule for phased security notification
    2056  * we need a private git repository for security specific branches.
    2057  * need filter and git commands to keep repository secure
    2058  * need to ensure all bind10-team@ list only has core developers who are (or their organizations arE) under NDA
    2059  * we need to use a password or invitation only jabber room for security issues
    2060  * beyond these things process sticks as closely to existing security process as possible
    2061  * by the end of the next sprint (April 15th) policy and git changes are established and in the second half of April a test security event will be rehearsed.
    2063 Note: we also need to redesign our front page to make it clear how to report issues and security issues (and in general, redesign)
    2065 '''Writing Down What a DNS Server Is''' (Medium)
    2067 Several team members feel that it is important to document what a DNS server is so that we can be sure we have built it. We need to discuss what exactly the goal of this activity would be and how we can achieve it. This is to create a plan for how we will document, not to actually document it.
    2069 '''Scheduling Team Calls & Suchlike''' (Short)
    2071 ''Once we have decided how our team(s) will be organized, we should probably take a moment to review our regular meetings/calls.''
    2073 Shane: we decided earlier this week that we're abandoning the A and R team split at least for now. We have three regularly scheduled calls now:
    2075  * daily call
    2076  * team call every two weeks
    2077  * R-team planning
    2078  * A-team planning
    2080 We still need the daily call. The time it is now is 08:00 UTC. This is a good time for Europe (9 and 10 am) and Beijing (4pm) and Tokyo (5pm) but a poor time for North Americans. Jinmei will call in on a best effort basis. Larissa and Jeremy are not expected to call. Larissa, Shane, and Jeremy plan to meet a few days a week at 6:30am Pacific (8:30 Central American and 15:30 Central European)
    2082 We need to set up the sprint planning call and the staff call. We will continue the idea of one week sprint planning one week team call. We are also looking into using the team meeting time for scrum style demos and retrospectives/reviews.
    2084 We need to keep the meeting at the same or a similar time to what we have now. We acknowledge that this is a rough time of night for Asian colleagues. We also need to mind the date line factor. We will leave the time as it is for now. Which day of the week is good?
    2086 Michal: Tuesday remains good.
    2088 Shane: Tuesday remains good.
    2090 Stephen: Tuesday is good
    2092 Larissa: Tuesday is good
    2094 Jinmei: I am worried the combined sprint planning will be a very long call.
    2096 Shane: maybe we reserve the same time on Wednesdays in case we need it.
    2098 Stephen: also after two hours people tail off
    2100 Michael: In BIND 9 we now do breakdown tuesday and estimation thursday
    2102 Stephen: also more than 90 minutes, is really going to be hard on the Asians
    2104 Shane: developers, how do you feel about sometimes having a second call in the same week?
    2106 Jinmei: I don't think I mind.
    2108 Shane and Larissa: and our current round of advanced planning will fall apart around June/July this time
    2110 Jelte: and we had a lot of clarity on tasks in the last meeting
    2112 Stephen: how many releases per year? Would it be worthwhile breaking up in to 18 weeks so every three sprints we have distinct goals?
    2114 Larissa: so quarterly deadlines for feature sets?
    2116 Stephen: every four months.
    2118 Shane: to get back to the planning issue, my proposal remains that we have an optional meeting on wednesday or thursday.
    2120 Stephen: a lot of time is taken up with estimating. you can actually start a task without an estimate when necessary. how do people feel the email estimating went? People sent their estimates via email, and I took a consensus value and we accepted it without further discussion, and we only discuss when opinions diverge wildly.
    2122 Shane: Likun, how do you feel about that?
    2124 Likun: its okay, sometimes if I'm not clear on the task I can then find out more independently
    2126 Jinmei: I'm basically negative on email estimation, people forget, it tends to introduce delay in the timeframe of a two week sprint that is significant. If we are going to a compromise I'd rather go more aggressive, like someone who is picking up the task just does the estimate.
    2128 Stephen: what I get in email is usually relatively close in size. Its only when I get a large disparity that we need the discussion. The difference between a 1 and a 2 comes out in the noise.
    2130 Jelte: doing it in email does eliminate discussion thats not necessary, but I agree that it introduces delay, and that people forget.
    2132 Discssion of estimation and sprint practices and whether the email thing would work.
    2134 Shane: how do JPRS and CNNIC feel about an overflow meeting on wednesday or thursday if we need to?
    2136 Jinmei: I am not sure its an "if" I suspect we always will need it.
    2138 Fujiara: It is okay.
    2140 Likun: if there is no other solution we will survive it.
    2142 Larissa: if we start half an hour earlier and allow two hours, that might help?
    2144 Michal: yes?
    2146 Larissa: How would that be for you Jinmei?
    2148 Jinmei: I guess it is okay. Maybe not in standard time, but that is a long way away.
    2150 Shane: what if the second sprint planning call every other week was at night for europe, afternoon for california, morning in asia?
    2152 Jelte: if its Wednesday, thats fine.
    2154 Stephen: I'm fine with that.
    2156 Shane: okay. One proposal is the tuesday call is always 15:00 UTC once a week. when we need a second sprint planning call, it would be at 23:00 UTC wednesdays, which means 8 am Thursdays for China and 9am for Japan.
    2158 Shane: if we do am am call for Asia and factor in the time that Kambe and Aharen san are traveling to work in the mornigns, the meeting would be 3am for europeans. Maybe what we should do is steal time from the standup calls.
    2160 Larissa: personally I am okay with missing the estimating.
    2162 Shane: okay so task estimation could happen in slightly extended daily scrum calls.
    2164 Michael: so proposal: task breakdown at scrum planning call, then emails for estimation, then discrepancies discussed on the daily sprints.
    2166 Larissa: yes
    2168 Shane: and start sprint planning at 14:30 UTC.
    2170 '''Things for End of Each Sprint''' (Short)
    2172 ''We are missing a couple of things from the end of each Scrum now. We don't do a true retrospective, and we do not do demos. We have been doing Scrum long enough that it may be time to adopt these practices.''
    2174 Demos: at the end of the sprint, Shane can ask one or two developers to come up with a demo for their new stuff. We would do that at the next team meeting. Demos would last 15 minutes. After a few rounds of this, we will start including customers and users in the demonstration. In general we might allow specific customers/users to attend the "internal" demos, but it may depend. We would probably invite close outside colleagues we know well.
    2176 Reviews: at the 6 week release point we will review all features against definition of of done.
    2178 Retrospectives: Stephen will call for a stop-keep-start style retrospective at the beginning of each sprint planning session. Shane will send a remindner email the day before, about the retrospective.
    2181 '''Unification of in-memory and SQLite Back-ends''' (Medium)
    2183 Michal notes:
    2185    Some unification of in-memory & sqlite3. Or should this be handled on the ML
    2186    rather? Because this would probably include little homework to look trough
    2187    both the APIs to be able to talk about it.
    2189 Michal: we have a base class for the datasource, and we have SQLite based on that, and we have another base class, and inmemory based on that, and this misses the point of having the abstract base class. So I think we should look at them and unify it.
    2191 Larissa: will this help us to have a shared API for datasources?
    2193 Others: yes
    2195 Shane: how did we come to be in such a place?
    2197 Michal: well, the base class was created, and the SQLbackend was in mind, but its a little bit specialized.
    2199 All: it was all because we needed to do the inmemory structure quickly.
    2201 Michal: I don't think either is what we want. We need to modify both a little bit. I think we could then get to the point that we find what we want in the end by merging them.
    2203 Stephen: it could be one task, to merge them?
    2205 Michal and Jinmei: three or four.
    2207 Decision: While our inmemory datasource will support DNSSEC, our API for datasources needs to allow databases that do not support DNSSEC to integrate with BIND 10.
    2209 (see task list)
    2211 '''Lack of Users''' (Short)
    2213 Michal notes:
    2215    I also worry little bit about the fact that, in contrary to the fact
    2216    that the software is generally buggy, we get really few bug reports, emails,
    2217    complains. We should have a situation when we release a tarball, we get ten
    2218    people hammering onto the door of jabber room demanding it's fixed. Also,
    2219    it's two years already, but we still don't have anything that could be really
    2220    used, though it's already planned I guess. But I'm not sure there's anything
    2221    to talk about here.
    2223 Shane: we actually only got tarballs 12 months ago and have been actively recommending against production, so that is probably part of the problem. I think now though we should be telling people they should run it, in a specific limited capacity.
    2225 Jinmei: I want to encourage people to play with it, but probably there is currently no reason for people to play with it, because its slower and missing many features.
    2227 Jeremy: we don't want to give a poor impression?
    2229 Shane: the analogy I've been recently giving is to Mozilla .6, when it was slower, and crashed all the time, and didn't do what you wanted, but the potential was there. eventually, it got to the point around .9, where it would finally render some sites better than netscape.
    2231 Jinmei: if, for example there is a website that can be better with Mozilla, that can be a reason. My point is we don't have that.
    2233 Stephen: so, is there anything we can add that BIND 9 doesn't have? that will get people to try?
    2235 Shane: we have the SQL.
    2237 Stephen: should we make a bigger play?
    2239 Larissa: we do have some plans to start telling people to try it, the webinar, demos, beta program, and blogs, are all oriented toward that.
    2241 Shane: we can show the simple demo, because right now, (once we fix the unindexed query bug) we could say look, we start in two seconds for a zone with a million records. That would be sexy.
    2243 Larissa: and we need a cool thing to do with a user story for every release.
    2245 Shane: so next after this one is TSIG, then configuring the BOSS for the release after that.
    2247 Jeremy: we need to make sure Ops really runs it and that we point people to it. We could also possibly run a public resolver that could take a beating.
    2249 Shane: is there a problem with that?
    2251 Michael: I want to do that in BIND 9.9 if you're confident that your resolver will hold up to high load with DoS attacks, go for it
    2253 Jelte: not yet!
    2255 Shane: if we put it on the bind 10 dev list and wiki for a while before putting it to say, and bind-users, that could work.
    2257 Jeremy: the bind10 box has been running the iterator without crashing since March 17th.
    2259 Shane: do we have statisitcs for the resolver yet?
    2261 Jeremy: no but I can find some information with verbose logging (which we have)
    2263 Jelte: sometimes it gives up too fast.
    2265 Larissa and Jinmei: so the three things to get this going:
    2266  * make sure there is sexy "geeky dns catnip" in each release (ie speedy load of large zones, TSIG, BOSS configuration)
    2267  * communicate increasingly abotu BIND 10 with webinars, demos, blogs, events
    2268  * demonstrate the stability etc of the server by getting it into limited use with ISC ops, beta programs, etc.
    2270 Of course, we want to be cautious. We don't want to increase users faster than we can keep up with new features, bug fixes, etc. Its a delicate balance.
    2277 '''Blogging'''
    2279 We agreed that BIND 10 will be doing a blog per month - we will schedule one as a sprint task every other sprint. Larissa will enforce this.
    2281 Topics we didn't quite get to:
    2283  * '''API/ABI Versioning'''
    2286  * '''How to benefit from Multi-core/processor'''(TBD)
    2288 There was a discussion on the dev list before:
    2290 but there didn't seem to be a clear conclusion.
    2293  * '''msgq Replacement''' (Medium)
    2295 It may be time to consider using something other than our own, hand-crafted message bus. We need to worry about portability, increased dependencies, ease of use, reliability, feature set, and so on. Plus at least sketch out a plan for selecting and adopting such technology.
    2297  * '''External Tester Program''' (Medium)
    2299 Larissa, Jeremy, and Shane have worked to outline how we may work with external testers. This may be interesting for everyone on the project.