Changes between Version 1 and Version 2 of March2011OriginalMeetingAgenda


Ignore:
Timestamp:
Mar 28, 2011, 9:23:30 AM (7 years ago)
Author:
zhanglikun
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • March2011OriginalMeetingAgenda

    v1 v2  
    1 ''The March 2011 meeting falls at the end of the project's 2nd year. We
    2 should have a pretty good idea of where the project is at - the focus
    3 of this meeting will be organizing the work for the project's 3rd
    4 year.
    5 
    6 CZ.NIC is hosting the meeting, and we will be joined by 3 people from the CZ.NIC DNS server team. This should be good for the BIND 10 developers and the DNS operators!
    7 
    8 = Logistics =
    9 
    10 We will be at the CZ.NIC office, and try to keep to 09:30 to 17:00 each day.''
    11 
    12 ----
    13 
    14 ''= Agenda =
    15 
    16 We have certain topics scheduled at particular times, and a number of "free-floating" topics that we can slot in on an "as needed" basis.
    17 
    18 == Monday, 2011-03-21 ==
    19 
    20 We have no new people attending this meeting, so we will not have an introduction to BIND 10 development.
    21 
    22 '''Opening Remarks'''
    23 
    24 When everyone arrives, we'll officially open the meeting, with our standard opening.
    25 ''
    26  * Greetings & Salutations
    27  * Introductions
    28  * Meeting roles & etiquette
    29  * Meeting goals
    30  * Meeting plan overview
    31 
    32 Introductions
    33 
    34 Attendees:
    35  * Shane
    36  * Larissa
    37  * Stephen
    38  * Jeremy
    39  * Michal
    40  * Jelte
    41  * Likun
    42  * Jerry
    43  * Lubos (guest from CZ.NIC)
    44 
    45 Via phone starting Tuesday:
    46  * Aharen
    47  * Kambe
    48 
    49 Joining us later in the week:
    50  * Fujiwara
    51  * Michael
    52  * Other cz.nic DNS developer folks
    53 
    54 Scrum planning is later in the week after some overall topic discussion
    55 Agenda is vast but flexible
    56 
    57 Goals:
    58 Like all of our annual kickoff meetings we have two goals
    59 1. project status - not the major goal, we hope to review and wrapup and evaluate today.
    60 2. the rest of the week will be a discussion of year three stuff: what and how we will deliver in year three.
    61 
    62 '''BIND 10 Year 2 Wrap-Up'''
    63 ''
    64 This will be a discussion of how Year 2 of the project went. We should look at the technical and other aspects of the project, and make sure we have consensus about where the project is at right now.
    65 ''
    66 (viewing https://bind10.isc.org/wiki/Year2Deliverable)
    67 
    68 After our January meeting we knew what we would actually deliver in Year 2. Shane then sent mail to the BIND 10 Steering Committee detailing what we could and could not send, and some of that is the list represented here.
    69 
    70 Our main goal was to focus on the resolver, and our secondary goals were an authoritative performance increase, additional backends, and to begin looking at the command tool.
    71 
    72  - We do have a working resolver, but we do not have a dnssec enabled resolver.
    73 
    74  - The hot spot cache is actually slower than the in-memory, for now.
    75 
    76  - We didn't quite get "bind 9 query performance" but we are within a binary order of magnitude.
    77 
    78  - We don't have a command language prototype but we may get a hack going next week
    79 
    80  - We will definitely not have XML statistics reporting done - its in a branch but it needs thorough review. It is not in the release tarball.
    81 
    82  - We have done a lot of work (maybe not quite half) toward DDNS and IXFR - though the recent in memory work may have impacted this.
    83 
    84 Shane's thinking about why we got where we got and did not make all y2 objectives:
    85 
    86 1. At the year one deliverable, we had a lot of extra work we delayed - a lot of technical debt from the y1 release had to be done after year one which took about two months.
    87 
    88 2. We had one less ISC engineer working on the project than we expected - it took a long time to hire Scott and then he moved to the BIND 9 project.
    89 
    90 3. We also took a long time to come to what it meant for us to build a resolver. We're not sure how we could have sped this up. It seemed important to analyise processes but maybe it wasn't necessary at this point in the project. We didn't know how to break the big problem into little problems. We didn't really know the problem space.
    91 
    92 4. We adopted scrum in a somewhat piecemeal fashion, and the adoption of new process temporarily slowed us down. It has probably now sped us up though.
    93 
    94 Jeremy's thinking:
    95 
    96 1. Not always knowing what other developers are doing - we could have reused code that we didn't, and new designs were implemented without using existing code
    97 
    98 Jinmei:
    99 
    100 I don't think that is such an issue - using jabber and the scrum organisation
    101 
    102 Shane:
    103 
    104 Early on in the project we adopted a default ISC development model where you assign a piece of work to a developer and they "submarine" with it. One of the really good things about Scrum is it takes us away from that. You're given "automatic buoyancy"
    105 
    106 Jeremy: we didn't implement the features that the sQLite implementation has in the in-memory implementation
    107 
    108 Stephen: the last few months have been very much about the y2 deliverables, and as ScrumMaster I was pushing just to get year two out, focus on what is essential for y2 deliverables only. It was necessity over function. Shane is aware of the politic, but it is important for us to keep our obligations so sponsors see progress. So when we're coming up to planning for year three, I would like for us to say by xx dates we will get certain deliverables done. That way we can tell the sponsors periodic progress updates.
    109 
    110 Larissa: I think that is the plan and should be the plan. We can break it down even more.
    111 
    112 Jelte: do the sponsors think we achieved our goal?
    113 
    114 Shane: we have 11 or 12 sponsors. Of them, the ones who have contributed developers have taken the strongest interest. We have a mix of involvement. A lot of them are non profits tlds who have money, and they don't worry so much about it.
    115 
    116 Stephen: but some of the sponsors *have* given feedback, and we should take that seriously.
    117 
    118 Shane: but not all are in this situation, many have specific hopes for bIND 10. A lot of the sponsors want a higher performance server. If you've got 20 sites around the world with a stack of servers in each one, it would be a considerable savings to have less servers in each. They're all adopting DNSSEc, etc. From this point of view, most sponsors are just happy for us to make good progress. As far as being visible and giving them incremental progress, one of the things we don't have that Scrum insists upon is customers working with the team. We don't have that and we have to guess. However, we're not just building the software for the sponsors. None of our current sponsors run big resolver farms. As much as I send updates to the sponsors, they are very busy and distracted, and we often don't get feedback from them. We sent a draft of the year 3 plans to the sponsors and we heard nothing. Silence doesn't mean they're happy, just that they're busy. So Shane approached each of them directly and then we got a lot of suggestions. We're going to try to get more feedback and more testing in year three.
    119 
    120 Jelte: I think they don't care too much about the resolver, for which the essential parts of the resolver only came out last week.
    121 
    122 Larissa: Perhaps we have less technical debt at the end of year two?
    123 
    124 Shane: I don't think so, but I think we know what it is and the work is planned. Also we now are always putting out user visible features.
    125 
    126 Jinmei: Regarding the sponsors, I've been feeling that they are too generous. If they're serious, I would think we're going to get more pressure from them.
    127 
    128 Shane: the one sponsor who has said exactly what they want is JPRS. But I think they are reluctant to do that. I think it cost us a little bit, that we didn't know sooner what they wanted. So for example, we had a discussion at the 2010 fall face to face, about performance, where we said well, from our perspective we've met our year two goals. JPRS wasn't happy about that and eventually came to us and said that it wasn't okay, so then we changed our goals. But changing goals has a cost. We hope now we can get feedback earlier and with scrum we can change course more quickly when we need to respond to changing sponsor requirements.
    129 
    130 Stephen: of course, we have a long range plan, in that we need to accomodate all the requirements laid out in the RFCs.
    131 
    132 Jinmei: I think there have been some inevitable overhead cycles with Scrum. If we simply want to port BIND 9 behavior such as the red-black tree to BIND 10, and we only care about the speed/easiest way to develop this, that is one thing, but instead we tried to break it down and share it in a scrum model, and this caused overhead.  Of course this is also a good thing, because many more people are familiar with the development. Hopefully it is a long term investment, which we can benefit from in year three.
    133 
    134 Shane: I hope so, I've been very happy with the recent cycles since the last face to face meeting.
    135 
    136 Jinmei: I hope so too, we just need to carefully watch how things improve, or not.
    137 
    138 Shane: on paper, the overhead cost of Scrum is very high.
    139 
    140 Stephen: but its actually standard for other projects.
    141 
    142 Jelte: of course in waterfall the initial phase is more than 10%
    143 
    144 Shane: and quite a few of our sponsors also use Scrum. Fredrico, our Brazilian sponsor, they recommended it, so does RIPE NCC and CIRA.
    145 
    146 Jinmei: The overhead of the project, with the testing and review, are also considerable but important. Just for example, comparing us to BIND 9 development two or three years ago, all of our code must have tests and careful review, so we tended to underestimate the work time, and I guess thats another reason why we couldn't make the progress we expected. I guess providing the tests is also a kindof an investment. It will help with refactoring later. For the long term it may result in shorter development.
    147 
    148 Shane: In engineering courses we were taught that 30% of the time is coding, 30% is testing and review, and 40% is design, requirements, and other overhead. I actually think its pretty accurate. I probably spend twice as much time testing when I work on BIND 10 code as I do coding. I don't think testing is overhead I think its part of the process.
    149 
    150 Jinmei: Maybe I should have said it differently, I am actually positive abotu providing tests. I just think it introduces underestimation.
    151 
    152 Shane: I think we have gotten better about not having review backlogs. I think this is a natural output of getting away from the submarine model.
    153 
    154 Larissa: I think the scrum model has helped us a lot with estimates and understanding status and we can do even better this year.
    155 
    156 Shane: I'm really happy with the status of the project now. We had a dip this year but if we can do even 75% of what we've done the last three months, I think we can make our goals this year.
    157 
    158 Jinmei: I hear that google is going to sponsor. What do they want?
    159 
    160 Larisa: I believe they want geolocation. We can meet with Warren Kumari next week.
    161 
    162 Shane: they did reject us for Google Summer of Code though. Perhaps we can get some of our own interns though. (general agreement that we will try to do this)
    163 
    164 
    165 
    166 
    167 '''BIND 10 Scope of Work''' (Shane)
    168 
    169 ''Shane will present the SoW document that is being used to get grant agreements from sponsors for the year 3 work, and go through it in some detail.''
    170 
    171 Shane: we have to get renewed committments from the sponsors annually. We show them what we did and what we're going to do, and we don't know for sure who will sponsor year to year. Sponsors may change year to year, Norm Ritchie is getting us some new ones for year three (google, .ru, .nz, possibly Chilean and/or Australian registries)
    172 
    173 
    174 The SoW is the document we use to tell people what we're doing. The year by year plan should be no real surprise: year 1 auth server, year 2 recursive server, year 3 goal, build on previous work and come up with a "production ready" server. Of course "production ready" is highly subjective. The goal of year three is the "80/20 rule" - 20% of the work can cover 80% of the users - this is the goal for year 3.
    175 
    176 Years four and five - our two most important things to do in BIND 10 are not to make the mistakes of BIND 9 - we must be faster, and more compatible, not less. That is the work of year four. In year five, we have reserved time for "all the other stuff". It will be a wrap up, loose ends, and some "cool" things. Right now BIND 10 is about 10% of ISC's operational budget. We have to transition the project financially at that time. We move from being a special project to a mainstream ISC software product. The "ideas page" on the wiki is a place where potential year 5 projects can go. We hope by then the thriving user community will give us guidance as well.
    177 
    178 Referencing the Y3 Wiki Page: (http://bind10.isc.org/wiki/Year3Goals)
    179 
    180 JPRS suggested we divide this work into three categories: things originally planned for year two, things originally planned for year three, and newly introduced things, and that makes sense.
    181 
    182 (discussion of the list on the wiki)
    183 
    184 Jinmei: regarding views, in some sense we already have them in the modular system, separating authoritative and recursive. Some people will use them that way.
    185 
    186 Shane: I think a lot of people use views to provide two faces on the authoritative side.
    187 
    188 Jinmei: so I want to know what do people want: separate auth and recursive, or views within on of these?
    189 
    190 Larissa: I will look into the survey data more thoroughly.
    191 
    192 Michal: Its harder if its in one side
    193 
    194 Shane: yes we may have to do extra refactoring work
    195 
    196 Shane: I always planned operational tools support, this is an umbrella topic. One thing we need here is "phone home" technology. Recursive resolution tracing will be a stand out feature for operators using BIND 10. Full system information aka "BIND 10 showtech" is important for support, debugging, etc. Fedora has a really cool crash report feature where you submit a report and then it looks up your issues in the database to see if others have had it. Quite sexy. We have to decide how many of the BIND 9 tools we duplicate this year, as well.
    197 
    198 Features we added since project incetption:
    199 
    200 Command tool: Jerry Scharf has specified this and we need the initial implementation this year.
    201 
    202 Authoritative data sources: we have multiple options here. We had to choose the lowest risk path in BIND 9, but we have other options: radix trees, Jinmei's initial work, and something Vixie suggested - the "first no compromise DNS in memory data structure".
    203 
    204 Jinmei: these are not necessary the only alternatives nor are they exclusive. The other thing is that I have actually looked at the code Vixie provided and my option is the application is quite limited. Its mainly focused on a recursive IPv4 reverse lookup use case. For such purposes it might be quite fast, but its not a general purpose use case. (Shane notes maybe this was for blackhole use cases)
    205 
    206 Michal: maybe we can get the ideas out there and then decide if we can apply it.
    207 
    208 Shane: my personal feeling is that performance optimization on the auth side is mostly a distraction. Most authoritative operators don't need more performance than we currently have. However, like graphics benchmarking, its what people look at. And our sponsors care about it.
    209 
    210 Another goal we want for this year now is hooks for plugins.
    211 
    212 Jeremy: what about recursive performance?
    213 
    214 Shane: its not on the official year 3 goals for now. (it is on the overall project goals of course.)
    215 
    216 Jeremy: I have benchmarked it.
    217 
    218 Shane: I do want to see a comparison across resolver products on this. Its hard to benchmark this but it would be very interesting.
    219 
    220 Jelte: actually this is an area where the current sponsors may care about recursive performance - when we start resolution lower down, we can save traffic for root operators and TLDs.
    221 
    222 Shane: we have other topics for this year which are not exactly development but they are in increasing reliability and people's trust in the reliability of the software. Testing, test platforms, security audit, system testing, and operational experience and documentation of that.
    223 
    224 Jeremy: we have done some of the interoperability work already using some tools by Robert Edmonds.
    225 
    226 Shane and Jelte: and building on the NSD work, some of which Jelte worked on for NSD 3.
    227 
    228 Shane: we need a security audit, but Barry Greene has pointed out that this could mean a lot of things. We need to figure out what this will mean. We need to help people feel confident. Security is always a trade off between functionality, ease of use, and cost.
    229 
    230 Jeremy: regarding system tests, we need to review the BIND 9 model this week and decide if we want to move forward with that model or use another.
    231 
    232 Shane: operational experience. We're going to be running this on some ISC servers in operation. We're starting with bind10.isc.org and then moving to our AS112 server as our next operational step, the ISC's internal resolvers, and then the big scary things, SNS-PB (which is a best effort service) hopefully in September. And then hopefully after a few months we could test it on root nameservers. By our year four kickoff, maybe we can discuss the operational problems of root nameservers in BIND 10 :)
    233 
    234 Stephen: I am wondering about discussion of other refactoring in year 3, in logging, and TCP, probably a lot of other areas.
    235 
    236 Shane: we had to make some estimates, so I made some SWAGs. I don't think they're too far off.  (see SoW for SWAG estimates) We will revise the estimates based on the output of this meeting. If history is any indicator, we will end up with more work not less.
    237 
    238 IF we get more input from users, we could get more direction based on actual user experience. That is our goal.
    239 
    240 Jeremy: one thing that isn't on the list is a DNS specifications document. Basically going through all the RFCs, BIND 9, and other implementations and creating a specifications document now.
    241 
    242 Jelte: people have started to do this inside the IETF but its crazy to do it there.
    243 
    244 The existing Wiki page on resolver design is more design than requirements and does not reference RFCs specifically.
    245 
    246 Jeremy: this would be a thousand pages if it doesnt just reference the RFCs.
    247 
    248 Jeremy: having a design document like this might really help us know what we missed.
    249 
    250 Stephen: a design document says how you do it. Functional testing requires a list of what you do, which is what this would be.
    251 
    252 Shane: when you do requirements documents, the scope of what you're describing is the requirements document itself. If its meant to be for other implementations besides just BIND 10....
    253 
    254 Stephen: we're also talking about answering questions like "what does BIND 9 compatibility mean?"
    255 
    256 Jelte: and sometimes the thing that the RFC says is not the thing any implementation is actually doing, because the RFC doesn't make sense.
    257 
    258 Larissa: are there other year three feature level items the team thinks are missing?
    259 
    260 Shane: Jinmei sent one about standalone library packaging.
    261 
    262 Jelte: we also need to discuss an API freeze at some point.
    263 
    264 Shane: or at least versioned.
    265 
    266 Michal: or you can do it the way Linux kernel does it.
    267 
    268 Shane: we may have certain points where we can break API compatibility, thats how BIND 9 does it.
    269 
    270 Jelte: you can go the Linux Kernel way, or the Firefox (and BIND 9) way. In Firefox only break on .0 releases.
    271 
    272 The missing and incomplete items seem to be:
    273 
    274  * DNS requirments document
    275  * refactoring of: ASIO, and....??
    276  * finishing incomplete features such as logging and TCP connection handling
    277  * standalone packaged libraries
    278 
    279 (list to be continued later)
    280 
    281 Shane: we need a measurement of deciding what we include. I think "will this help us becoming production ready" should be it.
    282 
    283 Larissa: yes
    284 
    285 Michal: the problem is we keep putting things off for later.
    286 
    287 Shane: everything we add means bumping something off.
    288 
    289 Shane: people seem to be uncomfortable with my proposed yardstick, so maybe we can discuss that.
    290 
    291 Stephen: we have already given the current list to the sponsors, so must we stick with it?
    292 
    293 Larissa and Shane: not necessarily.
    294 
    295 Larissa: so we can add other features to the "if we can" list
    296 
    297 Discussion here about how to handle requests we cannot currently accommodate, such as HSM support and IXFR-from-differences, etc. Current plan is that we let people know we cannot add more features without more human resources - this is tricky, the sponsors views are critical, but they don't always reflect 80% of DNS users needs.
    298 
    299 Michal: when I ask around, what people want is support for MySQL. That seems to be what would motivate DNS users I know to try it out.
    300 
    301 This is a topic we had discussed assigning to a GSoC intern, we'll see what happens now.
    302 
    303 Likun: some of Chinese DNS users prefer PowerDNS because the backend is Oracle. They use Times Ten.
    304 
    305 Shane: I looked at doing zone transfers with PostgressSQL but its not recommended.
    306 
    307 '''''Lunch'''''
    308 
    309 '''Scrum Setup'''
    310 
    311 ''The A-Team / R-Team split made sense when we were working on two separate goals at the same time. It was based on needing to finish 2 separate pieces of work, as well as the expected size of the team becoming quite large. For Y3, we have many more work items, and the team size does not look like it will expand that much (I hope I am wrong - in which case we will revisit the setup).
    312 
    313 We need to discuss how we want to organize our team.''
    314 
    315 Shane: for the past several months we've been split into two teams. The prime motivation was that the team is too large for the ideal scrum size. Ideal size is 5-10 people. We thought it would grow to 20 people by the end of the project year. The other motivation was that we had two distinct pieces of work to do. It has worked pretty well, each of the teams has focused, but people have expressed some concern that they dont know what the other team is doing.
    316 
    317 What has changed now? The team has not gotten smaller, but we don't all work full time on the project. We havent added people as much as we might have. We also all work remotely, which makes communication more formal. We have delivered the initial server implementations. There does not seem to be a logical team split for year three.
    318 
    319 I think we should reunify into one scrum team. What do people think?
    320 
    321 Jelte: I think the way the teams are split now they are too small.
    322 
    323 Shane: sometimes people seem lonely
    324 
    325 Jinmei: I think this makes sense
    326 
    327 Michal and Jinmei: I have been the only person on my team working in my timezone (Jinmei: on the CONTINENT!)
    328 
    329 Jeremy: I think this would save us a lot of time. All of the people who dont have to attend two sprint plannings.
    330 
    331 Larissa: sprint planning might run slightly longer.
    332 
    333 Jinmei: I do think a slightly larger team is preferred but we will have a slightly larger team than the ideal scrum size.
    334 
    335 Jelte: also it sometimes happens that people who are co-located in a timezone or company work together more, so we end up with specialized teams of two.
    336 
    337 Stephen: the thing with the timezones is important. Speaking in real time to discuss a problem is really helpful.
    338 
    339 Michal: when there is a problem with some code, and the person who wrote it is on the other team from you, it is difficult to figure out what to do.
    340 
    341 Stephen: I think we can do it in one team. If we have one team, there will be related groups of tasks. It would then be logical for people in the same areas of the world to do related groups of task, so they can talk more easily.
    342 
    343 Jinmei: if we have one big team, sprint planning session could become uncomfortably long.
    344 
    345 Shane: we're trying to do most of our sprint planning in the face to face meetings. however we may need a marathon planning session or two before the next face to face.
    346 
    347 Shane: so we have consensus we will try one team, though we know there may be a few problems.
    348 
    349 
    350 '''BIND 10 Year 3 Release Schedule'''
    351 ''
    352 Based on the SoW we need to discuss the release schedule for Y3.''
    353 
    354 Returning to the discussion topic from the morning, about additional features and issues we must handle in Year 3 beyond the statement of work items.
    355 
    356 The things we know we need:
    357 
    358  * BIND 9 style IP address based ACLs (TSIG, IP, extensions/hooks)
    359  * TSIG
    360  * IXFR in and out - protocol level (and data source level)
    361  * DDNS (server side only - same issues as IXFR)
    362  * DNSSEC validation for resolver
    363  * DNSSEC support for in memory data source
    364  * Views
    365  * Operational Support Tools:
    366     * Version Check / Phone Home
    367     * Recursive Resolution Tracing
    368     * DNS ShowTech
    369     * Cache Management (deleting, injecting, viewing, loading, dumping)
    370  * Command Tool
    371     * Demonstration Version
    372     * Framework
    373     * Specific functions:
    374       * Replicate functionality in bindctl
    375       * Load, Delete, List, Modify(?) Zone ("rndc addzone")
    376       * Per Feature Configuration
    377  * High Performance Back end (faster than BIND 9 for in memory and *maybe* hot spot cache)
    378  * Requirements, Design, and Implementation for Hooks (for Plugins)
    379  * Test Platform for Recursive Resolution
    380  * Interoperability Testing
    381  * Security Audit (and followup)
    382  * System Level Testing
    383  * Operational Experience (and followup)
    384 
    385 Additional Possible Requirements:
    386  * Completion of Logging - multiple files, destinations, filters. (logging API)
    387  * Configuration of the BOSS (using cmdctl), command line configuration, and config manager configuration
    388  * Save and load config (export and import)
    389  * Scattered TODO items
    390  * Refactoring
    391    * ASIO
    392    * Auth and Recursive server callbacks
    393    * General utility library
    394    * Generic BIND 10 process (make modules into libraries)
    395    * Stand Alone Mode (like b10-auth only)
    396    * datasource refactoring
    397  * Finish Socket Creator
    398  * DNS Specifications Document (Referencing RFCs, etc)
    399  * Complete support for RR types (everything on the IANA list)
    400  * Link to Crypto Libraries
    401  * Replacing msgq
    402  * Supporting multicore systems (multiple process model) at least auth
    403  * Complete zone file parser
    404  * Offline Configuration
    405  * Additional datasources: MySQL, PostgreSQL, BDB
    406  * Reduce the bug backlog! (resume inclusion of bug fixes per sprint)
    407  * Status query (zones being transferred, timeouts, qps, acls loaded, etc)
    408  * Demuxer (handling multiple queries on the same port) - suppressing duplicate queries
    409  * Randomization of Ports
    410 
    411 We have two major milestones listed, and 41 feature level tasks. (minor issues with the dependencies working correctly). There are a lot more tasks listed for auth than for recursive. (note today we have remote participation by JPRS members)
    412 
    413 List of tasks is on a separate url to be added to these notes.
    414 
    415 List of features with dependencies is complete but we will be breaking out into tasks for sprint planning.
    416 
    417 Replacing msgq is an item that we may leave out if time does not allow - but Michal notes that it needs enough work, that perhaps replacing it would be faster than refactoring it.
    418 
    419 We wonder how many people use TSIG in their recursive implementations today.
    420 
    421 We note GSS-TSIG is something to do when we do our windows implementation. Later.
    422 
    423 Discussion of how early in the year to start DNSSEC validation work. It is one of the most complex tasks, but it is also not linked to the first major deliverable of the year (production Auth only server). We think we need to hold off starting in on DNSSEC validation for a few months, though there is risk in not specifying this work for too long. So a second quarter start for DNSSEC validation.
    424 
    425 Added a dependency between the DNS specifications document, TSIG, and DNSSEC validation.
    426 
    427 Much of the work is split into the authoritative and the recursive implementations (documentation, multicore support, etc)
    428 
    429 Refactoring tasks are generally put ahead of new code, though not always.
    430 
    431 Added a feature to the task list for datasource refactoring (and API standardization)
    432 
    433 Operational support tools actually can be done at any time on an as needed (as resource is available) basis.
    434 
    435 Command tool has quite a few tasks inside of it, but it does not depend on any other tasks
    436 
    437 When do we do support for multi-core systems? Its not that its necessary for administrators in auth-only, but people may not want to install if the system does not benchmark fast.
    438 
    439 Once we refactor the SQL backend perhaps it will be non-major to implement more SQL backends.
    440 
    441 Status query can be pushed to later in the year if necessary, though administrators will start asking for it, and it
    442 "looks cool".
    443 
    444 A lot of things depend on the security audit. It may take a few weeks to do. We need to decide what the terms of reference are and who will do it.
    445 
    446 Views - scheduled for the recursive timeframe, even though they are useful to auth only systems.
    447 
    448 High performance data source is moved into the recursive section. This is something we can drop off if we need to.
    449 
    450 Hooks: it can be held to the recursive part of the year, but the team expressed concern that the longer we wait the more refactoring we would need to do. Stephen felt that once we write hooks we need to freeze the relevant APIs. Michal said he would like to be able to tell the world we have an early implementation of hooks, to get them to play with it. Stephen feels we should potentially hold off because this is not essential and we have so much to do.  General agreement that among things we will "do with enough time", this would be very high priority.
    451 
    452 Interoperability testing and system level testing: when? we may want to start approaching it as we write our unit tests.
    453 
    454 Which items should depend on the DNS specifications document being written first? Stephen: I see the auth and recursive as being written in parallel, not dependent. We need these to be written in bite size chunks - its boring and tedious work, and we need to be able to consume it as we develop as well. If we do them in parallel we can move through it one RFC at a time.
    455 
    456 Export/import configuration is not recursive specific but we can delay it to later in the year so it falls into that part of the year for now.
    457 
    458 Offline configuration - we may want to do before the auth server release because bootstrapping may be cumbersome otherwise.
    459 
    460 
    461 '''What is a "Forwarder" Anyway?'''
    462 
    463 ''Apparently nobody has ever defined what a DNS forwarder is. At least not to our satisfaction. We need a list of what a forwarder does and does not do.''
    464 
    465 Jelte: We've been adding and removing features to/from the current forwarding feature as we've developed the resolver. Lets make a list.
    466 
    467 Shane: is the concept of a forwarder defined in the RFCs?
    468 
    469 Michal: A proxy is mentioned, but not much. Only to the level that it exists.
    470 
    471 Jelte: it may be mentioned in an informational doc on DNS setups
    472 
    473 Jeremy: 5625?
    474 
    475 Michal: 1033?
    476 
    477 Shane: if its just casually mentioned and not carefully defined, does anyone else implement a forwarder? I guess DNSmasq have a forwarder?
    478 
    479 Michal: it has a cache and a dumb proxy
    480 
    481 Shane: is this a BIND-ism or not?
    482 
    483 Jelte: I guess Unbound has this but I am not sure how it implements it
    484 
    485 Michal: you can forward, but its still a resolver
    486 
    487 Stephen: RFC 2308 section 1, defines a forwarder.
    488 
    489 Jelte: this definition would suggest it is on the other side of the resolver - between the resolver and the internet, not between the stub and the resolver.
    490 
    491 Jeremy: RFC 2136 section 6 also discusses the forwarder.
    492 
    493 Michal: Why do we actually need one? We can create whatever server we like as long as it speaks the protocol correctly, so we can have the feature there, but what is the point?
    494 
    495 Shane: my use case: my ISP runs a resolver and its fine, but I'd like to have a local cache also, to save time.
    496 
    497 Michal: I use the DNSmasq for this.
    498 
    499 Jelte: I can see that use case in this scenario, but I don't see that it has a lot of benefit.
    500 
    501 Michal: I use unbound for another thing, I want validation that my provider doesn't do, but I was unable to configure BIND to do it because the provider blocks all other DNS traffic than to its own server. I use a validating forwarder.
    502 
    503 Jelte: thats a good thing to come out of this discussion, you would implement this differently than what we had in mind.
    504 
    505 Michal: maybe if we had plugins, we would do this this way, by replacing the part that sends queries.
    506 
    507 Jelte: Maybe we directly call the query which sends to the upstream address and then when it returns instead of going back into the resolver query you just pass the answer to the original client. That does mean you would be going through all the logic even though you don't need to.
    508 
    509 Jelte: if you run a straight forwarder you want to copy all the flags but if its a validating forwarder you do not.
    510 
    511 Stephen: three modes? one, pass through, no interpretation, second way adds a cache, third way is a validating forwarder
    512 
    513 Use cases:
    514 
    515  * First is for firewalls, or a computer not connected to the internet
    516  * Second is for local cache
    517  * Third is for getting additional or more trustworthy validation than is provided upstream
    518  * Fourth is selective forwarding - to get specific information from a particular server
    519 
    520 (note that BIND 9 has a default to fallback to iteration when forwarder fails. We may or may not want to do this. Useful to know why people would want this and what the behavior is)
    521 
    522  * There is also DDNS forwarding - some clients try to send requests to non primary master (other auth servers) - causing problems.
    523 
    524 Michal: we may want to try some scenarios in the forwarder before putting it in the resolver
    525 
    526 Jelte suggests that a forwarder does minimal work. No retry or fallback is done by a forwarder.
    527 
    528 Shane considers a forwarder as one which acts as a proxy and does fallback (and maybe retries).
    529 
    530 Behavior:
    531 
    532 See RFC 5625:
    533 
    534 http://www.faqs.org/rfcs/rfc5625.html
    535 
    536  1. Very, VERY Simple Forwarder
    537     Pass everything through without interpretation, except:
    538     * QID
    539     * port number
    540     * ACL considerations?
    541    
    542  2. Very Simple Forwarder
    543     Pass everything through without interpretation, except:
    544     * QID
    545     * port number
    546     * ACL considerations?
    547     * EDNS0 (adjusted?)
    548     * VERSION.BIND
    549  
    550  3. Proxy Forwarder
    551     Read query
    552     Do everything (interpret/strip EDNS, ...) except follow delegation, TCP fallback (?)
    553     Note: BIND 9 may originate other queries, for example follow CNAME chains
    554 
    555  4. Very Simple Forwarder + Cache
    556 
    557  5. Proxy Forwarder + Cache
    558     Maybe setting DO bit is helpful so we can cache that information. That may bloat cache though.
    559 
    560  6. Validating Forwarder
    561     Full resolver that only goes to specific address(es) (except with RD bit on)
    562 
    563 RFC 3490 mentions forwarders for IDN transformations.
    564 
    565 RFC 3901 mentions using forwarders for IPv6 to IPv4.
    566 
    567 RFC 2845 is about forwarders and TSIG.
    568 
    569 Forwarder is not a goal for Y1/2/3 so maybe we should remove the current support.
    570 
    571 Google draft about geolocation EDNS0 option.
    572 
    573 RFC 2671 mentions what *not* to forward.
    574 
    575 Jelte notes that if we modify the current ticket to pass the DO bit (#598) to lower the EDNS buffer size if the client's is greater than ours, then we have forwarder type 2 (simple forwarder). Also we probably don't copy all the correct response flags yet.
    576 
    577 == Tuesday, 2011-03-22 ==
    578 
    579 '''BIND 10 Year 2 Release'''
    580 
    581 We'll actually make our official Year 2 release. Everything will be prepared in advance, so it should just be a matter of sending some e-mails and updating some Trac pages.
    582 
    583 hurrah! champagne and sparkling cider were had.
    584 
    585 '''Y3 deliverables: approach to discussion'''
    586 
    587 ''Make sure we all understand how we're going to go through the list. Shane did his homework and made a list with dependencies, and we'll go through those together. Shane decided to organize this using Task Juggler. A copy of the gantt chart will be linked to the developer wiki. We did leave a few things out. We have a lot of work to do, and a few of the tasks were not needed or requested by sponsors in year three. We may remove additional items as we discuss.''
    588 
    589 '''Y3 deliverable: ACLs'''
    590 
    591 '''Y3 deliverable: TSIG'''
    592 
    593 Stephen: what does BIND 9 do?
    594 
    595 Jinmei: named key-gen
    596 
    597 Stephen: do we want to replicate this? or not?
    598 
    599 Jelte: tsig key generation is basically just writing random data
    600 
    601 Larissa: what would be easier?
    602 
    603 Stephen: is it a separate program?
    604 
    605 Jeremy: it is but it uses the libraries
    606 
    607 Stephen: so we write our own
    608 
    609 Jelte: but it shouldn't be hard
    610 
    611 Jeremy: we can also provide workarounds to do it with OpenSSL etc
    612 
    613 Stephen: how about the relevant crypto?
    614 
    615 Jinmei: we dont have it
    616 
    617 Stephen: okay so the first question is what crypto library
    618 
    619 Jinmei: well, we do have SHA1 code. And so we have some minimal crypto of our own, but it is still a question whether we want to have an outside crypto library or use our own minimum version.
    620 
    621 Stephen: this is our first assay into cryto really So what are the option:
    622  
    623 (refer also to the Beijing meeting notes at: http://bind10.isc.org/wiki/f2F1_Y2_Tue)
    624 
    625  * Soft HSM (is this where we add our HSM transparency layer?)
    626  * Botan
    627  * OpenSSL
    628  * Crypt++
    629 
    630 Discussion: how much more work would it be to add the HSM transparency layer when we're already adding crypto?
    631 
    632 JElte: so if we define an abstract crypto interface that takes keys as arbitrary identifiers, it doesnt matter what that uses internally.
    633 
    634 Stephen: we probably dont want to rely on SoftHSM. so what underlying library?
    635 
    636 Larissa: OpenSSL has been problematic in BIND 9. What about Botan? SoftHSM uses it...
    637 
    638 Jelte: the reason I wanted the SoftHSM in OpenDNSSEC ws that I didnt want a different code path whether you used an HSM or not.
    639 
    640 Stephen: we need to do our own implementation of libHSM?
    641 
    642 Jeremy: why don't we hafe someone try replacing our current SHA code with Botan and see how it goes?
    643 
    644 Larissa: what about GOST?
    645 
    646 Jeremy: maybe we get Botan to support GOST.
    647 
    648 (Continued after lunch...)
    649 
    650 Shane: Current BIND9 use of TSIG is broken - can't have two keys with the same wire information.  Need to decouple DNS name from identifier in configuration.
    651 
    652 Shane: TSIG from resolver side not a priority this year.
    653 
    654 (Discussion on bootstrapping problem.)
    655 
    656 Shane: Clients/stub resolver out of scope.  Main use of TSIG is to secure connection between servers.  What are issues with Crypto library?
    657 
    658 Jinmei: none that are insuperable.
    659 
    660 Jelte: One issue - if query signed with TSIG, answer must be so signed.  However, must be aware of keeping copy of wire data.
    661 
    662 Jinmei: TCP is tricky - need to provide signature every 100 messages or so.  Current impression is that it will be part of libdns++.
    663 
    664 Shane: need way to configure TSIG certificate as "global" data. 
    665 
    666 Jelte: Need way to configure data.  Question is where to put it?  How about "System" meta-module?
    667 
    668 Shane: Create TSIG configuration module in which TSIG data is put.
    669 
    670 Shane: Issue about NOTIFYs.  BIND9 does not support this (NSD does).
    671 
    672 Vorner: Not critical to sign them - can't do it now.
    673 
    674 Jeremy: Can configure BIND9 to do this.
    675 
    676 Shane: Motivator: NSD does it now.  Also, do we want to avoid remote used being able to get server do do something?
    677 
    678 Michael: Q: how will it be configured? (A: via bindctl.)
    679 
    680 
    681 
    682 '''Y3 deliverable: Views'''
    683 
    684 Jeremy: in BIND 9, Views are basically matching a client, matching a destination, match TSIG,  or match if the recursive RD bit is set. The goal is to provide a different data source back end based on the match.
    685 
    686 Stephen: If you have a look at the NSCP draft, we discuss views in there.
    687 
    688 Shane: Views are a BIND-ism, right?
    689 
    690 All: Pretty much.
    691 
    692 Shane: what do you do based on the match?
    693 
    694 Jeremy: provide different data.
    695 
    696 Stephen read out more on zones from the NSCP draft (http://tools.ietf.org/html/draft-dickinson-dnsop-nameserver-control-02)
    697 
    698 Jelte and Michal: this gets tricky if you mix auth and recursive
    699 
    700 Jeremy: the match takes you to separate data sources. This is why you need to figure out ACLs and TSIGs first. I thought at one time we had talked about being able to provide different data sources.
    701 
    702 Jinmei: is this recursive, auth, or both?
    703 
    704 Stephen: there are two parts, the access part, and then the selection of the data source part.
    705 
    706 Shane: and there is first match or best match.
    707 
    708 Stephen: what happens when you have 10,000 zones
    709 
    710 Shane: is there a performance penalty with zones?
    711 
    712 Jinmei: yes, with the matching part.
    713 
    714 Stephen: is that the way we want to do that? For a given zone you probably have relatively few views, but if you have 10,000 zones with different views, and you match by view, you have potentially many thousands of views...
    715 
    716 Jinmei: views have zones. not the other way.
    717 
    718 Michal: that is why it is so powerful. you can have one server pretending to be many servers.
    719 
    720 Shane: the difference between the way we use datasources and views: different views can contain the same name of a zone, but in a datasource they would have to physically copy the data.
    721 
    722 Jeremy: I would like to see our nameserver, regardless of views, be able to use multiple datsources at the same time.
    723 
    724 Jinmei: we cant do that now
    725 
    726 Jelte: but we will refactor to be able to
    727 
    728 Jeremy: in bIND 9 you're always loading everything into memory. This could make it easier.
    729 
    730 Larissa: and faster?
    731 
    732 Michal: you can have a mix, too. some things in both, and some in other things. how would this look?
    733 
    734 Jelte: I've never used views, but I think each view has its own full configuration.
    735 
    736 Jeremy: yes. There are 60 or more toggles you can put inside a view
    737 
    738 Stephen: and its like a virtual server
    739 
    740 Jelte: Michal suggested you could change your pipeline by views
    741 
    742 Michal: yes, then only the critical part would care about zones the rest could ignore
    743 
    744 Shane: as far as working with hooks, maintinaing what view you're in context wise should be passed around. should be straight forward.
    745 
    746 Michal: we would need hooks per view. if we have different configuration per view, we could have a hook in one view and not in another one.
    747 
    748 Stephen: do you pass the view to every hook and the hook decides?
    749 
    750 Michal: the first thought is you need to take care of views everywhere. Its a lot of code.
    751 
    752 Stephen: we're in danger of getting very very complex for corner cases. the main use of views as i understand it is to separate internal vs external networks in a company. That is the use case we should optimize for.
    753 
    754 Jeremy: one easy solution we have now for a destination based view is to make sure bind 10 can run multiple resolver processes listening on different IPs. They would have different configuration and different caches. Same with multiple b10-auths.
    755 
    756 Shane: we talked about config being different but there are different caches per view on the recursive side?
    757 
    758 Michal: so you can redirect a zone in one view but not another
    759 
    760 Shane: some people will not be able to set up two processes listening on different IPs
    761 
    762 Stephen and Larissa: lets work on the common case. 80/20. corporate situation, intranet/extranet.
    763 
    764 Michal: maybe we can simply solve both the common case and many corner cases.
    765 
    766 Jinmei: maybe there isn't much difference between common case and corner cases.
    767 
    768 Michal: we can restrict configuration somewhat
    769 
    770 Jinmei: there will be an exception
    771 
    772 Shane: thinking from an administrator point of view. I've got three zones, one each in two views and a third zone in both views. Would i have to put them each in their own database?
    773 
    774 Jeremy: our database needs another level
    775 
    776 Shane: we need a layer of indirection
    777 
    778 Jinmei: we should separate the notion of type of datasource and the database files
    779 
    780 Shane: I'm thinking of the abstract concept of a datasource. Right now when I query a datasource I ask for a name. When we add views, I have to ask for a name, and a view.
    781 
    782 Michal: yes, and the datasource can either look specifically for data based on the view, or...
    783 
    784 Shane: I don't care right now. What I'm realizing is what we need to do is expand our data source API to include views.
    785 
    786 Jeremy and Jinmei: how will we share a single zone file in multiple views?
    787 
    788 Jinmei: I see the desire but that will be very tricky and error prone
    789 
    790 Michal: the price of passing it to the API is nearly zero. I think we can handle this better on the datasource level than the higher level
    791 
    792 Jeremy: if you're changing configuration all the time, do you need to replicate that in your data source?
    793 
    794 Shane: not if it is done in an abstract way, or in SQL, in a reference table. Depending on how we implement, sQL could look to see if it has views and do different queries if it has it or if it doesn't, for performance.
    795 
    796 Jeremy: I don't know how you do this in BDB.
    797 
    798 Michal: every piece of configuration can be different. We dont want to go through the whole server and add conditions.
    799 
    800 Shane: we can say views are not able to configure *everything* just a specific set of commonly used things.
    801 
    802 Michal: it depends on the plugin system I suppose but the plugin system could provide a piece of logic that could copy views itself
    803 
    804 Shane: not a bad design but i hesitate to implement that without  a use case
    805 
    806 Stephen: what about the receptionist model?
    807 
    808 Michal: this is similar to my idea
    809 
    810 Stephen: the plugins could be determined by the configuration of the server
    811 
    812 Michal: the plugin means that its in some hook, and would be in the hook for one view and not for another. But you could also have common places for all views. You don't configure everything differently. You just can.
    813 
    814 Shane: I worry about using receptionist for this I dont think it would be that much simpler and it might cost performance.
    815 Maybe for BIND 9 compatability. Where everything is configurable per view.
    816 
    817 (Michal draws on the notepad)
    818 
    819 Jeremy: there is no memory sharing between caches in the BIND 9 way. So important information doesn't leak, but it uses 10 times the memory.
    820 
    821 Stephen: only if you have 10x the queries.
    822 
    823 Jelte: well.....
    824 
    825 Stephen: so where do the definitions of the views live. In the configuration database?
    826 
    827 Michal: I don';t know how. If we're allowed to configure everything, you need a configuration overlay.
    828 
    829 Shane: that wont be the initial implementation. We will just configure zones and recursive behavior.
    830 
    831 Michal: the configfuation manager can be handled somehow.
    832 
    833 Jinmei: we should ensure that views implementation
    834 are consistent across all the modules.
    835 
    836 Shane: we need a work item for non module related configuration.
    837 
    838 Stephen: we could have a pseudo module called system, and put it all in there
    839 
    840 Jeremy: in BINd 9, statistics can be separate by view
    841 
    842 Stephen: you need statistics per zone too
    843 
    844 Shane: we will need to capture that and report it, reporting should not be a problem with this, reporting is quite flexible now.
    845 
    846 Jeremy: BIND 9 by default has three views: BIND 9 view, _default view, _meta view.
    847 
    848 Stephen: even if you dont define views, everything goes through BIND 9 views. it simplifies the data model.
    849 
    850 (interlude about NSCP and nominet and whatnot)
    851 
    852 
    853 '''''Lunch'''''
    854 
    855 '''Y3 deliverable: DDNS'''
    856 
    857 Shane: tell us about the current status to changes to the backend to make them writeable?
    858 
    859 Jelte: yes, for the first SQLite data source I added functions that could add and remove RRs and also parse a dynamic update and perform nearly every action in there. It does not do data consistency. But that was for the SQLite data source, Jinmei had one look, and kindof disagreed to the general design, since I added everything on the abstract data source level. He thought we might want to add a separate class. We might want to make every datasource writeable.
    860 
    861 Shane: no, surely you want read only data sources.
    862 
    863 Michal: could we make a write only datasource?
    864 
    865 Jelte: I dont see a use case for that
    866 
    867 Shane: it might make sense if you had programmatic data sources. You could say, use DDNS to do logging.
    868 
    869 Michal: I am just thinking it might make sense to have readable and inherit read writeable, or to have all three.
    870 
    871 Shane: you can do this with aggregation instead. I don't know. I see why it would be nice to do that, then if you are implementing a datasource and you don't want it to be writeable you don't have to implement it at all.
    872 
    873 Jelte: I think you can do that today. It was written over 6 months ago now, though, so...
    874 
    875 Michal: I believe we want to merge first and see what a writeable datasource might look like before we start refactoring.
    876 
    877 Shane: questions: How do you handle concurrent access in the current code?
    878 
    879 Jelte: for DDNS it is the datasource itself that handles the packet, so right now it doesn't worrry about it, in the case of IXFR it is a separate process and it will send a fail.
    880 
    881 Shane: with IXFR we should not have to worry about that since we are the ones doing the updates. Thats probably appropriate though we may want to define a default where we lock everything, for naive implementations.
    882 
    883 Jelte: if you make a very simple implementation it just sets a lock.
    884 
    885 Shane: for ease of use for implementors, we may want to put an in memory mutex there by default.
    886 
    887 Shane: can we not use an in memory lock if we have multiple processes?
    888 
    889 Michal: we could but it wouldnt be easy. If you provide it in the abstract class the simple version may use it, but...
    890 
    891 Shane: okay we can refactor this later if we need to.
    892 
    893 Shane: Multiple processes? I guess with SQLite we dont care too much. I guess Jinmei or Michal or someone thought about this for multiple processes.
    894 
    895 Jinmei: in the in memory data source?
    896 
    897 Michal: if you have the memory shared you can share a semiphore. But you need another daemon that handles it that holds the data. It would be another process. It seems quite heavyweight.
    898 
    899 Shift to some discussion of multicore model as it relates.
    900 
    901 Shane: my thinking is we would scale across multiple cores by using multiple processes.
    902 
    903 Michal: you could use the writeable as SQLite and inmemory as the secondary store.
    904 
    905 Shane: we could encode deltas as well, useful for a big zone.
    906 
    907 Michal: we could start loading from the datasource in parallel with handling the current data as well.
    908 
    909 Shane: if you're using a system that requires the performance of an in memory data store, you will then start dropping queries.
    910 
    911 Jinmei: can we get back to dynamic updates?
    912 
    913 Shane: the proposal is that we dont allow dynamic updates to the inmemory source at all - that when you need to change it you do partial or full zone reload.
    914 
    915 Jelte: either DDNS or IXFR says you have to store it there before you start serving it anyway.
    916 
    917 Stephen: one thing about an auth server is the updates wont be that frequent.
    918 
    919 Shane: I really think this might be the right way, where in memory gets its data from another source, and if you want to update it, we have an upload method that can be done with a delta, and have an API for the upload method.
    920 
    921 Stephen: you can load it into memory as soon as possible, but if you get multiple updates to a single record (Shane notes this happens in dHCP) it is complex.
    922 
    923 Shane: if you presume the set of changes will be small and infrequent, you can lock the whole dataset to make the changes.
    924 
    925 Jelte: this sounds remarkably similar to constructing an IXFR out packet.
    926 
    927 Stephen: how often are things read in the dHCP case?
    928 
    929 Shane: it depends on the environment. In a reverse tree, probably pretty soon. For some reason many machines want to do reverse lookup.
    930 
    931 Stephen: if its going to be updated 5x before its uploaded again there is no point. just mark it as dirty.
    932 If you've stored it, and you update it on disc before you bring it to memory, then you do essentially have a hot spot cache.
    933 
    934 Jelte: as a general design thing.
    935 
    936 Shane: it listens for the query, applies all the prerequisites, and the pushes it down to the datasource
    937 
    938 Shane: do views apply to DDNS?
    939 
    940 Jinmei: yes
    941 
    942 Jelte: the reason I applied this to all the layer of abstraction is that if you have a datasource that can handle more efficiently you can rewrite it
    943 
    944 Shane: if I have prerequisites across multiple data sources will that be a problem for us?
    945 
    946 Shane: Then the datasource layer needs to do prerequisite checking and then the actual updates. Then for in memory, we need an abstract class for stable storage.
    947 
    948 Stephen: I think this is a problem for a very large zone
    949 
    950 Shane: so we need a signaling mechanism for them to get updated, which would end up a lot like IXFR out.
    951 
    952 Stephen: unless we say well, when we load a zone from zone file, we load it into a database, full stop. If you want a zone file out, we just write a zone file back out.
    953 
    954 Shane: yes that is the right model
    955 
    956 Michal: yes and you can use the zone file as a source for the inmemory datasource
    957 
    958 Stephen: only going through an intermediate database
    959 
    960 Michal: you don't need that.
    961 
    962 Shane: we probably need a special case.
    963 
    964 Shane: it probably needs to be a synchronous notify, so it can also send data back to the stable database.
    965 
    966 Michal: but there is no guarantee.
    967 
    968 Jinmei: so how do we ensure consistency between original and in memory?
    969 
    970 Shane: that is why I propose the synchronous model, so when there is an update to the "disk space" datasource, it sends an update to the in memory, which is a process, waits for the reply, indicating the update, and only then is the process complete. This will also allow us to do other things in the future.
    971 
    972 Jeremy: I don't think the current msgq can keep up with this. Which is why we may replace it.
    973 
    974 Shane draws the current design plan on the notepad (see photo)
    975 
    976 Shane: updates are *really* slow in BIND 9, I think using a real SQL database in the back can buy us a lot.
    977 
    978 Jinmei: this might be faster than writing to disk directly?
    979 
    980 Shane: I think maybe. SQL people have worked very hard to get their writes fast.
    981 
    982 Michal: they tune the performance toward parallel updates
    983 
    984 Shane: the trick here is the SOA update which has to occur with every update
    985 
    986 Jeremy: no, it can be trained to every 300 seconds.
    987 
    988 Shane: so that would be a lot faster, yes
    989 
    990 Jelte: I was thinking of a shorter time, but yeah, we would do that
    991 
    992 Shane: in DHCP they queue up the answers and synch them periodically.
    993 
    994 Jinmei: does this architecture have its own bottlenecks in it, and in the worst place, does the request from DDNS block the auth server from responding to further queries?
    995 
    996 Michal: this is where we need the good msgq.
    997 
    998 Shane: there are potential bottlenecks. I think with this model though, its a bit like microkernel architecture, you can throw it away if its a problem.
    999 
    1000 Jinmei: in the case of IXFR with NSD, I thought that it does periodic updates, like every 30 seconds or something. Not update immediately upon receipt.
    1001 
    1002 Shane: in the update I worked with it in we did updates every minute.
    1003 
    1004 Jinmei: so it can combine incoming updates. In that case I dont know if it also makes sense for dynamic updates. Especially if the update rate is quite high.
    1005 
    1006 Shane: could be.
    1007 
    1008 Jinmei: I simply dont know.
    1009 
    1010 Shane: batch processing can be a lot more efficient but with DDNS it may be difficult to ensire fairness.
    1011 
    1012 Likun: we need to think about the lightest uses, like a user who just needs to start an auth server and we dont want the model coupling too much.
    1013 
    1014 Shane: I think we can easily hide all this from the user. You just load a zone file in and start. That should be the default. If you're not configured as a secondary we should not start xfrin or zone manager modules. Automatically.
    1015 
    1016 '''Y3 deliverable: logging'''
    1017 
    1018 Stephen: log for cxx did not work, so now logging just goes to std-out and that's it. We have to decide what we want to do with the logging. DO we go to another existing package or do we write our own? The log4cxx has the advantage that you can create independent loggers with individual characteristics. So for one module you could have detailed logging and very primitive logging for another. It also provides multiple levels and destinations. My principle reason for choosing it was that it is already there. If however, we decide we want to implement our own, we need to do everything log4cxx does now plus... ?? So.... what do we do?
    1019 
    1020 Jinmei: Is it true that FreeBSD doesn't have sufficiently new version of Log4cxx?
    1021 
    1022 Jeremy: it was not in the packages collection.
    1023 
    1024 Jinmei: which version?
    1025 
    1026 Stephen: I downloaded the Ubuntu version and that was 0.9.8.
    1027 
    1028 Jinmei: on my laptop is 0.10.0
    1029 
    1030 Stephen: the version we had problems with was a 0.9.x and the issue was they changed an underlying strings thing due to a windows issue.
    1031 
    1032 Stephen: we could leave it logging to std-out for the OSes it doesnt work on and hope that upcoming versions fix this? Log4cxx comes from Apache, but there are others.
    1033 
    1034 Jelte: SyslogNG has its own API?
    1035 
    1036 Stephen: so what is a simple logging system? Log4cxx is really complex, and realistically you dont need this.
    1037 
    1038 Jeremy: BIND 9 logging is hard to configure, but it does have a lot of features
    1039 
    1040 Stephen: this is part of why I wanted Log4cxx, becaue it seems to have the features people want
    1041 
    1042 Jeremy: what about log4c+? It is just run by one guy (http://log4cplus.sourceforge.net/)
    1043 
    1044 Stephen: yeah that makes it a non starter
    1045 
    1046 Jeremy: or we embed and maintain.
    1047 
    1048 Stephen: going back, would it be right to go along the same lines as log4cxx, but not as flexible, in our own implementation.
    1049 
    1050 Jelte: I would be fine with that
    1051 
    1052 JEremy: I dont think I want us handling log rotations or anything like that
    1053 
    1054 Stephen: so we need to either base on an existing package or...
    1055 
    1056 Stephen: principle of least surprise, do we make it do what bind 9 does?
    1057 
    1058 Jinmei: maybe its sufficient to use the features in the operating system support - but I also think it makes sense to have a minimal version that does *not* rely on something like Log4cxx, as Jelte said.
    1059 
    1060 Stephen: ok, compromise. we write a minimal implementation, no log rotations, goes to a few specific locations, and we have the option for plugging in log4cxx later for people for whom it works with their OS.
    1061 
    1062 Others: we also lookd at glog, logging for c++ by google. It didn't have documentation with it.
    1063 
    1064 '''Y3 deliverable: IXFR-out'''
    1065 
    1066 
    1067 '''Y3 deliverable: XFR-in'''
    1068 
    1069 
    1070 '''Y3 deliverable: DNSSEC validation '''
    1071 
    1072 Jeremy: really need a specifications document.
    1073 
    1074 Stephen: Q: are we really trying to requirements or design. (A: neither at the moment.)
    1075 
    1076 Michael: can approach this by supporting one algorithm initially.
    1077 
    1078 Shane: can decompose.  e.g. know about trust anchor management, could document that.  However, really do need to understand this before we start writing code.
    1079 
    1080 Michael: corner cases make life very complicated.  Also, validation is a combination of top-down and bottom-up validation.  Odd cases where you can almost reach it one day, then have to go back and read data other day.  If I plead for one requirements, its to make it easily validatable. 
    1081 
    1082 Michael: many recent bugs in BIND9 due to different trust levels of data.  Suggest having two caches, and copy between untrusted and trusted data.
    1083 
    1084 Vorner: Suggest we walk chain from root each time and check - don't need to do crypto every time.
    1085 
    1086 (Discussion on validation procedure: Problems when elements validation chain have different TTLs.  Hardest cases come when something is wrong - remember "roll over and die",  Stress need to have every corner case as a test case.)
    1087 
    1088 Michael: Really do need a specification/design document - need to document Mark Andrew's experience.  Can see it becoming a best practice document.
    1089 
    1090 Jinmei: can't document corner case here.
    1091 
    1092 Michael: how easy is it to issue queries?
    1093 
    1094 Jelte: not too difficult.
    1095 
    1096 Michael: need to do fetches in parallel.
    1097 
    1098 (Discussion on when to issue queries for DS records.)
    1099 
    1100 Michael: biggest problem in BIND-9 is retry time and retries.
    1101 
    1102 (Discussion on what to do with insecure responses.)
    1103 
    1104 Shane: Will task Jeremy to produce document describing validation process.  Will need to get periodic updates on the document - say every two weeks.
    1105 
    1106 Jeremy: Will work with BIND-9 developers to document existing code.
    1107 
    1108 
    1109 Jinmei: Q: 5011 support?
    1110 
    1111 Michael: A: Yes - is critical.
    1112 
    1113 Jinmey: Q: DLV Support?
    1114 
    1115 Michael: whether or BizOps says we need it.
    1116 
    1117 Shane: Need to support it for next year.
    1118 
    1119 Michael: Why do we need it? (If parent does not support DNSSEC)
    1120 
    1121 Jelte: Nice but not essential (Shane: agree.  Michael: recommend we don't implement it - nasty hack needed before root was signed; expect fewer zones will be signed with key here.)
    1122 
    1123 Conclusion - does not make sense to implement DLV now.
    1124 
    1125 
    1126 '''Refactoring ASIO'''/'''Event Driven or Threaded Model?'''
    1127 
    1128 ''We need to talk about how we're going to refactor the ASIO code, or at least the coroutine style. It's hard to work with.''[[BR]]
    1129 ''A suggestion to use non-preemptive threads for processing. We need to decide if this is worth pursuing, and what it would mean if we did.''
    1130 
    1131 '''Event Driven or Threaded Model'''
    1132 
    1133 External Assertion: event driven not good for high-performance server.
    1134 
    1135 With threads, have problem about concurrent access, and scaling gives problems?  (Assertion - no way to make program thread-safe?)  Proposed that real problem with threading is concurrency.  Proposed that threads operate one at a time.  Way to do this is co-operative multi-tasking(?)
    1136 
    1137 (Discussion on ASIO and coroutines.  Recommendation to remove coroutines)
    1138 
    1139 Jinmei: if we use event-driven model, don't see reason to drop ASIO.
    1140 
    1141 Michal: thread model can be used, but will hit problems with it.
    1142 
    1143 Conclusion: can get rid of coroutines with relatively little effort.
    1144 
    1145 Q: What version of ASIO do we use and are we updating it?  A: Jeremy will check.
    1146 
    1147 Q: do templates give code-bloat? A: Stephen will investigate.
    1148 
    1149 Shane: Threaded code may be simpler to read, but interface provided by pthreads is not easier to read.  However, have problems with things like cancel.
    1150 
    1151 Michael: multiple threads but only one thread running at a time.
    1152 
    1153 Shane: proposal from the comments was state threads (Apache project).  (Discussion of the state threads model)
    1154 
    1155 Vorner: potentially only single core, but multiple processes.  This does not appear to support multiple (real) threads.
    1156 
    1157 Vorner: Believe that we can run multiple processes for authoritative server.  But will need multiple threads to run resolver.
    1158 
    1159 Shane: What about current code?  Proposal is that we won't pursue this now - event driven code is easy enough to read.
    1160 
    1161 (Discussion about general multi-threading issues.)
    1162 
    1163 Michael: authoritative server can be done multi-process (although there is a lot of interaction in the data base).  Recursive server has too much interaction.
    1164 
    1165 
    1166 
    1167 == Wednesday, 2011-03-23 ==
    1168 
    1169 
    1170 '''Unit Testing: How to Do It''' (Medium)
    1171 
    1172 ''We should talk about our unit tests, and where and how we draw the line on testability. Some things are ''hard''.''
    1173 
    1174 Shane: our general rule is we test everything. There are cases where that is really hard. I have to say, though, some places I thought it would not be possible, it was, with refactoring. Do we have examples of places we dont have tests now because they're too hard? Assuming we don't test the libraries we rely upon.
    1175 
    1176 Jelte: I have one test that doesn't actually do statistical test on the QID but it does test that it doesn't get the same QID a few times in a row.
    1177 
    1178 Michael: a random number doesn't mean you never get a repeat.
    1179 
    1180 Jelte: which is why it does a few checks in a row.
    1181 
    1182 Michal: the part of the code without many tests is the TCP and UDP servers.
    1183 
    1184 Jelte: msgq is also insufficiently tested.
    1185 
    1186 Shane: that is one area that is quite difficult - when you interact with the external environemnt
    1187 
    1188 Michal: I dont think thats why they dont have tests - they were written at the beginning before the strict policy.
    1189 
    1190 Shane: For things like that, we could create our own descendent of the listening classes themselves, and use that for testing somehow.
    1191 
    1192 Michael: the Samba folks have a full virtual networking layer that lets you inject any format you want without using a networking stack to do it.
    1193 
    1194 Michal: you could use the loopback interface
    1195 
    1196 Shane: how do you cause bad behavior then?
    1197 
    1198 Stephen: the problem is testing how it fails
    1199 
    1200 Shane: if our code is structured so anything that doesn't succeed goes to the same code paths, this matters less.
    1201 
    1202 Michael: if you remove the network part of the unit testing, its more reliable.
    1203 
    1204 Jinmei: what is the goal of the topic?
    1205 
    1206 Shane: to discuss where we are failing to make unit tests, how to fix it, what we can do about it?
    1207 
    1208 (looking at an example of tcp server code)
    1209 
    1210 Shane: its easier to instrument python code for testing than c++.
    1211 
    1212 Stephen: if you're writing your c++ code and you want to point to something different for testing, build it into the object and put in a flag, so the production code includes code for testing, and I think that's valid. Its like an automobile with diagnostics for maintenance.
    1213 
    1214 Michal: you could use inject the tests with templates if you dont want the test code compiled in?
    1215 
    1216 Shane: possible.
    1217 
    1218 Jinmei: we can also use some higher level abstractions, by introducing class hierarchy just for the purpose of tests. There are techniques, but it is true it will be more difficult.
    1219 
    1220 Shane: its early binding which makes it more difficult.
    1221 
    1222 Jinmei: I dont think that is the essential difficulty.
    1223 
    1224 Michal: the places we dont test are sometimes main functions.
    1225 
    1226 Jinmei: One possible good thing is to have a wrapper layer - then we can separate the dependency - so we can test the code using the network related things.
    1227 
    1228 Shane: so add an indirection layer?
    1229 
    1230 Jinmei: right. Then we can use a fake certificate, fake network communication, etc. Then we can test all of these other things with the ASIO wrapper.
    1231 
    1232 Jelte: so we already have the layer, but if you replaced it you'd be rewriting much of ASIO. If we have that layer and we don't directly use ASIO directly, we use ASIO link. But if we replace it for testing, we'd have to replace all the functionality.
    1233 
    1234 Michal: we only have to replace some specific network parts.
    1235 
    1236 Stephen: you can inject packets, but if you have a fate where you replace a routine to write packets to the netowork, the routine has to do a callback, and it replicates a lot of effort. I think its really only the servers we haven't really tested.
    1237 
    1238 Michael: have you tested the client query stuff?
    1239 
    1240 Shane: no, we don't check for it.
    1241 
    1242 Jelte: we do test the resolver behavior.
    1243 
    1244 Likun: can we look at ASIO's test code?
    1245 
    1246 Jeremy: I was just looking, ASIO and Boost have unit tests. Maybe we can work with them.
    1247 
    1248 (Shane brings up boost.org and a google search for ASIO and Boost tests)
    1249 
    1250 Shane: we should research this
    1251 
    1252 Jinmei: at least in theory we should be able to test all parts but the wrapper itself, but some things heavily rely on the core ASIO. Another thing is that if the wrapper itself is very trivial, we can maybe skip that - it will simply mean testing with the external library itself. If the wrapper is difficult, then it needs tests
    1253 
    1254 Shane infinite regression!
    1255 
    1256 Michael: if they test the ASIO stuff, they've got to have a way to do this
    1257 
    1258 Jeremy: check out
    1259 http://www.boost.org/development/tests/trunk/developer/asio.html
    1260 for example: http://svn.boost.org/svn/boost/trunk/libs/asio/test/basic_datagram_socket.cpp
    1261 
    1262 Michael: I think client behavior is trickiest.
    1263 
    1264 Shane: do you mean the resolver?
    1265 
    1266 Michael: yes.
    1267 
    1268 Shane: we will test packet drops, packet delays, incorrect answers, etc, but we wont test UDP checksum errors etc.
    1269 
    1270 Michael: of course not.
    1271 
    1272 Michal: we will have the demultiplex thing, so we will test on that level. Right now the client in the resolver is... temporary, right?
    1273 
    1274 Shane: part of it is. the demuxer is a layer in front of that.
    1275 
    1276 Jelte: yes. Right now the resolver issues its own queries and it would ask the demuxer to do the sending of the actual packets.
    1277 
    1278 Michael: do you use the system resolver to send notifies?
    1279 
    1280 Shane: yes
    1281 
    1282 Michael: thats the right way. don't change that.
    1283 
    1284 Shane: for unit testing, for new code, there should be *no* new code that you cant write a unit test for. If you cant figure out how to write the test, speak to the team. I was trying to do some BOSS work and I couldnt figure it out and tried functional tests and then Michal asked if I needed to and I realized I didn't. So that works.
    1285 
    1286 Larissa: and people can mention it in a daily scrum call if they're stuck on a test
    1287 
    1288 Shane: yes
    1289 
    1290 Larissa: arent we also doing TDD?
    1291 
    1292 Shane: yes.
    1293 
    1294 Larissa: and arent those unit test?
    1295 
    1296 Shane: yes, but people get stuck, so they dont write the test, they just write the code.
    1297 
    1298 Michal: what about refactoring?
    1299 
    1300 Shane: if you refactor the code you refactor the test. If you're writing a sort function and then you refactor, even though its internal, or private, you refactor the test.
    1301 
    1302 Stephen: you test *all* the code you write.
    1303 
    1304 Michael: if you dont test functions, you have to write more tests from the outside. If you have internal tests, your tests are less fragile.
    1305 
    1306 Shane: you have to test the function somehow.
    1307 
    1308 Michael: right if you know you tested that then at the higher level you can trust that its tested, its opaque, and thats okay.
    1309 
    1310 Jinmei: I got lost. This is about testing private things? I am afraid there is no single universal solution to this problem I think we need to use our discretion.
    1311 
    1312 Stephen: the simple way is to make it protected.
    1313 
    1314 Michael: we had this discussion in another BIND 10 meeting, that we will allow other people to shoot themselves in the foot, if they want to mess with this stuff. Why make it private?
    1315 
    1316 Shane: private is an *advisory notice*, not something we use to prevent.
    1317 
    1318 Jelte: I thought the decision back then was to not change our interfaces for testing.
    1319 
    1320 Stephen: as the code becomes more complex, why not put in code that is just for testing?
    1321 
    1322 Shane: I think we were saying we didn't want different code executed for testing.
    1323 
    1324 Michael: the plan for BIND 9 is to be able to compile a test version that's static. We also have to rename functions in BIND 9, but you're protected from that with c++.
    1325 
    1326 Stephen: if you access something protected for test use, but have it set to private for regular use, and there is a macro, it just wont compile if you try to compile it for real environment not test environment. It will compile for testing only.
    1327 
    1328 Michael: its worth trying this and seeing how it goes, but it may end up you just need a comment or somehting.
    1329 
    1330 Michal: people don't read docs/comments.
    1331 
    1332 Shane: its good to use the standard way the language is normally used.
    1333 
    1334 Larissa: project goal of understandable hackable code....
    1335 
    1336 Jeremy: should we focus time on getting better coverage? We have some specific areas with poor coverage.
    1337 
    1338 Shane: I think this has been getting better. Except msgq. And BOSS. But these have a refactoring scheduled.
    1339 
    1340 Jeremy: bindctl and xfrout and xfr library need more tests. The datasource master. We knew, but it needs all testing done.
    1341 
    1342 Shane: we will also be refactoring datasource soon. I hope.
    1343 
    1344 Jeremy: there are a few things.
    1345 
    1346 Shane: all of these places will be touched within the next 3-6 months, so the question is should we expand the scope of the changes to also add tests.
    1347 
    1348 Jeremy: I would guess yes, because otherwise people will only test what they are writing
    1349 
    1350 Michael: and its always better to have tests first.
    1351 
    1352 Jinmei: I think in general we should care about test coverage but should we introduce specific action to address this concern?
    1353 
    1354 Shane: we have two pieces of work scheduled that will affect xfrout daemon. So we can schedule another task before those that is for writing tests and the relevant refactoring.
    1355 
    1356 Jinmei: there are some other cases that are normally considered difficult to test. Database related things. That would be moreso when we add more backend databases. I anticipate some excuses and reasons we cant test in this instance.
    1357 
    1358 Shane: I think the tests we have for SQLite now are a little broken.
    1359 
    1360 Michael: you have to run the relevant server to test the specific backend, which can eventually not scale.
    1361 
    1362 Jelte: it would be nice to have a generalized datasource functional test suite.
    1363 
    1364 Michal: isn't there some kind of general database library where you send SQL but it doesn't matter which server is thee?
    1365 
    1366 Shane: I looked at this 7 years ago and the answer was no because once you do anything non trivial, things vary really a lot.
    1367 
    1368 Michael: databases are becoming more standard now.
    1369 
    1370 Shane: its the details of how things work within databases that are really different. Jelte and I looked at the SQLite schema, and normally where you would expect a between command to work, it doesn't work there. Thats a really simple thing. Some systems don't support nested selects, etc and so forth.
    1371 
    1372 Jelte: we need to have some high level tests, functional tests, that run on any datasource.
    1373 
    1374 Michael: unit test what you can, don't unit test what you can't, in this instance.
    1375 
    1376 Jinmei: we can't solve this today, but this way we are prepared for the case.
    1377 
    1378 Michal: some people do not want the SQL backend to be compiled at all, and some will, and they will have SQL running anwyay, and will want to test it, and we want to test it.
    1379 
    1380 Jinmei: another point: time related tests?
    1381 
    1382 Michal: for some of the time related stuff we could provide our own function that gives the time. And then the time moves. We could put it in a common library.
    1383 
    1384 Jeremy: I am just wondering how important these things are. I don't know what all the tests are but 5 time tests have been failing. I don't know how important they are. Would bind10-auth or resolver fail on a virtual machine?
    1385 
    1386 Jinmei: possibly. Even forgetting about VMs, time related tests are tricky.
    1387 
    1388 Larissa: at BayLISA multiple operators asked me if we are optimizing for VMs.
    1389 
    1390 Jinmei: I think even if its ugly, its much better to test it than not test it - but its not so sophisticated.
    1391 
    1392 Michal: one of the tests that failed, is a test where somehow I created a msgq core and a client, and tried to see if the traffic will arrive, and I put a timeout there. There is no timeout in real life, but if its stuck forever... I put a timeout there I thought was large enough but it turned out it was not.
    1393 
    1394 Michael: we also have to start considering timing involving DNSSEC validation stuff. Then you have to plan time tests involving months.
    1395 
    1396 Larissa: Francis wrote some sort of time machine meant to help with that.
    1397 
    1398 Michal: we don't want to ask for the time once we are computing, but we ask so many times, and the time only differs by milliseconds.
    1399 
    1400 Michael: but how do you know?
    1401 
    1402 Michael: BIND 9 has two useful things - one, once a test starts, gettimeofday locks down. Second, Francis wrote this time library with an exponential curve that crushes 30 days into 15 seconds. There are some tests you can do that are helpful that way. Particularly for functional tests. Its a library that you can use. Compiled in for some things.
    1403 
    1404 Jeremy: to finish my point, once we know the test is what we want, and it still fails on virtual machines, maybe its the code that needs tuning not the test.
    1405 
    1406 Shane: sometimes its really not the code causing the test to fail.
    1407 
    1408 Shane: also about timing, every time we add time to a timing test, it adds waiting when I type make check. Sometimes you need a small wait, but they add up over time.
    1409 
    1410 Stephen: then get a biscuit with your coffee.
    1411 
    1412 Shane: but in a year or two, will it take three hours? Lets think about this as we write the tests.
    1413 
    1414 Michael: eventually maybe we can get tests running in the background. make test running continuously on the laptop.
    1415 
    1416 Shane: I run make check across the whole system when I do a review.
    1417 
    1418 Michael: it takes 8 hours to run the tests on BIND 9. Don't ever get there on BIND 10.
    1419 
    1420 Jinmei: Can we make a rule for this? Timing tests? We may want a generic framework for faking time.
    1421 
    1422 Stephen: can we pull across Francis's work?
    1423 
    1424 Shane: for functional testing. For unit tests we need arbitrary time values.
    1425 
    1426 Jinmei: regarding tests taking time, there are severl issues. In general taking time for tests is a bad thing because it makes people skip running tests. So one question is whether we want to avoid that. I personally think its better to run the tests.
    1427 
    1428 Shane: could we flag time related tests?
    1429 
    1430 Jinmei: there is not a general flag but we could include time in the name and separate them that way.
    1431 
    1432 Shane: is it possible in google test to run tests in parallel?
    1433 
    1434 Jinmei: maybe
    1435 
    1436 Michal: I don't think so. But we have many test programs, they could run in parallel, but I worry that they use ports.
    1437 
    1438 Michael: we can't run all our variants in BIND 9 in parallel. We have to stop unit tests to run specific tests and then remember to turn them back on, and it sucks. This is why I recommend looking at what Samba does.
    1439 
    1440 Jelte: I think this is also what Unbound does.
    1441 
    1442 Michael: if you don't use ports, you can run in parallel.
    1443 
    1444 Michal: it would work if we didn't use auto tools.
    1445 
    1446 Jinmei: so we could introduce a filter for longer duration tests. The other thing is that I would suggest using smaller timeouts as much as possible. That also means we may want to change the API so that it will take a milisecond granularity.
    1447 
    1448 Shane: which API?
    1449 
    1450 Jinmei: an example would be the cache timeout for Hot Spot Cache. It is set to seconds which makes sense functionally but not for tests.
    1451 
    1452 Michael: google test does not run tests in parallel and has no ability to built in, but it does support the naming pattern sets. So if you say named things "slow" or "fast" you could break down some tests.
    1453 
    1454 Jelte: lots of projects do "make test" or "make all tests"
    1455 
    1456 Shane: then people never run "make all tests" - I want there to be pressure against avoiding tests
    1457 
    1458 Jelte: except that if the tests take sooooo long people stop running them at all. Just run the tests you are interested in. You can specify which tests run with which features too.
    1459 
    1460 Jinmei: in any case my approach would be to have high level techniques to shorten the time we need for tests, and to have that concept in the review test, so if the reviewer can check the time of the test and bring it up if its long...
    1461 
    1462 Shane: someone add that to the review process now!
    1463 
    1464 
    1465 '''Functional Testing: How to Do It''' (Medium)
    1466 ''
    1467 This is testing at a higher level. We have had some brainstorming about this at the end of Y2 during our mad testing phase, but we need to formalize our work here.''
    1468 
    1469 Shane: testing is one of those things where getting the terminology right is tricky. In our project we understand unit testing but we have no or nearly no functional testing. In our case we mean running the software as a system and seeing what happens.
    1470 
    1471 Jeremy: I have a few ad-hoc scripts for server start, loadzone, xfrin, dig, etc
    1472 
    1473 Shane: unlike unit testing we want to do this at the system level, right? Do we want to define it by module?
    1474 
    1475 Jinmei: what?
    1476 
    1477 Shane: do we want to define tests for cmd-ctl, or just for configuration, etc
    1478 
    1479 Stephen: if you list requirements, there might be functional tests that correspond.
    1480 
    1481 Shane: a note for jeremy, we need to at least identify which tests cover which areas of the functional dns specification.
    1482 
    1483 Michael: how will you write specifications? Is it a user story format?
    1484 
    1485 Stephen: I think we're talking about the same thing. Every requirement should be testable.
    1486 
    1487 Michael: the reason I like user stories is because it focuses you on the user focused outcome.
    1488 
    1489 Stephen: except we write from RFCs
    1490 
    1491 Michael: BIND 9 was written in RFCs... and the user interface...
    1492 
    1493 (discussion about what user stories are)
    1494 
    1495 Michael: the idea that a user story translates to a functional test is very useful.
    1496 
    1497 Shane: let me pull up an example.
    1498 
    1499 http://bind10.isc.org/wiki/MasterOfBindRequirements
    1500 
    1501 This has functional and programatic requirements.
    1502 
    1503 Shane: assuming we have a framework to execute tests on a functional level, who writes the tests, when do they get written, and do we have a document to track them?
    1504 
    1505 Stephen: whether we use user stories or requirements statements or a combination, how do we test it?
    1506 
    1507 Michael: you can do a "work in progress test" where a test you're going to add goes.
    1508 
    1509 Stephen: the reason why this business about the requirements came up is that DNS is specified by many RFCs plus we have BIND 9 compatibility.
    1510 
    1511 Michael: can the requirements be generated from the test suite, or are the requirements their own document?
    1512 
    1513 Michal: I would rather have them in the same file, from the developer point of view.
    1514 
    1515 Michael: this is what I would recommend. But there is one catch - you end up with one functional spec, but 40 tests for one functional spec. Numbering can get weird.
    1516 
    1517 Jeremy: lets say I write 700 statements. They are a few sentences each, and I attribute them to source code or RFCs. I can put it in XML, parse it out, generate HTML, whatver, and point to URL in the test cases?
    1518 
    1519 Shane: in XML it will generate directories, it could even generate test stubs.
    1520 
    1521 Michal: then someone has to write the test, and they can put a comment that links to the specification. But when the test fails the error message should indicate what the test tried to do.
    1522 
    1523 Jeremy: I have this document and then changes go along and we change a requirement, then we change tests?
    1524 
    1525 Michael: but we're talking about having the descriptions in the tests. So the master file is that XML document. How do we structure this and is there a tool that will do it?
    1526 
    1527 Shane: there are probably 700 test frameworks that academics have written.
    1528 
    1529 Jeremy: I think we should try one of the three python cucumber clones.
    1530 
    1531 Michael: you can use either one, you dont end up writing much code in those. Its very verbose, english language type testing. Its really driven for user stories.
    1532 
    1533 (looking at http://cukes.info)
    1534 
    1535 Michael: I experimented with this and I liked it, but I dont think it would be easy to get BIND 9 people to do it. You would be more able to do this because you're just starting to implement functional tests. Also this is a very good format for developing tests progressively.
    1536 
    1537 Shane: I'm trying to think of corner cases. How would this work for say a key rollover in DNSSEC. There are a lot of ways to *do* a key rollover. Do you document them all?
    1538 
    1539 Stephen: there are a sequence of tests. "Given I have put a DNS key in the zone and I have waited xyz I should see xxx"
    1540 
    1541 Shane: and I guess we choose how we implement this.
    1542 
    1543 Jeremy: in some situations we start one server snd run many tests. in other situations we run multiple server to run one test, and stop between, etc. How does that work?
    1544 
    1545 Michael: its just. slow. You can set it up to specifically track and kill processes, etc. I also have things I call "meta sets". It knows what having a dnssec implementation with 3 masters means.
    1546 
    1547 Jeremy: the good thing for us as we create these rules, if it doesnt work right, we can fix BIND 10.
    1548 
    1549 Michael: I would love to be able to run the same test suite against BIND 9 for things that make sense.
    1550 
    1551 Shane: like tests where we change config engines would be different.
    1552 
    1553 Shane: so getting to implementation, I think finding a python cucumber clones would make sense. In the past I would have asked Jeremy to look for that, but will you have time?
    1554 
    1555 Jeremy: I would like to but I would only to have a couple of days to look. I would also like Jinmei and Michael to explain the systest that is in BIND 9 now.
    1556 
    1557 Jinmei: its basically lifted from BIND 9's system tests.
    1558 
    1559 Shane: is this an executable program?
    1560 
    1561 Jinmei: for now its a shell wrapper thing. You can look at the source code.
    1562 
    1563 (team looks at test.sh in bindctl)
    1564 
    1565 Michael: yes this is vry much like what BIND 9 does, its disgusting, but it works.
    1566 
    1567 Jinmei: yes this was a quick hack to get some testing done before a release. We can throw it away or enhance and integrate it. Or I don't know.
    1568 
    1569 Michael: the one problem with BIND 9s system tests is that you really want to start the server, issue a query, do a specific thing, shut it down, do the next one. BIND 9 starts, does a lot of tests, and then shuts down. Its not as clean of a test. Its expedient in some cases but its not good test methodology.
    1570 
    1571 Shane: this may depend on the kind of test.
    1572 
    1573 Michael: one improvement I want is, the way you make a test is, you find one that does something like you did and you copy it. Refactoring to a library for common use cases would be better. This could be shared between BIND 9 and 10.
    1574 
    1575 Shane: so.... yeah. I don't even know if we would port these, maybe we would, but they should reflect a requirement. We will have requirements that arent in the DNS spec. Like statistics, etc.
    1576 
    1577 Stephen: we need to make an assessment, as to how much is automated, a couple of things may not be worth it.
    1578 
    1579 Shane: we may need at least two documents. One is a DNS specification but the other is other related things.
    1580 
    1581 Michael: in cucumber you can tag them, so we can have a set of RFC compliance specific tests, statistics specific tests, etc.
    1582 
    1583 Michal: can you have a test that has no requirement?
    1584 
    1585 Shane: no, actually, there needs to be a requirement or why is it there? You need to say what happens if you start a server when its already running? etc.
    1586 
    1587 Michael: remember how we're doing unit testing. Once something runs cleanly you can rely on the unit test.
    1588 
    1589 Shane: this also applies backward pressure on developers to avoid adding cool features that no one asked for.
    1590 
    1591 Shane: we may have to have developers do some of the research on test frameworks and set it up.
    1592 
    1593 Michael: maybe 3 people each research one and bring it to the engineering forum for 15 minutes.
    1594 
    1595 Shane: hmm...
    1596 
    1597 All: maybe we do this in a bind 10 staff meeting and then present the decision.
    1598 
    1599 Jeremy: there are 3 python based cucumber clones, and maybe we can just look at those.
    1600 
    1601 Michael: ATF is an option too. It spits out XML.
    1602 
    1603 Jeremy: and I know the ATF developer.
    1604 
    1605 All: hmmmmm.
    1606 
    1607 Shane: okay. Jeremy, if you have time over the next two weeks to figure this out, then cool. If not, we'll flag it, and we'll get other resources onto the solution.
    1608 
    1609 Jinmei: what do we do with the existing test framework?
    1610 
    1611 Shane: will we need to add tests in the next two weeks? We don't know.
    1612 
    1613 Michael: did the tests you ported over from bIND 9 find problems?
    1614 
    1615 Jinmei: yes I did
    1616 
    1617 Michael: then I would continue with this and prioritize for importance and ease
    1618 
    1619 discussion of existing tests written against dns-python and what to do with them? Should we rewrite to use our own library or not?
    1620 
    1621 Jeremy: can we set goals for the year?
    1622  * Jeremy will research test frameworks and not spend more than 3 days
    1623  * set our functional test framework by end of May
    1624  * develop xx number functional tests or % of  tests by end of y3 - for example (100% P1, 50% P2, 0% P3)
    1625  * Jeremy will share his list of requirement/stories with Larissa sprint by sprint and she will set priority with guidance from the team (we will see if this works, resource wise) - developers write test implementatios and they are reviewed with code.
    1626 
    1627 
    1628 '''Testing Suites''' (Medium)
    1629 ''
    1630 In addition to functional testing, we may want to include several other type of testing suites such as Tahi (for example, performance).''
    1631 
    1632 Shane: Jeremy looked at Tahi, which is an IPv6 thing with close ties to the WIDE project.
    1633 
    1634 Michal: I looked at it, its for testing IPv6 infrastructure.
    1635 
    1636 Jeremy: it seems like the scripts and requirements are not generated automatically, but I've never set up the platform.
    1637 
    1638 Michal: It seems like you need a complete laptop setup and you need to change your environment to run it. They provide their own DNS server and client. If I understand correctly, they are checking to see that the network runs DNS, not that the DNS server runs.
    1639 
    1640 Jeremy: it might be useful, but the setup time might be high. 2-3 days at least to set up virtual servers.
    1641 
    1642 Shane: the main use of it is probably to tell people we run it.
    1643 
    1644 Jinmei: I can talk to the developers of it, I know them.
    1645 
    1646 Shane: the coolest thing would be if there is an existing lab we could use it in. CNNIC is using it.
    1647 
    1648 Jeremy: we could ask Cathy that.
    1649 
    1650 Shane: of Jinmei could talk to the developer, that might be best.
    1651 
    1652 Jinmei: if we are very lucky they may be interested in testing bIND 10, but I don't know. I will ask for general advice.
    1653 
    1654 Jeremy: there is another test suite called Protos that is a java based conformance suite.
    1655 
    1656 Michael: there is a huge set of people writing test suites. Its a service model
    1657 
    1658 Shane: maybe OARC could ask people... lets ask dnssec-deployment what suites they are using for dnssec conformance? Shane will ask.
    1659 
    1660 Fujiwara-San: I made a specifications document I will share with the team.
    1661 
    1662 Larissa: that document was excellent and may be useful to Jeremy's requirements doc as well.
    1663 
    1664 Shane: there is also non functional testing, you can convert a lot of it to functional testing. But for performance benchmarking you really want a chart or a list.
    1665 
    1666 Jeremy: our current tests are not automated because there were always failures.
    1667 
    1668 Shane: it would be really nice if we could include that testing in our test suites so the team can run the tests.
    1669 
    1670 Jeremy: some of it will be duplicated by what the functional tests do. So I am wondering if I should move it into the functional test layout.
    1671 
    1672 Shane: maybe see if any of the functional test framework supports performance benchmarking. Or we could also have timing reported for all our tests and tag things for performance specific tests.
    1673 
    1674 Jeremy: we also have Jinmei's microbenchmark testing that is a bit like unit tests. I dont think people use them outside development.
    1675 
    1676 Jinmei: they are not for regular use, they are for when you want to introduce an optimization to see if you actually improve performance.
    1677 
    1678 Jeremy: my concern is maybe people don't know about it.
    1679 
    1680 Shane: what about Stephen's fuzz testing?
    1681 
    1682 Stephen: yes I am planning to expand it actually.
    1683 
    1684 Shane: what we want to do at some point is leave fuzz testing running for a weekend prior to release. We will want to include that.
    1685 
    1686 stephen: its in the experiemental branch for now.
    1687 
    1688 Shane: there is a test directory off main. It can go there.
    1689 
    1690 Jeremy: Fujiwara-San also has a fuzz tool that fakes traffic.
    1691 
    1692 '''Modularity & Hooks''' (Medium)
    1693 
    1694 Michal notes:
    1695 
    1696    I proposed it some time ago on the mailing list,
    1697    some people looked at it, I got few comments from few people, but we should
    1698    talk more widely if we want something like this. If so, we should start using
    1699    it ASAP, because it could easy some development or at last lower the need to
    1700    refactor later.
    1701 
    1702    The ideas are here: http://bind10.isc.org/wiki/modularity
    1703 
    1704 
    1705 Michal: I would like the user to be able to not just add behavior but also remove the default behavior to replace it with theirs. We would build a whole system for the hooks, and it would have advantages for us as well, where we can generalize a library that does listening on the network.
    1706 
    1707 Larissa: so are you saying make all the existing process modules act like hooks?
    1708 
    1709 Michal: yes.
    1710 
    1711 Stephen: One of the things about hooks and putting data out and pulling it in is the data is basically self contained. As soon as you start doing processing, you're accessing internal data structures, and that complicates things. If you want to change data in the cache, do you put hooks into the cache?
    1712 
    1713 Michal: I would make the cache itself a hook.
    1714 
    1715 Stephen: I see hooks more as a set of well defined points where you can change specific simple things in the code.
    1716 
    1717 Shane: explain more please.
    1718 
    1719 Michael: is this a hook or more like a filter?
    1720 
    1721 Michal: I don't know exactly what to call it, its a bit like Apache.
    1722 
    1723 Shane: ok, so..... I can see how this could be fairly straighforward in our event processing today
    1724 
    1725 Michal: so then you build the server at runtime from the parts.
    1726 
    1727 Shane: so basically when we get an event we do things and at the end we register a callback to another thing. We could change the callback to be what the user wanted, which would fit with this model.
    1728 
    1729 Jelte: we kindof discussed this before, but currently we have two callbacks, dns-lookup and dns-answer, and if we made that a configurable list of dynamically available callbacks, maybe that would work?
    1730 
    1731 Michal: I want the callback to be able to modify the data. You could say "this is bogus, drop it" or "Stop processing, servfail" or...
    1732 
    1733 Michael: in asterisk call forwarding of all things, you do something and then you call what the next hook would be. Then you dont have a pre-defined list but you do have a library of options.
    1734 
    1735 Shane: if you're too flexible, if you don;'t want to write an entire telephone system, it is hard to set up asterisk.
    1736 
    1737 Jeremy: I think we need to write down 20 things we would want here. Some of them were discussed before we started the bind 10 project. two examples: have code points that point out to places where people would write scripts with an if-then statement. Another way is using firewall rules, like if ___ matches ____, accept/reject. Those would be a lot easier to do than configuring named.conf is today.
    1738 
    1739 Michal: if we could configure them like this, we could make them very powerful for power users.
    1740 
    1741 Shane: to me this seems like... how would this be different, for the user, than writing code? Easier I mean?
    1742 
    1743 Michal: because you can replace the library at run time. I want them to be able to both put in and take code out.
    1744 
    1745 Stephen: at some point, you can reconfigure everything at run time, and providing we've got our encapsulation right, you could replace the cache, you've got the object interface, replace it, and it works.
    1746 
    1747 Jelte: I would not do that with the cache.
    1748 
    1749 Michal: I would make the cache replaceable because the cache would be a source of data.
    1750 
    1751 Shane: if you want to change cache data, you can inherit from the existing cache and write your own, or you can also use the API for how the cache works today, and in the hook world, when you do xyz with the cache, a series of hooks are called. Administrators can make changes at each point.
    1752 
    1753 Jelte: I don't like that I think its wrong way round. I don't think people should modify cache behavior.
    1754 
    1755 Michal: if you want to change what to throw out, what do you do?
    1756 
    1757 Shane: An administrator wants to never cache data related to a specific website. So there is a specific hook point he can edit.
    1758 
    1759 Stephen: what is the business case? 80/20 rule
    1760 
    1761 Michael: if you cant make a case for why its useful, then why do it?
    1762 
    1763 Shane: there are blacklists in BIND 9, right? It would be nice if you didnt have to have special code to do that.
    1764 
    1765 Michael: that's a specific example.
    1766 
    1767 Jelte: I think everything people will want to do can be done with a fairly simple API. And we have several places (currently in TCP or UDP server now) and we point to specific callouts, we can do everything people would want.
    1768 
    1769 Larissa: I just want to make sure that this is still something sysadmins can deal with.
    1770 
    1771 Shane: what is the difference between this and writing a new ASIO block?
    1772 On a web server it used to be you had a callout point and you added a function.
    1773 
    1774 Jelte: if you write a module for apache or lighty, you write a function thats called, you configure when it will be called, and the context. It can modify anything, and it can send back some defined options.
    1775 
    1776 Shane: and there are defined steps. In this way there are no defined steps.
    1777 
    1778 Jelte proposes a model with a specific plugin module and specific limited list of points where it plugs in.
    1779 
    1780 Shane: why does this scare me less?
    1781 
    1782 Michael: I am worried we will write a language here. That is a big mistake. Think of the blacklist option? You're actually shortcircuiting certain options.
    1783 
    1784 Stephen: I think we need to keep it simple.
    1785 
    1786 Shane: maybe we do something simpler and then consider Michal's option later if we need to
    1787 
    1788 Larissa: I suggest a very simple prototype and then some user discussion.
    1789 
    1790 Jinmei: we need it to be testable by itself, we dont want to be able to replace everything. I generally think its a good idea to have a small potentially replacable module. I kindof think its a good idea to have a framework that makes this whole idea possible.
    1791 
    1792 Shane: one possible concern is that whenever you design something new thats complex you will get it wrong the first time.
    1793 
    1794 Michal: I really didn't completely design it, I just was inspired by Miranda and Apache.
    1795 
    1796 Shane: I am worried about an elaborate design that won't get used.
    1797 
    1798 Jelte: SIDN very much wants exactly the thing I described.
    1799 
    1800 Shane: we need a defined set of calls.
    1801 
    1802 Jinmei: decomposing the feature into separate pieces, or making everything decomposable, seems to be different.
    1803 
    1804 Larissa: I need to understand what people want to do. To figure out whether this is more complex than we need.
    1805 
    1806 Shane: Jelte and Michal's position is that it wont be any harder to do what they want than to do a smaller thing. So I suggest whoever wants to proposes a design. Define an API and some configuration examples, maybe some pseudo code, and then we evaluate it.
    1807 
    1808 Potential use cases:
    1809 
    1810  * DNSSEC signing w/ on the fly answers
    1811  * validating forwarding resolver
    1812  * blacklists
    1813  * NXDOMAIN redirection
    1814  * NSEC masking
    1815  * non DNS operational data management?
    1816  * script run upon AXFR
    1817  * query introspection (need to know why)
    1818  * alternate method to configure ACLs - to use an LDAP database to authenticate updates
    1819  * dynamically generated content of zone data - be able to write a script to send answers
    1820  * experiments with new data sources
    1821  * debugging - log various steps
    1822  * AS112?
    1823  * possibly use this to combine auth and recurse
    1824  * evlDNS stuff
    1825  * network discovery from behind a NAT
    1826  * change timing behaviors on the XFR side - have zones refresh more or less often
    1827  * pick or prefer specific masters
    1828  * change query behavior - resolver gets a timeout then it tries all the servers in the NS set
    1829  * non expiring cache for better performance
    1830  * reduction of configuration knobs
    1831  * Filter-AAAA or other IPv6
    1832  * stub zones?
    1833  * SCTP
    1834  * Shim6?
    1835  * alternate classes (think MIT people like Hesiod users)
    1836 
    1837 Thoughts: could we use the hooks system for BIND 9 compatability?
    1838 
    1839 We don't want to avoid coding in things that we really want, though
    1840 
    1841 What kind of programming languages will we support hooks in? C++ and Python, but... do we extend to other languages... we probably need perl. Could other people write layers to support other languages?
    1842 
    1843 '''''Lunch'''''
    1844 
    1845 '''Task Breakdown Part 1'''
    1846 
    1847 We begin our Epic Quest to break down the tasks for the first 6 months of Y3.
    1848 
    1849 == Thursday, 2011-03-24 ==
    1850 
    1851 '''Task Breakdown Part 2'''
    1852 
    1853 '''''Lunch'''''
    1854 
    1855 '''Scrum Estimation Part 1'''
    1856 
    1857 We need to do some planning poker for the tasks that we have identified for the start of Y3, so we can estimate how much we can deliver in each sprint, and so we can track our performance on an ongoing basis.
    1858 
    1859 == Friday, 2011-03-24 ==
    1860 
    1861 '''Scrum Estimation Part 2'''
    1862 
    1863 We should be able to finish our Scrum estimations here.
    1864 
    1865 '''''Lunch'''''
    1866 
    1867 '''Working with BIND 9 (Michael Graff)'''
    1868 
    1869 The main goal for Y3 is not BIND 9 compatibility, but we are going to
    1870 be living in a world where BIND 9 and BIND 10 are both running in the
    1871 wild. We would also like to avoid duplicate work and divergent code
    1872 paths as much as possible.
    1873 
    1874 Michael Graff, the BIND 9 programme manager, will be joining us and we
    1875 will discuss this topic.
    1876 
    1877 Shane: Michael has been running BIND 9 for about a year, as its first dedicated engineering manager.
    1878 
    1879 Michael: So we've been trying to do TDD, Scrum, and some other concepts used in BIND 10, with varying success.
    1880 
    1881 Jeremy: how long will BIND 9 last?
    1882 
    1883 Michael/Larissa/Shane: well, 7-10 more years... some current OS versions can't upgrade, people need motivation to upgrade, but there is a plan to deprecate ununsed features in BIND 9 so they need not be ported to BIND 10.
    1884 
    1885 Larissa: and can we talk about how code can be shared?
    1886 
    1887 Michael: yes, we are going to be using pythion
    1888 
    1889 <discussion of python 2 and 3>
    1890 
    1891 Michael: we will be writing key managment tools in BIND 9 in python that maybe we can use for both. (Discussion)
    1892 
    1893 Shane: one challenge  I have in bIND 9 is the tight coupling.
    1894 
    1895 Michael: The biggest problem I think is that it was written by engineers without object oriented experience to separate the data parts. That was a decision by some original BIND 9 developers and it was questioned then and its not consistent in the code.
    1896 
    1897 Shane: you're trying to figure out what behavior is going on but it has pseudo object orientation and you can't figure it out. This was to the database.
    1898 
    1899 Shane: we would like to lift/share code from bIND 9 when possible. If we do that, how do we keep changes in sync?
    1900 
    1901 Michael: It seems silly to reproduce things. There are a couple of things. In BIND 9 we need to write code thats easier to test and compatible with modern design techniques used in BIND 10. We have a unit test framework now. And we use it! We're working on writing testable functions and reasonably sized functions. (discussion of code copying and problems therein)No more 5,000 line functions.
    1902 
    1903 Shane: BIND 9 also has a lot of functions with 15 parameters
    1904 
    1905 Michael: actually I think its about 8. The problem is you pass them in almost every context and that makes it bigger
    1906 
    1907 Shane: I don't understand the directory structure.
    1908 
    1909 Michael: libdns is a supporting library for named. There are a lot of things in libdns specific to named and vice versa
    1910 
    1911 Shane: lets talk about the logger in particular
    1912 
    1913 Stephen: we're talking about how to share code. Thats a goal, to make an independent library both projects can use.
    1914 
    1915 Jelte: the "real" libbind. If we have tools that work with either project, it should be a separately distributable thing.
    1916 
    1917 Michael: not distribute but treat separately.
    1918 
    1919 Jelte: I mean package.
    1920 
    1921 Stephen: say you want to release BIND 9, there is a formal internal release of the library, and its separate.
    1922 
    1923 Michael: we kindof have this issue with DHCP already.
    1924 
    1925 Larissa: maybe DHCP could use this library instead of libDNS which makes a mess.
    1926 
    1927 Shane: and we can optimize things in one place.
    1928 
    1929 Michael: someone has to change, but i dont care who. maybe easier for bind 10 because it has tests and because most C programs are valid in C++ but not the other way.
    1930 
    1931 Jeremy: BIND 9 has coe thats compiled, and built, but no paths ever use it. Like logging from source. Bob Halley told me nothing uses it. I found that easily.
    1932 
    1933 Michael: I've considered writing a script that changes the names of the functions and then if it compiles, nothing is using it, and we can clean it out. We add functionality but we don't remove it.
    1934 
    1935 Discussion of issues with shared libraries.
    1936 
    1937 Shane: Michael, tell us the release schedule plans.
    1938 
    1939 Michael: we're releasing a feature version about every 6 months, and maintenance releases between quarterly and monthly, depending on whats going on.
    1940 
    1941 Shane: all of our bug tickets are currently private, in bIND 9, right?
    1942 
    1943 Michael: yes. Working on this.
    1944 
    1945 Shane: and its all in RT?
    1946 
    1947 Michael and Larissa: yes, there are two instances, so support manages a case a customer logs, and the customer can see it, but then if it becomes a bug, it goes to the bugs instance, which is closed to ISC people only. RT is almost too powerful. An example is our review process. It is in the bug queue, moves to the review queue, then the notes queue, then the resolved queue. But it looks like the guys didnt finish work because things never just close. Also Dan has gone to RT training now, and he has ideas about how to fix it.
    1948 
    1949 Shane: we discussed this at all hands, and Barry mentioned that you do want to decouple ticket handling from bugs.
    1950 
    1951 Michael: I'm not worried about trac.
    1952 
    1953 We can put a trac ticket item link into the support instance.
    1954 
    1955 Shane: you also have an alpha, beta, release candidate model for major releases.
    1956 
    1957 Michael: we have an obligation to the forum, for advance code release at each point. This impacts our schedule. Alpha is something we have for .0s betas and RCs for everything.
    1958 
    1959 Shane: are there fixed times?
    1960 
    1961 Larissa: no, but there should be. its an attempt to build community testing but it fails.
    1962 
    1963 Michael: people ignore everything until the .0 and then they send bugs.
    1964 
    1965 Stephen: and you dont change after beta
    1966 
    1967 Michael: our rules: alpha establishes syntax. Beta is bugfixes but the feature set is locked down. RC1 is critical bug fixes and docs, and the final only has docs changes.
    1968 
    1969 Michael: some things that didn't work well. I wanted to start putting features in point releases. Lots of projects do it. But it was a disaster for us. We can't do it, it confused everyone. the other thing that didnt work well was setting a fixed release date. What they really wanted was release on this day, except if there are bugs, and well, don't take my features out.
    1970 
    1971 Discussion of the forum model and its issues and open source etc.
    1972 
    1973 Stephen: we do have to be careful about the copyright for patches etc.
    1974 
    1975 Michael: lets not go into legal issues
    1976 
    1977 Stephen: if something is a release candidate, make sure its a real release candidate, and not a buggy version you put out because your release said.
    1978 
    1979 Michael: agile has helped wit this, we know sooner when a feature will be too buggy and not ready in time. We let release dates slip, but if they slip because of poor planning, we need to fix planning, if they come in late bugs, we need to fix the schedule.
    1980 
    1981 Larissa: we're going to have beta programs across the board too
    1982 
    1983 Jeremy: and we claim ops tests our software but its not that effective
    1984 
    1985 Michael: they compile it (which is a good test) and they run it for a bit, at least a weekend. IS that real testing. It does show that someone could install this.
    1986 
    1987 Larissa: Jim has indicated he would like to improve this.
    1988 
    1989 Michael: we need to give them a specific checklist.
    1990 
    1991 Jeremy: BIND 10 has the same problem. You'll probably notice my bursts of bug submission, its because suddenly I'm using stuff or new stuff.
    1992 
    1993 Larissa: we need to treat ops a bit like a beta test person. Specific instructions.
    1994 
    1995 Jeremy: BIND 9 sometimes gets bad press on security issues. You know, there was a long period of no security bugs. Do we know what happened?
    1996 
    1997 Michael: DNSSEC. In 1994 we wrote DNSSEC. in 1995 we rewrote it the spec changed. in 2004 we rewrote DNSSEC because the spec changed. We didn't introduce it per se. Now, in 2011, people are using DNSSEC. All of a sudden, here are the bugs. It was written poorly, it had no tests. Rob warned us about this.
    1998 
    1999 Jelte: DNSSEC is so new, its logical that its in this state.
    2000 
    2001 Michael: and we don;t get yelled at for this. People understand. But also one person's little bug is actually a giant security hole.
    2002 
    2003 Jeremy: so in hindsight unit tests and functionality tests might help.
    2004 
    2005 Michael: the projects compete for resources. BIND 10 had money but BIND 9 didn't, and we had to shuffle people around because we didn't want to lay off, and we are still suffering from this. In any case, I'm looking for BIND 9 developers, if anyone is looking! Especially someone who can do Windows *and* UNIX
    2006 
    2007 Shane: lets talk abotu how to organize shared efforts.
    2008 
    2009 Stephen: maybe logging is a good first option
    2010 
    2011 Jelte: ideally you could have a shared scrum thing for the shared project
    2012 
    2013 Michael: or a "prisoner exchange" where developers trade for a sprint or a few sprints or something.
    2014 
    2015 Larissa: I would want to have people on sprints, and probably more than one in a row, for coherency. Mike Cohn advised on this.
    2016 
    2017 Michael: maybe pair programming is the solution here.
    2018 
    2019 Shane: hmmm so we put one BIND 10 person on logging paired with one BIND 9 person, on logging, together.
    2020 
    2021 Larissa: I want to also figure out how we share the whole culture not just the code, so we need to figure tht out.
    2022 
    2023 Michael: one last word: when you go to develop thigns, please consider the bIND 9 code, and why you did things. Please.
    2024 
    2025 Shane: we are thinking about crypto libraries. what does bind 9 do? OpenSSL?
    2026 
    2027 Michael: also, please, tell us, when you find a BIND 9 bug?
    2028 
    2029 Jinmei: I often refer to BIND 9, to import logic, and I do report bugs I find.
    2030 
    2031 Jeremy: I think this should be a blog article
    2032 
    2033 
    2034 '''Confidential Work for Security (Jeremy Reed)''' (Medium)
    2035 
    2036 We need a procedure for privately using git and our discussions for security issues (such as #80).
    2037 
    2038 Jeremy: Aaaright. We need a way for the customer to contact us if they have an issue. And they might need a private way. Phone or an alias.
    2039 
    2040  * We need a secure email method (Securityofficer@)
    2041  * We need an obvious way to mark a ticket confidential (Jeremy needs to note it still works)
    2042  * We need a wiki page on how to do this (and report problems)
    2043  * Form that goes to the securityofficer list?
    2044  * Maybe we should default to the secured method of submission
    2045  * Michael: maybe we can do a threestate toggle
    2046  * we should always *ask* the customer before we mark something insecure after they mark it secure.
    2047  * Decision: the best solution is a pulldown box that defaults blank, with yes no or not set. (do not display not set tickets until we review)
    2048  * We need a human to respond when someone submits a security issue (bug triage)
    2049  * if the issue comes to securityofficer@, that person creates a ticket and then comments to the submitter.
    2050  * Quick evaluate the issue - run a CVSS check - determine approximate severity and work estimate
    2051  * move discussion to a private email list
    2052  * determine if the issue is in the wild or not - type 1 vs type 2
    2053  * if the issue came in over an open list, assume it is in the wild
    2054  * contact reporter, inform them that we think its a security issue, ask them to refrain from discussion, and offer them a credit in the CVE if desired
    2055  * determine schedule for phased security notification
    2056  * we need a private git repository for security specific branches.
    2057  * need filter and git commands to keep repository secure
    2058  * need to ensure all bind10-team@ list only has core developers who are (or their organizations arE) under NDA
    2059  * we need to use a password or invitation only jabber room for security issues
    2060  * beyond these things process sticks as closely to existing security process as possible
    2061  * by the end of the next sprint (April 15th) policy and git changes are established and in the second half of April a test security event will be rehearsed.
    2062 
    2063 Note: we also need to redesign our front page to make it clear how to report issues and security issues (and in general, redesign)
    2064 
    2065 '''Writing Down What a DNS Server Is''' (Medium)
    2066 
    2067 Several team members feel that it is important to document what a DNS server is so that we can be sure we have built it. We need to discuss what exactly the goal of this activity would be and how we can achieve it. This is to create a plan for how we will document, not to actually document it.
    2068 
    2069 '''Scheduling Team Calls & Suchlike''' (Short)
    2070 
    2071 ''Once we have decided how our team(s) will be organized, we should probably take a moment to review our regular meetings/calls.''
    2072 
    2073 Shane: we decided earlier this week that we're abandoning the A and R team split at least for now. We have three regularly scheduled calls now:
    2074 
    2075  * daily call
    2076  * team call every two weeks
    2077  * R-team planning
    2078  * A-team planning
    2079 
    2080 We still need the daily call. The time it is now is 08:00 UTC. This is a good time for Europe (9 and 10 am) and Beijing (4pm) and Tokyo (5pm) but a poor time for North Americans. Jinmei will call in on a best effort basis. Larissa and Jeremy are not expected to call. Larissa, Shane, and Jeremy plan to meet a few days a week at 6:30am Pacific (8:30 Central American and 15:30 Central European)
    2081 
    2082 We need to set up the sprint planning call and the staff call. We will continue the idea of one week sprint planning one week team call. We are also looking into using the team meeting time for scrum style demos and retrospectives/reviews.
    2083 
    2084 We need to keep the meeting at the same or a similar time to what we have now. We acknowledge that this is a rough time of night for Asian colleagues. We also need to mind the date line factor. We will leave the time as it is for now. Which day of the week is good?
    2085 
    2086 Michal: Tuesday remains good.
    2087 
    2088 Shane: Tuesday remains good.
    2089 
    2090 Stephen: Tuesday is good
    2091 
    2092 Larissa: Tuesday is good
    2093 
    2094 Jinmei: I am worried the combined sprint planning will be a very long call.
    2095 
    2096 Shane: maybe we reserve the same time on Wednesdays in case we need it.
    2097 
    2098 Stephen: also after two hours people tail off
    2099 
    2100 Michael: In BIND 9 we now do breakdown tuesday and estimation thursday
    2101 
    2102 Stephen: also more than 90 minutes, is really going to be hard on the Asians
    2103 
    2104 Shane: developers, how do you feel about sometimes having a second call in the same week?
    2105 
    2106 Jinmei: I don't think I mind.
    2107 
    2108 Shane and Larissa: and our current round of advanced planning will fall apart around June/July this time
    2109 
    2110 Jelte: and we had a lot of clarity on tasks in the last meeting
    2111 
    2112 Stephen: how many releases per year? Would it be worthwhile breaking up in to 18 weeks so every three sprints we have distinct goals?
    2113 
    2114 Larissa: so quarterly deadlines for feature sets?
    2115 
    2116 Stephen: every four months.
    2117 
    2118 Shane: to get back to the planning issue, my proposal remains that we have an optional meeting on wednesday or thursday.
    2119 
    2120 Stephen: a lot of time is taken up with estimating. you can actually start a task without an estimate when necessary. how do people feel the email estimating went? People sent their estimates via email, and I took a consensus value and we accepted it without further discussion, and we only discuss when opinions diverge wildly.
    2121 
    2122 Shane: Likun, how do you feel about that?
    2123 
    2124 Likun: its okay, sometimes if I'm not clear on the task I can then find out more independently
    2125 
    2126 Jinmei: I'm basically negative on email estimation, people forget, it tends to introduce delay in the timeframe of a two week sprint that is significant. If we are going to a compromise I'd rather go more aggressive, like someone who is picking up the task just does the estimate.
    2127 
    2128 Stephen: what I get in email is usually relatively close in size. Its only when I get a large disparity that we need the discussion. The difference between a 1 and a 2 comes out in the noise.
    2129 
    2130 Jelte: doing it in email does eliminate discussion thats not necessary, but I agree that it introduces delay, and that people forget.
    2131 
    2132 Discssion of estimation and sprint practices and whether the email thing would work.
    2133 
    2134 Shane: how do JPRS and CNNIC feel about an overflow meeting on wednesday or thursday if we need to?
    2135 
    2136 Jinmei: I am not sure its an "if" I suspect we always will need it.
    2137 
    2138 Fujiara: It is okay.
    2139 
    2140 Likun: if there is no other solution we will survive it.
    2141 
    2142 Larissa: if we start half an hour earlier and allow two hours, that might help?
    2143 
    2144 Michal: yes?
    2145 
    2146 Larissa: How would that be for you Jinmei?
    2147 
    2148 Jinmei: I guess it is okay. Maybe not in standard time, but that is a long way away.
    2149 
    2150 Shane: what if the second sprint planning call every other week was at night for europe, afternoon for california, morning in asia?
    2151 
    2152 Jelte: if its Wednesday, thats fine.
    2153 
    2154 Stephen: I'm fine with that.
    2155 
    2156 Shane: okay. One proposal is the tuesday call is always 15:00 UTC once a week. when we need a second sprint planning call, it would be at 23:00 UTC wednesdays, which means 8 am Thursdays for China and 9am for Japan.
    2157 
    2158 Shane: if we do am am call for Asia and factor in the time that Kambe and Aharen san are traveling to work in the mornigns, the meeting would be 3am for europeans. Maybe what we should do is steal time from the standup calls.
    2159 
    2160 Larissa: personally I am okay with missing the estimating.
    2161 
    2162 Shane: okay so task estimation could happen in slightly extended daily scrum calls.
    2163 
    2164 Michael: so proposal: task breakdown at scrum planning call, then emails for estimation, then discrepancies discussed on the daily sprints.
    2165 
    2166 Larissa: yes
    2167 
    2168 Shane: and start sprint planning at 14:30 UTC.
    2169 
    2170 '''Things for End of Each Sprint''' (Short)
    2171 
    2172 ''We are missing a couple of things from the end of each Scrum now. We don't do a true retrospective, and we do not do demos. We have been doing Scrum long enough that it may be time to adopt these practices.''
    2173 
    2174 Demos: at the end of the sprint, Shane can ask one or two developers to come up with a demo for their new stuff. We would do that at the next team meeting. Demos would last 15 minutes. After a few rounds of this, we will start including customers and users in the demonstration. In general we might allow specific customers/users to attend the "internal" demos, but it may depend. We would probably invite close outside colleagues we know well.
    2175 
    2176 Reviews: at the 6 week release point we will review all features against definition of of done.
    2177 
    2178 Retrospectives: Stephen will call for a stop-keep-start style retrospective at the beginning of each sprint planning session. Shane will send a remindner email the day before, about the retrospective.
    2179 
    2180 
    2181 '''Unification of in-memory and SQLite Back-ends''' (Medium)
    2182 
    2183 Michal notes:
    2184 
    2185    Some unification of in-memory & sqlite3. Or should this be handled on the ML
    2186    rather? Because this would probably include little homework to look trough
    2187    both the APIs to be able to talk about it.
    2188 
    2189 Michal: we have a base class for the datasource, and we have SQLite based on that, and we have another base class, and inmemory based on that, and this misses the point of having the abstract base class. So I think we should look at them and unify it.
    2190 
    2191 Larissa: will this help us to have a shared API for datasources?
    2192 
    2193 Others: yes
    2194 
    2195 Shane: how did we come to be in such a place?
    2196 
    2197 Michal: well, the base class was created, and the SQLbackend was in mind, but its a little bit specialized.
    2198 
    2199 All: it was all because we needed to do the inmemory structure quickly.
    2200 
    2201 Michal: I don't think either is what we want. We need to modify both a little bit. I think we could then get to the point that we find what we want in the end by merging them.
    2202 
    2203 Stephen: it could be one task, to merge them?
    2204 
    2205 Michal and Jinmei: three or four.
    2206 
    2207 Decision: While our inmemory datasource will support DNSSEC, our API for datasources needs to allow databases that do not support DNSSEC to integrate with BIND 10.
    2208 
    2209 (see task list)
    2210 
    2211 '''Lack of Users''' (Short)
    2212 
    2213 Michal notes:
    2214 
    2215    I also worry little bit about the fact that, in contrary to the fact
    2216    that the software is generally buggy, we get really few bug reports, emails,
    2217    complains. We should have a situation when we release a tarball, we get ten
    2218    people hammering onto the door of jabber room demanding it's fixed. Also,
    2219    it's two years already, but we still don't have anything that could be really
    2220    used, though it's already planned I guess. But I'm not sure there's anything
    2221    to talk about here.
    2222 
    2223 Shane: we actually only got tarballs 12 months ago and have been actively recommending against production, so that is probably part of the problem. I think now though we should be telling people they should run it, in a specific limited capacity.
    2224 
    2225 Jinmei: I want to encourage people to play with it, but probably there is currently no reason for people to play with it, because its slower and missing many features.
    2226 
    2227 Jeremy: we don't want to give a poor impression?
    2228 
    2229 Shane: the analogy I've been recently giving is to Mozilla .6, when it was slower, and crashed all the time, and didn't do what you wanted, but the potential was there. eventually, it got to the point around .9, where it would finally render some sites better than netscape.
    2230 
    2231 Jinmei: if, for example there is a website that can be better with Mozilla, that can be a reason. My point is we don't have that.
    2232 
    2233 Stephen: so, is there anything we can add that BIND 9 doesn't have? that will get people to try?
    2234 
    2235 Shane: we have the SQL.
    2236 
    2237 Stephen: should we make a bigger play?
    2238 
    2239 Larissa: we do have some plans to start telling people to try it, the webinar, demos, beta program, and blogs, are all oriented toward that.
    2240 
    2241 Shane: we can show the simple demo, because right now, (once we fix the unindexed query bug) we could say look, we start in two seconds for a zone with a million records. That would be sexy.
    2242 
    2243 Larissa: and we need a cool thing to do with a user story for every release.
    2244 
    2245 Shane: so next after this one is TSIG, then configuring the BOSS for the release after that.
    2246 
    2247 Jeremy: we need to make sure Ops really runs it and that we point people to it. We could also possibly run a public resolver that could take a beating.
    2248 
    2249 Shane: is there a problem with that?
    2250 
    2251 Michael: I want to do that in BIND 9.9 if you're confident that your resolver will hold up to high load with DoS attacks, go for it
    2252 
    2253 Jelte: not yet!
    2254 
    2255 Shane: if we put it on the bind 10 dev list and wiki for a while before putting it to say, isc.org and bind-users, that could work.
    2256 
    2257 Jeremy: the bind10 box has been running the iterator without crashing since March 17th.
    2258 
    2259 Shane: do we have statisitcs for the resolver yet?
    2260 
    2261 Jeremy: no but I can find some information with verbose logging (which we have)
    2262 
    2263 Jelte: sometimes it gives up too fast.
    2264 
    2265 Larissa and Jinmei: so the three things to get this going:
    2266  * make sure there is sexy "geeky dns catnip" in each release (ie speedy load of large zones, TSIG, BOSS configuration)
    2267  * communicate increasingly abotu BIND 10 with webinars, demos, blogs, events
    2268  * demonstrate the stability etc of the server by getting it into limited use with ISC ops, beta programs, etc.
    2269 
    2270 Of course, we want to be cautious. We don't want to increase users faster than we can keep up with new features, bug fixes, etc. Its a delicate balance.
    2271 
    2272 
    2273 
    2274 
    2275 
    2276 
    2277 '''Blogging'''
    2278 
    2279 We agreed that BIND 10 will be doing a blog per month - we will schedule one as a sprint task every other sprint. Larissa will enforce this.
    2280 
    2281 Topics we didn't quite get to:
    2282 
    2283  * '''API/ABI Versioning'''
    2284 
    2285 
    2286  * '''How to benefit from Multi-core/processor'''(TBD)
    2287 
    2288 There was a discussion on the dev list before:
    2289 https://lists.isc.org/pipermail/bind10-dev/2010-December/001738.html
    2290 but there didn't seem to be a clear conclusion.
    2291 
    2292 
    2293  * '''msgq Replacement''' (Medium)
    2294 
    2295 It may be time to consider using something other than our own, hand-crafted message bus. We need to worry about portability, increased dependencies, ease of use, reliability, feature set, and so on. Plus at least sketch out a plan for selecting and adopting such technology.
    2296 
    2297  * '''External Tester Program''' (Medium)
    2298 
    2299 Larissa, Jeremy, and Shane have worked to outline how we may work with external testers. This may be interesting for everyone on the project.