wiki:RTeamSprintMinutes20101019

Attendees

Pigs

Jelte
Stephen
Jeremy
Likun
Ocean
Shane
Larissa
Aharen
Michal

Chickens

Shawn
Evan

Review Feature Backlog

[ Refer to spreadsheet send to list ]

Evan reports on forwarding work he has done.

Evan: Did a simple forwarder that passes along a DNS query to an upstream name server and passes back an answer. Major refactor of asio_link file that was in auth, which implemented the simple service of answering UDP/TCP packets. That is now in a library, and there is a UDP query feature, so that while answering a query from elsewhere it can initiate a query upstream, then get the answer back. Added a b10-recurs library (a misnomer, really b10-forwarder now), parallel to b10-auth (shares a lot of code). Reviewed about 1.5 weeks ago by Stephen. Trying to address comments... about 90% done. Today is day move over to BIND 9 full time.

Evan: Suggest someone else should pick up this job, respond to change requests from Stephen's review, and get it re-based and merged. That would take care of a chunk of the first step - the ability to send messages and receive answers.

Jelte: And also with NSAS, should be able to add simple algorithm to resolve queries? Is this full event based?

Evan: Yes it is full event based.

Evan: The code is cute. I used a co-routine model. Fairly easy to understand, should be easy to expand. Very simple model - gets query, sends up, waits for answer, sends answer back. With full resolver the event model is much more complicated. Will need to send multiple queries, need to re-issue to different servers, may have incomplete answer after queries, and so on. A lot more loops in a real resolver, but hopefully it can be expanded without too much more difficulty.

Stephen: I think the first task is to finish the I/O system.

Stephen: What other features?

Shane: What about NSAS?

Stephen: Doable in the sprint. Tricky bit is sorting out the interface with all the I/O's.

Larissa: Because it's only doing one query at a time (does not address feature #2)... or does it?

Evan: Sends a query, not to the name server for a given zone. It is not timing out.

Jelte: If you ask the same thing twice, it should not ask same question twice.

Evan: Not done yet.

Larissa: Suggest that dispatch plus NSAS is first sprint.

Jelte: Suggest focus on getting general framework in first.

Stephen: Question: is one of the features we want to control where log output goes to. Should we work on that so we don't have to go back through the code and re-work all of our messages? (Item 29: "As an administrator, I want to be able to control where log output goes to")

Jeremy: I would say "yes", because we will need this for troubleshooting.

Larissa: You think this is important at the very beginning because it will be useful for testing.

Stephen: Maybe we can just define an API for this.

Jelte: Which team should do this?

[ All agree this belongs to both teams. ]

Larissa: I suggest we should add a much smaller story about the API, and put that on this sprint list.

Evan: Certain things we know a logging API will need. Message, channel, priority, ... Without defining the API, we can define a no-op logging operation.

Stephen: Unless we actually define the API now. We don't have to build it. The earlier we start it, the less code we need to rework.

Evan: If I had put a comment everywhere I thought of logging, it would have been helpful.

Larissa: The reason the stories are high level is exactly this. I would add one, and make it a priority for this sprint.


Summary:

  • NSAS and dispatch for this release
  • Get the software to do something early on
  • Want logging so we can follow the recursion process

Shane: Intention was for A-Team to do "Anything" that is not recursion.

Jelte: At the very first start until we have global components we can build on, so not *everyone* will be building. So we need something for the others to do.

Evan: Also the authoritative server needs significant refactoring. Event driven now, so asio-link piece calls something in the calling daemon.

Jelte: We don't have a receptionist kind of thing. Not necessary for early work.

Michal: We can run on a different port for the start.

Evan: Conclusion was that we would have an auth-only version, but most people would want a recursive version that can also handle authoritative stuff.

Stephen: Things like that are out of scope for this sprint.

Jelte: My idea was control over which processes run in the boss process.

Evan: Right now it runs recurse if you specify a forward address, otherwise runs boss.

Stephen: General recursor logic.

Evan: "General framework" is having a recursive framework that can be run by BIND 10. "dispatch" can send queries and get answers back. "demux" to correlate (address, port, id) tuple back to specific queries.

Stephen: Dispatch, general framework, demux, recursive logic, NSAS.

  • Dispatch
  • general framework
  • demux
  • NSAS
  • recursive logic

Jeremy: Are we planning on planning the DNSSEC validation steps after the fact, or abstract them now? For example, we can check the RRSIG... don't do any chain of trust, but do [start date, end date] validation. I'm wondering what we can look at now.

Shane: I was told the cache & validation are to be built together.

Stephen: My feeling is that we want to be stubbing stuff in. Otherwise we will go down a bunch of rabbit holes chasing DNSSEC problems. If we can get a recursor built with stubs in we will show more progress.

Evan: I think you're probably right.

Stephen: We will now break these into tasks.

Jelte: It's a bit much.

Larissa: Yes, but as we break it down into tasks we can figure out which tasks to break into the next sprint.

Task Breakdown

Larissa: Need to convert this into tasks.

Stephen: Shane can note everything down, then we can convert into a list of tasks tomorrow.

Dispatch

Write down an outline design.

Evan: Done isn't it?

Stephen: Then we just need to point to the appropriate page.

Shane: Doesn't need to be low-level design.

Stephen: I found it helpful to think about it; the main data structures. Helpful so people can look at it and know how it works.


  • Timeout estimate: 1w
  • Re-Factoring to make more reusable (Comment in Trac ticket) library with services for DNS servers estimate: 3d to move it, 2d to massage it (verify tests are in the right place, and so on)
  • I/O service implementation constructor to be more generalized (Comment in Trac ticket) estimate: 0.5d
  • Refactoring of auth & recursive tests (moving into common file) (going through tests in auth & recurse, making a single .cc file that can be used by both) estimate: 1d
  • Add logging (said would add notes where events should be logged) estimate: 2d
  • Review of all tasks together estimate: 1d
  • Review of Evan's existing work (Trac 327) estimate: 2h

General Framework

  • Configuration to determine whether to run recursive or auth binary estimate: 3d
  • Definitions of various parameters for recursive server & addition to configuration database (currently only forward name server) estimate: 1d

Stephen: How much tuning by configuration vs. rebuild binaries? For example size of hash tables.

Shane: Michael's approach is 'pick a sane value, and if you ever need to change it think about making it configurable'

Jelte: If you can make it configurable, then it should be configurable dynamically.

Shane: I agree. (That is, no recompilation.)

Michal: Probably off-topic, but can't the code pick a size and resize based on need?

Stephen: Yes, but that's additional code.

x Built in root zone, and way to ask for more root zone (priming queries) - Evan suggests this is part of general logic.

demux

Data structure that allows the server to rapidly identify which co-routine initiated the query.

Tuple:

  NAME
  RRCLASS
  RRTYPE
  QID 
  Destination port (where response is going to)
  Source port (maybe... but always port 53)

Michal: Can we borrow?

Evan: The current model for the server is to use a co-routine concept. Every time a query comes in it allocates a certain amount of memory. (This is a generic problem with asynchronous programming - things fall out of scope and memory goes away.) Right now using heap allocation, which is inefficient. Better to pre-allocate structures, put on free list, and grab when packets come in. When done put back on the free list.

Evan: That's a performance enhancement, but can combine with attaching this to the demux data structure. When a query comes in we would be able to quickly identify which query state object is associated with this response and hand the query state object back to server as if it had not stopped.

Shane: So we can't borrow the BIND 9 code?

Evan: No. I drew a diagram, talked about it with Stephen for an hour or so...

Stephen: I can e-mail the photograph of the diagram.

  • Design phase 1 (Evan brain dump) estimate: 1d
  • Design phase 2 estimate: 1d

NSAS

Name Server Address Store

Stephen: I've checked in some of the basic classes.

Fundamental data structures are there.

  • Include logic to handle NS queries. estimate: 1w
  • Include logic to update RTT. estimate: 1d
  • Address selection logic. estimate: 3d
  • RTT banding. estimate: 2d
  • Overall logic. estimate: 1w

Jeremy: Does the address store cache other information like DS records that correspond with NS records?

Stephen: Initial intention is not. Just to handle address of name servers.

Shane: Eventually store things like EDNS(0)?

Stephen: I had not envisioned that.

Shane: I think we'll probably want to extend it at some point.

recursive logic

[ To be done later ]

Evan: Note that cache stubs and DNSSEC API stubs need to be considered.

Evan: Breaking the recursive server down is almost the same as writing it!

Evan: The existing data source model is that we call a function and it provides a complete answer. That needs to change, because we are going to have a model where we get a partial answer and need to go someplace to get the answer completed. Need to have a way to have an object with a partial answer that you can add enough partial answers to get a full answer.

Estimation

(inserted above)

Task Selection

Stephen: Coming to end of meeting. Propose Stephen & Larissa will extract tasks. Short call for task selection?

Larissa: Everyone doing coding is on Eastern hemisphere call. Can we do that tomorrow?

Stephen: Can spend next half hour doing that. Does that sound okay?

Closing

Stephen: Anything else?

Michal: Will choosing of task work like, everyone choosing task until gone, or...?

Stephen: We'll sort that out tomorrow. We're talking about a 2-week sprint. Bear in mind this is first pass of estimating. Almost certainly all numbers are underestimates.

Larissa: Thanks for working with the new process.

Michal: If I am not sure I can do any of this without help...? Can I ask someone to help or?

Stephen: There will be people on line to help. Don't worry about that... we're all feeling our way through this. Okay, that's it then.

Last modified 7 years ago Last modified on Oct 19, 2010, 4:31:09 PM