wiki:April2013DhcpPerformanceMeasurements

DHCP Performance Measurements

Results to date: available on internal wiki, see DhcpPerfromance? page (no direct link to internal wikis on external sites).

Performance measured on HP with smart disk array, this affecting how the system did disk write flush. We got around 1000 leases/sec for dhcp4. We expect radically worse performance for regular disks.

Memfile backend (Kea) had 8000 leases/sec. CPU usage was below 50%. For MySQL backend (Kea) and got the results around 1000 leases/sec for both DHCPv4 and DHCPv6.

dhcp4 crashed when configured with /8 pool (16 million leases). The perfromance of dhcp4 server is strongly correlated to size of the pool size.

DHCPv6 in dhcp4 is performing much better 1800 leases/sec.

Profiling done using valgrind tool. Valgrind offers many tools. One of them is callgrind. Running test under callgrind slows everything 10 fold, but it gives exact call information, showing up bottlenecks in the code. This is how Marcin found the issue in Kea that did (unnecessary) checks if option format definition (something we specify in the config) is valid every time incoming packet is processed. There's a bottleneck in dhcp4's lease_enqueue() function that just iterates linearly over the list.

One outstanding thing is to repeat the measurements on a regular machine (without smart disk array).

We had a discussion about repeating the tests on a regular (no smart disk array) PC. This task was estimated to take up around one week. Marcin spent 2 weeks.

Desired test extensions:

  • We should be able to get information from a customer who did fair amount of work on performance testing.
  • Shorten lease time, so we can experience lease expiration and lease reuse.
  • Traffic model that better represents reality - mix of new clients, renewing, releasing and expiring.
  • "waves" observed in dhcp4 - periods of high expiration processing time intervened with periods of high packet processing time.
  • Tom: we can run multiple instances of perfdhcp on several machines (sending queries to the same interface OR to different interface)
  • We'll get IXIA box. Jeff: We should see what it offers regarding DHCP. We got XT80v2 it has 8 gigabit interfaces. It does DHCP performance testing (I think). We also get IxANVL. IXIA can be used to simulate multiple clients. IXIA does response verification.
  • v4 failover testing: add a spare server and configure the server under test to do updates its failover partner. Repeat with failover enabled and disabled to get failover impact on performance.
  • develop tests that exercise different code paths: request existing pools, get lease with known hw addr: getLease(address), getLease(hwaddr, subnetid), getLease(clientaddr, subnetid)

How performance tests are run:

  • Marcin: set jenkins test to run over 3x measured performance (e.g. current performance is ~1000 leases/sec, so run at 3000leases/sec).
  • there's a performance plugin in Jenkins.
  • proposal: select a small subset of tests to be automated in jenkins
  • Shawn: response time should be measured. Send 10000 queries and see what is the response time.

Questions:

  • What is the minimum performance that would be satisfactory? Obviously, the higher the better, but the question is about bare minimum that is sufficient.

Performance improvements in the code:

  • Kea: migrate to binary format for storing data.
  • IOAddress has a constructor that takes text representation. (IOAddress::fromBytes takes binary, converts to text, then passed text to constructor which converts back to binary).
  • Merge several queries: getLease(address), getLease(hwaddr, subnetid), getLease(clientaddr, subnetid) into one stored procedure
  • Try to merge getLease(address), getLease(hwaddr, subnetid), getLease(clientaddr, subnetid) into one SELECT lease where address=foo OR (hwaddr=bar AND subnetid=baz) OR (...)
  • use explain to investigate how the queries are run
Last modified 5 years ago Last modified on Apr 18, 2013, 7:01:25 PM