Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#3601 closed task (complete)

Script to update "memfile" file

Reported by: stephen Owned by: tmark
Priority: medium Milestone: Kea1.0-beta
Component: database-backend Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

See the description for #3599. This ticket applies to the updating of a memfile CSV file.

Subtickets

Change History (16)

comment:1 Changed 3 years ago by tomek

  • Owner set to tomek
  • Status changed from new to reviewing

This is being covered by #3555 - the code is able to read both old and new format, so no upgrade script is needed.

Moving this to review. Will resolve once #3555 is done.

comment:2 Changed 3 years ago by tomek

  • Status changed from reviewing to accepted

comment:3 Changed 3 years ago by hschempf

  • Milestone changed from Kea0.9.1 to Kea0.9.2

comment:4 Changed 3 years ago by tomek

  • Owner changed from tomek to UnAssigned
  • Status changed from accepted to assigned

comment:5 Changed 3 years ago by hschempf

  • Milestone changed from Kea0.9.2 to Kea1.0

comment:6 Changed 2 years ago by tmark

  • Owner changed from UnAssigned to tmark

comment:7 Changed 2 years ago by tmark

  • Owner changed from tmark to unassigned
  • Status changed from assigned to reviewing

After a fair amount of discussion it was decided to build in the ability to convert Memfile lease files on the fly rather than build a separate tool or script for doing so.

The approach taken is relatively straight forward:

  1. A new class, VersionedCSVFile which is derived from CVSFile was created. This class

contains metadata for each column in the file including name, version, and default value.
VersionedCSVFiles automatically convert data read from earlier schemas into the most
recent schema as the rows of the file are read in.

  1. CSVFileLeaseFile4 and CSVLeaseFile6 were changed to derive from VersionedCSVFile.
  1. Memefile_LeaseMgr now launches an LFC immediately after loading the lease files if

any of the files loaded need upgrading. This is done regardless of whether or not LFC
is enabled. This ensures that all new lease data is written in the current schema, while
any prior lease data gets converted and written out.

  1. Admin guide was amended to explain this process and to inform users that they may run

LFC manually prior to server startup if they wish to convert the files themselves.

We may wish to consider extending kea-admin to run LFC for purposes of updating lease files. If so, this should be addressed under a new ticket.

I would propose the following ChangeLog

1xxx.   [func]      tmark
    Upon startup Kea servers will now detect memfile lease files
    that need upgrading, and will launch in instance of the LFC 
    to convert them to the most current memfile schema version.
    (Trac #3601, git TBD)

Ticket is ready for review.

comment:8 Changed 2 years ago by tmark

  • Owner changed from unassigned to Unassigned

comment:9 Changed 2 years ago by stephen

  • Owner changed from Unassigned to stephen

comment:10 Changed 2 years ago by stephen

  • Owner changed from stephen to tmark

Reviewed commits a54b11073030d0df9e3b3d530424623b740ae259 through 91a9427f7feaadb4b18bcd2c151918ae3e0510b6

doc/guide/admin.xml
Line 157: s/from earlier/from an earlier/
Lines 164/5: s/yourself, see/yourself. See/

src/lib/dhcpsrv/csv_lease_file6.cc
initColumns: For completeness, shouldn't a default value be set for the "hwaddr" column (as the minimum valid column is the one before it, "hostname")?

src/lib/dhcpsrv/dhcpsrv_messages.mes
DHCPSRV_MEMFILE_NEEDS_UPGRADING and DHCPSRV_MEMFILE_UPGRADING_LEASE_FILES - the first word of the message text (the string after the message ID) should not start with a capital letter.

src/lib/dhcpsrv/memfile_lease_mgr.cc
LFCSetup::setup: as run_once_now is boolean, "! run_once_now" is preferred to "run_once_now == false".

loadLeasesfromFiles: the construct upgrade_needed |= value is used in four places. The operands are "bool", but "|=" is a bitwise operator. The construct upgrade_needed == upgrade_needed || value should be used instead.

src/lib/util/tests/versioned_csv_file_unittest.cc
A check should be added that VersionedCSVFile will read a file if the number of columns is correct (all tests are assuming an update).

Line 259: There are three defined columns, not two.

Line 314: typo s/existant/existent/

Line 366: I was confused for a bit - extend the comment to note that it is a V2.0 schema file with too many header row columns.

The compilation of this file failed on my Ubuntu system with the error:

versioned_csv_file_unittest.cc:166:167: error: converting 'false' to pointer type for
argument 1 of 'char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)'
[-Werror=conversion-null]

The reason is the statement at line 166:

EXPECT_EQ(false, csv->needsUpgrading());

This should be replaced with:

EXPECT_FALSE(csv->needsUpgrading());

src/lib/util/versioned_csv_file.cc
Line 118: typo - "physicall"

Line 131: A row has too few valid columns, not the size of a row.

Line 152 (validateHeader): If the header row has no columns, surely it is invalid?

src/lib/util/versioned_csv_file.h
VersionedCSVFile class header: I think that your "todo" - allowing the class to read a line containing too many columns - should be added here. Behaviour should be that if the row contains too many columns, the additional ones are ignored. (I'm thinking about someone who upgrades, has problems and reverse to the previous version of the software.)

ChangeLog
The proposed entry is OK.

comment:11 Changed 2 years ago by tmark

  • Owner changed from tmark to stephen

Replying to stephen:

Reviewed commits a54b11073030d0df9e3b3d530424623b740ae259 through 91a9427f7feaadb4b18bcd2c151918ae3e0510b6

doc/guide/admin.xml
Line 157: s/from earlier/from an earlier/
Lines 164/5: s/yourself, see/yourself. See/

Got it.

src/lib/dhcpsrv/csv_lease_file6.cc
initColumns: For completeness, shouldn't a default value be set for the "hwaddr" column (as the minimum valid column is the one before it, "hostname")?

The "default" default_value for a VersionedColumn? is "". "hwaddr" column in lease6 permitted to be left blank. Leaving
it blank is probably the safest option given that for v6 the server has to derive it. This would require an extension
to VersionedColumn? to allow a missing column's value to be calculated using a "calculation handler".

src/lib/dhcpsrv/dhcpsrv_messages.mes
DHCPSRV_MEMFILE_NEEDS_UPGRADING and DHCPSRV_MEMFILE_UPGRADING_LEASE_FILES - the first word of the message text (the string after the message ID) should not start with a capital letter.

Oops. Fixed.

src/lib/dhcpsrv/memfile_lease_mgr.cc
LFCSetup::setup: as run_once_now is boolean, "! run_once_now" is preferred to "run_once_now == false".

Right. Shawn prefers it be the other way for ISC_DHCP.

loadLeasesfromFiles: the construct upgrade_needed |= value is used in four places. The operands are "bool", but "|=" is a bitwise operator. The construct upgrade_needed == upgrade_needed || value should be used instead.

Right again. Fixed.

src/lib/util/tests/versioned_csv_file_unittest.cc
A check should be added that VersionedCSVFile will read a file if the number of columns is correct (all tests are assuming an update).

Added "currentSchemaVersion" test which verifies current schema'd file.

Line 259: There are three defined columns, not two.

Fixed.

Line 314: typo s/existant/existent/

Got it.

Line 366: I was confused for a bit - extend the comment to note that it is a V2.0 schema file with too many header row columns.

Cleaned this up a bit.

The compilation of this file failed on my Ubuntu system with the error:

versioned_csv_file_unittest.cc:166:167: error: converting 'false' to pointer type for
argument 1 of 'char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)'
[-Werror=conversion-null]

The reason is the statement at line 166:

EXPECT_EQ(false, csv->needsUpgrading());

This should be replaced with:

EXPECT_FALSE(csv->needsUpgrading());

Changed all such tests.

src/lib/util/versioned_csv_file.cc
Line 118: typo - "physicall"

Line 131: A row has too few valid columns, not the size of a row.

Line 152 (validateHeader): If the header row has no columns, surely it is invalid?

Actually, this test is if there are defined columns, which is a slightly different question. What would be the point of using a VersionedCSVFile if there is no schema required to validate the header.

As to whether or not an empty header row is valid, that is actually gated (or can be gated) by the user via setMinimumColumns().

I have added a throw if someone attempts to call validateHeader with no schema defined. Though the open() guards against this already.

src/lib/util/versioned_csv_file.h
VersionedCSVFile class header: I think that your "todo" - allowing the class to read a line containing too many columns - should be added here. Behaviour should be that if the row contains too many columns, the additional ones are ignored. (I'm thinking about someone who upgrades, has problems and reverse to the previous version of the software.)

I have implemented the ability to downgrade. Assuming the defined schema is a ssubset of the file's header, then all columns beyond the defined columns are dropped from the data row before it is returned to the calller.

In addition, the VersionedCSVFile now tracks the "state" of the input file's schema relative to the defined schema as CURRENT, NEEDS_UPGRADE, and
NEEDS_DOWNGRADE. This change ripples up, so MemfileLeaseMgr? will invoke LFC
automatically if any of the lease files are out of step (older or newer).

comment:12 Changed 2 years ago by stephen

  • Owner changed from stephen to tmark

Reviewed commit 91a4978e3045a547921f357ffea90bb48fca37c5

doc/guide/admin.xml
Line 162: typo - "vesion"

src/lib/dhcpsrv/dhcpsrv_messages.mes
DHCPSRV_MEMFILE_NEEDS_DOWNGRADING: Rather than "lease file: %s is beyond version %2" suggest "schema of lease file %1 is later than version %2".

For symmetry, the message for DHCPSRV_MEMFILE_NEEDS_UPGRADING could start "schema of lease file %1 is at version %2"

src/lib/dhcpsrv/tests/csv_lease_file4_unittest.cc
CSVLeaseFile4Test (and elsewhere): Minor comment. The code reads:

    boost::scoped_ptr<CSVLeaseFile4> lf(new CSVLeaseFile4(filename_));
    ASSERT_NO_THROW(lf->open());
    :

with a couple of "lf->whatever()" calls later on. Why not avoid the need for a pointer and just create the CSVLease4File object as an automatic variable, i.e.

    CSVLeaseFile4 lf(filename_));
    ASSERT_NO_THROW(lf.open());
    :

(The same goes for the equivalent LeaseFile6 test.) No need to change though, as it works as it is.

src/lib/util/csv_file.cc
CSVRow::trim: the use of vector::erase had me reaching for my STL manual as I wasn't sure that erase() would resize the vector as well. (The function std::remove has a similar effect - it removes the elements from a range - but it does not resize the container.) It is OK and works, but wouldn't

values_.resize(values_.size() - count);

... have been simpler?

src/lib/util/versioned_csv_file.h
VersionedCSVFile header comments: s/does contain match up/does not contain a match up/

The header reads:

/// After successfully opening a file,  rows are read one at a time via
/// @ref VersionedCSVFile::next().  Each data row is expected to have at least
/// the same number of columns as were found in the header. Any row which as
/// fewer values is discarded as invalid.  Similarly, any row which is found
/// to have more values than were found in the header is discarded as invalid.

... and then goes on to state what happens if the row contains more or fewer columns. I think that paragraph needs to mention that the action is conditional on the schema state.

getInputHeaderCount() needs a method header.

recreate() - as they always match, input_header_count_ could be set equal to valid_column_count_ rather than incurring another method call. (In fact, this is a case where a construct:

input_header_count_ = valid_column_count_ = getColumnCount();

... would be very clear.)

comment:13 Changed 2 years ago by tmark

  • Owner changed from tmark to stephen

Replying to stephen:

Reviewed commit 91a4978e3045a547921f357ffea90bb48fca37c5

doc/guide/admin.xml
Line 162: typo - "vesion"

Fixed.

src/lib/dhcpsrv/dhcpsrv_messages.mes
DHCPSRV_MEMFILE_NEEDS_DOWNGRADING: Rather than "lease file: %s is beyond version %2" suggest "schema of lease file %1 is later than version %2".

For symmetry, the message for DHCPSRV_MEMFILE_NEEDS_UPGRADING could start "schema of lease file %1 is at version %2"

That is cleaner. Well done.

src/lib/dhcpsrv/tests/csv_lease_file4_unittest.cc
CSVLeaseFile4Test (and elsewhere): Minor comment. The code reads:

    boost::scoped_ptr<CSVLeaseFile4> lf(new CSVLeaseFile4(filename_));
    ASSERT_NO_THROW(lf->open());
    :

with a couple of "lf->whatever()" calls later on. Why not avoid the need for a pointer and just create the CSVLease4File object as an automatic variable, i.e.

    CSVLeaseFile4 lf(filename_));
    ASSERT_NO_THROW(lf.open());
    :

(The same goes for the equivalent LeaseFile6 test.) No need to change though, as it works as it is.

Done. The scoped_ptr was the pattern used by the original author, and shamefully, I simply followed suit.

src/lib/util/csv_file.cc
CSVRow::trim: the use of vector::erase had me reaching for my STL manual as I wasn't sure that erase() would resize the vector as well. (The function std::remove has a similar effect - it removes the elements from a range - but it does not resize the container.) It is OK and works, but wouldn't

values_.resize(values_.size() - count);

... have been simpler?

Indeed it is simpler. Somehow I missed resize().

src/lib/util/versioned_csv_file.h
VersionedCSVFile header comments: s/does contain match up/does not contain a match up/

Fixed.

The header reads:

/// After successfully opening a file,  rows are read one at a time via
/// @ref VersionedCSVFile::next().  Each data row is expected to have at least
/// the same number of columns as were found in the header. Any row which as
/// fewer values is discarded as invalid.  Similarly, any row which is found
/// to have more values than were found in the header is discarded as invalid.

... and then goes on to state what happens if the row contains more or fewer columns. I think that paragraph needs to mention that the action is conditional on the schema state.

I went a step further. Your comment suggested to me that processing a row should actually look at the input schema state. It's certainly clearer this way.

getInputHeaderCount() needs a method header.

Oops. Fixed. I also moved it up above getValidHeaderCount() for clarity.

recreate() - as they always match, input_header_count_ could be set equal to valid_column_count_ rather than incurring another method call. (In fact, this is a case where a construct:

input_header_count_ = valid_column_count_ = getColumnCount();

... would be very clear.)

Good catch.

comment:14 follow-up: Changed 2 years ago by stephen

  • Owner changed from stephen to tmark

Reviewed commit 2023588f8ac78ae09161d7df782199af39dbfad1

src/lib/dhcpsrv/dhcpsrv_messages.mes
DHCPSRV_MEMFILE_NEEDS_DOWNGRADING: Rather than "lease file: %s is beyond version %2" suggest "schema of lease file %1 is later than version %2".
For symmetry, the message for DHCPSRV_MEMFILE_NEEDS_UPGRADING could start "schema of lease file %1 is at version %2"

That is cleaner. Well done.

Still not quite right. The upgrade message implies that the version of the lease file schema is at the current version, not earlier than it. How about "version of lease file %1 schema is earlier/later than %1, the file's schema will be upgraded/downgraded"?

Messages should not end with a period.

Finally, I checked the code in lease_file_loader.h that outputs these messages. The second argument should be obtained with getSchemaVersion(), not getInputSchemaState(). (I missed that in the first review.)

src/lib/util/versioned_csv_file.cc
next(): When switching on the input schema state, the NEEDS_UPGRADE comment is incorrect - columns must be at least as long as the minimum number of columns required, not at least as long as the header.

These are minor fixes so please merge when you have made them - I don't need to to see this ticket again.

comment:15 in reply to: ↑ 14 Changed 2 years ago by tmark

  • Resolution set to complete
  • Status changed from reviewing to closed

Replying to stephen:

Reviewed commit 2023588f8ac78ae09161d7df782199af39dbfad1

src/lib/dhcpsrv/dhcpsrv_messages.mes
DHCPSRV_MEMFILE_NEEDS_DOWNGRADING: Rather than "lease file: %s is beyond version %2" suggest "schema of lease file %1 is later than version %2".
For symmetry, the message for DHCPSRV_MEMFILE_NEEDS_UPGRADING could start "schema of lease file %1 is at version %2"

That is cleaner. Well done.

Still not quite right. The upgrade message implies that the version of the lease file schema is at the current version, not earlier than it. How about "version of lease file %1 schema is earlier/later than %1, the file's schema will be upgraded/downgraded"?

Done.

Messages should not end with a period.

Done.

Finally, I checked the code in lease_file_loader.h that outputs these messages. The second argument should be obtained with getSchemaVersion(), not getInputSchemaState(). (I missed that in the first review.)

Done.

src/lib/util/versioned_csv_file.cc
next(): When switching on the input schema state, the NEEDS_UPGRADE comment is incorrect - columns must be at least as long as the minimum number of columns required, not at least as long as the header.

Amended commentary.

These are minor fixes so please merge when you have made them - I don't need to to see this ticket again.

I have merged the changes with git ce4b0e42e8a01bbf3b58fdb1f505bbd6e2fada134

Added ChangeLog entry 1046.

Ticket is closed.

comment:16 Changed 2 years ago by tomek

  • Milestone changed from Kea1.0 to Kea1.0-beta

Milestone renamed

Note: See TracTickets for help on using tickets.