wiki:ExternalMessageFiles

External Message Files

Problem Statement

One of the things that needs to happen in the command tool is language localization. The common way I have seen this done is that there are tokens (numbers, text, whatever) that are put in the program and then the display system replaces them with the text for the appropriate language.

It could really aid the parsing of the results if there was an option to include the token as well as the translation in the output. The humans ignore the tokens and the parsers ignore the language text. My initial thoughts were to put the tokens in parenthesis and the text in double quotes and prohibit those characters from appearing elsewhere in the display and in the localization texts. The choice of marking characters are arbitrary on my part.

Background

There is nothing new under the sun :-) The VMS operating system (still around but now a distinctly minority interest) outputs its messages in the form

%<facility>-<severity>-<identification>, <text>

... where ...

"%" is the message introducer. Related messages are preceded by a "-" to indicate continuation, e.g.

%TPU-F-OPENIN, error opening PROGRAM.C as input
-RMS-E-FNF, file not found

<facility> is the system component originating the error. In the above example, "TPU" is the text processing utility (something in which editors were written) and "RMS" is "Record Management Services", the part of the system that handles file I/O

<severity> is the severity of the error. VMS defined five severities: S (success), I (information), W (warning), E (error) and F (fatal). (The difference between information and success messages is principally when they are used: an information message is "I am doing something" and success message is "I have completed something".)

<identification> is a string that uniquely identifies the message. It is the key in the "System Messages and Recovery Procedures" manual (see for example http://h71000.www7.hp.com/doc/73final/documentation/pdf/ovms_msg_ref_al.pdf).

<text> is the text of the message.

The mechanism can be used for user programs as well. Messages are defined in a message file which is processed with the message compiler. The output is an object file, which is linked into the program. As well as the text of the messages, the object module defines symbols which the program can use to refer to the messages.

Suggestion

A similar thing could be done in BIND-10. For example, suppose we defined messages in a file that had a syntax along the lines of:

#FACILITY    b10auth
INVPORTNO    invalid port number: %d
NETINITFAI   failed to initialize network servers: %d

It could then be run through a message compiler (written in Python) to produce a ".h" file:

static const char* B10AUTH__FACILITY    "b10auth"
static const char* B10AUTH_INVPORTNO    "[b10auth] INVPORTNO, invalid port number: %d";
static const char* B10AUTH_NETINITFAI   "[b10auth] NETINITFAI, failed to initialize network servers: %d";

... and/or a python module:

class B10AUTH:
   FACILITY_ = "b10auth"
   INVPORTNO = "[b10auth] INVPORTNO, invalid port number: %d"
   NETINITFAI = "[b10auth] NETINITFAI, failed to initialize network servers: %d"
   :

There is a drawback with this approach, at least in the C++ case: with the g++ compiler, unused "static const char*" definitions are only removed when compiler optimisation is specified. For this reason there will be some code bloat during development.

An alternative would be to follow the VMS model more closely and instead of generating symbols that define strings, generate symbols that define message numbers. So the message compiler could produce two files: a .h file giving symbol definitions, e.g.

#define B10AUTH__FACILITY 0x12
#define B10AUTH_INVPORTNO  0x120001
#define B10AUTH_NETINITFAI  0x120002

... and an object file that holds code to return the message text. (These modules would also contain code to register their facility and messages at program startup time, so allowing the message retrieval code to locate the message for any given code.)

In fact it does occur to me that the message file - if properly documented - could be used to provide support with appropriate documentation. For example:

#FACILITY    b10auth

INVPORTNO    invalid port number: %d
+The program failed to start because the port number was invalid: either the port number
+specified in the "auth" configuration was outside the range 0..65,535 or was not numeric.

NETINITFAI   failed to initialize network servers: %s
+This is a generic message, output when the asynchronous I/O layer has generated an
+exception.  The specific reason for the exception is given as the message argument.

Suitable processing could provide a text file or HTML pages.

Finally, if we are encouraging users to make changes we would need to take care during upgrades and, in particular, come up with a way of preserving changes that users have made. For example:

  • if they have spent time translating 500 messages and we add another 100 scattered through the message files, asking them to manually reapply their changes represents a lot of work. It would be much better to have a system whereby their changes are picked up automatically and if they have not given a translation for a particular message, the default supplied with the distribution is used.
  • although we can could change the text in a message without impact, changing the order or types of arguments could cause problems if the user has translated the message string. In these cases we should stipulate that a new message be created with a new identification and the old one retired.

[Comment] Very nicely done. For what it's worth, I am a big fan of using #defines and numerical codes for this sort of thing. static const char* isn't always the right thing to do (after all, there are times when it is appropriate to use goto statements) and in this case, the use of the internal #define code hides the underlying numerical value (which could be organized or not) and the code could be translated into a string in something like an enum. Coding is straight forward and is only an issue if two or more developers add codes at the same time in different branches which must be later merged. Furthermore, this approach supports both localization and (as illustrated above) verbosity which could be a compile- (or run-) time selection.

Last modified 7 years ago Last modified on Dec 29, 2010, 9:33:40 PM