.TH %(program)s %(section)s "%(month)s %(day)s %(year)s" "version %(version)s" "USER COMMANDS"
.SH NAME
%(program)s \- Parse various log formats and output NetLogger format 
.SH SYNOPSIS
.B %(program)s [options] [files..]
.SH DESCRIPTION
.PP
This program converts from known log formats to NetLogger
(a.k.a. CEDPS Best-Practices) format, which can then be used by the
rest of the NetLogger tools. There are a number of built-in parsers.
Any Python module implementing the API documented below can also be
used as a parser. %(program)s can operate on many different files at
once, using pattern rules to match parsers to files. It can also
handle combined logs from different applications -- e.g., as from
syslog -- as long as there is a header that can be used to distinguish
them. The output is always a single file, standard output by default; see
the Output section below for details.
.PP
This program can either run with command-line options or using a configuration
file. It can also run as a daemon, which requires the use of a configuration
file. Its functionality is restricted from the command-line to processing
with a single parser module and writing to standard output.
.SH OPTIONS
%(options)s
.SH USAGE
.SS Output file
The output is a single file. If using the configuration file, it can
be specified in the global section using the 'output_file' keyword.
Otherwise, it is standard output.
.PP
If the persistence feature is used, the output file name in the persistent
state will be tried first, then if that fails to open the configuration
file's value will be tried. If all else fails, standard output will be used.
.PP
The only time standard out will not be used as a fallback is if the
file rotation feature is used. It is very hard to rotate standard output,
so in this case the persistent state file (if present) or the configuration
file must have a valid path. Note that the value in the persistent state file
will always be used 'as-is', whereas with rotation turned on the configuration
file value will be postfixed with a number: see the 'rotate' keyword
in the Configuration section for details.
.SS Configuration
.PP
The general syntax of the configuration file is an 'INI' variant
recognized by the Python ConfigObj module.  See the ConfigObj homepage
(listed at bottom) for details.  Basically, the format consists of
sections of keyword, value pairs. 
Keywords and values are separated by an '=', and section markers 
are between square brackets. 
Keywords, values, and section names can be surrounded by single or double
quotes. 
Nested subsections are indicated by increasing numbers the square brackets in the
section marker, e.g., "[section]", "[[subsection]]", and "[[[sub-subsection]]]".
You can have list values by separating items with a comma, and values spanning multiple lines by using triple quotes (single or double).
.PP
The specific sections and keywords used here will be explained in the
context of the following example:
.nf
.RS

[global]
files_root = /var/log
modules_root = netlogger.parsers.modules
pre_path = ~/lib/python
post_path = /some/other/lib/python
use_system_path = yes
state_file = /tmp/nlparser-saved.state
eof_event = True
tail = True
rotate = 60

[p1]
files_root = /tmp
[[bp]]
files = *.log, *.out
[[[parameters]]]
has_gid = yes

[p2]
files = *.nllog
pattern = "\[(?P<pid>\d+)\] (?P<level>[A-Z]+)/(?P<app>\S+):"
[[gk]]
[[[match]]]
app = "globus_gatekeeper"
[[generic]]
[[[match]]]
app = ".*"

.fi
.RE
.PP
The 
.B [global]
section has settings that apply to all parsers:
.RS
.TP
.B files_root
Path to prepend to 'files' paths. This can be overridden
inside the parser sections, as shown here.
.br
Default value = 
.I . 
(current directory) 
.TP
.B pre_path
Colon-separated module path to put
.I before
the system path.
.br
Default value =
.I empty
.TP
.B post_path
Colon-separated module path to put
.I after
the system path.
.br
Default value =
.I empty
.TP
.B modules_root
Module path to prepend to module names.
.br
Default value = 
.I netlogger.parsers.modules 
.TP
.B use_system_path
If 'no', do not include the normal Python system path to find modules.
.br
Default value = 
.I yes 
.TP
.B state_file
Save state to the given file. Any value here turns persistence "on", except
either the special value "None" or an empty string.
For more details about saving state, see the Persistence section.
.br
Default value =
.I /tmp/netlogger_parser_state
.TP
.B eof_event
Append a special end-of-file NetLogger event when closing
or rotating the file. Usually used with the 'rotate' option.
.br
Default value = 
.I False
.TP
.B tail
Flag indicating whether to 'tail' the file forever.
.br
Default value = 
.I False
.TP
.B rotate
Number of minutes between rotations of the file.
Zero (0) means 'off'. If this is 'on', then
.I all
output filenames will have ".{num}" added to them, where
{num} is chosen to be the next-lowest-number in the
same directory as the file. For example, if the output
file name is "/tmp/foo.log" and there is already a
"/tmp/foo.log.3" and "/tmp/foo.log.5" at startup, then
the first output file name will be "/tmp/foo.log.6".
.br
Default value = 
.I 0 
(off)
.TP
.B throttle
Percent of "full speed" to which the parser should
throttle itself. This only has an effect if the data
starts or stays ahead of the parser; it should not slow
processing down if the parser only has short bursts
of activity to perform. The value is a number in the
interval (0,1], i.e. greater than 0 and less than or
equal to 1. Note that the parser is single-threaded so
on a dual CPU machine it can only get at most 50%% of the
availabe CPU (etc.).
.br
Default value =
.I 1
.RE
.PP
The 
.B [p1] 
section is an example of a static mapping of a parser module
to a file pattern. The titles "p1" and "p2" are arbitrary; anything
except the reserved word "global" can be used.
.RS
.TP
.B files_root
Each section optionally specifies its own value for this.
.br
Default value = 
.I global value
.TP
.B files
File pattern, or list of patterns, that selects the input files. This is concatenated to
the 
.I files_root
value and then matched with UNIX 'glob' semantics. Note that this matching
is done only during initial configuration, so new files that match the pattern
will not be "seen" until the program is restarted or re-configured (see Signals, below).
.br
Default value =
.I empty
.TP
.B [[<module>]] 
Sub-section whose name is the name of the
Python parser module to use. This module name has the modules_root
value prepended, so in this example the module would be:
"netlogger.parsers.modules.bp".
.br
.TP
.B [[[parameters]]]
Additional keyword, value pairs to be passed in to the module at
initialization time. The meaning of these keywords is module-specific.
.br
Default value = 
.I empty
.RE
.PP
The 
.B [p2] 
section is an example of a dynamic mapping of a parser module
to a file pattern, where the correct module is selected based on a
regular expression match to the contents of the header of each log line.
This header is stripped before the line itself is passed to the parser, so
we can use the same parser modules as for static mappings.
.RS
.TP
.B files
See description of "files" parameter under [p1]. This cannot be specified
on a per-module basis, as the modules are chosen using the 'pattern' instead.
.br
Default value =
.I empty
.TP
.B pattern
Regular expression used to extract the header from each line.
This keyword implies that the mapping is dynamic, rather than static.
Named pattern groups in the expression use the Python 're' module syntax
of "(?P<name>PATTERN)"
to extract things matching PATTERN as group 'name'. Other regular expression
syntax may be used, but these named groups are important because they
are used in the subsequent [[[match]]] sub-section.
.TP
.B [[<module>]] 
As for 
.B [p1]
, a sub-section whose name is the name of the
Python parser module to use. Similarly, a [[[parameters]]] sub-sub-section
is allowed.
.TP
.B [[[match]]]
Keyword, value pairs that describe which headers should be matched to
this parser. The keywords should be the same as the names of the named
patterns given in the 'pattern' expression. The values are regular
expressions matched agains the corresponding strings extracted from the header.
The first, and only the first, module to match a given header 
is used to parse that header's log line.
.br
Default value = 
.I empty
(match anything)
.RE 
.PP
The command-line configuration creates a simplified configuration
file with a single parser module. Persistence is turned off, and neither the
file rotation nor the EOF-marker features are used.
The following options and arguments configure the parser modules:
.RS
.TP
.B -e, --expr
is equivalent to the "pattern" keyword.
Since only one parser module can be specified,
the given pattern cannot be used to dynamicaly choose between modules;
it simply strips the header.
.TP
.B -m, --module
is equivalent to [[<module>]]
.TP
.B -p, --param
populates [[[parameters]]]
.TP
.B files..
populates the "files" keyword.
The value of "files_root" is set to '' (the empty string),
so that relative and absolute paths can both be used.
.RE
.PP
For example, the options:
.nf
.RS

--header '(?P<foo>bar.*):' --module mymod --param add_host=yes /tmp/abc*.log

.fi
.RE
are equivalent to the following configuration:
.nf
.RS

[cmdline]
files_root = ""
files = /tmp/abc*.log
pattern = "(?P<foo>bar.*)"
[[mymod]]
[[[parameters]]]
add_host = yes

.fi
.RE
.PP
.SS Persistence
.PP
Persistence saves the current position in all input files (along with
parser-specific state information, if needed) in a "state file".  It
should be used whenever the program is run over a long period of time,
where it may need to be restarted or reconfigured while running.  
In this mode, three things happen:
.PP
1. At startup, %(program)s attempts to read the specified state file for
the current positions. 
If this file does not exist, a warning will be printed, and it will be created.
.PP
2. When reconfiguration is triggered by a signal, the state is first
saved and then restored.
.PP
3. When the program exits gracefully, state is first saved. In addition,
the state is saved periodically (every time all files reach EOF), so
even termination with SIGKILL will, in general, not lose much information.
.SS Signals
Some signals cause %(program)s to perform special actions:
.PP
*  SIGTERM, SIGINT, SIGUSR2: Terminate gracefully
.PP
*  SIGUSR1: Rotate the output file. The current file's contents will be moved to a new name with a unique prefix in the same directory, and the file will start over at zero length.  
.PP
*  SIGHUP: Re-read the configuration file. This results in all the input files being closed and re-opened, although assuming persistence is turned on this should not cause any anomalies in the processing of files that are present in both configurations.
.SS Adding New Parsers
New parsers should be written as Python modules.
New parser classes should inherit from
netlogger.parsers.base.BaseParser. The parser class API consists of a
single overridden method, process(), that takes as an argument a line
of input and return a list of dictionaries or formatted log strings
(with newlines).  If there is an error with the format, process()
should raise a ValueError, KeyError, or the base module's
ParseError. If nothing is yet ready, it should return an empty list or
tuple; if nothing will ever be ready, it should return None.
.SH EXAMPLES
.TP
Use the configuration file "my.conf", fork into the background, and run on the given input files until a a termination signal (see USAGE:Signals) is received. Use the persistence feature to save current state in a file. The PID of the process will be written to /var/run/nl_parser.pid.
.B %(program)s
\-c my.conf \-d /var/run/nl_parser.pid \-f \-p
.TP
Parse standard input as a GridFTP "info" log, writing the result to standard output.
.B %(program)s
\-M gridftp
.SH EXIT STATUS
Returns zero on success, non-zero when it encounters a misconfiguration, missing file, or fatal parsing error.
.SH BUGS
No known bugs.
.SH AUTHOR
Dan Gunter (dkgunter (at) lbl.gov))
.SH SEE ALSO
NetLogger home page
.RS
http://dsd.lbl.gov/NetLoggerWiki
.RE
ConfigObj home page
.RS
http://www.voidspace.org.uk/python/configobj.html
.RE

