logtools
A log files analysis / filtering framework.

Author: Adam Ever-Hadani <adamhadani@gmail.com>

logtools encompasses of a few easy-to-use, easy to configure command-line
tools, typically used in conjunction with Apache logs.

The idea is to standardize log parsing and filtering using a coherent
configuration methodology and UNIX command-line interface (STDIN input streaming, command-line piping etc.)
so as to create a consistent environment for creating reports, charts and other such
log mining artifacts that are typically employed in a Website context.

This software is distributed under the Apache 2.0 license.


Installation
------------
To install this package and associated console scripts, unpack the distributable tar file,
or check out the project directory, and then run:
	python setup.py install


Console Scripts
---------------
* filterbots - used to filter bots based on an ip blacklist and a useragent blacklist file(s).
               The actual regular expression mask used for matching is also user-specified,
               so this can be used with any arbitrary log format (See examples below).

* geoip      - Simple helper utility for using the GeoIP tool to tag log lines by the IP's country.
               The regular expression mask used for matching the IP in the log line is user-specified.

* logsample  - Produce a (uniform) random sample from log stream. This uses Reservoir Sampling to
			   efficiently produce a random sampling over an arbitrary large input stream.
			   
* logplot      - Render a plot of Successive values based on log parsing. Under Construction.

* logplotserve - Start a compact webserver (WSGI-based) serving logplots. Under Construction.


Configuration
-------------
All tools' command-line parameters can assume a default value using parameter interpolation
from /etc/logtools.cfg and ~/.logtoolsrc, if these exist.
This allows for convenient operation in the usual case where these rarely change.
The configuration file format is of the form:

[script_name]
optname: optval

For example:

[geoip]
ip_re: ^(.*?) -

[filterbots]
bots_ua: /home/www/conf/bots_useragents.txt
bots_ips: /home/www/conf/bots_hosts.txt
ip_ua_re: ^(?P<ip>.*?) -(?:.*?"){5}(?P<ua>.*?)"


Usage Examples
--------------
1. The following example demonstrates specifying a custom regular expression for matching
the ip/user agent. Notice the use of named match groups in the regular expression - (?P<name>...).
The ips/useragents files are not specified in commandline and therefore are assumed to be defined
in ~/.logtoolsrc or /etc/logtools.cfg. The option --print is used to actually print matching lines.

	cat error_log.1 | filterbots -r ".*\[client (?P<ip>.*?)\].*USER_AGENT:(?P<ua>.*?)\'" --print

Notice that its easy to reverse the filtermask simply by adding the --reverse flag:

	cat error_log.1 | filterbots -r ".*\[client (?P<ip>.*?)\].*USER_AGENT:(?P<ua>.*?)\'" --print --reverse

2. The following example demonstrates using the geoip wrapper. Pretty self-explanatory:

	cat access_log.1 | geoip -r '.*client (.*?)\]'

3. Naturally, piping between utilities is useful:
	
	cat access_log.1 | filterbots -r "^(?P<ip>.*?) -.*(?P<ua>.*?)" --print | geoip -r '.*client (.*?)\]'

4. All tools admit a --help command-line option that will print out detailed information about the different
   options available.

Unit-testing
------------
A test suite is included in the package. Simplest way to run would be using nose. From package root directory, issue:

	nosetests

~~
