Metadata-Version: 1.1
Name: mole
Version: 0.1
Summary: A flexible log analyzer and operational intelligence tool.
Home-page: http://github.com/ajdiaz/mole
Author: Andres J. Diaz
Author-email: ajdiaz@connectical.com
License: GPLv2
Description: Mole: A flexible operational log analyzer.
        ==========================================
        
        Mole is a log analyzer with parse your logs file (any kind of log), using
        specified definitions (usually as regular expressions) and magically
        interpret some fields (numbers, dates ...). Mole provide you a set of
        functions to analyze that data.
        
        Installation
        ------------
        Just as usual for each python package::
        
            pip install mole
        
        Getting started
        ---------------
        
        In this example we will use an access log file generated by apache (or any
        other HTTP server). Let's suppose that this file is located in
        /var/log/apache/access.log.
        
        .. note:: Don't worry about log rotations, mole can handle it.
        
        1. Configure mole
        ~~~~~~~~~~~~~~~~~
        
        Edit the ``/etc/mole/input.conf``, just adding
        
        .. code-block:: ini
        
            [apache_log]
            type   = tail
            source = /var/log/apache/access.log
        
        We are defining a new input called *apache_log*, of type tail (that means
        that we read the new lines in the file when written and handle rotate logs),
        pointing to our log file in ``/var/log/apache/access.log``
        
        Edit the ``/etc/mole/index.conf``, just adding
        
        .. code-block:: ini
        
            [apache_log]
            path = /var/db/mole/apache_log
        
        We are defining a new index. The index is the mole database where logs will
        be stored in a proper format, so we can perform faster searches.
        
        2. Start daemons
        ~~~~~~~~~~~~~~~~~~~~
        
        .. code-block:: bash
        
            $ mole-indexer -C /etc/mole
            $ mole-seeker -C /etc/mole
        
        3. Enjoy some searches
        ~~~~~~~~~~~~~~~~~~~~~~
        
        For example, get the top IP addresses which requested more traffic
        
        .. code-block:: bash
        
            $ mole 'input apache_log | sum bytes by src_ip | top'
        
        
        Understanding Mole Components
        -----------------------------
        
        The mole pipeline is the responsible to read log items from a source,
        process then (and transform them if required) and, finally, return an
        output. If output is not explicitly defined, use the best output format for
        current console (serialize in network, just an printf in console).
        
        .. image:: http://yuml.me/diagram/scruffy;/class/[element]++-0..*%3E[input],%20[element]++-0..*%3E[index],%20[element]++-0..*%3E[parser],%20[index]-%3E[schema]
          :align: center
        
        There are a few components which are interesting to know:
        
        **input:** The input are the responsible to read the log source, sources can
        be of different kinds, such normal files, network stream, index file and so
        on.
        
        **plotter:** The plotter main function is to split the source in logical
        lines. In a normal log file, each line in log is usually a new log entry,
        but some other logs could be use a couple of lines to define the same
        logical entry (i.e. java exceptions are usually in a number of lines).
        
        **parser:** Once the logical line is got, you need to known what is the
        meaning of each field. The parser just assign names to fields using regular
        expressions for that.
        
        **actions:** The actions are transformations, filters and in general any
        other action to take over the log dataset.
        
        **output:** The output just encapsulate the results of the actions in
        a human (or machine) readable form. You can think the output as some kind of
        serialization.
        
        So, the final pipeline in mole is something like that::
        
            <input> | <plotter> | <parser> | <action> | <action> ... | <output>
        
        
        Daemons
        -------
        Mole is composed by three different daemons (for now):
        
        **mole-indexer**: is the responsible to get the log files and index it,
            using an index back-end (just whoosh right now).
        
        **mole-seeker**: is the daemon responsible to lookup into the index,
            receiving queries using TCP port.
        
        **mole**: is the client which can query the mole-seeker.
        
        Running
        -------
        To start mole, you need to configure the server. You have an example in the
        configuration directory of the source code. The configuration directory
        will contains one file per mole component.
        
        Once your server is configured, start both mole-indexer and mole-seeker.
        
        Finally perform your query using mole.
        
        Configuration
        -------------
        Into the configuration directory, you can find a different file per each
        mole component, i.e:
        
        **input.conf** for configure inputs. An input is a reader over a file,
            a network stream or everything else that can use to retrieve data to
            be analyzed.
        
        **index.conf** for set up indexes. The indexes are special stpra
        
        Examples
        --------
        Count the lines of a input (in this case the input will be an access_log of
        apache server)::
        
          $ mole 'input apache_log | count *'
          count(*)=3445
        
        Perform the same query, but grouping by source ip::
        
          $ mole 'input apache_log | count * by src_ip'
          src_ip=127.0.0.1 count=121
          src_ip=192.168.0.21 count=1203
        
        Calculate the average transfer size in apache log, sorted by URL and get
        only the top three::
        
          $ mole 'input apache_log | avg bytes by path | top 3'
          path=/ avg(bytes)=12343
          path=/login avg(bytes)=6737
          path=/logout avg(bytes)=2128
        
        Search for an expression and count occurrences::
        
          $ mole 'input apache_log | search path=*login* | count *'
          count(*)=3838
        
        
        Development
        -----------
        The Mole code is stored in github_, and you can download it using git, as
        usual too::
        
          $ git clone git://github.com/ajdiaz/mole
        
        .. _github: http://github.com/ajdiaz/mole
        
        
        Design
        ------
        The basic design of mole is a linear pipeline which includes, the following
        components:
        
        * The *input*, is the responsible to read the data source byte-to-byte (or
          line to line, but it's agnostic to the format).
        
        * The *plotter*, which breaks the logical lines of the input. A logical line
          can be a text line or a number of text lines or a binary block.
        
        * The *parser*, is the responsible to get fields into the lines, for example
          using a regular expression or a comma separated pattern.
        
        * The *actions*, which are a number of transformations over the fields.
        
        Inputs can be normal files (or tails of files) or special files called
        "indexes". An index contains the raw data plus time pointer.
        
        Bugs, feedbacks, comments et spam
        ---------------------------------
        To open bugs or enhanced proposals, please use the `github issues tool`_.
        If you have any suggestions, do not hesitate to contact me.
        
        .. _`github issues tool`: http://github.com/ajdiaz/mole/issues
        
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 2.6
Classifier: Natural Language :: English
