===========
Performance
===========

This page shows performance tests and their results. For element-by-element pipelines, the performance as good as ad-hoc crafted Python code written without the same flexibility. For chunking pipelines 
(see :doc:`numpy_chunking`), the performance is better than that of either ad-hoc crafted C code that processes element-by-element or NumPy 


----------------------
CSV Element By Element
----------------------

The following shows the performance of three implementations finding the correlation between two CSV columns pruned for outliers:

.. figure:: NoChunksPerf.png
  :alt: no-chunks performance

The ad-hoc crafted CSV code using `csv.reader` is
::

    r = csv.reader(open(_f_name, 'r'))
    
    fields = r.next()
    ind0, ind1 = fields.index('0'), fields.index('1')
    
    sx, sxx, sy, syy, sxy, n = 0, 0, 0, 0, 0, 0
    try:        
        while True:
            row = r.next()
            x, y = float(row[ind0]), float(row[ind1])
            if x < 0.5 and y < 0.5:
                sx += x
                sxx += x * x
                sy += y
                sxy += x * y
                syy += y * y
                n += 1
    except StopIteration:
    c = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)
    
The ad-hoc crafted CSV code using `csv.DictReader` is
::

    def _dict_corr():
        r = csv.DictReader(open(_f_name, 'r'), ('0', '1'))
    
        sx, sxx, sy, syy, sxy, n = 0, 0, 0, 0, 0, 0
        for row in r:
            x, y = float(row['0']), float(row['1'])
            if x < 0.5 and y < 0.5:
                sx += x
                sxx += x * x
                sy += y
                sxy += x * y
                syy += y * y
                n += 1
        c = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)
    
The pipeline code is:
::

    def _csv_pipes_corr():
    c = csv_vals(open(_f_name, 'r'), ('0', '1')) | \
        filt(pre = lambda (x, y) : x < 0.5 and y < 0.5) | \
        corr()


------------
CSV Chunking
------------

The following shows the performance of two implementations finding the mean of a CSV column using direct !Numpy and dagpype.

.. figure:: ChunksCsv.png
  :alt: chunks CSV performance

The !NumPy implementation processing all data in a single chunk is:
::

    x = numpy.genfromtxt(_f_name, usecols = (0), delimiter = ',')
    a = numpy.mean(x)
    
The pipeline implementation is:
::

    c = np.chunk_stream_vals(_f_name, '0') | np.mean()


--------------------
Binary File Chunking
--------------------

Correlation
~~~~~~~~~~~

The following shows the performance of three implementations finding the correlation between data in binary format using C code, direct !Numpy, and dagpype:

.. figure:: ChunksPerf.png
  :alt: chunks performance

The C implementation is:
::

    double c_corr_prune(const char f_name[])
    {
        FILE *const pf = fopen(f_name, "rb");
        assert(pf != NULL);
        double sx = 0, sxx = 0, sy = 0, syy = 0, sxy = 0;
        size_t n = 0;
        while(1)
        {
            double x, y;
            if(fread(&x, sizeof(double), 1, pf) != 1 || fread(&y, sizeof(double), 1, pf) != 1 || feof(pf))
            {
                fclose(pf);
                break;
            }
            if(x >= 0.25 || y >= 0.25)
                continue;
            sx += x;
            sxx += x * x;
            sy += y;
            sxy += x * y;
            syy += y * y;
            ++n;
        }
    
        // printf("C %ld values\n", n);
    
        return (n * sxy - sx * sy) / sqrt(n * sxx - sx * sx) / sqrt(n * syy - sy * sy);
    }
    
The !NumPy implementation processing all data in a single chunk is:
::

    s = open(_f_name, 'rb').read()
    a = numpy.fromstring(s)
    xy = a.reshape(a.shape[0] / 2, 2)    
    
    s = numpy.sum(xy, axis = 0)
    sx = s[0]
    sy = s[1]
    
    c = numpy.dot(xy.T, xy)
    
    sxx = c[0, 0]
    sxy = c[0, 1]
    syy = c[1, 1]
    
    n = xy.shape[0]
    # print 'numpy core', n, 'values'
    res = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

The pipeline implementation is:
::

    c = np.chunk_stream_bytes(_f_name, num_cols = 2) | np.corr()


Pruned Correlation
~~~~~~~~~~~~~~~~~~

The following shows the performance of three implementations finding the correlation between data in binary format, pruning pairs with values larger than 0.25,  using C code, direct !Numpy, and dagpype:

.. figure:: ChunksPerfPrune.png
  :alt: chunks + pruning performance

The C implementation is:
::

    double c_corr_prune(const char f_name[])
    {
        FILE *const pf = fopen(f_name, "rb");
        assert(pf != NULL);
        double sx = 0, sxx = 0, sy = 0, syy = 0, sxy = 0;
        size_t n = 0;
        while(1)
        {
            double x, y;
            if(fread(&x, sizeof(double), 1, pf) != 1 || fread(&y, sizeof(double), 1, pf) != 1 || feof(pf))
            {
                fclose(pf);
                break;
            }
            if(x >= 0.25 || y >= 0.25)
                continue;
            sx += x;
            sxx += x * x;
            sy += y;
            sxy += x * y;
            syy += y * y;
            ++n;
        }
    
        // printf("C %ld values\n", n);
    
        return (n * sxy - sx * sy) / sqrt(n * sxx - sx * sx) / sqrt(n * syy - sy * sy);
    }    

The !NumPy implementation processing all data in a single chunk is:
::
    s = open(_f_name, 'rb').read()
    a = numpy.fromstring(s)
    xy = a.reshape(a.shape[0] / 2, 2)    
    xy = xy[numpy.logical_and(xy[:, 0] < 0.25, xy[:, 1] < 0.25), :]
    
    s = numpy.sum(xy, axis = 0)
    sx = s[0]
    sy = s[1]
    
    c = numpy.dot(xy.T, xy)
    
    sxx = c[0, 0]
    sxy = c[0, 1]
    syy = c[1, 1]
    
    n = xy.shape[0]
    # print 'numpy core', n, 'values'
    res = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)
single

The pipeline implementation is:
::

    c = np.chunk_stream_bytes(_f_name, num_cols = 2) | \
        filt(lambda a : a[numpy.logical_and(a[:, 0] < 0.25, a[:, 1] < 0.25), :]) | \
        np.corr()



Truncated Correlation
~~~~~~~~~~~~~~~~~~~~~

The following shows the performance of three implementations finding the correlation between data in binary format, truncating values at 0.25,  using C code, direct !Numpy, and dagpype:


.. figure:: ChunksPerfTrunc.png
  :alt: chunks + performance

The C implementation is:
::

    double c_corr_trunc(const char f_name[])
    {
        FILE *const pf = fopen(f_name, "rb");
        assert(pf != NULL);
        double sx = 0, sxx = 0, sy = 0, syy = 0, sxy = 0;
        size_t n = 0;
        while(1)
        {
            double x, y;
            if(fread(&x, sizeof(double), 1, pf) != 1 || fread(&y, sizeof(double), 1, pf) != 1 || feof(pf))
            {
                fclose(pf);
                break;
            }
            x = fmin(x, 0.25);
            y = fmin(x, 0.25);
            sx += x;
            sxx += x * x;
            sy += y;
            sxy += x * y;
            syy += y * y;
            ++n;
        }
    
        // printf("C %ld values\n", n);
    
        return (n * sxy - sx * sy) / sqrt(n * sxx - sx * sx) / sqrt(n * syy - sy * sy);
    }

The NumPy implementation processing all data in a single chunk is:
::

    s = open(_f_name, 'rb').read()
    a = numpy.fromstring(s)
    xy = a.reshape(a.shape[0] / 2, 2)    
    xy = numpy.where(xy < 0.25, xy, 0.25)
    
    s = numpy.sum(xy, axis = 0)
    sx = s[0]
    sy = s[1]
    
    c = numpy.dot(xy.T, xy)
    
    sxx = c[0, 0]
    sxy = c[0, 1]
    syy = c[1, 1]
    
    n = xy.shape[0]
    # print 'numpy core', n, 'values'
    res = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

The pipeline implementation is:
::

    c = np.chunk_stream_bytes(_f_name, num_cols = 2) | \
        filt(lambda a : numpy.where(a < 0.25, a, 0.25)) | \
        np.corr()

