=================
Sed-Like Examples
=================

This page contains common examples of common sed_-like operations (taken from `Sed One-Liners Explained`_).

.. _sed: http://www.gnu.org/software/sed/
.. _`Sed One-Liners Explained`: http://www.osnews.com/story/21004/Awk_and_Sed_One-Liners_Explained

.. figure:: post-icon-sed.jpg
  :alt: sed


------------
File spacing
------------

1. Double-space a file.
    ::

        >>> stream_lines('foo.txt') | filt(lambda l: l + '\n') | to_stream(sys.stdout)

2. Double-space a file which already has blank lines in it. Do it so that the output contains no more than one blank line between two lines of text.
    ::

        >>> stream_lines('foo.txt') | filt(lambda l: l + '\n', pre = lambda l: l.strip()) | to_stream(sys.stdout)

3. Triple-space a file.
    ::
    
        >>> stream_lines('foo.txt') | filt(lambda l: l + '\n\n') | to_stream(sys.stdout)


.. _`Item 4`:

4. Undo double-spacing. This assumes that even-numbered lines are always blank.
    ::

        >>> stream_lines('foo.txt') | enumerate_() | \
        ...     consec_group(lambda (n, _): int(n / 2), lambda d: select_inds(1) | nth(0))  | \
        ...     to_stream(sys.stdout)
        
    .. Note::    
        
        The first line,        
        ::
        
            >>> stream_lines('foo.txt') | enumerate_() | \
            
        transforms the lines of the files to tuples, whose first item is a running index. 
        
        The second line, 
        ::
        
            ...     consec_group(lambda (n, _): int(n / 2), lambda d: select_inds(1) | nth(0))  | \
            
        uses the :func:`dagpype.consec_group` filter. The key determining consecutive group is the integer part of the running index divided by 2 (hence each two consecutive lines are in a group); the pipe for a consecutive group selects the first index, then pipes the result to a terminal stage taking only the first element piped.
        
5. Insert a blank line above every line that matches "regex".
    ::

        >>> stream_lines('foo.txt') | \
        ...     filt(lambda l: '\nregex' if l == 'regex' else l) | to_stream(sys.stdout)

6. Insert a blank line below every line that matches "regex".
    ::

        >>> stream_lines('foo.txt') | filt(lambda l: 'regex\n' if l == 'regex' else l) | to_stream(sys.stdout)

7. Insert a blank line above and below every line that matches "regex".
    ::

        >>> stream_lines('foo.txt') | filt(lambda l: '\nregex\n' if l == 'regex' else l) | to_stream(sys.stdout)


---------
Numbering
---------

8. Number each line of a file (named filename). Left align the number.
    ::

        >>> stream_lines('foo.txt') | enumerate_() | \
        ...     filt(lambda (n, l): '{:<5} {}'.format(n, l)) | \
        ...     to_stream(sys.stdout)

9. Number each line of a file (named filename). Right align the number.
    ::

        >>> stream_lines('foo.txt') | enumerate_() | \
        ...     filt(lambda (n, l): '{:>5} {}'.format(n, l)) | \
        ...     to_stream(sys.stdout)

10. Number each non-empty line of a file.
    ::

        >>> source((int(l.strip() != ''), l.strip()) for l in open('foo.txt')) | \
        ...     (select_inds(0) | cum_sum()) + select_inds(1) | \
        ...     filt(lambda (n, l): '{} {}'.format(n, l) if l else '') | \
        ...     to_stream('tmp.txt')
        
11. Count the number of lines in a file.
    ::

        >>> print stream_lines('foo.txt') | count()


--------------------------------
Text Conversion and Substitution
--------------------------------

22. Delete leading whitespace (tabs and spaces) from each line.
    ::
        
        >>> stream_lines('foo.txt') | filt(lambda l: l.lstrip()) | to_stream(sys.stdout)

23. Delete trailing whitespace (tabs and spaces) from each line.
    ::
        
        >>> stream_lines('foo.txt') | filt(lambda l: l.rstrip()) | to_stream(sys.stdout)

24. Delete both leading and trailing whitespace from each line.
    ::
        
        >>> stream_lines('foo.txt') | filt(lambda l: l.strip()) | to_stream(sys.stdout)

25. Insert five blank spaces at the beginning of each line.
    ::
        
        >>> stream_lines('foo.txt') | filt(lambda l: 5 * ' ' + l) | to_stream(sys.stdout)

26. Align lines right on a 79-column width.
    ::

        >>> stream_lines('foo.txt') | filt(lambda l: '{:>79}'.format(l)) | to_stream(sys.stdout)

27. Align lines right on a 79-column width.
    ::

        >>> stream_lines('foo.txt') | filt(lambda l: '{:^79}'.format(l)) | to_stream(sys.stdout)

28. Substitute (find and replace) the fist occurrence of "foo" with "bar" on each line.
    ::
    
        >>> stream_lines('foo.txt') | filt(lambda l: l.replace('foo', 'bar', 1)) | to_stream(sys.stdout)

29. Substitute (find and replace) the fourth occurrence of "foo" with "bar" on each line.
    ::
    
        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: 'foo'.join(l.split('foo', 4)[: 4]) + 'bar' + ''.join(l.split('foo', 4)[4: ]) \
        ...         if len(l.split('foo')) > 3 else l) | \
        ...     to_stream(sys.stdout)

30. Substitute (find and replace) all occurrence of "foo" with "bar" on each line.
    ::
    
        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: l.replace('foo', 'bar')) | \
        ...     to_stream(sys.stdout)

31. Substitute (find and replace) the first occurrence of a repeated occurrence of "foo" with "bar".
    ::

        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: l.replace('foo', 'bar', 1)) | \
        ...     to_stream(sys.stdout)
        
32. Substitute (find and replace) only the last occurrence of "foo" with "bar".
    ::

        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: 'bar'.join(l.rsplit('foo', 1))) | \
        ...     to_stream(sys.stdout)

33. Substitute all occurrences of "foo" with "bar" on all lines that contain "baz".
    ::

        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: l.replace('foo', 'bar') if 'baz' in l else l) | \
        ...     to_stream(sys.stdout)

34. Substitute all occurrences of "foo" with "bar" on all lines that DO NOT contain "baz".
    ::

        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: l.replace('foo', 'bar') if 'baz' not in l else l) | \
        ...     to_stream(sys.stdout)

35. Change text "scarlet", "ruby" or "puce" to "red".
    ::

        >>> stream_lines('foo.txt') | 
        ...     filt(lambda l: l.replace('scarlet', 'red').replace('ruby', 'red').replace('puce', 'red')) | \
        ...     to_stream(sys.stdout)

38. Join pairs of lines side-by-side (emulates "paste" Unix command).
    ::
    
        >>> stream_lines('foo.txt') | enumerate_() | \
        ...     consec_group(
        ...         lambda (n, _): int(n / 2), 
        ...         lambda d: select_inds(1) | (to_list() | sink(lambda l: '\t'.join(l))))  | \
        ...     to_stream(sys.stdout)
        
    .. Note::
        
        see `Item 4`_ for an explanation of the :func:`dagpype.consec_group` filter use here.        

.. _`Item 39`:

39. Append a line to the next if it ends with a backslash "\".
    ::

        >>> c = [0]
        >>> stream_lines('foo.txt') | \
        ...     consec_group(
        ...         lambda l: (c[0], c.__setitem__(0, c[0] + int(len(l) == 0 or l[-1] != '\\'))), 
        ...         lambda d: filt(lambda l: l[: -1] if len(l) > 0 and l[-1] == '\\' else l) | sum_())  | \
        ...     to_stream(sys.stdout)

    .. Note::
    
        The first line,
        ::
        
            >>> c = [0]

        initializes ``c`` to a list with a single element (0).

        lines 3-5,
        ::    
        
            ...     consec_group(
            ...         lambda l: (c[0], c.__setitem__(0, c[0] + int(len(l) == 0 or l[-1] != '\\'))), 
            ...         lambda d: filt(lambda l: l[: -1] if len(l) > 0 and l[-1] == '\\' else l) | sum_())  | \

        group paragraphs together using the :func:`dagpype.consec_group` filter. The key determining consecutive group is a counter incremented after each line not ending in ``\`` (see `Stupid lambda tricks`_); the pipe for a consecutive group strips the ``\`` character, and joins the lines.        

        .. _`Stupid lambda tricks`: http://www.p-nand-q.com/python/stupid_lambda_tricks.html

40. Append a line to the previous if it starts with an equal sign "=".
    ::

        >>> c = [0]
        >>> stream_lines('foo.txt') | \
        ...     consec_group(
        ...         lambda l: (c[0], c.__setitem__(0, c[0] + int(len(l) == 0 or l[0] != '='))), 
        ...         lambda d: filt(lambda l: l[1 :] if len(l) > 0 and l[0] == '=' else l) | sum_())  | \
        ...     to_stream(sys.stdout)

    .. Note::
        
        see `Item 39`_ for an explanation.        

43. Add a blank line after every five lines.
    ::

        >>> stream_lines('foo.txt') | enumerate_(1) | \
        ...     filt(lambda (n, l): l if n % 5 else (l + '\n')) | to_stream(sys.stdout)

    .. Note::    


        The first line,        
        ::
            >>> stream_lines('foo.txt') | enumerate_(1) | \
            
        transforms the lines of the files to tuples, whose first item is a running index starting from 1. 

-----------------------------------
Selective Printing of Certain Lines
-----------------------------------

44. Print the first 10 lines of a file.
    ::

        >>> print slice_(open('foo.txt'), 10) | to_list()

45. Print the first line of a file.
    ::

        >>> print stream_lines('foo.txt') | nth(0)
        
46. Print the last 10 lines of a file.
    ::
    
        >>> print stream_lines('foo.txt') | tail(10) | to_list()        

47. Print the last 2 lines of a file.
    ::
    
        >>> print stream_lines('foo.txt') | tail(2) | to_list()        

48. Print the last line of a file.
    ::

        >>> print stream_lines('foo.txt') | nth(-1)

49. Print next-to-the-last line of a file.
    ::

        >>> print stream_lines('foo.txt') | nth(-2)
        
50. Print only the lines that match a regular expression (emulates "grep").
    ::
      
        >>> stream_lines('foo.txt') | grep(re.compile(r'foo(.+?)bar')) | to_stream(sys.stdout)
        
51. Print only the lines that do not match a regular expression.
    ::
    
    >>> r = re.compile(r'foo(.+?)bar')
    >>> stream_lines('foo.txt') | filt(pre = lambda l: r.match(l) is None) | to_stream(sys.stdout)        
        
52. Print the line immediately before regexp, but not the line containing the regexp.
    ::
    
        >>> r = re.compile(r'foo(.+?)bar')
        >>> stream_lines('foo.txt') | to(lambda l: r.match(l)) | (nth(-2) | to_stream(sys.stdout))
        
53. Print the line immediately after regexp, but not the line containing the regexp.        
    ::
    
        >>> r = re.compile(r'foo(.+?)bar')
        >>> stream_lines('foo.txt') | from_(lambda l: r.match(l)) | (nth(1) | to_stream(sys.stdout))
        
54. Print one line before and after regexp. Also print the line matching regexp and its line number. (emulates "grep -A1 -B1").
    ::
    
        >>> r = re.compile(r'foo(.+?)bar')
        >>> stream_lines('foo.txt') | \
        ...     (to(lambda l: r.match(l)) | (nth(-2))) + \
        ...     (from_(lambda l: r.match(l)) | (nth(1))) + \
        ...     (enumerate_() | filt(pre = lambda (n, l): r.match(l)) | nth(0))

55. Grep for "AAA" and "BBB" and "CCC" in any order.
    ::
    
        >>> stream_lines('foo.txt') | \
        ...     filt(pre = lambda l: 'AAA' in l and 'BBB' in l and 'CCC' in l) | to_stream(sys.stdout)
        
56. Grep for "AAA" and "BBB" and "CCC" in that order.
    ::
    
        >>> stream_lines('foo.txt') | \
        ...     grep(re.compile(r'(.*)AAA(.*)BBB(.*)CCC(.*)')) | to_stream(sys.stdout)
        
57. Grep for "AAA" or "BBB", or "CCC".
    ::
    
        >>> stream_lines('foo.txt') | \
        ...     filt(pre = lambda l: 'AAA' in l or 'BBB' in l or 'CCC' in l) | to_stream(sys.stdout)
        
.. _`Item 58`:

58. Print a paragraph that contains "AAA". (Paragraphs are separated by blank lines).
    ::
    
        >>> stream_lines('foo.txt') | \
        ...     consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
        ...     filt(lambda ls: '\n'.join(ls) + '\n', pre = lambda ls: sum(['AAA' in l for l in ls]) > 0) | \
        ...     to_stream(sys.stdout)

    .. Note::    
        
        The second line, 
        ::    
        
            ...     consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
            
        groups paragraphs together using the :func:`dagpype.consec_group` filter. The key determining consecutive group is whether a line is empty (hence all lines in a paragraph will get ``True`` key, whereas separating lines will get a ``False`` key); the pipe for a consecutive group places the lines in a list.
        
        The third line,
        ::
        
            ...     filt(lambda ls: '\n'.join(ls) + '\n', pre = lambda ls: sum(['AAA' in l for l in ls]) > 0) | \
            
        filters lists: the precondition is that 'AAA' appears somewhere in the list, and the transformation function joins the lines using a newline.
        

59. Print a paragraph if it contains "AAA" and "BBB" and "CCC" in any order.
    ::
    
        >>> stream_lines('foo.txt') | \
        ...     consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
        ...     filt(
        ...         lambda ls: '\n'.join(ls) + '\n', 
        ...         pre = lambda ls: sum(['AAA' in l and 'BBB' in l and 'CCC' in l for l in ls]) > 0) | \
        ...     to_stream(sys.stdout)

    .. Note::
        
        see `Item 58`_ for an explanation.

60. Print a paragraph if it contains "AAA" or "BBB" or "CCC".
    ::
    
        >>> stream_lines('foo.txt') | \
        ...     consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
        ...     filt(
        ...         lambda ls: '\n'.join(ls) + '\n', 
        ...         pre = lambda ls: sum(['AAA' in l or 'BBB' in l or 'CCC' in l for l in ls]) > 0) | \
        ...     to_stream(sys.stdout)

    .. Note::
        
        see `Item 58`_ for an explanation.

61. Print only the lines that are 65 characters in length or more.
    ::
    
        >>> print stream_lines('foo.txt') | filt(pre = lambda l: len(l) >= 65) | to_list()        

62. Print only the lines that are less than 65 chars.
    ::
    
        >>> print stream_lines('foo.txt') | filt(pre = lambda l: len(l) < 65) | to_list()        

66. Beginning at line 3, print every 7th line.
    ::

        >>> print stream_lines('foo.txt') | slice_(3, None, 7) | to_list()


-----------------------------------
Selective Deletion of Certain Lines
-----------------------------------

69. Delete duplicate, consecutive lines from a file.
    ::

        >>> stream_lines('foo.txt') | \
        ...     consec_group(lambda l: l, lambda l: nth(0)) | to_stream(sys.stdout)

70. Delete duplicate, nonconsecutive lines from a file.
    ::

        >>> stream_lines('foo.txt') | \
        ...     group(lambda l: l, lambda l: nth(0)) | to_stream(sys.stdout)

71. Delete all lines except duplicate consecutive lines (emulates "uniq -d").
    ::

72. Delete the first 10 lines of a file.
    ::
    
        >>> stream_lines('foo.txt') | slice_(10, None) | to_stream(sys.stdout)
        
73. Delete the last line of a file.        
    ::
    
        >>> stream_lines('foo.txt') | skip(-1) | to_stream(sys.stdout)

74. Delete the last 2 lines of a file.        
    ::
    
        >>> stream_lines('foo.txt') | skip(-2) | to_stream(sys.stdout)

75. Delete the last 10 lines of a file.        
    ::
    
        >>> stream_lines('foo.txt') | skip(-10) | to_stream(sys.stdout)

76. Delete every 8th line.      
    ::
    
        >>> stream_lines('foo.txt') | enumerate_(1) | \
        ...     filt(pre: lambda (n, l): n % 8) | to_stream(sys.stdout)
        
77. Delete lines that match regular expression pattern.
    ::

        >>> r = re.compile(r'foo(.+?)bar')
        >>> stream_lines('foo.txt') | filt(pre = lambda l: r.match(l) is None) | to_stream(sys.stdout)

78. Delete all blank lines in a file (emulates "grep '.'").
    ::
    
        >>> stream_lines('foo.txt') | filt(pre = lambda l: l.strip()) | to_stream(sys.stdout)

