% Mailing List Filter
% Yang Zhang
% gmail:yaaang

[download 0.1 egg] | [download 0.1 src tgz] | [PyPI page] | [browse svn] | [home page]

[download 0.1 egg]: http://pypi.python.org/packages/2.5/p/mailing-list-filter/mailing_list_filter-0.1-py2.5.egg
[download 0.1 src tgz]: http://pypi.python.org/packages/source/p/mailing-list-filter/mailing-list-filter-0.1.tar.gz
[PyPI page]: http://pypi.python.org/pypi/mailing-list-filter/
[more downloads]: http://code.google.com/p/assorted/downloads/list
[browse svn]: http://assorted.svn.sourceforge.net/viewvc/assorted/mailing-list-filter/trunk/
[home page]: http://assorted.sf.net/mailing-list-filter/


Overview
--------

I have a Gmail account that I use for subscribing to and posting to mailing
lists.  When dealing with high-volume mailing lists, I am typically only
interested in those threads that I participated in.  This is a simple filter
for starring and marking unread any messages belonging to such threads.

This is accomplished by looking at the set of messages that were either sent
from me or explicitly addressed to me.  From this "root set" of messages, we
can use the `Message-ID`, `References`, and `In-Reply-To` headers to determine
threads, and thus the other messages that we care about.

I have found this to be more accurate than my two original approaches.  I used
to have Gmail filters that starred/marked unread any messages containing my
name anywhere in the message.  This worked OK since my name is not too common,
but it produced some false positives (not that bad, just unstar messages) and
some false negatives (much harder to detect).

A second approach is to tag all subjects with some signature string.  This
usually is fine, but it doesn't work when you did not start the thread (and
thus determine the subject).  You can try to change the subject line, but this
is (1) poor netiquette, (2) unreliable because your reply may not register in
other mail clients as being part of the same thread (and thus other
participants may miss your reply), and (3) unreliable because replies might not
directly referencing your post (either intentionally or unintentionally).  It
also fails when others change the subject.  Finally, this approach is
unsatisfactory because it pollutes subject lines, and it essentially replicates
exactly what Message-ID was intended for.

This script is not intended to be a replacement for the Gmail filters. I still
keep those active so that I can get immediate first-pass filtering. I execute
this script on a daily basis to perform second-pass filtering/unfiltering to
catch those false negatives that may have been missed.

Setup
-----

Requirements:

- [argparse](http://argparse.python-hosting.com/)
- [Python Commons](http://assorted.sf.net/python-commons/) 0.4
- [path](http://www.jorendorff.com/articles/python/path/)

Install the program using the standard `setup.py` program.

Future Work Ideas
-----------------

- Currently, we assume that the server specification points to a mailbox
  containing all messages (both sent and received), and a message is determined
  to have been sent by you by looking at the From: header field. This works
  well with Gmail. An alternative strategy is to look through two folders, one
  that's the Inbox and one that's the Sent mailbox, and treat all messages in
  Sent as having been sent by you. This is presumably how most other IMAP
  servers work.

- Implement incremental maintenance of local cache.

- Accept custom operations for filtered/unfiltered messages
  (trashing/untrashing, labeling/unlabeling, etc.).

- Refactor the message fetching/management part out into its own library.

License
-------

Mailing List Filter is released under the [PSF License], the same as Python's license.

[PSF License]: http://www.python.org/psf/license.html

Contact
-------

Copyright 2008 [Yang Zhang].  
All rights reserved.

Back to [assorted.sf.net].

[Yang Zhang]: http://www.mit.edu/~y_z/
[assorted.sf.net]: http://assorted.sourceforge.net/

