nltk >= 3.0a
ftfy >= 3