pdfminer
pyquery
lxml
roman