Metadata-Version: 1.1
Name: yalign
Version: 0.1.1
Summary: A tool to align comparable corpora
Home-page: https://github.com/machinalis/yalign
Author: Rafael Carrascosa, Gonzalo Garcia Berrotaran, Andrew Vine
Author-email: rafacarrascosa@gmail.com
License: UNKNOWN
Description: About
        =====
        
        Yalign is a tool for extracting parallel sentences from comparable corpora.
        
        `Statistical Machine Translation <http://en.wikipedia.org/wiki/Statistical_machine_translation>`_ relies on `parallel corpora <http://en.wikipedia.org/wiki/Parallel_text>`_ (eg.. `europarl <http://www.statmt.org/europarl/>`_) for training translation models. However these corpora are limited and take time to create. Yalign is designed to automate this process by finding sentences that are close translation matches from `comparable corpora <http://www.statmt.org/survey/Topic/ComparableCorpora>`_. This opens up avenues for harvesting parallel corpora from sources like translated documents and the web.
        
        Installation
        ============
        
        Yalign requires that you install `scikit-learn <http://scikit-learn.org/stable/install.html>`_.
        
        After that you can install Yalign from PyPi via pip:
        
        ::
        
            sudo pip install yalign
        
        Usage
        =====
        
        Firstly we need to download and unpack the english to spanish model.
        
        ::
        
            wget http://yalign.machinalis.com/models/0.1/en-es.tar.gz
            tar -xvzf en-es.tar.gz 
        
        Now we can use the **yalign-align** script along with the english to spanish model to align two web pages.
        
        ::
        
            yalign-align en-es http://en.wikipedia.org/wiki/Antiparticle http://es.wikipedia.org/wiki/Antipart%C3%ADcula
        
        Yalign is not limited to any one language pair. By creating your own models you can align any two languages. For more details on how to use yalign and on yalign's implementation please `read the docs <http://yalign.readthedocs.org/>`_.
        
        Yalign is a `Machinalis <http://www.machinalis.com>`_ project.
        You can view our other open source contributions `here <https://github.com/machinalis/>`_.
        
        **The Yalign Team:**
        
        | Andrew Vine
        | Gonzalo García Berrotarán
        | Rafael Carrascosa
        | Elías Andrawos
        | Laura Alonso Alemany
        
Keywords: align,corpus,corpus alignment
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Human Machine Interfaces
Classifier: Topic :: Scientific/Engineering :: Interface Engine/Protocol Translator
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
