Metadata-Version: 1.1
Name: webstruct
Version: 0.2
Summary: A library for creating statistical NER systems that work on HTML data
Home-page: https://github.com/scrapinghub/webstruct
Author: Mikhail Korobov, Terry Peng
Author-email: kmike84@gmail.com, pengtaoo@gmail.com
License: MIT
Description: Webstruct
        =========
        
        Webstruct is a library for creating statistical NER_ systems that work
        on HTML data, i.e. a library for building tools that extract named
        entities (addresses, organization names, open hours, etc) from webpages.
        
        Unlike most NER systems, webstruct works on HTML data, not only
        on text data. This allows to define features that use HTML structure,
        and also to embed annotation results back into HTML.
        
        Read the docs_ for more info.
        
        License is MIT.
        
        .. _docs: http://webstruct.readthedocs.org/en/latest/
        .. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition
        
        Contributing
        ------------
        
        * Source code: https://github.com/scrapinghub/webstruct
        * Bug tracker: https://github.com/scrapinghub/webstruct/issues
        
        To run tests, make sure nose_ is installed, then run ``runtests.sh`` script.
        
        .. _nose: https://github.com/nose-devs/nose
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Requires: sklearn
Requires: lxml
