Metadata-Version: 1.1
Name: xmlpumpkin
Version: 0.1
Summary: CaboCha output-XML accessor
Home-page: https://github.com/drowse314-dev-ymat/xmlpumpkin
Author: ymat
Author-email: drowse314@gmail.com
License: BSD
Description: XMLPumpkin
        ==========
        
        Parse XMLs from `CaboCha
        <http://code.google.com/p/cabocha/>`_ and provides simple tree accessors.
        
        
        Usage
        -----
        
        Expected usages are focused on chunk surfaces and dependency links::
        
            >>> aisansan = xmlpumpkin.parse_to_tree(
            ...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
            ... )
            >>> len(aisansan.chunks)
            8
            >>> print(aisansan.root.surface)
            流したりして
            >>> print(aisansan.root.func_surface)
            て
            >>> for dep in aisansan.root.linked:
            ...     print(dep.surface)
            ...
            降って
            涙を
        
        You need CaboCha in your path, or shortly with prepared XML::
        
            >>> tree = xmlpumpkin.Tree(xml_as_unicode)
        
        Should you need an easy interface from Python to CaboCha::
        
            >>> from xmlpumpkin import cabocha
            >>> print(cabocha.txttree(
            ...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
            ... ))
                愛燦々と-----D
                      この-D |
                        身に-D
                        降って-------D
                        心密かな---D |
                          うれしい-D |
                                涙を-D
                          流したりして
            EOS
            >>> print(cabocha.as_xml(
            ...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
            ... ))
            <sentence>
              ...
            </sentence>
        
        All I/Os are unicodes!
        If encodings other than UTF-8 is preferred, directly modify following constants::
        
            >>> import xmlpumpkin.runner
            >>> xmlpumpkin.runner.CABOCHA_ENCODING = 'SJIS'
            >>>
            >>> import xmlpumpkin.tree
            >>> xmlpumpkin.tree.XML_ENCODING = 'SJIS'
        
        
        Properties
        ----------
        
        Not enough but a few properties are provided via `Tree` and `Chunk` objects.
        
        `class xmlpumpkin.Tree(cabocha_xml)`
            * chunks - tuple of chunks
            * root - root (not depending on any chunks) Chunk object
            * chunk_by_id(chunk_id) - get Chunk object by its id generated by CaboCha
            * _element - origin XML as lxml Element object
        
        `class xmlpumpkin.Chunk(element, parent)`
            * id - chunk id
            * link_to_id - its depending chunk id
            * linked_from_ids - tuple of chunk id depending to this chunk
            * func_id - functional token id of this chunk
            * dep - its depending Chunk object
            * linked - list of all Chunk objects depending to this chunk
            * surface - surface of this chunk
            * func_surface - surface of this chunk's functional token
            * _tokens() - its containing tokens as lxml Element objects
        
Keywords: cabocha nlp xml parsing
Platform: any
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
