Stanza · DeepNLPF

Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism.

Install Plugin

deepnlpf --install stanza

Download English Model

python -c "import stanza; stanza.download('en')"

Model Language

Stanza supports multiple languages, see available ones here.

Pipeline

Json

yaml

{
    "lang": "en",
    "tools": {
        "stanza": {
            "processors": [
                "tokenize",
                "mwt",
                "pos",
                "lemma",
                "ner",
                "depparse"
            ]
        }
    }
}

---
lang: en
tools:
- stanza:
    processors:
    - tokenize
    - mwt
    - pos
    - lemma
    - ner
    - depparse

Example

python

from deepnlpf.pipeline import Pipeline

path_dataset = "<path_dir_dataset>"
path_pipeline = "<path_file>/pipeline.json"

nlp = Pipeline(_input=sentence, pipeline=path_pipeline, _output='file')
annotation = nlp.annotate()