Stanza
Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism.
Install Plugin
deepnlpf --install stanza
Download English Model
python -c "import stanza; stanza.download('en')"
Model Language
Stanza supports multiple languages, see available ones here.
Pipeline
{
"lang": "en",
"tools": {
"stanza": {
"processors": [
"tokenize",
"mwt",
"pos",
"lemma",
"ner",
"depparse"
]
}
}
}
---
lang: en
tools:
- stanza:
processors:
- tokenize
- mwt
- pos
- lemma
- ner
- depparse
Example
from deepnlpf.pipeline import Pipeline
path_dataset = "<path_dir_dataset>"
path_pipeline = "<path_file>/pipeline.json"
nlp = Pipeline(_input=sentence, pipeline=path_pipeline, _output='file')
annotation = nlp.annotate()