Stanford CoreNLP

Stanford CoreNLP is a comprehensive Java-based NLP toolkit developed at Stanford University. It provides a wide range of linguistic annotations including tokenization, POS tagging, NER, coreference resolution, sentiment analysis, dependency parsing, and more — all in a single integrated pipeline.

Two Ways to Use CoreNLP from Python

Option 1: Stanza — Stanford’s official Python NLP library (recommended). Implements the same algorithms natively in Python.

Option 2: CoreNLP Server + py-corenlp — Run the Java CoreNLP server and call it from Python via REST API.

Option 1: Stanza (Recommended)

Stanza is Stanford’s pure-Python NLP library with the same models and supports 70+ languages:

pip install stanza

import stanza

# Download English models
stanza.download('en')

nlp = stanza.Pipeline('en', processors='tokenize,mwt,pos,lemma,depparse,ner')

text = "Apple CEO Tim Cook announced the new Mac Studio at the company's Cupertino headquarters in March 2025."
doc = nlp(text)

# Tokens and POS
print("=== POS Tags ===")
for sent in doc.sentences:
    for word in sent.words:
        print(f"{word.text:<20} pos: {word.pos:<8} lemma: {word.lemma}")

# Named Entities
print("\n=== Named Entities ===")
for ent in doc.ents:
    print(f"{ent.text:<25} type: {ent.type}")

# Dependency Parse
print("\n=== Dependencies ===")
for sent in doc.sentences:
    for word in sent.words:
        head = sent.words[word.head - 1].text if word.head > 0 else "ROOT"
        print(f"{word.text:<20} deprel: {word.deprel:<10} head: {head}")

Multilingual Processing with Stanza

import stanza

# Process multiple languages with the same interface
languages = {
    'en': "The transformer model achieved state-of-the-art results.",
    'fr': "Le modèle de transformateur a obtenu des résultats de pointe.",
    'de': "Das Transformer-Modell erzielte modernste Ergebnisse.",
    'zh': "变换器模型取得了最先进的结果。"
}

for lang_code, text in languages.items():
    stanza.download(lang_code, verbose=False)
    nlp = stanza.Pipeline(lang_code, verbose=False)
    doc = nlp(text)
    tokens = [word.text for sent in doc.sentences for word in sent.words]
    print(f"{lang_code}: {tokens}")

Coreference Resolution

Coreference resolution identifies when multiple mentions in a text refer to the same entity — one of CoreNLP’s most distinctive features:

# Run CoreNLP server (requires Java 8+)
# Download from: https://stanfordnlp.github.io/CoreNLP/
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
  -port 9000 -timeout 15000

# pip install pycorenlp
from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

text = """
Mary said she would finish the NLP project by Friday.
She mentioned that her team had already completed the data preprocessing step.
"""

result = nlp.annotate(text, properties={
    'annotators': 'tokenize,ssplit,pos,lemma,ner,dcoref',
    'outputFormat': 'json',
    'timeout': 15000
})

print("Coreference chains:")
for chain_id, chain in result['corefs'].items():
    mentions = [(m['text'], m['sentNum'], m['position'][1]) for m in chain]
    print(f"Chain {chain_id}: {mentions}")

# Chain 0: [('Mary', 1, 1), ('she', 1, 3), ('She', 2, 1), ('her', 2, 4)]
# Chain 1: [('the NLP project', 1, 6), ('the data preprocessing step', 2, 8)]

Sentiment Analysis

CoreNLP’s sentiment analyzer scores each sentence on a 5-point scale (Very Negative to Very Positive):

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

reviews = [
    "The new model is absolutely brilliant and works flawlessly.",
    "Terrible performance, crashes constantly, completely unusable.",
    "The library works fine, documentation could be better."
]

sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}

for review in reviews:
    result = nlp.annotate(review, properties={
        'annotators': 'sentiment',
        'outputFormat': 'json'
    })
    for sentence in result["sentences"]:
        score = int(sentence["sentimentValue"])
        print(f"[{sentiment_map[score]}] {review}")

CoreNLP Annotators Reference

Annotator	What it does	Requires
`tokenize`	Split text into tokens	—
`ssplit`	Sentence splitting	tokenize
`pos`	POS tagging	tokenize, ssplit
`lemma`	Lemmatization	pos
`ner`	Named entity recognition	pos, lemma
`depparse`	Dependency parsing	pos
`coref`/`dcoref`	Coreference resolution	ner, depparse
`sentiment`	Sentiment per sentence	pos
`openie`	Open information extraction	depparse
`kbp`	Relation extraction	ner

CoreNLP vs spaCy vs Stanza

Feature	CoreNLP (Java)	spaCy	Stanza
Language	Java (Python via server)	Python	Python
Speed	Slow	Fast	Medium
Coreference	Excellent	Limited	Limited
Languages	6	20+	70+
Ease of use	Complex	Easy	Easy
Accuracy	High	High	High
Streaming corpora	No	Yes	Yes

CoreNLP is the right choice when you need coreference resolution or relation extraction that other libraries lack. For standard NLP tasks, Stanza provides the same Stanford algorithms with a much simpler Python interface.