Technology  /  NLP

💬 Natural Language Processing 40 guides · updated 2026

From tokenisation and embeddings to transformer-based language understanding — the NLP fundamentals that underpin every modern LLM.

Stanford CoreNLP

Stanford CoreNLP is a comprehensive Java-based NLP toolkit developed at Stanford University. It provides a wide range of linguistic annotations including tokenization, POS tagging, NER, coreference resolution, sentiment analysis, dependency parsing, and more — all in a single integrated pipeline.


Two Ways to Use CoreNLP from Python

Option 1: Stanza — Stanford’s official Python NLP library (recommended). Implements the same algorithms natively in Python.

Option 2: CoreNLP Server + py-corenlp — Run the Java CoreNLP server and call it from Python via REST API.


Stanza is Stanford’s pure-Python NLP library with the same models and supports 70+ languages:

Terminal window
pip install stanza
import stanza
# Download English models
stanza.download('en')
nlp = stanza.Pipeline('en', processors='tokenize,mwt,pos,lemma,depparse,ner')
text = "Apple CEO Tim Cook announced the new Mac Studio at the company's Cupertino headquarters in March 2025."
doc = nlp(text)
# Tokens and POS
print("=== POS Tags ===")
for sent in doc.sentences:
for word in sent.words:
print(f"{word.text:<20} pos: {word.pos:<8} lemma: {word.lemma}")
# Named Entities
print("\n=== Named Entities ===")
for ent in doc.ents:
print(f"{ent.text:<25} type: {ent.type}")
# Dependency Parse
print("\n=== Dependencies ===")
for sent in doc.sentences:
for word in sent.words:
head = sent.words[word.head - 1].text if word.head > 0 else "ROOT"
print(f"{word.text:<20} deprel: {word.deprel:<10} head: {head}")

Multilingual Processing with Stanza

import stanza
# Process multiple languages with the same interface
languages = {
'en': "The transformer model achieved state-of-the-art results.",
'fr': "Le modèle de transformateur a obtenu des résultats de pointe.",
'de': "Das Transformer-Modell erzielte modernste Ergebnisse.",
'zh': "变换器模型取得了最先进的结果。"
}
for lang_code, text in languages.items():
stanza.download(lang_code, verbose=False)
nlp = stanza.Pipeline(lang_code, verbose=False)
doc = nlp(text)
tokens = [word.text for sent in doc.sentences for word in sent.words]
print(f"{lang_code}: {tokens}")

Coreference Resolution

Coreference resolution identifies when multiple mentions in a text refer to the same entity — one of CoreNLP’s most distinctive features:

Terminal window
# Run CoreNLP server (requires Java 8+)
# Download from: https://stanfordnlp.github.io/CoreNLP/
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-port 9000 -timeout 15000
# pip install pycorenlp
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
text = """
Mary said she would finish the NLP project by Friday.
She mentioned that her team had already completed the data preprocessing step.
"""
result = nlp.annotate(text, properties={
'annotators': 'tokenize,ssplit,pos,lemma,ner,dcoref',
'outputFormat': 'json',
'timeout': 15000
})
print("Coreference chains:")
for chain_id, chain in result['corefs'].items():
mentions = [(m['text'], m['sentNum'], m['position'][1]) for m in chain]
print(f"Chain {chain_id}: {mentions}")
# Chain 0: [('Mary', 1, 1), ('she', 1, 3), ('She', 2, 1), ('her', 2, 4)]
# Chain 1: [('the NLP project', 1, 6), ('the data preprocessing step', 2, 8)]

Sentiment Analysis

CoreNLP’s sentiment analyzer scores each sentence on a 5-point scale (Very Negative to Very Positive):

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
reviews = [
"The new model is absolutely brilliant and works flawlessly.",
"Terrible performance, crashes constantly, completely unusable.",
"The library works fine, documentation could be better."
]
sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
for review in reviews:
result = nlp.annotate(review, properties={
'annotators': 'sentiment',
'outputFormat': 'json'
})
for sentence in result["sentences"]:
score = int(sentence["sentimentValue"])
print(f"[{sentiment_map[score]}] {review}")

CoreNLP Annotators Reference

AnnotatorWhat it doesRequires
tokenizeSplit text into tokens
ssplitSentence splittingtokenize
posPOS taggingtokenize, ssplit
lemmaLemmatizationpos
nerNamed entity recognitionpos, lemma
depparseDependency parsingpos
coref/dcorefCoreference resolutionner, depparse
sentimentSentiment per sentencepos
openieOpen information extractiondepparse
kbpRelation extractionner

CoreNLP vs spaCy vs Stanza

FeatureCoreNLP (Java)spaCyStanza
LanguageJava (Python via server)PythonPython
SpeedSlowFastMedium
CoreferenceExcellentLimitedLimited
Languages620+70+
Ease of useComplexEasyEasy
AccuracyHighHighHigh
Streaming corporaNoYesYes

CoreNLP is the right choice when you need coreference resolution or relation extraction that other libraries lack. For standard NLP tasks, Stanza provides the same Stanford algorithms with a much simpler Python interface.