Dependency Parsing in NLP
Dependency parsing maps the grammatical relationships between words in a sentence. Every word (except the root) connects to a head word with a labeled relationship — subject, object, modifier, and more.
How Dependency Parsing Works
In a dependency parse, the sentence has a root verb, and every other word attaches to a head:
"The researcher published a new paper on transformers."
published (ROOT) / | \researcher paper .(nsubj) (dobj) / \ a on (det) (prep) | transformers (pobj) / new (amod)Each arrow shows a dependency relation: nsubj (nominal subject), dobj (direct object), det (determiner), prep (prepositional modifier), pobj (object of preposition), amod (adjectival modifier).
Core Dependency Labels
| Label | Meaning | Example |
|---|---|---|
nsubj | Nominal subject | ”Alice runs” |
nsubjpass | Passive subject | ”The paper was written” |
dobj | Direct object | ”She read the book” |
iobj | Indirect object | ”He gave her a gift” |
prep | Prepositional modifier | ”She works at Google” |
pobj | Object of preposition | ”at Google” |
amod | Adjectival modifier | ”a new model” |
advmod | Adverbial modifier | ”runs quickly” |
det | Determiner | ”the model” |
compound | Compound noun | ”language model” |
conj | Conjunction | ”Apple and Google” |
ROOT | Root verb of the sentence | ”She published…” |
Dependency Parsing with spaCy
import spacynlp = spacy.load("en_core_web_sm")
text = "OpenAI released GPT-5 which significantly outperformed previous language models."doc = nlp(text)
for token in doc: print(f"{token.text:<20} dep: {token.dep_:<12} head: {token.head.text}")
# OpenAI dep: nsubj head: released# released dep: ROOT head: released# GPT-5 dep: dobj head: released# which dep: nsubj head: outperformed# significantly dep: advmod head: outperformed# outperformed dep: relcl head: GPT-5# previous dep: amod head: models# language dep: compound head: models# models dep: dobj head: outperformedVisualizing Dependency Trees
from spacy import displacy
doc = nlp("The model efficiently handles long-context reasoning tasks.")displacy.render(doc, style="dep", jupyter=True, options={"distance": 120})# For a standalone script:# displacy.serve(doc, style="dep")Extracting Subject-Verb-Object Triples
import spacynlp = spacy.load("en_core_web_sm")
def get_svo_triples(text): doc = nlp(text) triples = []
for token in doc: if token.pos_ == "VERB": subjects = [w for w in token.lefts if w.dep_ in ("nsubj", "nsubjpass")] objects = [w for w in token.rights if w.dep_ in ("dobj", "pobj", "attr")]
for subj in subjects: for obj in objects: triples.append({ "subject": subj.text, "verb": token.lemma_, "object": obj.text })
return triples
texts = [ "Google acquired YouTube in 2006 for $1.65 billion.", "Anthropic trained Claude using constitutional AI methods.", "Researchers published findings that challenged existing benchmarks."]
for t in texts: print(get_svo_triples(t))Navigating the Dependency Tree
spaCy provides helpers to traverse the parse tree:
doc = nlp("The startup's innovative NLP platform attracted significant investor attention.")
for token in doc: if token.dep_ == "ROOT": root = token print(f"Root verb: {root.text}") print(f"Subtree: {[t.text for t in root.subtree]}") print(f"Left children: {[t.text for t in root.lefts]}") print(f"Right children: {[t.text for t in root.rights]}")Multilingual Dependency Parsing with Stanza
Stanza supports Universal Dependencies across 70+ languages:
import stanzastanza.download('en')
nlp_stanza = stanza.Pipeline('en')doc = nlp_stanza("She quickly analyzed the complex dataset.")
for sent in doc.sentences: for word in sent.words: head = sent.words[word.head - 1].text if word.head > 0 else "ROOT" print(f"{word.text:<15} deprel: {word.deprel:<10} head: {head}")Real-World Use Cases
Knowledge graph construction — extract entity relationships at scale from news or scientific articles using SVO triples.
Coreference resolution — track which pronoun refers to which noun by following dependency paths.
Semantic role labeling — extend dependency parses to identify “who did what to whom, when, and where.”
RAG preprocessing — annotating documents with dependency-derived facts improves structured retrieval in knowledge-intensive QA systems.
Document-level relation extraction — large language models use attention that implicitly captures dependency-like relationships, but explicit parses help with interpretability and structured pipelines.