Technology  /  NLP

💬 Natural Language Processing 40 guides · updated 2026

From tokenisation and embeddings to transformer-based language understanding — the NLP fundamentals that underpin every modern LLM.

Part-of-Speech Tagging in NLP

Part-of-speech (POS) tagging assigns a grammatical label — noun, verb, adjective, adverb, and so on — to each word in a sentence. It’s one of the foundational steps in understanding the structure of language.


Why POS Tagging Matters

The same word can mean different things depending on its grammatical role:

Without POS tags, a system can’t disambiguate these. Downstream tasks — lemmatization, named entity recognition, syntax parsing, information extraction — all benefit from accurate POS labels.


Common Tag Sets

Penn Treebank POS Tags (most common in English NLP):

NN Noun, singular "dog", "city"
NNS Noun, plural "dogs", "cities"
NNP Proper noun, singular "London", "Alice"
VB Verb, base form "run", "think"
VBD Verb, past tense "ran", "thought"
VBG Verb, gerund/participle "running", "thinking"
JJ Adjective "fast", "beautiful"
JJR Adjective, comparative "faster", "prettier"
RB Adverb "quickly", "very"
DT Determiner "the", "a", "an"
IN Preposition "in", "on", "of"
CC Coordinating conjunction "and", "but", "or"
PRP Personal pronoun "I", "he", "they"

POS Tagging with NLTK

import nltk
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize
from nltk import pos_tag
text = "LLMs have revolutionized how developers build language-aware applications."
tokens = word_tokenize(text)
tagged = pos_tag(tokens)
print(tagged)
# [('LLMs', 'NNS'), ('have', 'VBP'), ('revolutionized', 'VBN'),
# ('how', 'WRB'), ('developers', 'NNS'), ('build', 'VBP'),
# ('language-aware', 'JJ'), ('applications', 'NNS'), ('.', '.')]

Extract only the nouns:

nouns = [word for word, tag in tagged if tag.startswith('NN')]
print(nouns) # ['LLMs', 'developers', 'applications']

POS Tagging with spaCy

spaCy provides both the Penn Treebank tag (token.tag_) and a simpler universal tag (token.pos_):

import spacy
nlp = spacy.load("en_core_web_sm")
text = "The new model released in 2025 outperforms its predecessors significantly."
doc = nlp(text)
print(f"{'Token':<15} {'POS':<8} {'Tag':<8} {'Explanation'}")
print("-" * 55)
for token in doc:
print(f"{token.text:<15} {token.pos_:<8} {token.tag_:<8} {spacy.explain(token.tag_)}")
# Token POS Tag Explanation
# -----------------------------------------------------------
# The DET DT determiner
# new ADJ JJ adjective
# model NOUN NN noun, singular or mass
# released VERB VBN verb, past participle
# 2025 NUM CD cardinal number
# outperforms VERB VBZ verb, 3rd person singular present
# predecessors NOUN NNS noun, plural
# significantly ADV RB adverb

Universal POS Tags

When working with multilingual models, universal POS tags offer a consistent set across languages:

NOUN ADJ VERB ADV PRON DET ADP NUM
CONJ PART PUNCT SYM X INTJ

spaCy’s token.pos_ returns these universal tags.


Real-World Applications

Keyword extraction — filter for nouns and noun phrases:

keywords = [token.text for token in doc
if token.pos_ in ('NOUN', 'PROPN') and not token.is_stop]

Sentiment-aware adjective extraction:

sentiments = [(token.text, token.pos_) for token in doc if token.pos_ == 'ADJ']

Coreference — identify pronouns to resolve:

pronouns = [token.text for token in doc if token.pos_ == 'PRON']

Grammar-based chunking — extract noun phrases:

for chunk in doc.noun_chunks:
print(chunk.text, "→", chunk.root.pos_)

POS Tagging Accuracy in 2025

Modern neural taggers built into spaCy, Stanza, and Hugging Face pipelines achieve 97–99% accuracy on standard English corpora. Accuracy drops when handling:

For specialized domains, fine-tuning a small transformer model on in-domain annotated data produces the best results.