Flair NLP

Flair is a Python NLP framework from Zalando Research known for its contextual string embeddings — character-level language model embeddings that capture context and handle rare words, misspellings, and subword morphology. Flair achieves state-of-the-art results on NER and sequence labeling tasks.

Installation

pip install flair

The Key Innovation: Contextual String Embeddings

Traditional word embeddings give “bank” the same vector in every context. Flair’s embeddings are character-level language model representations — the vector for a word depends on the characters surrounding it in the specific sentence. This makes them sensitive to capitalization, context, and morphology.

Named Entity Recognition with Flair

from flair.data import Sentence
from flair.models import SequenceTagger

# Load pre-trained NER model (downloads automatically on first run)
tagger = SequenceTagger.load("ner")  # English NER (CoNLL-2003)

text = "Anthropic, founded by Dario Amodei and Daniela Amodei in San Francisco, released Claude 3 in March 2024."
sentence = Sentence(text)

tagger.predict(sentence)

print("Named Entities:")
for entity in sentence.get_spans("ner"):
    print(f"  {entity.text:<30} [{entity.tag}] score: {entity.score:.4f}")

# Anthropic                      [ORG] score: 0.9994
# Dario Amodei                   [PER] score: 0.9986
# Daniela Amodei                 [PER] score: 0.9981
# San Francisco                  [LOC] score: 0.9997
# Claude 3                       [MISC] score: 0.9842

Available NER Models

from flair.models import SequenceTagger

# Standard English NER (CoNLL-2003: PER, ORG, LOC, MISC)
tagger_standard = SequenceTagger.load("ner")

# Large model (higher accuracy)
tagger_large = SequenceTagger.load("ner-large")

# Fast model (lower latency)
tagger_fast = SequenceTagger.load("ner-fast")

# Multilingual NER (supports 20+ languages)
tagger_multi = SequenceTagger.load("ner-multi")

# Fine-grained NER (18 entity types — dates, events, products, etc.)
tagger_ontonotes = SequenceTagger.load("ner-ontonotes-large")

POS Tagging

from flair.data import Sentence
from flair.models import SequenceTagger

tagger = SequenceTagger.load("pos")

sentence = Sentence("Large language models generate contextually appropriate text responses.")
tagger.predict(sentence)

for token in sentence.tokens:
    print(f"{token.text:<25} {token.get_label('pos').value}")

# Large                     JJ
# language                  NN
# models                    NNS
# generate                  VBP
# contextually              RB
# appropriate               JJ
# text                      NN
# responses                 NNS

Stacked Embeddings

Flair’s superpower is the ability to stack multiple embedding types to combine their strengths:

from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.data import Sentence

# Combine GloVe + Flair contextual + FastText
stacked_embeddings = StackedEmbeddings([
    WordEmbeddings('glove'),               # GloVe global context
    FlairEmbeddings('news-forward'),       # Forward language model
    FlairEmbeddings('news-backward'),      # Backward language model
])

sentence = Sentence("The NLP model achieved remarkable benchmark performance.")
stacked_embeddings.embed(sentence)

for token in sentence:
    print(f"{token.text:<20} embedding dim: {token.embedding.shape}")
    # embedding dim: (2348,) — GloVe(100) + Flair-fwd(1024) + Flair-bwd(1024) + ...

Sentence-Level Embeddings

from flair.embeddings import SentenceTransformerDocumentEmbeddings
from flair.data import Sentence

# Use sentence-transformers through Flair
sentence_embedder = SentenceTransformerDocumentEmbeddings('all-MiniLM-L6-v2')

sentences = [
    Sentence("Flair achieves excellent results on NER benchmarks."),
    Sentence("Named entity recognition identifies persons and organizations."),
    Sentence("Pizza dough needs to ferment overnight in the refrigerator."),
]

for sent in sentences:
    sentence_embedder.embed(sent)
    print(f"Embedding shape: {sent.embedding.shape}")  # (384,)

Text Classification with Flair

from flair.models import TextClassifier
from flair.data import Sentence

# Load sentiment classifier
classifier = TextClassifier.load("sentiment")

texts = [
    "This NLP library is incredibly powerful and easy to use!",
    "The documentation is incomplete and the examples are confusing.",
    "The performance is adequate for most basic NLP tasks."
]

for text in texts:
    sentence = Sentence(text)
    classifier.predict(sentence)
    label = sentence.labels[0].value
    score = sentence.labels[0].score
    print(f"[{label} {score:.3f}] {text[:55]}")

Training a Custom NER Model

from flair.data import Corpus
from flair.datasets import CONLL_03
from flair.embeddings import WordEmbeddings, FlairEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# Load training corpus (CoNLL-2003 format)
corpus: Corpus = CONLL_03()

# Define tag type
tag_type = 'ner'
tag_dictionary = corpus.make_label_dictionary(label_type=tag_type)

# Define embeddings
embeddings = StackedEmbeddings([
    WordEmbeddings('glove'),
    FlairEmbeddings('news-forward-fast'),
    FlairEmbeddings('news-backward-fast'),
])

# Create sequence tagger
tagger = SequenceTagger(
    hidden_size=256,
    embeddings=embeddings,
    tag_dictionary=tag_dictionary,
    tag_type=tag_type,
    use_crf=True    # Conditional Random Field output layer
)

# Train
trainer = ModelTrainer(tagger, corpus)
trainer.train(
    base_path='./ner-model',
    learning_rate=0.1,
    mini_batch_size=32,
    max_epochs=10
)

Flair vs spaCy vs Hugging Face

Aspect	Flair	spaCy	Hugging Face
NER accuracy	Excellent	High	Highest
Contextual embeddings	Yes (char-level)	Via trf	Yes (BERT-based)
Speed	Slower	Fast	Moderate
Stacking	Yes	No	No
Ease of use	Good	Excellent	Good
Model hub	Yes	Yes	Very large

Flair is the right choice when you need the highest possible NER accuracy without writing custom model code, or when you want to experiment with stacked embeddings. For speed-critical production systems, spaCy with its transformer models provides a good alternative.