Quick Answer
Natural language processing (NLP) is the area of AI that deals with human language: reading, writing, translating, summarising, classifying, and generating text. Every product that understands or produces language uses NLP: search engines, chatbots, translation tools, voice assistants, email spam filters, and document summarisers. Large language models like ChatGPT and Claude are the most capable NLP systems available in 2026.
A figure surrounded by flowing red data ribbons in a dark space: an NLP system processes continuous streams of language data, extracting meaning from text at scale.
NLP systems process language as a continuous stream of tokens: every word, every sentence, every document flowing through models that extract structure and meaning from unstructured text.

Where you encounter NLP every day

NLP is one of the most widely deployed AI technologies. You interact with it constantly:

  • Search: Google interprets the meaning of your query, not just the keywords
  • Email: Gmail’s spam filter, smart reply suggestions, and category sorting
  • Translation: Google Translate, DeepL, and web page auto-translation
  • Voice assistants: Siri, Alexa, and Google Assistant all understand spoken (then transcribed) language
  • Customer support: Chatbots that understand your question and route you to the right team
  • Document processing: PDF extraction, invoice parsing, contract review
  • Social media: Content moderation, trending topic detection, ad targeting by interest

Core NLP tasks

Understanding text
Classification Sentiment analysis Named entity recognition Intent detection Information extraction
Transforming text
Translation Summarisation Paraphrasing Text cleaning and normalisation
Generating text
Question answering Text generation (LLMs) Dialogue systems Document drafting
Searching and matching
Semantic search Document similarity Embedding-based retrieval (RAG)

How NLP works: from words to numbers

Computers process numbers, not words. The first step of any NLP pipeline is converting text into numbers that capture meaning.

Tokenisation: Split text into tokens (words or word fragments)

"AI-solutions.wiki is useful" → ["AI", "-", "solutions", ".", "wiki", "is", "useful"]

Embeddings: Convert each token to a vector (list of numbers) that encodes meaning

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = [
    "The meeting was cancelled",
    "The appointment was called off",  # same meaning, different words
    "The product launch went well",    # different meaning
]

embeddings = model.encode(sentences)
# Embeddings for sentence 1 and 2 will be mathematically close
# Embedding for sentence 3 will be far from both

The key insight: words with similar meanings end up close together in vector space. “King” and “Queen” are near each other. “Bank” (financial) and “Bank” (river) are in different locations depending on context.

From classical NLP to modern LLMs

NLP methods have evolved dramatically:

EraApproachExample
1990sRule-based: hand-written grammars and dictionariesEarly spell checkers
2000sStatistical ML: count-based models (TF-IDF, n-grams)Naive Bayes spam filter
2010sWord embeddings + RNNsWord2Vec, early chatbots
2017+TransformersBERT (classification), GPT (generation)
2022+Instruction-following LLMsChatGPT, Claude, Gemini

Modern LLMs handle most classical NLP tasks (classification, summarisation, extraction, translation) as part of general instruction following. You no longer need a separate specialised model for each task; you prompt a single large model.

A practical NLP pipeline

python
from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY")

def analyse_customer_email(email_text):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """Analyse the customer email and return JSON with:
                - sentiment: positive | negative | neutral
                - intent: complaint | question | cancellation | praise | other
                - urgency: high | medium | low
                - summary: one sentence max
                - suggested_action: what the support team should do"""
            },
            {"role": "user", "content": email_text}
        ],
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content

result = analyse_customer_email("""
I've been waiting three weeks for my order and nobody has responded 
to my previous two emails. This is completely unacceptable.
""")
# Returns structured JSON: sentiment=negative, intent=complaint, urgency=high
Step 1 Ingest text Receive raw text: emails, documents, social posts, support tickets, contracts. Clean and normalise (remove HTML, fix encoding).
Step 2 Tokenise and embed Split text into tokens. Convert to vector embeddings for search/similarity, or pass directly to an LLM for understanding.
Step 3 Apply NLP task Classify, extract, summarise, translate, or generate. Modern LLMs can do all of these from a single API call with the right prompt.
Step 4 Use the structured output Feed results into downstream systems: CRM, dashboards, databases, email automation, routing queues, or human review workflows.
ToolTypeBest for
spaCyPython libraryNamed entity recognition, dependency parsing, fast rule-based NLP
Hugging Face TransformersPython libraryRunning any open-source transformer model locally
sentence-transformersPython librarySemantic search, document similarity, embeddings
OpenAI APIAPI serviceGeneral NLP via LLM: classification, summarisation, extraction
AWS ComprehendManaged serviceSentiment, entities, key phrases, language detection at scale
Google Natural Language APIManaged serviceSentiment, entity recognition, content classification

What’s next

Further reading