What is Natural Language Processing (NLP)?
Natural language processing (NLP) is the field of AI concerned with understanding and generating human language. Plain-English guide covering how NLP works and where you encounter it.

Where you encounter NLP every day
NLP is one of the most widely deployed AI technologies. You interact with it constantly:
- Search: Google interprets the meaning of your query, not just the keywords
- Email: Gmail’s spam filter, smart reply suggestions, and category sorting
- Translation: Google Translate, DeepL, and web page auto-translation
- Voice assistants: Siri, Alexa, and Google Assistant all understand spoken (then transcribed) language
- Customer support: Chatbots that understand your question and route you to the right team
- Document processing: PDF extraction, invoice parsing, contract review
- Social media: Content moderation, trending topic detection, ad targeting by interest
Core NLP tasks
How NLP works: from words to numbers
Computers process numbers, not words. The first step of any NLP pipeline is converting text into numbers that capture meaning.
Tokenisation: Split text into tokens (words or word fragments)
"AI-solutions.wiki is useful" → ["AI", "-", "solutions", ".", "wiki", "is", "useful"]Embeddings: Convert each token to a vector (list of numbers) that encodes meaning
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = [
"The meeting was cancelled",
"The appointment was called off", # same meaning, different words
"The product launch went well", # different meaning
]
embeddings = model.encode(sentences)
# Embeddings for sentence 1 and 2 will be mathematically close
# Embedding for sentence 3 will be far from bothThe key insight: words with similar meanings end up close together in vector space. “King” and “Queen” are near each other. “Bank” (financial) and “Bank” (river) are in different locations depending on context.
From classical NLP to modern LLMs
NLP methods have evolved dramatically:
| Era | Approach | Example |
|---|---|---|
| 1990s | Rule-based: hand-written grammars and dictionaries | Early spell checkers |
| 2000s | Statistical ML: count-based models (TF-IDF, n-grams) | Naive Bayes spam filter |
| 2010s | Word embeddings + RNNs | Word2Vec, early chatbots |
| 2017+ | Transformers | BERT (classification), GPT (generation) |
| 2022+ | Instruction-following LLMs | ChatGPT, Claude, Gemini |
Modern LLMs handle most classical NLP tasks (classification, summarisation, extraction, translation) as part of general instruction following. You no longer need a separate specialised model for each task; you prompt a single large model.
A practical NLP pipeline
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY")
def analyse_customer_email(email_text):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": """Analyse the customer email and return JSON with:
- sentiment: positive | negative | neutral
- intent: complaint | question | cancellation | praise | other
- urgency: high | medium | low
- summary: one sentence max
- suggested_action: what the support team should do"""
},
{"role": "user", "content": email_text}
],
response_format={"type": "json_object"}
)
return response.choices[0].message.content
result = analyse_customer_email("""
I've been waiting three weeks for my order and nobody has responded
to my previous two emails. This is completely unacceptable.
""")
# Returns structured JSON: sentiment=negative, intent=complaint, urgency=highPopular NLP libraries and services
| Tool | Type | Best for |
|---|---|---|
| spaCy | Python library | Named entity recognition, dependency parsing, fast rule-based NLP |
| Hugging Face Transformers | Python library | Running any open-source transformer model locally |
| sentence-transformers | Python library | Semantic search, document similarity, embeddings |
| OpenAI API | API service | General NLP via LLM: classification, summarisation, extraction |
| AWS Comprehend | Managed service | Sentiment, entities, key phrases, language detection at scale |
| Google Natural Language API | Managed service | Sentiment, entity recognition, content classification |
What’s next
- What is a Large Language Model? : The most capable current form of NLP
- Building RAG Systems : Using NLP embeddings to search your own documents
- What is Generative AI? : How NLP models produce new text
Further reading
- Hugging Face NLP Course : Free, comprehensive course covering transformers and modern NLP
- spaCy documentation : Industrial-strength Python NLP library, well documented
- AWS Comprehend documentation : Managed NLP for teams using AWS
Frequently asked questions