Amazon Comprehend is a managed NLP service that provides trained models for common text analysis tasks without requiring ML expertise or model training. It handles the high-volume, structured NLP tasks that would otherwise require either custom model development or expensive LLM calls.

Official documentation: https://docs.aws.amazon.com/comprehend/latest/dg/
Pricing: https://aws.amazon.com/comprehend/pricing/
Service quotas: https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html

What Comprehend Does

Sentiment analysis - Classifies text as positive, negative, neutral, or mixed, with a confidence score for each. Works at the document level and at the entity level (targeted sentiment). Useful for customer feedback analysis, social media monitoring, and product review processing.

Entity recognition - Identifies and classifies named entities: people, organizations, locations, dates, quantities, events. Returns the entity text, type, and position in the document. For domain-specific entities (medical terms, legal entities, product names), Comprehend custom entity recognition allows training on your own labeled examples.

Key phrase extraction - Identifies the main concepts and topics in text without requiring a predefined taxonomy. Useful for search indexing and quick summarization of large text volumes.

Topic modeling - Identifies recurring themes across a document collection using Latent Dirichlet Allocation (LDA). Batch operation: you provide thousands of documents, Comprehend returns topic clusters and the terms that characterize each topic. Useful for understanding what a large customer feedback corpus is actually about.

Language detection - Identifies the language of text with a confidence score. Supports 100 languages. Useful as a preprocessing step before language-specific processing pipelines.

PII detection and redaction - Identifies personally identifiable information (names, addresses, credit card numbers, etc.) and can automatically redact it. Useful for compliance workflows where documents must be processed without exposing PII.

Comprehend vs. Bedrock for NLP Tasks

Both services can perform NLP tasks, but they have different cost and performance profiles:

Use Comprehend when:

  • Processing high volumes (10,000+ documents per day) - Comprehend is significantly cheaper than Bedrock per document for standard tasks
  • You need consistent, structured output - Comprehend returns structured JSON responses with defined schemas
  • Latency is critical - Comprehend is faster for synchronous calls
  • Tasks are well-defined and standard - sentiment, entities, language, key phrases

Use Bedrock (Claude) when:

  • The task requires reasoning or judgment beyond classification - “What is the author’s attitude toward the policy?” vs. “Positive/Negative?”
  • You need nuanced extraction that benefits from context - extracting contract obligations requires understanding, not just pattern matching
  • You need explanation with the output - “Why is this review negative?” requires generation
  • Volume is low - LLM costs per call are higher but acceptable at modest volumes

For many production NLP pipelines, Comprehend and Bedrock are complementary: Comprehend handles volume preprocessing (language detection, basic entity extraction, sentiment flagging), and Bedrock handles the cases Comprehend flags for deeper analysis.

Custom Classification and Entity Recognition

Comprehend Custom allows training classification and entity recognition models on your own labeled data. Training a custom classifier requires 100+ labeled examples per class minimum, and performance improves substantially with 1,000+.

This is the right choice when you have domain-specific categories that general models do not know about: product category taxonomies, internal issue classification schemas, or specialized document types. Training cost is low (typically under 100 EUR per training job); inference is priced per unit, cheaper than Bedrock for high-volume classification.