Amazon Personalize is a managed machine learning service that generates individualized recommendations for users. It uses the same recommendation technology that Amazon.com uses for product suggestions. You provide interaction data (user clicked item X, user purchased item Y), and Personalize trains models that predict what each user is most likely to engage with next. No ML expertise is required to get started, though the service exposes tuning parameters for teams that want fine-grained control.

Official documentation: https://docs.aws.amazon.com/personalize/ Pricing: https://aws.amazon.com/personalize/pricing/ Service quotas: https://docs.aws.amazon.com/personalize/latest/dg/limits.html

Core Concepts

Dataset Group - The top-level container. It holds three dataset types: Interactions (required, the event history of user-item actions), Items (optional metadata like category, price, genre), and Users (optional metadata like age segment, subscription tier).

Solution - A trained model. You select a recipe (algorithm type), Personalize trains on your data, and produces a solution version. Recipes include User-Personalization (general recommendations), Similar-Items (item-to-item), and Personalized-Ranking (reranking a supplied list).

Campaign - A deployed endpoint that serves real-time recommendations. You create a campaign from a solution version and call it via API to get recommendations for a specific user. Campaigns have a configurable minimum throughput (transactions per second) that affects cost.

Event Tracker - A real-time ingestion endpoint. As users interact with your application, you send events (clicks, views, purchases) to the tracker. Personalize incorporates these events into recommendations immediately, without retraining.

Recipe Selection

Choosing the right recipe is the most consequential decision in a Personalize project.

User-Personalization is the default choice for most use cases. It generates a ranked list of items for a specific user based on their interaction history and the behavior patterns of similar users. This is the recipe to use for “recommended for you” experiences.

Similar-Items generates recommendations based on item co-occurrence in interaction data plus item metadata. Use this for “customers who viewed this also viewed” or “related products” experiences. It does not require a user context, making it useful for anonymous or new users.

Personalized-Ranking takes a list of items you supply and reranks them for a specific user. This is valuable when you have a curated list (search results, category page items) and want to personalize the ordering without changing the set.

Data Requirements

Personalize requires a minimum of 1,000 interaction records from at least 25 unique users to train a model. In practice, models become meaningfully accurate at around 50,000 interactions. The interaction dataset must include a USER_ID, ITEM_ID, and TIMESTAMP at minimum.

Data quality matters more than quantity. Remove bot traffic, test accounts, and duplicate events before import. If your interaction data is sparse, supplement it with item and user metadata to help the model generalize.

Real-Time vs Batch Recommendations

Real-time recommendations via campaigns are the primary pattern. Latency is typically under 100ms. Use this for web and mobile applications where recommendations are generated at page load.

Batch recommendations generate recommendations for all users at once, outputting to S3. Use this for email campaigns, push notification targeting, or pre-computing recommendations for offline consumption. Batch jobs are significantly cheaper than maintaining a campaign endpoint.

Cold Start Handling

New users with no interaction history are a common challenge. Personalize handles this through exploration: the User-Personalization recipe automatically balances exploitation (recommending items the model is confident about) with exploration (surfacing items that need more data). You can tune the exploration weight to control this balance.

For new items, include rich item metadata (categories, tags, descriptions) so the model can recommend them based on attribute similarity even before interaction data accumulates.

Pricing

Personalize charges for data ingestion, training, and real-time inference. Training costs depend on data volume and are one-time per solution version. Campaign costs are based on the minimum provisioned throughput. For most projects, the campaign hosting cost dominates. Start with the minimum 1 TPS and scale up as traffic increases.