Event Sourcing and CQRS for AI Pipelines
Using event-driven architecture patterns for AI data pipelines: immutable event logs, replay capability, audit trails, and CQRS for separating read and write models.
Event Sourcing treats every state change as an immutable event appended to a log. Instead of storing the current state of a record, you store the full sequence of events that produced that state. The current state is derived by replaying the log. For AI systems, this pattern solves several problems that are hard to address with mutable state stores: audit trails, pipeline replay, debugging data quality issues, and reconstructing model inputs retrospectively.
Why AI Pipelines Benefit from Event Sourcing
AI systems have unique auditability requirements that event sourcing addresses directly.
Regulatory compliance - Financial, healthcare, and legal AI applications often require proof of what data was used to produce a specific output. An event log provides a complete provenance trail: document version X was ingested at time T, processed with chunking configuration V, retrieved as context for request R, and produced output O.
Pipeline replay - When an embedding model is upgraded, you may need to re-embed all existing documents. With event sourcing, the ingestion events are already stored. You replay them through the new processing pipeline without re-ingesting from source systems.
Debugging data quality - If a model produces a bad output, you can reconstruct exactly what context was retrieved and what prompt was sent. With mutable state, that information is overwritten on the next request.
Gradual rollout - New pipeline versions can process a copy of the event stream in parallel with the existing version, allowing comparison before cutover.
The Event Model
Events are immutable records of something that happened. For an AI document pipeline, a minimal set of events might look like:
DocumentSubmitted { document_id, source, raw_content, submitted_at }
DocumentValidated { document_id, format, byte_count, validated_at }
DocumentChunked { document_id, chunks: [{chunk_id, text, position}], chunked_at }
ChunkEmbedded { chunk_id, embedding_model, vector, embedded_at }
ChunkIndexed { chunk_id, index_name, indexed_at }
QueryReceived { query_id, query_text, user_id, received_at }
ContextRetrieved { query_id, chunk_ids, scores, retrieved_at }
InferenceRequested { query_id, model_id, prompt_hash, requested_at }
InferenceCompleted { query_id, response_text, tokens_used, completed_at }
Each event is written once and never modified. The pipeline advances by reacting to events and producing new events.
CQRS: Separating Reads from Writes
Command Query Responsibility Segregation (CQRS) complements event sourcing by separating write models (commands that produce events) from read models (projections built from events).
In an AI knowledge base:
Write side (commands): Submit document, update document, delete document. Each command produces one or more events. Write side is optimised for consistency and auditability.
Read side (projections): Current document index, embedding status per document, retrieval statistics. Each projection is built by subscribing to the event stream and maintaining a denormalised view optimised for its query pattern. The vector search index is a read projection. The document status dashboard is another projection.
This separation allows the vector index to be rebuilt by replaying events without touching the write-side model.
Implementation on AWS
Event store: Kinesis Data Streams for high-throughput pipelines. DynamoDB with a composite key (entity_id + timestamp) for moderate-volume applications where queryability by entity matters. S3 with a partition scheme like events/YYYY/MM/DD/entity_id/ for archival and batch replay.
Event bus: EventBridge for routing events to multiple consumers without coupling producers to consumers. Each service subscribes to the event types it cares about.
Projections: Lambda functions or ECS tasks that consume from the event stream and update read models (OpenSearch index, DynamoDB status table, CloudWatch metrics).
Replay: To replay events, read from the event store with a start timestamp and process forward. For S3-based stores, an S3 batch operation or Glue job can feed events back into the processing pipeline.
Handling Schema Evolution
Events are written by the version of the code that existed at the time. When event schemas change, old events still need to be readable.
Three strategies:
- Upcasting - At read time, transform old event versions to the current schema before processing.
- Versioned schemas - Include a
schema_versionfield in every event. Consumers handle each version explicitly. - Additive-only changes - New fields are optional with defaults. Never remove or rename fields in events that have been written.
For AI pipelines, additive-only changes work well for most cases. If a processing step adds a new metadata field, old events simply do not have it and the consumer treats it as absent.
What Event Sourcing Does Not Solve
Event sourcing adds complexity. Do not apply it when:
- The pipeline is simple and linear with no replay requirements
- The team is small and the operational overhead of managing an event store outweighs the benefits
- Regulatory requirements do not demand an immutable audit trail
A simpler alternative for audit logging is appending structured log entries to CloudWatch Logs or S3. Event sourcing is warranted when replay, projection rebuilding, or state reconstruction from history are actual requirements rather than hypothetical ones.
Sources and Further Reading
- Evans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley. — Introduced the strategic and tactical design patterns of DDD, the conceptual foundation from which Event Sourcing emerged.
- Young, G. (2010). “CQRS Documents.” Self-published. https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf — Greg Young’s canonical description of CQRS and its relationship to Event Sourcing.
- Fowler, M. “Event Sourcing.” martinfowler.com, 2005. https://martinfowler.com/eaaDev/EventSourcing.html — Canonical pattern description.
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O’Reilly Media. — Chapter 11 (“Stream Processing”) covers event logs, derived data, and the relationship between stream processing and event sourcing at the systems level.
Need help implementing this?
Turn this knowledge into a working prototype. Our structured workshop methodology takes you from idea to deployed AI solution in three sessions.
Explore AI Workshops