Most organizational data is managed as a byproduct of operational systems. Tables are created as implementation details of applications, poorly documented, and governed informally. Downstream consumers (analysts, ML engineers, other teams) reverse-engineer schemas, guess at semantics, and build pipelines on unstable foundations. The data product pattern treats each shared dataset as a product with defined consumers, quality guarantees, and an accountable owner.

What Makes Data a Product

A dataset becomes a data product when it has five properties:

Ownership - A specific team is accountable for the data product’s quality, availability, and evolution. The owning team is the domain team that generates or best understands the data, not a central data team.

Documented schema and semantics - Every field has a description, data type, and business meaning. Consumers should not need to read source code or query the generating team to understand what the data means.

Quality SLAs - The data product defines measurable quality guarantees: freshness (data is no more than N hours old), completeness (null rate below threshold), accuracy (validated against source of truth), and availability (uptime percentage).

Discoverability - The data product is registered in a data catalog where consumers can find it through search. The catalog entry includes schema, documentation, ownership, SLAs, sample data, and lineage information.

Access control - Consumers request access through a self-service mechanism. The data product owner approves or denies access based on data classification and consumer need. Access is audited.

Technical Implementation

Data product interface - Expose the data product through standardized interfaces: a table in the lakehouse, an API endpoint, or a streaming topic. The interface abstracts the underlying implementation so the owner can change internals without breaking consumers.

Quality monitoring - Automated checks run continuously to verify that SLAs are met. Freshness checks confirm that the latest data arrived on schedule. Completeness checks validate null rates. Distribution checks detect anomalies. SLA violations trigger alerts to the owning team.

Schema registry - A central schema registry stores the canonical schema for each data product. Consumers validate their pipelines against the registry. Schema changes go through a compatibility check to prevent breaking downstream consumers.

Versioning - Data products support versioning for breaking changes. A new version can coexist with the previous version during a migration period. Consumers migrate to the new version on their own schedule within a deprecation window.

Organizational Pattern

The data product pattern works best with a federated ownership model where domain teams own their data products and a central platform team provides the shared infrastructure (storage, catalog, monitoring, access control). The central team sets standards; domain teams implement them for their specific data.

Relationship to AI

ML models are only as good as their training data. Data products with documented schemas, quality SLAs, and clear ownership give ML teams confidence that the data they train on is reliable. When a model degrades, the data product SLA framework helps identify whether the cause is upstream data quality rather than a modeling problem.