Federated learning trains machine learning models across multiple devices or organizations without moving the training data to a central location. Instead of “bring the data to the model,” federated learning “brings the model to the data.” This is valuable when data cannot be centralized due to privacy regulations, competitive concerns, or practical constraints.

How Federated Learning Works

The basic federated learning process:

  1. Central server distributes a model. The coordinating server sends the current model to all participating clients (devices, organizations, data centers).
  2. Clients train locally. Each client trains the model on its local data for one or more epochs.
  3. Clients send model updates. Instead of sending raw data, each client sends its model update (gradients or updated weights) back to the server.
  4. Server aggregates updates. The server combines the updates from all clients (commonly using Federated Averaging - averaging the weight updates) to produce a new global model.
  5. Repeat. Steps 1-4 repeat for multiple rounds until the model converges.

The key insight: raw data never leaves the client. Only model parameters (which do not directly reveal individual data points) are communicated.

When to Use Federated Learning

Regulatory constraints. Healthcare data (HIPAA), financial data (GLBA), or data subject to GDPR data residency requirements that cannot be centralized.

Cross-organization collaboration. Multiple hospitals want to train a model on their combined patient data without sharing patient records. Multiple banks want a fraud detection model trained on combined transaction data without sharing customer information.

Mobile and IoT. Training on data generated by millions of mobile devices without uploading personal data to a central server. This is how Google trains the keyboard prediction model on Android devices.

Data sovereignty. Organizations that want to contribute to shared model training without giving up control of their data.

When Not to Use Federated Learning

When data can be centralized. If there are no privacy, regulatory, or practical barriers to centralizing data, standard centralized training is simpler and typically produces better models.

When data distribution is highly heterogeneous. If each client has fundamentally different data (non-IID), federated learning struggles to produce a model that works well for all clients.

When communication is severely limited. Federated learning requires multiple rounds of communication between server and clients. If clients have very limited bandwidth or intermittent connectivity, this communication overhead may be prohibitive.

When the privacy guarantees are insufficient. While federated learning does not share raw data, model updates can leak information about training data (through gradient inversion attacks). For strong privacy guarantees, federated learning must be combined with differential privacy.

Implementation Approaches

Frameworks

Flower (flwr). The most popular open-source federated learning framework. Framework-agnostic (works with PyTorch, TensorFlow, scikit-learn). Flexible and well-documented. Good for research and production.

PySyft. Privacy-preserving ML framework. Supports federated learning with differential privacy and secure multi-party computation. More focused on privacy guarantees than pure federated learning.

NVIDIA FLARE. Enterprise federated learning framework. Designed for healthcare and financial services. Supports complex workflows and provisioning.

AWS SageMaker with federated learning. Not a native feature, but SageMaker can be used to orchestrate federated learning across multiple accounts or regions.

Architecture Choices

Cross-device. Many clients (thousands to millions), each with small amounts of data. Clients are typically mobile devices or IoT sensors. Communication is unreliable; not all clients participate in every round. Example: keyboard prediction on smartphones.

Cross-silo. Few clients (2-20), each with large amounts of data. Clients are typically organizations (hospitals, banks). Communication is reliable. All clients participate in every round. Example: multi-hospital medical AI model.

Selection for enterprise: Most enterprise federated learning is cross-silo. Cross-device federated learning is primarily relevant for consumer-facing mobile and IoT applications.

Practical Challenges

Data Heterogeneity

In real federated settings, clients have different data distributions. Hospital A may see mostly cardiac patients while Hospital B sees mostly oncology patients. This non-IID (non-independent and identically distributed) data makes training harder because local model updates push in different directions.

Mitigations: Personalization layers (shared base model with client-specific layers), local fine-tuning after federated training, careful client selection per training round.

Communication Efficiency

Sending full model updates is expensive for large models. A 1GB model updated by 100 clients generates 100GB of network traffic per round.

Mitigations: Gradient compression (send sparse or quantized gradients), fewer communication rounds (more local training per round), federated distillation (send predictions instead of weights).

Privacy Enhancement

Federated learning alone does not guarantee privacy. Model updates can leak information.

Differential privacy. Add calibrated noise to model updates before sending them. This provides mathematical privacy guarantees at the cost of some model accuracy.

Secure aggregation. Cryptographic protocols that allow the server to compute the aggregate model update without seeing individual client updates. Prevents the server from learning about individual clients.

Model Governance

In federated settings, model governance is complex:

  • Who owns the trained model?
  • How are model updates audited?
  • How is model quality verified across clients?
  • What happens when a client wants to withdraw?

Define governance agreements before starting federated training.

Getting Started

  1. Validate the need. Confirm that data truly cannot be centralized. Federated learning adds significant complexity; use it only when necessary.
  2. Start with simulation. Before deploying across real organizations, simulate federated learning on partitioned data within a single environment. This validates the approach and hyperparameters.
  3. Choose a framework. Flower for flexibility, NVIDIA FLARE for enterprise healthcare and finance.
  4. Start with cross-silo. Two organizations, well-defined data, reliable communication. This is the simplest real-world federated setup.
  5. Implement privacy enhancements. Add differential privacy or secure aggregation based on your privacy requirements.
  6. Establish governance. Document ownership, audit processes, and withdrawal procedures.

Federated learning is a specialized technique for specific circumstances. When those circumstances apply - data that cannot be centralized, organizations that want to collaborate without sharing data - it enables AI that would otherwise be impossible.

Sources and Further Reading

  1. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., and Arcas, B.A. y (2017). “Communication-Efficient Learning of Deep Networks from Decentralized Data.” Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). — Introduced the term “federated learning” and the FedAvg aggregation algorithm. The foundational paper for the field. https://arxiv.org/abs/1602.05629
  2. Bonawitz, K. et al. (2019). “Towards Federated Learning at Scale: A System Design.” Proceedings of Machine Learning and Systems (MLSys) 1. — Describes Google’s production deployment of federated learning across Android devices for keyboard prediction, covering engineering challenges at scale. https://arxiv.org/abs/1902.01046
  3. Li, T. et al. (2020). “Federated Learning: Challenges, Methods, and Future Directions.” IEEE Signal Processing Magazine 37(3), pp. 50–60. — Survey covering non-IID data, communication efficiency, privacy, and open research questions. https://arxiv.org/abs/1908.07873
  4. Kairouz, P. et al. (2021). “Advances and Open Problems in Federated Learning.” Foundations and Trends in Machine Learning 14(1–2), pp. 1–210. — Comprehensive 200-page survey written by 58 authors from academic and industry institutions; covers virtually every open problem in the field. https://arxiv.org/abs/1912.04977