Precision industrial components arranged in sequence, representing an end-to-end machine learning pipeline from training to deployment.
TensorFlow is built as a full pipeline: train, package as a SavedModel, then serve it on a server, a phone, or a browser.

TensorFlow is an open-source, end-to-end machine learning platform developed and maintained by Google. It represents computation as dataflow graphs whose nodes are operations and whose edges carry tensors, and it maps those graphs across CPUs, GPUs, and Google’s Tensor Processing Units (TPUs). TensorFlow 1.x used static graphs you built then ran inside a session. TensorFlow 2.x made eager execution the default, so code runs immediately like normal Python, and reintroduced graph speed on demand through the @tf.function decorator. Today its centre of gravity is production: serving, mobile and edge deployment, and TPU training, rather than research prototyping.

Where TensorFlow sits

TensorFlow spans the whole lifecycle, from a high-level model API down to the hardware, and out to deployment targets that most other frameworks leave to third parties.

API
Keras 3 tf.keras High-level model definition, multi-backend
Core
Dataflow graphs tf.GradientTape tf.data tf.distribute Autodiff, input pipelines, distribution strategies
Compiler
XLA / OpenXLA Op fusion and TPU compilation
Deployment
TF Serving LiteRT TensorFlow.js Server, on-device, and browser targets
Hardware
CPU GPU TPU TPU support is first-class

Graphs, tracing, and autodiff

Two mechanisms define how TensorFlow runs. The @tf.function decorator turns a Python function into a graph: on first call it traces the Python into a tf.Graph, and later calls reuse that graph, skipping the interpreter for speed and portability. AutoGraph converts Python control flow such as if and for into graph operations automatically. Gradients come from tf.GradientTape, which records operations in a context and computes derivatives by reverse-mode automatic differentiation when you call tape.gradient(loss, variables). Underneath, XLA can fuse operations into optimised kernels and is the compilation backbone for TPUs. The trained result is packaged as a SavedModel, the language-neutral artifact that TF Serving, LiteRT, and TensorFlow.js all consume.

Keras 3 is now multi-backend

The most important recent change is Keras 3. Released in late 2023, it is a full rewrite that runs the same model code on TensorFlow, JAX, or PyTorch. You pick the backend with the KERAS_BACKEND environment variable before importing Keras; TensorFlow is the default. Since TensorFlow 2.16, tf.keras is Keras 3 with the TensorFlow backend. This means TensorFlow is now one backend among several rather than Keras’s exclusive engine, and Keras releases publish on keras.io rather than in TensorFlow’s notes.

Installing TensorFlow

bash
# CPU only, all platforms
pip install tensorflow

# GPU support on Linux or WSL2 (bundles CUDA and cuDNN wheels)
pip install tensorflow[and-cuda]

# Verify the GPU is visible
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

The old tensorflow-gpu package is gone; use the [and-cuda] extra. Native Windows GPU support ended after TF 2.10, so use WSL2 for GPU on Windows. Keras 3 ships as a dependency, so tf.keras already gives you Keras 3.

A Keras 3 model, end to end

Define a convolutional classifier, compile it with a loss and optimiser, then fit with callbacks for early stopping and checkpointing.

python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = keras.Sequential([
    keras.Input(shape=(32, 32, 3)),
    layers.Conv2D(32, 3, activation="relu"),
    layers.BatchNormalization(),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dropout(0.4),
    layers.Dense(128, activation="relu"),
    layers.Dense(10),  # logits
])

model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

model.fit(
    x_train, y_train, validation_split=0.1, epochs=20, batch_size=128,
    callbacks=[keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True)],
)
model.evaluate(x_test, y_test)

A tf.data pipeline with a custom training loop

For full control, build an input pipeline with tf.data and write the training step yourself inside a @tf.function.

python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), _ = keras.datasets.mnist.load_data()
x_train = (x_train / 255.0).astype("float32")

train_ds = (
    tf.data.Dataset.from_tensor_slices((x_train, y_train))
    .shuffle(10_000).batch(128).prefetch(tf.data.AUTOTUNE)  # overlap prep with training
)

model = keras.Sequential([
    keras.Input(shape=(28, 28)),
    layers.Reshape((28, 28, 1)),
    layers.Conv2D(32, 3, activation="relu"),
    layers.GlobalAveragePooling2D(),
    layers.Dense(10),
])

optimizer = keras.optimizers.Adam()
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

@tf.function  # traced once into a graph, then reused
def train_step(images, labels):
    with tf.GradientTape() as tape:
        logits = model(images, training=True)
        loss = loss_fn(labels, logits)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss

for epoch in range(5):
    for images, labels in train_ds:
        loss = train_step(images, labels)
    print(f"epoch {epoch}: loss={float(loss):.4f}")

From training to deployment

Step 1 Build Define a model in Keras 3 or with low-level tf ops.
Step 2 Train Fit with tf.data pipelines and tf.distribute across GPUs or TPUs.
Step 3 Export Save a SavedModel, the portable deployment artifact.
Step 4 Serve Deploy with TF Serving, on-device with LiteRT, or in-browser with TF.js.

For scale, tf.distribute strategies handle multi-GPU (MirroredStrategy), multi-machine (MultiWorkerMirroredStrategy), and TPU (TPUStrategy) training with the same model code. For edge, LiteRT (the 2024 rebrand of TensorFlow Lite) runs models on mobile and embedded targets, and now also runs models authored in PyTorch and JAX, which is why the name dropped “TensorFlow”.

How it compares

TensorFlowPyTorchJAXKeras 3
Core modelEager, graphs via tf.functionEager, compile optionalFunctional transformsAPI over a backend
AutodiffGradientTapeautogradjax.gradDelegates
Research useDecliningDominantRising at scaleSits on top
ProductionVery strong (TF Serving, TFX)Strong, growingEmergingInherits backend
Mobile / edgeMost mature (LiteRT)ExecuTorch (newer)LimitedVia TF backend
Best forProduction, edge, TPUNew research, LLMsTPU-scale trainingPortable code

When not to use TensorFlow

  • You are doing research or reproducing recent papers. The overwhelming majority of new papers and open-weight models ship in PyTorch first, so reproducing state of the art in TensorFlow means fighting missing ports.
  • You want frontier TPU-scale training. Google itself trains its largest models in JAX; the JAX stack is the recommended path in that regime.
  • You are sensitive to migration churn. The TF1 to TF2 move, the Keras 3 transition, and the tf.lite to LiteRT split have each imposed real migration costs.
  • You want the biggest community momentum. Fewer new libraries and checkpoints target TensorFlow now, which slows debugging and hiring.
  • You only need an on-device runtime. For pure on-device inference, LiteRT or ExecuTorch as a standalone runtime is the target, not full TensorFlow.

Further reading

Sources