What is Fine-tuning?
Fine-tuning adapts a pre-trained AI model to a specific task or domain using your own data. When it makes sense, what it costs, and when prompt engineering is better.

The problem fine-tuning solves
A pre-trained model like GPT-4o or Llama 3 is trained on general internet data. It can write in many styles, discuss many topics, and answer many types of questions. But it does not know:
- Your company’s internal terminology
- The exact output format your system expects
- The tone of voice your brand uses
- The regulatory constraints specific to your industry
- Domain-specific concepts that are underrepresented in public training data
You can address some of these with prompt engineering: add instructions to every prompt. But for complex domains, long instruction lists inflate costs and still produce inconsistent results.
Fine-tuning bakes the knowledge or behaviour into the model itself, so you do not need to explain it every time.
Prompt engineering vs fine-tuning
| Prompt engineering | Fine-tuning | |
|---|---|---|
| Setup time | Minutes to hours | Days to weeks |
| Setup cost | €0 | €2 to €10,000+ |
| Inference cost | Higher (long prompts cost more) | Lower (shorter prompts needed) |
| Consistency | Variable | High |
| Data required | None | 50 to 100,000+ examples |
| Model ownership | No | Yes (if self-hosted) |
| Best for | Exploratory, general tasks | High-volume, specialised tasks |
Rule of thumb: spend one week on prompt engineering first. If quality is still insufficient after exhausting prompt techniques, evaluate fine-tuning.
Types of fine-tuning
The fine-tuning workflow
Fine-tuning with the OpenAI API
pip install openaiimport json
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY")
# Training data format: JSONL with messages arrays
training_examples = [
{
"messages": [
{"role": "system", "content": "You are a customer support agent for an Austrian fintech company. Reply in formal German."},
{"role": "user", "content": "Ich habe eine Frage zu meiner Rechnung."},
{"role": "assistant", "content": "Guten Tag! Ich helfe Ihnen gerne bei Ihrer Frage zur Rechnung. Bitte teilen Sie mir Ihre Kundennummer mit, damit ich Ihr Konto prüfen kann."}
]
},
# ... add 99+ more examples
]
# Save as JSONL
with open("training.jsonl", "w") as f:
for ex in training_examples:
f.write(json.dumps(ex) + "\n")
# Upload training file
with open("training.jsonl", "rb") as f:
response = client.files.create(file=f, purpose="fine-tune")
file_id = response.id
# Start fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=file_id,
model="gpt-4o-mini",
)
print(f"Fine-tuning job created: {job.id}")LoRA fine-tuning with Hugging Face
For open-weight models (Llama 3, Mistral), LoRA is the standard approach:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig, TaskType
import torch
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
# Configure LoRA: only train a small adapter
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # rank: higher = more capacity, more memory
lora_alpha=32,
target_modules=["q_proj", "v_proj"], # which layers to adapt
lora_dropout=0.1,
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 4,194,304 || all params: 8,034,476,032 || 0.05%
# Only 0.05% of parameters are being trainedThe LoRA adapter saves as a small file. Apply it to the base model at inference time.
When not to fine-tune
The prompt engineering ceiling is not actually reached: Most companies that think they need fine-tuning have not exhausted prompt engineering. Few-shot examples, structured output formats, and chain-of-thought prompting often get 80% of the quality with 0% of the fine-tuning overhead.
Your task changes frequently: Fine-tuned models encode the task into weights. Changing the task requires retraining. For rapidly evolving use cases, prompt engineering stays flexible.
Your dataset has quality problems: Fine-tuning amplifies patterns in training data. A model trained on inconsistent or low-quality examples learns to be inconsistent and low-quality.
You are under EU AI Act obligations: If your use case falls under the EU AI Act’s high-risk categories, a fine-tuned model may trigger documentation, evaluation, and compliance obligations that a prompted model using a compliant third-party API does not.
What’s next
- What is Machine Learning? : The technical foundation that makes fine-tuning work
- What is a Large Language Model? : Understanding what you are actually fine-tuning
- Custom ML vs Foundation Models : When to fine-tune vs build a custom model vs prompt a foundation model
- MLOps: Getting Started : How to manage the lifecycle of fine-tuned models in production
Further reading
- OpenAI Fine-tuning Guide : Step-by-step with data format specs and pricing
- Hugging Face PEFT documentation : LoRA, prefix tuning, and other parameter-efficient methods
- AWS Bedrock Fine-tuning : Managed fine-tuning for Titan, Llama 3, and Mistral models
Frequently asked questions