Fine-Tuning LLMs
Customize language models for your use case — when to fine-tune vs RAG, LoRA, training data preparation, and evaluation
What is Fine-Tuning?
Fine-tuning takes a pre-trained language model and further trains it on your specific data to make it better at a particular task. It's like teaching a college graduate the specifics of your company — they already know the fundamentals, they just need domain-specific knowledge.
🤔 Fine-Tuning vs RAG vs Prompting
When to Fine-Tune
✅ Good Reasons to Fine-Tune
- • Consistent output format (JSON schemas, markdown, etc.)
- • Domain-specific tone or writing style
- • Reducing prompt length (internalize instructions)
- • Teaching specialized knowledge or terminology
- • Improving performance on a specific task category
❌ Don't Fine-Tune When
- • You need access to frequently changing data (use RAG)
- • Good prompting achieves similar results (try that first)
- • You have fewer than 50 high-quality examples
- • The task is too general or varied
- • You need source attribution (use RAG)
Training Data Preparation
Quality training data is the most important factor. Here's the format for OpenAI fine-tuning:
// training-data.jsonl — one example per line
// Each example is a complete conversation
{"messages": [
{"role": "system", "content": "You are a customer support agent for TechLead, a web development tutorial platform."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "To reset your password:\n\n1. Go to frontendtechlead.com/login\n2. Click 'Forgot Password'\n3. Enter your email\n4. Check your inbox for the reset link\n\nThe link expires in 24 hours. Need help with anything else?"}
]}
{"messages": [
{"role": "system", "content": "You are a customer support agent for TechLead, a web development tutorial platform."},
{"role": "user", "content": "Is the React tutorial free?"},
{"role": "assistant", "content": "Yes! All tutorials on TechLead are completely free, including our comprehensive React tutorial with 17 lessons covering components, hooks, state management, and more.\n\nStart here: frontendtechlead.com/learn-react"}
]}
// Guidelines for training data:
// - Minimum 50 examples (100-500 is ideal)
// - High quality, consistent format
// - Cover edge cases and variations
// - Match the exact format you want the model to output
LoRA: Efficient Fine-Tuning
Full fine-tuning updates all model parameters (expensive!). LoRA (Low-Rank Adaptation) is a technique that freezes the original model and trains small adapter layers:
Full Fine-Tuning
- • Updates all billions of parameters
- • Needs powerful GPUs (A100, H100)
- • Creates a full copy of the model
- • Hours to days of training
- • Risk of catastrophic forgetting
LoRA Fine-Tuning
- • Trains only 0.1-1% of parameters
- • Works on consumer GPUs
- • Adapter is just a few MB
- • Minutes to hours of training
- • Original model weights preserved
QLoRA goes further: quantizes the base model to 4-bit, making fine-tuning possible on a single GPU with 24GB VRAM.
Fine-Tuning with OpenAI API
// Fine-tuning workflow with OpenAI
import OpenAI from 'openai';
import fs from 'fs';
const openai = new OpenAI();
// Step 1: Upload training data
const file = await openai.files.create({
file: fs.createReadStream('training-data.jsonl'),
purpose: 'fine-tune',
});
console.log('File uploaded:', file.id);
// Step 2: Create fine-tuning job
const job = await openai.fineTuning.jobs.create({
training_file: file.id,
model: 'gpt-4o-mini-2024-07-18',
hyperparameters: {
n_epochs: 3, // Number of passes through the data
batch_size: 'auto', // Let OpenAI optimize
learning_rate_multiplier: 'auto',
},
suffix: 'my-custom-model', // Custom name suffix
});
console.log('Fine-tuning job created:', job.id);
// Step 3: Monitor progress
const status = await openai.fineTuning.jobs.retrieve(job.id);
console.log('Status:', status.status); // 'running' | 'succeeded' | 'failed'
// Step 4: Use your fine-tuned model
const completion = await openai.chat.completions.create({
model: 'ft:gpt-4o-mini-2024-07-18:org:my-custom-model:abc123',
messages: [
{ role: 'user', content: 'How do I start learning React?' }
],
});
console.log(completion.choices[0].message.content);
Evaluation & Iteration
Hold-out Test Set
Reserve 10-20% of your data for evaluation. Never train on test data. Compare fine-tuned vs base model on identical prompts.
Human Evaluation
Have domain experts rate outputs on accuracy, tone, format, and helpfulness. Automated metrics miss nuance.
A/B Testing
Deploy both models side-by-side and measure real user engagement, satisfaction, and task completion rates.
Iterative Improvement
Analyze failures → add corrective examples → retrain. Each iteration should target specific weaknesses.
🔑 Key Takeaways
- • Try prompting first, then RAG, then fine-tuning — escalate only when needed
- • Quality of training data matters more than quantity (50-500 high-quality examples)
- • LoRA makes fine-tuning accessible with minimal compute
- • Always evaluate with a held-out test set and human review
- • Fine-tuning changes model behavior; RAG adds knowledge — they complement each other