TechLead
Intermediate
20 min
Full Guide

Fine-Tuning LLMs

Customize language models for your use case — when to fine-tune vs RAG, LoRA, training data preparation, and evaluation

What is Fine-Tuning?

Fine-tuning takes a pre-trained language model and further trains it on your specific data to make it better at a particular task. It's like teaching a college graduate the specifics of your company — they already know the fundamentals, they just need domain-specific knowledge.

🤔 Fine-Tuning vs RAG vs Prompting

Prompting: Cheapest. No training. Best for general tasks with good instructions.
RAG: Medium effort. Access to current/private data. Best when knowledge changes often.
Fine-Tuning: Most effort. Changes model behavior. Best for specific tone, format, or domain expertise.

When to Fine-Tune

✅ Good Reasons to Fine-Tune

  • • Consistent output format (JSON schemas, markdown, etc.)
  • • Domain-specific tone or writing style
  • • Reducing prompt length (internalize instructions)
  • • Teaching specialized knowledge or terminology
  • • Improving performance on a specific task category

❌ Don't Fine-Tune When

  • • You need access to frequently changing data (use RAG)
  • • Good prompting achieves similar results (try that first)
  • • You have fewer than 50 high-quality examples
  • • The task is too general or varied
  • • You need source attribution (use RAG)

Training Data Preparation

Quality training data is the most important factor. Here's the format for OpenAI fine-tuning:

// training-data.jsonl — one example per line
// Each example is a complete conversation

{"messages": [
  {"role": "system", "content": "You are a customer support agent for TechLead, a web development tutorial platform."},
  {"role": "user", "content": "How do I reset my password?"},
  {"role": "assistant", "content": "To reset your password:\n\n1. Go to frontendtechlead.com/login\n2. Click 'Forgot Password'\n3. Enter your email\n4. Check your inbox for the reset link\n\nThe link expires in 24 hours. Need help with anything else?"}
]}

{"messages": [
  {"role": "system", "content": "You are a customer support agent for TechLead, a web development tutorial platform."},
  {"role": "user", "content": "Is the React tutorial free?"},
  {"role": "assistant", "content": "Yes! All tutorials on TechLead are completely free, including our comprehensive React tutorial with 17 lessons covering components, hooks, state management, and more.\n\nStart here: frontendtechlead.com/learn-react"}
]}

// Guidelines for training data:
// - Minimum 50 examples (100-500 is ideal)
// - High quality, consistent format
// - Cover edge cases and variations
// - Match the exact format you want the model to output

LoRA: Efficient Fine-Tuning

Full fine-tuning updates all model parameters (expensive!). LoRA (Low-Rank Adaptation) is a technique that freezes the original model and trains small adapter layers:

Full Fine-Tuning

  • • Updates all billions of parameters
  • • Needs powerful GPUs (A100, H100)
  • • Creates a full copy of the model
  • • Hours to days of training
  • • Risk of catastrophic forgetting

LoRA Fine-Tuning

  • • Trains only 0.1-1% of parameters
  • • Works on consumer GPUs
  • • Adapter is just a few MB
  • • Minutes to hours of training
  • • Original model weights preserved

QLoRA goes further: quantizes the base model to 4-bit, making fine-tuning possible on a single GPU with 24GB VRAM.

Fine-Tuning with OpenAI API

// Fine-tuning workflow with OpenAI
import OpenAI from 'openai';
import fs from 'fs';

const openai = new OpenAI();

// Step 1: Upload training data
const file = await openai.files.create({
  file: fs.createReadStream('training-data.jsonl'),
  purpose: 'fine-tune',
});

console.log('File uploaded:', file.id);

// Step 2: Create fine-tuning job
const job = await openai.fineTuning.jobs.create({
  training_file: file.id,
  model: 'gpt-4o-mini-2024-07-18',
  hyperparameters: {
    n_epochs: 3,          // Number of passes through the data
    batch_size: 'auto',   // Let OpenAI optimize
    learning_rate_multiplier: 'auto',
  },
  suffix: 'my-custom-model', // Custom name suffix
});

console.log('Fine-tuning job created:', job.id);

// Step 3: Monitor progress
const status = await openai.fineTuning.jobs.retrieve(job.id);
console.log('Status:', status.status); // 'running' | 'succeeded' | 'failed'

// Step 4: Use your fine-tuned model
const completion = await openai.chat.completions.create({
  model: 'ft:gpt-4o-mini-2024-07-18:org:my-custom-model:abc123',
  messages: [
    { role: 'user', content: 'How do I start learning React?' }
  ],
});

console.log(completion.choices[0].message.content);

Evaluation & Iteration

Hold-out Test Set

Reserve 10-20% of your data for evaluation. Never train on test data. Compare fine-tuned vs base model on identical prompts.

Human Evaluation

Have domain experts rate outputs on accuracy, tone, format, and helpfulness. Automated metrics miss nuance.

A/B Testing

Deploy both models side-by-side and measure real user engagement, satisfaction, and task completion rates.

Iterative Improvement

Analyze failures → add corrective examples → retrain. Each iteration should target specific weaknesses.

🔑 Key Takeaways

  • • Try prompting first, then RAG, then fine-tuning — escalate only when needed
  • • Quality of training data matters more than quantity (50-500 high-quality examples)
  • • LoRA makes fine-tuning accessible with minimal compute
  • • Always evaluate with a held-out test set and human review
  • • Fine-tuning changes model behavior; RAG adds knowledge — they complement each other