Streaming Responses - Vercel AI SDK Tutorial | TechLead

Understanding Streaming in AI Applications

Streaming allows AI responses to be delivered incrementally as they're generated, rather than waiting for the complete response. This creates a more responsive user experience, especially for longer responses that might take several seconds to generate.

Benefits of Streaming

Faster Time-to-First-Token: Users see content immediately
Better UX: Real-time typing effect feels more natural
Lower Perceived Latency: Users engage while content loads
Memory Efficient: Process data as it arrives

The streamText Function

The streamText function is the primary way to stream AI responses:

import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = streamText({
  model: openai('gpt-4-turbo'),
  prompt: 'Write a poem about coding.',
});

// The result provides multiple ways to consume the stream

Consuming Streams

1. Using toDataStreamResponse (Recommended for API Routes)

// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4-turbo'),
    messages,
  });

  // Returns a Response with proper streaming headers
  return result.toDataStreamResponse();
}

2. Using toTextStreamResponse

// Returns plain text stream (simpler but less features)
export async function POST(req: Request) {
  const { prompt } = await req.json();

  const result = streamText({
    model: openai('gpt-4-turbo'),
    prompt,
  });

  return result.toTextStreamResponse();
}

3. Using Async Iterator

import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = streamText({
  model: openai('gpt-4-turbo'),
  prompt: 'Count from 1 to 10.',
});

// Process each chunk as it arrives
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

// Or collect all text
const fullText = await result.text;

Stream Data Protocol

The AI SDK uses a special protocol for streaming that includes metadata:

// The data stream includes:
// - Text deltas (the actual content)
// - Tool calls and results
// - Finish reasons
// - Usage information

const result = streamText({
  model: openai('gpt-4-turbo'),
  messages,
  onFinish: async ({ text, finishReason, usage }) => {
    console.log('Finished:', finishReason);
    console.log('Tokens used:', usage);

    // Save to database, log analytics, etc.
    await saveToDatabase(text);
  },
});

Streaming with System Messages

const result = streamText({
  model: openai('gpt-4-turbo'),
  system: 'You are a helpful coding assistant. Be concise.',
  messages: [
    { role: 'user', content: 'How do I sort an array in JavaScript?' }
  ],
});

return result.toDataStreamResponse();

Streaming with Options

const result = streamText({
  model: openai('gpt-4-turbo'),
  messages,

  // Model parameters
  temperature: 0.7,
  maxTokens: 1000,
  topP: 0.9,

  // Callbacks
  onChunk: ({ chunk }) => {
    // Called for each chunk
    console.log('Chunk:', chunk);
  },
  onFinish: ({ text, usage }) => {
    // Called when stream completes
    console.log('Total tokens:', usage.totalTokens);
  },

  // Abort signal for cancellation
  abortSignal: controller.signal,
});

Client-Side Stream Handling

'use client';

import { useState } from 'react';

export default function StreamDemo() {
  const [response, setResponse] = useState('');
  const [isLoading, setIsLoading] = useState(false);

  const handleStream = async () => {
    setIsLoading(true);
    setResponse('');

    const res = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        messages: [{ role: 'user', content: 'Tell me a joke' }]
      }),
    });

    const reader = res.body?.getReader();
    const decoder = new TextDecoder();

    while (reader) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      setResponse(prev => prev + text);
    }

    setIsLoading(false);
  };

  return (
    <div>
      <button onClick={handleStream} disabled={isLoading}>
        {isLoading ? 'Streaming...' : 'Get Response'}
      </button>
      <div>{response}</div>
    </div>
  );
}

Use useChat Instead

While manual stream handling works, the useChat hook handles all this complexity for you. Use it for chat interfaces instead of manual stream parsing.

Error Handling in Streams

try {
  const result = streamText({
    model: openai('gpt-4-turbo'),
    messages,
  });

  return result.toDataStreamResponse();
} catch (error) {
  if (error.name === 'AbortError') {
    return new Response('Stream cancelled', { status: 499 });
  }

  console.error('Stream error:', error);
  return new Response('Error generating response', { status: 500 });
}

Key Takeaways

• Streaming delivers responses incrementally for better UX
• Use streamText for server-side streaming
• toDataStreamResponse() handles headers and protocol automatically
• The onFinish callback is useful for logging and persistence
• Use useChat hook on the client for easier integration

Learn More

streamText Documentation →
Complete reference for the streamText function.
Stream Protocol →
Understand the data stream format and protocol.