Formateo de salida y datos estructurados

Por qué importa el formateo de salida

Al construir aplicaciones con LLM, a menudo necesitas salida estructurada y parseable en lugar de texto libre. Instrucciones de formato adecuadas te ayudan a obtener JSON consistente, bloques de código, tablas o cualquier formato que tu aplicación pueda procesar de forma fiable.

🎯 Objetivos de formateo

Parseabilidad: Salida que tu código pueda parsear de forma fiable (JSON, XML)
Consistencia: Misma estructura cada vez para el mismo tipo de solicitud
Completitud: Todos los campos requeridos presentes
Seguridad de tipos: Tipos de datos correctos para cada campo

Salida JSON

Para obtener JSON fiable del modelo, es fundamental especificar el esquema exacto que esperas, incluyendo los nombres de campos, tipos de datos y un ejemplo completo de la estructura deseada.

// Basic JSON request
"Extract entities from this text and return as JSON.

Text: 'Apple Inc. announced that CEO Tim Cook will visit 
the new Paris office on March 15, 2024.'

Return JSON with: companies, people, locations, dates"

// Better: Specify exact schema
"Extract entities from the text. Return a JSON object 
with this exact structure:

{
  "companies": [{ "name": string, "type": string }],
  "people": [{ "name": string, "role": string }],
  "locations": [{ "city": string, "country": string }],
  "dates": [{ "date": string, "context": string }]
}

Text: 'Apple Inc. announced that CEO Tim Cook will visit 
the new Paris office on March 15, 2024.'

Return ONLY the JSON, no other text."

// Response:
{
  "companies": [{ "name": "Apple Inc.", "type": "technology" }],
  "people": [{ "name": "Tim Cook", "role": "CEO" }],
  "locations": [{ "city": "Paris", "country": "France" }],
  "dates": [{ "date": "2024-03-15", "context": "office visit" }]
}

Asegurar JSON válido

Varias técnicas ayudan a garantizar que el modelo produzca JSON válido: mostrar un ejemplo completo, indicar restricciones explícitas y usar delimitadores para facilitar la extracción programática.

// Techniques for reliable JSON

// 1. Show complete example
"Generate a product listing in JSON format.

Example output:
{
  "id": "prod_123",
  "name": "Wireless Mouse",
  "price": 29.99,
  "inStock": true,
  "categories": ["electronics", "accessories"]
}

Generate a listing for: 'Bluetooth Keyboard with backlight'

Output the JSON only, with no markdown formatting or explanation."

// 2. Specify constraints
"Return a JSON object. Rules:
- Must be valid JSON (no trailing commas)
- Use double quotes for strings
- Numbers should not be quoted
- Booleans are true/false (not strings)
- Empty arrays are [] not null
- No comments in the JSON"

// 3. Use delimiters for extraction
"Return your response in this format:

```json
{ your json here }
```

This makes it easy to parse the JSON from the response."

// Parsing in code:
const jsonMatch = response.match(/```json\n([\s\S]*?)\n```/);
const data = JSON.parse(jsonMatch[1]);

Interfaces TypeScript a partir de JSON

Puedes pedir al modelo que genere tanto las interfaces de TypeScript como datos de ejemplo que cumplan esas interfaces, usando secciones claramente delimitadas para parsear cada parte.

// Generate types along with data
"Generate TypeScript interfaces and sample data.

Task: Create types for a blog post system.

Return in this format:

INTERFACES:
```typescript
// Your interfaces here
```

SAMPLE DATA:
```json
// Sample data matching the interfaces
```

Include: Post, Author, Comment, Category"

// Response includes both parseable sections:

INTERFACES:
```typescript
interface Author {
  id: string;
  name: string;
  email: string;
  avatarUrl: string;
}

interface Comment {
  id: string;
  authorId: string;
  content: string;
  createdAt: string;
}

interface Post {
  id: string;
  title: string;
  content: string;
  author: Author;
  comments: Comment[];
  categories: string[];
  publishedAt: string | null;
}
```

SAMPLE DATA:
```json
{
  "id": "post_1",
  "title": "Getting Started with TypeScript",
  ...
}
```

Formateo en Markdown

Definir una plantilla Markdown con secciones específicas (encabezados, tablas, bloques de código) asegura que la documentación generada siga una estructura consistente y profesional.

// Structured markdown output
"Create documentation for this API endpoint.

Use this exact markdown structure:

# Endpoint Name

## Description
[1-2 sentence description]

## HTTP Request
\`[METHOD] /path\`

## Parameters
| Name | Type | Required | Description |
|------|------|----------|-------------|
| ... | ... | ... | ... |

## Response
\`\`\`json
{example response}
\`\`\`

## Errors
| Code | Description |
|------|-------------|
| ... | ... |

Endpoint:
POST /api/users - Creates a new user with email and password"

Formateo de bloques de código

Cuando necesitas código bien formateado, especifica el lenguaje, las secciones que deseas (implementación y ejemplo de uso) y si quieres solo código o también explicaciones.

// Getting properly formatted code
"Write a React hook for form validation.

Format your response as:

HOOK CODE:
```typescript
// The complete hook implementation
```

USAGE EXAMPLE:
```typescript
// How to use the hook in a component
```

Requirements:
- TypeScript with proper types
- Support for email, required, and minLength validators
- Return errors object and isValid boolean"

// Code-only response
"Write a binary search function in JavaScript.

Return ONLY the code block, no explanations:

```javascript
// your code here
```"

Listas estructuradas

Para listas, checklists o comparaciones, definir el formato exacto con ejemplos de cada categoría garantiza resultados organizados y fáciles de procesar.

// Numbered steps with consistent format
"Explain how to set up a Next.js project.

Format as numbered steps:

1. **Step Title**
   - Command: `command here`
   - Explanation: Brief explanation

2. **Step Title**
   ...

Include exactly 5 steps from initialization to running the dev server."

// Checklists
"Create a code review checklist.

Format as a markdown checklist:

## Category Name
- [ ] Item 1
- [ ] Item 2

Include categories: Security, Performance, Code Quality, Testing"

// Comparison format
"Compare React and Vue.

Use this format for each aspect:

### [Aspect Name]
- **React**: [description]
- **Vue**: [description]
- **Winner**: [React/Vue/Tie] - [reason]

Compare: Learning Curve, Performance, Ecosystem, Job Market"

Manejo de casos límite

Es importante indicar al modelo cómo manejar datos faltantes o inválidos: valores nulos, arreglos vacíos o cadenas por defecto. También puedes incluir validación automática en la respuesta.

// Specify what to do when data is missing
"Extract product info as JSON. If a field is not found:
- Use null for missing optional fields
- Use empty array [] for missing lists
- Use 'UNKNOWN' for missing required strings

Schema:
{
  "name": string,         // required
  "price": number | null, // optional
  "description": string,  // required, use 'UNKNOWN' if missing
  "features": string[]    // use [] if missing
}

Text: 'iPhone 15 Pro - Buy now!'

JSON:"

// Validation in response
"Analyze this data and return JSON.

Before returning, validate:
1. All required fields are present
2. Numbers are valid (not NaN)
3. Dates are in ISO format

If validation fails, return:
{
  "success": false,
  "error": "description of the issue"
}

If validation passes, return:
{
  "success": true,
  "data": { ... }
}"

Patrones de integración con APIs

Las APIs modernas como OpenAI y Anthropic ofrecen modos de salida estructurada nativos (Function Calling, Tool Use) que garantizan JSON válido directamente desde el modelo sin necesidad de parseo manual.

// OpenAI Function Calling / Structured Outputs
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Extract user info from: ..." }],
  response_format: { 
    type: "json_schema",
    json_schema: {
      name: "user_info",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          email: { type: "string" },
          age: { type: "number" }
        },
        required: ["name", "email"]
      }
    }
  }
});

// Anthropic Tool Use for structured output
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  tools: [{
    name: "extract_info",
    description: "Extract structured information",
    input_schema: {
      type: "object",
      properties: {
        name: { type: "string" },
        email: { type: "string" }
      }
    }
  }],
  tool_choice: { type: "tool", name: "extract_info" },
  messages: [{ role: "user", content: "Extract from: ..." }]
});

✅ Buenas prácticas de formateo

• Especifica siempre el esquema/estructura exacta que deseas
• Usa delimitadores (backticks, etiquetas) para secciones parseables
• Incluye un ejemplo completo de la salida deseada
• Indica cómo manejar datos faltantes o inválidos
• Usa funciones de API (function calling) cuando estén disponibles
• Valida la salida en tu código antes de usarla