Matemáticas para IA

Por qué importan las matemáticas en IA

Las matemáticas son el lenguaje de la IA. Cada algoritmo de IA, desde regresión lineal hasta redes neuronales complejas, se basa en principios matemáticos. Entender la matemática detrás de la IA te ayuda a diseñar mejores modelos, depurar problemas e innovar.

📐 Cuatro pilares:

La IA se apoya principalmente en cuatro dominios: Álgebra lineal (representación de datos), Cálculo (optimización), Probabilidad (incertidumbre) y Estadística (aprendizaje a partir de datos).

1. Álgebra lineal: el lenguaje de los datos

¿Por qué álgebra lineal?

• Representación de datos: Imágenes, texto y audio se almacenan como vectores y matrices
• Redes neuronales: Las operaciones son multiplicaciones de matrices
• Reducción de dimensionalidad: PCA, SVD usan descomposición matricial
• Eficiencia: Las GPUs están optimizadas para operaciones de matrices

Vectores y matrices

Los vectores son la representación básica de datos en IA: cada palabra, imagen o señal se codifica como un vector numérico. Las operaciones como el producto punto y la similitud coseno permiten medir relaciones entre datos.

// Vector Operations in JavaScript
class Vector {
  constructor(elements) {
    this.elements = elements;
    this.length = elements.length;
  }

  // Vector addition: v1 + v2
  add(other) {
    if (this.length !== other.length) {
      throw new Error("Vectors must have same dimension");
    }
    return new Vector(
      this.elements.map((val, i) => val + other.elements[i])
    );
  }

  // Scalar multiplication: k * v
  scale(scalar) {
    return new Vector(
      this.elements.map(val => val * scalar)
    );
  }

  // Dot product: v1 · v2
  dot(other) {
    if (this.length !== other.length) {
      throw new Error("Vectors must have same dimension");
    }
    return this.elements.reduce((sum, val, i) => 
      sum + val * other.elements[i], 0
    );
  }

  // Magnitude (length): ||v||
  magnitude() {
    return Math.sqrt(this.dot(this));
  }

  // Unit vector: v / ||v||
  normalize() {
    const mag = this.magnitude();
    return this.scale(1 / mag);
  }

  toString() {
    return "[" + this.elements.join(", ") + "]";
  }
}

// Example: Word embeddings as vectors
const word1 = new Vector([0.5, 0.8, 0.3]); // "king"
const word2 = new Vector([0.4, 0.7, 0.4]); // "queen"

console.log("Vector 1:", word1.toString());
console.log("Vector 2:", word2.toString());

// Vector addition (analogy: king - man + woman)
const result = word1.add(word2);
console.log("Addition:", result.toString());

// Dot product (similarity measure)
const similarity = word1.dot(word2);
console.log("Dot product (similarity):", similarity.toFixed(4));

// Cosine similarity (normalized)
const cosineSim = word1.dot(word2) / (word1.magnitude() * word2.magnitude());
console.log("Cosine similarity:", cosineSim.toFixed(4));

🎯 Conceptos clave:

Vector: Lista ordenada de números (array 1D)
Matriz: Array 2D de números (filas × columnas)
Producto punto: Mide similitud entre vectores
Multiplicación de matrices: Operación central en redes neuronales

Operaciones con matrices

Las matrices son fundamentales para las redes neuronales: cada capa de una red realiza una multiplicación de matrices seguida de una función de activación. Comprender estas operaciones es clave para entender el forward pass.

// Matrix Operations for Neural Networks
class Matrix {
  constructor(rows, cols, data = null) {
    this.rows = rows;
    this.cols = cols;
    
    if (data) {
      this.data = data;
    } else {
      // Initialize with zeros
      this.data = Array(rows).fill(0).map(() => Array(cols).fill(0));
    }
  }

  // Matrix multiplication: A × B
  multiply(other) {
    if (this.cols !== other.rows) {
      throw new Error("Invalid dimensions for multiplication");
    }

    const result = new Matrix(this.rows, other.cols);
    
    for (let i = 0; i < this.rows; i++) {
      for (let j = 0; j < other.cols; j++) {
        let sum = 0;
        for (let k = 0; k < this.cols; k++) {
          sum += this.data[i][k] * other.data[k][j];
        }
        result.data[i][j] = sum;
      }
    }
    
    return result;
  }

  // Element-wise multiplication (Hadamard product)
  hadamard(other) {
    if (this.rows !== other.rows || this.cols !== other.cols) {
      throw new Error("Matrices must have same dimensions");
    }

    const result = new Matrix(this.rows, this.cols);
    for (let i = 0; i < this.rows; i++) {
      for (let j = 0; j < this.cols; j++) {
        result.data[i][j] = this.data[i][j] * other.data[i][j];
      }
    }
    return result;
  }

  // Transpose: swap rows and columns
  transpose() {
    const result = new Matrix(this.cols, this.rows);
    for (let i = 0; i < this.rows; i++) {
      for (let j = 0; j < this.cols; j++) {
        result.data[j][i] = this.data[i][j];
      }
    }
    return result;
  }

  // Apply function to each element
  map(fn) {
    const result = new Matrix(this.rows, this.cols);
    for (let i = 0; i < this.rows; i++) {
      for (let j = 0; j < this.cols; j++) {
        result.data[i][j] = fn(this.data[i][j], i, j);
      }
    }
    return result;
  }

  // Matrix addition
  add(other) {
    if (this.rows !== other.rows || this.cols !== other.cols) {
      throw new Error("Matrices must have same dimensions");
    }
    return new Matrix(
      this.rows, 
      this.cols,
      this.data.map((row, i) => 
        row.map((val, j) => val + other.data[i][j])
      )
    );
  }

  print() {
    console.log("Matrix " + this.rows + "x" + this.cols + ":");
    this.data.forEach(row => {
      console.log("  [" + row.map(v => v.toFixed(2)).join(", ") + "]");
    });
  }
}

// Example: Neural Network forward pass
console.log("Neural Network Layer Computation:
");

// Input: 1x3 (1 sample, 3 features)
const input = new Matrix(1, 3, [[0.5, 0.8, 0.3]]);
console.log("Input:");
input.print();

// Weights: 3x4 (3 inputs → 4 neurons)
const weights = new Matrix(3, 4, [
  [0.2, -0.1, 0.4, 0.3],
  [0.5, 0.3, -0.2, 0.1],
  [-0.1, 0.4, 0.2, -0.3]
]);
console.log("
Weights:");
weights.print();

// Forward pass: input × weights
const output = input.multiply(weights);
console.log("
Output (before activation):");
output.print();

// Apply activation function (ReLU)
const activated = output.map(x => Math.max(0, x));
console.log("
Activated output (ReLU):");
activated.print();

2. Cálculo: las matemáticas de la optimización

¿Por qué cálculo?

• Descenso de gradiente: Encuentra mínimos de funciones de pérdida
• Backpropagation: Calcula gradientes con la regla de la cadena
• Optimización: Minimiza el error y entrena modelos
• Tasa de cambio: Entiende cómo afectan los parámetros a la salida

Derivadas y gradientes

Las derivadas miden cómo cambia una función cuando varían sus entradas. El gradiente generaliza la derivada a múltiples dimensiones y es el motor del descenso de gradiente, el algoritmo que entrena las redes neuronales.

// Numerical Gradient Computation
class Calculus {
  // Numerical derivative (finite difference)
  static derivative(f, x, h = 1e-5) {
    // f'(x) ≈ [f(x + h) - f(x - h)] / (2h)
    return (f(x + h) - f(x - h)) / (2 * h);
  }

  // Gradient for multivariable function
  static gradient(f, x, h = 1e-5) {
    const grad = [];
    for (let i = 0; i < x.length; i++) {
      const xPlusH = [...x];
      const xMinusH = [...x];
      xPlusH[i] += h;
      xMinusH[i] -= h;
      
      grad[i] = (f(xPlusH) - f(xMinusH)) / (2 * h);
    }
    return grad;
  }

  // Gradient descent optimizer
  static gradientDescent(f, initialX, learningRate = 0.01, iterations = 100) {
    let x = [...initialX];
    const history = [{ x: [...x], value: f(x) }];

    for (let iter = 0; iter < iterations; iter++) {
      // Calculate gradient
      const grad = this.gradient(f, x);
      
      // Update: x = x - α * ∇f(x)
      x = x.map((xi, i) => xi - learningRate * grad[i]);
      
      const value = f(x);
      history.push({ x: [...x], value });
      
      // Log every 10 iterations
      if (iter % 10 === 0) {
        console.log("Iteration " + iter + ": x = [" + x.map(v => v.toFixed(4)).join(", ") + "], f(x) = " + value.toFixed(6) + "");
      }
    }

    return { optimum: x, value: f(x), history };
  }
}

// Example 1: Find minimum of f(x) = x²
console.log("Example 1: Minimize f(x) = x²
");
const f1 = x => x * x;

const derivative = Calculus.derivative(f1, 3);
console.log("f'(3) =", derivative.toFixed(4), "(analytical: 6)
");

// Find minimum starting from x = 5
const result1 = Calculus.gradientDescent(
  x => x[0] * x[0],
  [5],
  0.1,
  50
);
console.log("
Optimum found at x =", result1.optimum[0].toFixed(6));
console.log("Minimum value:", result1.value.toFixed(6));

// Example 2: Minimize f(x,y) = x² + y² (bowl shape)
console.log("

Example 2: Minimize f(x,y) = x² + y²
");
const f2 = x => x[0] * x[0] + x[1] * x[1];

const result2 = Calculus.gradientDescent(
  f2,
  [5, -3],
  0.1,
  50
);
console.log("
Optimum found at:", result2.optimum.map(v => v.toFixed(6)));
console.log("Minimum value:", result2.value.toFixed(6));

📊 Cálculo en redes neuronales:

Forward pass: Computa salida desde la entrada

Función de pérdida: Mide error L(ŷ, y)

Backward pass: Calcula ∂L/∂w con regla de la cadena

Actualización de pesos: w = w - α(∂L/∂w)

Regla de la cadena: base de backpropagation

// Automatic Differentiation with Computational Graph
class Node {
  constructor(value, children = [], operation = '') {
    this.value = value;
    this.gradient = 0;
    this.children = children;
    this.operation = operation;
  }

  backward(gradient = 1) {
    this.gradient += gradient;

    if (this.operation === 'add') {
      // ∂(x + y)/∂x = 1, ∂(x + y)/∂y = 1
      this.children[0].backward(gradient);
      this.children[1].backward(gradient);
    } else if (this.operation === 'mul') {
      // ∂(x * y)/∂x = y, ∂(x * y)/∂y = x
      this.children[0].backward(gradient * this.children[1].value);
      this.children[1].backward(gradient * this.children[0].value);
    } else if (this.operation === 'pow') {
      // ∂(x^n)/∂x = n * x^(n-1)
      const x = this.children[0];
      const n = this.children[1].value;
      x.backward(gradient * n * Math.pow(x.value, n - 1));
    }
  }
}

// Operations
function add(a, b) {
  return new Node(a.value + b.value, [a, b], 'add');
}

function mul(a, b) {
  return new Node(a.value * b.value, [a, b], 'mul');
}

function pow(a, n) {
  return new Node(Math.pow(a.value, n), [a, new Node(n)], 'pow');
}

// Example: f(x, y) = (x + y) * x
console.log("Automatic Differentiation Example:");
console.log("f(x, y) = (x + y) * x
");

const x = new Node(3, [], 'input');
const y = new Node(4, [], 'input');

const sum = add(x, y);      // x + y = 7
const result = mul(sum, x); // (x + y) * x = 21

console.log("Forward pass:");
console.log("x =", x.value);
console.log("y =", y.value);
console.log("f(x, y) =", result.value);

// Backward pass
console.log("
Backward pass (computing gradients):");
result.backward(1); // Start with gradient of 1

console.log("∂f/∂x =", x.gradient, "(analytical: 2x + y = 10)");
console.log("∂f/∂y =", y.gradient, "(analytical: x = 3)");

// Example 2: Neural network layer
console.log("

Neural Network Layer:");
console.log("f(x) = σ(w·x + b) where σ(z) = 1/(1+e^(-z))
");

const x_input = new Node(0.5, [], 'input');
const w_weight = new Node(0.8, [], 'param');
const b_bias = new Node(0.2, [], 'param');

// z = w * x + b
const wx = mul(w_weight, x_input);
const z = add(wx, b_bias);

console.log("Forward:");
console.log("z = w·x + b =", z.value);

// Backward
z.backward(1);
console.log("
Gradients:");
console.log("∂z/∂w =", w_weight.gradient, "(should be x =", x_input.value + ")");
console.log("∂z/∂x =", x_input.gradient, "(should be w =", w_weight.value + ")");
console.log("∂z/∂b =", b_bias.gradient, "(should be 1)");

3. Probabilidad: modelar la incertidumbre

¿Por qué probabilidad?

• Cuantificar incertidumbre: Confianza en predicciones
• Inferencia bayesiana: Actualizar creencias con evidencia
• Modelos probabilísticos: Naive Bayes, HMM, redes bayesianas
• Muestreo: Monte Carlo, MCMC para distribuciones complejas

Distribuciones de probabilidad

// Probability Distributions in AI
class Probability {
  // Normal (Gaussian) distribution
  static normal(x, mean = 0, stdDev = 1) {
    const variance = stdDev * stdDev;
    const coefficient = 1 / Math.sqrt(2 * Math.PI * variance);
    const exponent = -Math.pow(x - mean, 2) / (2 * variance);
    return coefficient * Math.exp(exponent);
  }

  // Sample from normal distribution (Box-Muller transform)
  static sampleNormal(mean = 0, stdDev = 1, count = 1) {
    const samples = [];
    for (let i = 0; i < count; i += 2) {
      const u1 = Math.random();
      const u2 = Math.random();
      
      const z0 = Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2);
      const z1 = Math.sqrt(-2 * Math.log(u1)) * Math.sin(2 * Math.PI * u2);
      
      samples.push(mean + z0 * stdDev);
      if (i + 1 < count) samples.push(mean + z1 * stdDev);
    }
    return samples.slice(0, count);
  }

  // Bernoulli distribution (binary outcome)
  static bernoulli(p) {
    return Math.random() < p ? 1 : 0;
  }

  // Calculate entropy
  static entropy(probabilities) {
    return -probabilities.reduce((sum, p) => {
      return p > 0 ? sum + p * Math.log2(p) : sum;
    }, 0);
  }

  // KL Divergence: measure difference between distributions
  static klDivergence(p, q) {
    return p.reduce((sum, pi, i) => {
      if (pi > 0 && q[i] > 0) {
        return sum + pi * Math.log(pi / q[i]);
      }
      return sum;
    }, 0);
  }
}

// Example 1: Normal distribution (common in neural network initialization)
console.log("Normal Distribution Example:
");
console.log("Probability density at x = 0:", Probability.normal(0, 0, 1).toFixed(4));
console.log("Probability density at x = 1:", Probability.normal(1, 0, 1).toFixed(4));

const samples = Probability.sampleNormal(0, 1, 1000);
const mean = samples.reduce((a, b) => a + b) / samples.length;
const variance = samples.reduce((sum, x) => sum + Math.pow(x - mean, 2), 0) / samples.length;

console.log("
Sampled 1000 points from N(0,1):");
console.log("Sample mean:", mean.toFixed(4), "(expected: 0)");
console.log("Sample variance:", variance.toFixed(4), "(expected: 1)");

// Example 2: Entropy (information theory)
console.log("

Entropy Examples:
");

const uniform = [0.25, 0.25, 0.25, 0.25]; // Maximum uncertainty
const peaked = [0.7, 0.1, 0.1, 0.1];       // Low uncertainty

console.log("Uniform distribution:", uniform);
console.log("Entropy:", Probability.entropy(uniform).toFixed(4), "bits (maximum for 4 classes)");

console.log("
Peaked distribution:", peaked);
console.log("Entropy:", Probability.entropy(peaked).toFixed(4), "bits (lower uncertainty)");

// Example 3: Bayes' Theorem
console.log("

Bayes' Theorem: P(A|B) = P(B|A) * P(A) / P(B)
");

// Disease diagnosis example
const pDisease = 0.01;           // P(Disease) = 1%
const pPositiveGivenDisease = 0.95; // P(Positive|Disease) = 95% (sensitivity)
const pPositiveGivenHealthy = 0.05; // P(Positive|Healthy) = 5% (false positive)

const pHealthy = 1 - pDisease;
const pPositive = pPositiveGivenDisease * pDisease + pPositiveGivenHealthy * pHealthy;
const pDiseaseGivenPositive = (pPositiveGivenDisease * pDisease) / pPositive;

console.log("Medical Test Scenario:");
console.log("- Disease prevalence: " + (pDisease * 100).toFixed(1) + "%");
console.log("- Test sensitivity: " + (pPositiveGivenDisease * 100).toFixed(1) + "%");
console.log("- False positive rate: " + (pPositiveGivenHealthy * 100).toFixed(1) + "%");
console.log("
If test is positive:");
console.log("P(Disease|Positive) = " + (pDiseaseGivenPositive * 100).toFixed(1) + "%");
console.log("
(Despite 95% sensitivity, only " + (pDiseaseGivenPositive * 100).toFixed(1) + "% chance of actually having disease!)");

4. Estadística: aprender de los datos

¿Por qué estadística?

• Pruebas de hipótesis: Validar mejoras del modelo
• Intervalos de confianza: Cuantificar incertidumbre en predicciones
• Tradeoff bias-variance: Balancear underfitting vs overfitting
• Análisis de datos: Entender el dataset antes de modelar

Medidas estadísticas

// Statistical Analysis Toolkit
class Statistics {
  // Mean (average)
  static mean(data) {
    return data.reduce((sum, x) => sum + x, 0) / data.length;
  }

  // Median (middle value)
  static median(data) {
    const sorted = [...data].sort((a, b) => a - b);
    const mid = Math.floor(sorted.length / 2);
    return sorted.length % 2 === 0
      ? (sorted[mid - 1] + sorted[mid]) / 2
      : sorted[mid];
  }

  // Standard deviation
  static std(data) {
    const avg = this.mean(data);
    const variance = data.reduce((sum, x) => 
      sum + Math.pow(x - avg, 2), 0) / data.length;
    return Math.sqrt(variance);
  }

  // Correlation coefficient
  static correlation(x, y) {
    if (x.length !== y.length) throw new Error("Arrays must be same length");
    
    const n = x.length;
    const meanX = this.mean(x);
    const meanY = this.mean(y);
    
    let numerator = 0;
    let sumXSq = 0;
    let sumYSq = 0;
    
    for (let i = 0; i < n; i++) {
      const dx = x[i] - meanX;
      const dy = y[i] - meanY;
      numerator += dx * dy;
      sumXSq += dx * dx;
      sumYSq += dy * dy;
    }
    
    return numerator / Math.sqrt(sumXSq * sumYSq);
  }

  // Confidence interval for mean
  static confidenceInterval(data, confidence = 0.95) {
    const n = data.length;
    const mean = this.mean(data);
    const std = this.std(data);
    
    // Using t-distribution (simplified with normal approximation)
    const zScore = confidence === 0.95 ? 1.96 : 2.576; // 95% or 99%
    const margin = zScore * (std / Math.sqrt(n));
    
    return {
      mean: mean,
      lower: mean - margin,
      upper: mean + margin,
      margin: margin
    };
  }

  // Normalize data (z-score normalization)
  static normalize(data) {
    const mean = this.mean(data);
    const std = this.std(data);
    return data.map(x => (x - mean) / std);
  }

  // Min-Max scaling
  static minMaxScale(data, min = 0, max = 1) {
    const dataMin = Math.min(...data);
    const dataMax = Math.max(...data);
    const range = dataMax - dataMin;
    
    return data.map(x => 
      min + (x - dataMin) * (max - min) / range
    );
  }
}

// Example: Analyzing model performance
console.log("Statistical Analysis of Model Accuracy:
");

// Accuracy scores from 30 training runs
const accuracies = [
  0.87, 0.89, 0.88, 0.85, 0.91, 0.88, 0.90, 0.86, 0.89, 0.87,
  0.88, 0.92, 0.87, 0.89, 0.88, 0.86, 0.90, 0.88, 0.89, 0.87,
  0.91, 0.88, 0.87, 0.89, 0.90, 0.86, 0.88, 0.89, 0.87, 0.90
];

console.log("Descriptive Statistics:");
console.log("Mean accuracy:", Statistics.mean(accuracies).toFixed(4));
console.log("Median accuracy:", Statistics.median(accuracies).toFixed(4));
console.log("Std deviation:", Statistics.std(accuracies).toFixed(4));

const ci = Statistics.confidenceInterval(accuracies, 0.95);
console.log("
95% Confidence Interval:");
console.log("Mean: " + ci.mean.toFixed(4) + " ± " + ci.margin.toFixed(4) + "");
console.log("Range: [" + ci.lower.toFixed(4) + ", " + ci.upper.toFixed(4) + "]");

// Feature correlation
console.log("

Feature Correlation Example:
");
const feature1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
const feature2 = [2, 4, 5, 4, 5, 7, 8, 9, 10, 11]; // Correlated
const feature3 = [10, 8, 9, 7, 6, 5, 4, 3, 2, 1];  // Negative correlation

const corr12 = Statistics.correlation(feature1, feature2);
const corr13 = Statistics.correlation(feature1, feature3);

console.log("Correlation(feature1, feature2):", corr12.toFixed(4));
console.log("Correlation(feature1, feature3):", corr13.toFixed(4));

// Data normalization
console.log("

Data Normalization:
");
const rawData = [10, 20, 30, 40, 50];
console.log("Raw data:", rawData);

const normalized = Statistics.normalize(rawData);
console.log("Z-score normalized:", normalized.map(x => x.toFixed(2)));

const scaled = Statistics.minMaxScale(rawData, 0, 1);
console.log("Min-Max scaled [0,1]:", scaled.map(x => x.toFixed(2)));

Fundamentos matemáticos en acción

🧮 Redes neuronales

• Álgebra lineal: Multiplicación de matrices para capas
• Cálculo: Backpropagation vía regla de la cadena
• Probabilidad: Regularización con dropout
• Estadística: Batch normalization

📊 Machine Learning

• Álgebra lineal: Vectores de features y transformaciones
• Cálculo: Optimización por descenso de gradiente
• Probabilidad: Clasificadores probabilísticos
• Estadística: Validación cruzada, pruebas de significancia

🎯 Visión por computadora

• Álgebra lineal: Convoluciones como operaciones matriciales
• Cálculo: Detección de bordes (gradientes)
• Probabilidad: Segmentación probabilística
• Estadística: Normalización de imágenes

🗣️ PLN

• Álgebra lineal: Embeddings como vectores
• Cálculo: Optimización de atención
• Probabilidad: Modelos de lenguaje (probabilidad de siguiente palabra)
• Estadística: TF-IDF, estadísticas de texto

Fórmulas matemáticas esenciales

Funciones de pérdida

Error cuadrático medio (MSE):

L = (1/n) Σ(ŷᵢ - yᵢ)²

Pérdida cross-entropy:

L = -Σ yᵢ log(ŷᵢ)

Optimización

Descenso de gradiente:

θ = θ - α∇L(θ)

Optimizador Adam:

θ = θ - α·m̂/(√v̂ + ε)

Funciones de activación

Sigmoid:

σ(x) = 1/(1 + e⁻ˣ)

ReLU:

f(x) = max(0, x)

Softmax:

σ(xᵢ) = eˣⁱ / Σⱼeˣʲ

Regularización

Regularización L2 (Ridge):

L = Loss + λΣwᵢ²

Regularización L1 (Lasso):

L = Loss + λΣ|wᵢ|

💡 Conclusiones clave

✓ El álgebra lineal representa datos como vectores y matrices
✓ El cálculo permite optimizar con descenso de gradiente
✓ La probabilidad modela incertidumbre y confianza
✓ La estadística ayuda a validar y entender rendimiento
✓ Los cuatro dominios trabajan juntos en cada algoritmo de IA
✓ No necesitas ser matemático — entender conceptos es lo clave

📚 Ruta de aprendizaje

Domina estos conceptos en orden:

1. Álgebra lineal básica: Vectores, matrices, producto punto, multiplicación
2. Fundamentos de cálculo: Derivadas, derivadas parciales, regla de la cadena
3. Teoría de probabilidad: Distribuciones, teorema de Bayes, esperanza
4. Estadística: Media, varianza, correlación, pruebas de hipótesis
5. Aplicar a IA: Conectar las matemáticas con algoritmos reales

🔧 Consejos prácticos

• Empieza con código: Implementa conceptos para entenderlos mejor
• Visualiza: Grafica funciones, gradientes y distribuciones
• Usa librerías: NumPy, PyTorch automatizan la matemática, pero entiende qué hacen
• Trabaja ejemplos: Calcula gradientes a mano en redes pequeñas
• No memorices: Enfócate en la intuición y cuándo aplicar cada concepto