Home
  • June 1, 2026 Some (non-technical) details on training neural networks

    Notes from training a 500M-parameter transformer mostly from scratch