Trained vs base — random + top-by-output-tokens differing rows per benchmark, with embedded images.
Source CSV: projects/indirect-caption/data/eval_results.csv on exp-record.
projects/indirect-caption/data/eval_results.csv