Define scalable oversight and give an example of how it may reduce annotation costs in large models.

Prepare for the Anthropic Fellows Program Test with multiple choice questions and in-depth explanations. Our quiz covers AI Safety, Economics, and Research Methods. Master the skills needed for success!

Multiple Choice

Define scalable oversight and give an example of how it may reduce annotation costs in large models.

Explanation:
Scalable oversight uses a cheaper automated evaluator to judge most outputs at scale, while humans handle only the uncertain or high-risk cases. This systematically lowers annotation costs because the routine judgments are done by the automated agent rather than by humans. In practice, you train a smaller oracle model to judge outputs; for new outputs, the oracle provides a verdict, and only if it is uncertain or signals risk do humans intervene to provide the final label. This keeps the vast majority of evaluations automated, reserving human effort for the difficult cases, which makes it feasible to oversee very large models. The other options miss the point: random evaluation isn’t reliable for safety, evaluating only a small sample still taxes humans for many cases, and discarding human evaluation entirely removes essential safety checks.

Scalable oversight uses a cheaper automated evaluator to judge most outputs at scale, while humans handle only the uncertain or high-risk cases. This systematically lowers annotation costs because the routine judgments are done by the automated agent rather than by humans. In practice, you train a smaller oracle model to judge outputs; for new outputs, the oracle provides a verdict, and only if it is uncertain or signals risk do humans intervene to provide the final label. This keeps the vast majority of evaluations automated, reserving human effort for the difficult cases, which makes it feasible to oversee very large models. The other options miss the point: random evaluation isn’t reliable for safety, evaluating only a small sample still taxes humans for many cases, and discarding human evaluation entirely removes essential safety checks.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy