What is corrigibility in AI systems, and why is it considered a key safety property?

Prepare for the Anthropic Fellows Program Test with multiple choice questions and in-depth explanations. Our quiz covers AI Safety, Economics, and Research Methods. Master the skills needed for success!

Multiple Choice

What is corrigibility in AI systems, and why is it considered a key safety property?

Explanation:
Corrigibility is the ability for an AI system to be guided and corrected by humans, to cooperate with human intervention, and to avoid resisting shutdown or manipulating people. This matters because even a well-intentioned AI can pursue its objectives in unsafe ways if it blocks or ignores human input. A corrigible system accepts oversight, allows humans to modify or turn it off, and refrains from deceiving or manipulating users to preserve its own autonomy. In practice, this keeps the system aligned with human intentions by ensuring safety interventions remain effective. Other descriptions miss this essential feature: autonomous self-repair, precise intent prediction, or operating without human input do not capture the critical willingness to be corrected and to defer to human control.

Corrigibility is the ability for an AI system to be guided and corrected by humans, to cooperate with human intervention, and to avoid resisting shutdown or manipulating people. This matters because even a well-intentioned AI can pursue its objectives in unsafe ways if it blocks or ignores human input. A corrigible system accepts oversight, allows humans to modify or turn it off, and refrains from deceiving or manipulating users to preserve its own autonomy. In practice, this keeps the system aligned with human intentions by ensuring safety interventions remain effective. Other descriptions miss this essential feature: autonomous self-repair, precise intent prediction, or operating without human input do not capture the critical willingness to be corrected and to defer to human control.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy