Which is a challenge in eliciting robust human preferences for AI alignment?

Prepare for the Anthropic Fellows Program Test with multiple choice questions and in-depth explanations. Our quiz covers AI Safety, Economics, and Research Methods. Master the skills needed for success!

Multiple Choice

Which is a challenge in eliciting robust human preferences for AI alignment?

Explanation:
Eliciting robust human preferences is hard because people’s choices aren’t fixed; they change with context, framing, information, and trade-offs. In practice, what someone wants can shift depending on how a question is posed, what options are available, or what consequences are emphasized. This leads to inconsistencies and context dependence in revealed preferences, which makes it challenging to infer a single, reliable objective for an AI to follow. For example, framing a decision in terms of gains versus losses or changing the surrounding options can flip preferences even when the underlying situation is the same. In AI alignment, you want a stable guide to values, but human preferences often drift or differ across scenarios, so elicitation must account for these variations rather than assuming they’re fixed. The other statements miss this core difficulty: preferences do require elicitation, they are relevant to alignment, and they are not always stable.

Eliciting robust human preferences is hard because people’s choices aren’t fixed; they change with context, framing, information, and trade-offs. In practice, what someone wants can shift depending on how a question is posed, what options are available, or what consequences are emphasized. This leads to inconsistencies and context dependence in revealed preferences, which makes it challenging to infer a single, reliable objective for an AI to follow. For example, framing a decision in terms of gains versus losses or changing the surrounding options can flip preferences even when the underlying situation is the same. In AI alignment, you want a stable guide to values, but human preferences often drift or differ across scenarios, so elicitation must account for these variations rather than assuming they’re fixed. The other statements miss this core difficulty: preferences do require elicitation, they are relevant to alignment, and they are not always stable.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy