What is data leakage and how can it occur in AI safety experiments?

Prepare for the Anthropic Fellows Program Test with multiple choice questions and in-depth explanations. Our quiz covers AI Safety, Economics, and Research Methods. Master the skills needed for success!

Multiple Choice

What is data leakage and how can it occur in AI safety experiments?

Explanation:
Data leakage happens when information that should be hidden from the model during training ends up influencing the training process because it comes from the evaluation data or its statistics. In AI safety experiments, this can occur in several concrete ways: using the test set to compute preprocessing parameters (like normalization or feature scaling) that affect the training data, selecting features or tuning hyperparameters based on performance on the test set, or inadvertently including test labels or future information in the training data. When leakage occurs, the model can look much more capable on the test data than it would on truly new, unseen data, giving a misleading impression of its safety or robustness. For example, if you normalize features using statistics computed from the entire dataset, including the test portion, the model is indirectly trained with information from the test set. Or if you peek at test outcomes to decide which model to pick or which safety threshold to set, you’re letting test information guide training decisions. The net effect is optimistic performance estimates that don’t reflect real generalization. The other options don’t capture this issue: data cleaning is a standard data-preparation step, not leakage; a cloud security breach is about external access to data rather than how data informs training; and leakage isn’t a desired way to improve calibration—checking calibration should rely on proper, separate evaluation data, not leaked information.

Data leakage happens when information that should be hidden from the model during training ends up influencing the training process because it comes from the evaluation data or its statistics. In AI safety experiments, this can occur in several concrete ways: using the test set to compute preprocessing parameters (like normalization or feature scaling) that affect the training data, selecting features or tuning hyperparameters based on performance on the test set, or inadvertently including test labels or future information in the training data. When leakage occurs, the model can look much more capable on the test data than it would on truly new, unseen data, giving a misleading impression of its safety or robustness.

For example, if you normalize features using statistics computed from the entire dataset, including the test portion, the model is indirectly trained with information from the test set. Or if you peek at test outcomes to decide which model to pick or which safety threshold to set, you’re letting test information guide training decisions. The net effect is optimistic performance estimates that don’t reflect real generalization.

The other options don’t capture this issue: data cleaning is a standard data-preparation step, not leakage; a cloud security breach is about external access to data rather than how data informs training; and leakage isn’t a desired way to improve calibration—checking calibration should rely on proper, separate evaluation data, not leaked information.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy