Which practice helps prevent p-hacking when testing multiple safety hypotheses in a single study?

Prepare for the Anthropic Fellows Program Test with multiple choice questions and in-depth explanations. Our quiz covers AI Safety, Economics, and Research Methods. Master the skills needed for success!

Multiple Choice

Which practice helps prevent p-hacking when testing multiple safety hypotheses in a single study?

Explanation:
When you test several safety hypotheses in one study, you increase the chance of finding at least one false positive just by luck. P-hacking often happens when researchers try many analyses until something looks significant, which amplifies that problem. The best way to prevent this is to adjust for multiple comparisons. Methods like Bonferroni or controlling the false discovery rate set a stricter criterion for declaring significance across all tests, keeping the overall chance of false findings in check. Bonferroni tightens the per-test threshold by dividing the usual alpha by the number of tests; FDR controls the expected proportion of false discoveries among the reported significant results, which is often more powerful when many tests are involved. Pre-registering hypotheses helps reduce data-driven fishing but doesn’t by itself fix the inflated error rate across multiple tests. Testing a single hypothesis or ignoring the issue leaves the error rate unchecked. So, adjusting for multiple comparisons directly addresses the risk of p-hacking in this context.

When you test several safety hypotheses in one study, you increase the chance of finding at least one false positive just by luck. P-hacking often happens when researchers try many analyses until something looks significant, which amplifies that problem. The best way to prevent this is to adjust for multiple comparisons. Methods like Bonferroni or controlling the false discovery rate set a stricter criterion for declaring significance across all tests, keeping the overall chance of false findings in check. Bonferroni tightens the per-test threshold by dividing the usual alpha by the number of tests; FDR controls the expected proportion of false discoveries among the reported significant results, which is often more powerful when many tests are involved. Pre-registering hypotheses helps reduce data-driven fishing but doesn’t by itself fix the inflated error rate across multiple tests. Testing a single hypothesis or ignoring the issue leaves the error rate unchecked. So, adjusting for multiple comparisons directly addresses the risk of p-hacking in this context.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy