The double-edged sword of data preparation: ‘P-hacking’
In statistics, data preparation and selection is a crucial yet dicey step.
From lesson 1, we have seen a little taste of data preparation techniques such as handling outliers, choosing the correct parametric distribution, selecting the appropriate sample size, etc. All these steps must be performed to enhance the statistics process and prevent statistical errors.
However, we mustn’t actively manipulate our data to achieve statistically significant results. This is known as p-hacking.
The line between ethical data preparation and p-hacking can be very thin. Therefore, this lesson clearly outlines some practices that are apparent data manipulations and are considered unethical.