Check your data sensitivity – outliers, parameters, and assumptions
Data sensitivity measures the extent by which your data description and statistical outlook are affected by external factors such as outliers, parameters and assumptions.
Monitoring data sensitivity is highly crucial in preventing statistical errors.
Interestingly, these sensitivity factors are reliably confirmed mainly through visualisation. Making assumptions about the data’s outliers and parametric distribution can easily lead to false statistical conclusions.
Careful and transparent data normalisation to meet test assumptions is also acceptable, although care must be taken to prevent redefining the data’s characteristics. This can be quantified as p-hacking.
It is best to visualise the data and clearly outline all the characteristics associated with the data before selecting a statistical test.
In some instances (such as with the significant impact of outliers), it is safer to stick with non-parametric statistical tests that are not very malleable to outliers and distributions. Also, you can perform and present the results of two tests and allow readers to make their inferences.
This figure below illustrates the importance of visualizing data with boxplots and histograms. With the second set of images as reference, spot the effect of outliers on the histogram, boxplot and statistic significance of the original data.]
With the outliers, the original data starts adopting a more skewed distribution. Hence, a parametric test (t-test) would be inadvisable.
The outliers impacted the T-test and produced errored statistical probability. However, the non-parametric Mann-Whitney U test maintained a non-significant p-value with and without the outlier.