Beware of relational assumptions and multiple comparisons
Another wary slippery slope of statistics is the slope of relational and multiple comparisons
Relational comparisons are statistical tests that quantify the relationship between two or more variables. The most common of these tests are correlation, cross-correlation, and regression analysis. Like all other statistical tests, each of the relational tests works with certain assumptions about the nature of the input data. These assumptions must be carefully understood and appropriately represented for analysis.
- Linearity
Tests like linear regression and simple correlation assume a linear relationship between the dependent and independent variables. However, many relationships, such as curved or clustered relationships, reflect nonlinear natures. Using linear statistical tests may lead to false inferences about the data relations. Again, visualization is vital: A scatter plot clearly illustrates the nature of the relationships between the data.
- Independence
In testing for multivariate causal relationships, statistical tests such as regression and some forms of ANOVA assume that the input-dependent variables have insignificant relationships between them. However, if these different variables originate from one individual, there is a strong likelihood of dependence between them. Autocorrelation plots, scatter plots of the residual between predictor and dependent variables against time, or the Durbin-Watson test are most beneficial to detect independence.
- Homoscedasticity
Homoscedasticity assumes that the residuals (i.e. the difference between observed and predicted variables) are constant across all data points. This means the test assumes the data points are uniformly spread around the regression line. If the data points start to converge, they reflect some heteroscedastic nature in the relationship.
All these assumptions about the relational data in question must be made clear in every research for reliability.