Question 1

What is the difference between correlation and causation?

Accepted Answer

Correlation measures the statistical relationship between two variables — when one changes, the other tends to change in a predictable way. Causation means that one variable directly produces a change in the other. A strong correlation does not prove causation. For example, ice cream sales and drowning rates are correlated (both increase in summer), but ice cream does not cause drowning. Establishing causation requires experimental evidence, temporal precedence, a plausible mechanism, and the elimination of confounding variables.

Question 2

What is Pearson r and how do I interpret it?

Accepted Answer

Pearson r (also called the Pearson correlation coefficient) measures the strength and direction of the linear relationship between two continuous variables. It ranges from -1 to +1. A value of +1 means a perfect positive linear relationship, -1 means a perfect negative linear relationship, and 0 means no linear relationship. Values between 0.7 and 1.0 (or -0.7 and -1.0) are generally considered strong correlations, while values between 0.3 and 0.7 are moderate, and below 0.3 are weak.

Question 3

When should I use Spearman rho instead of Pearson r?

Accepted Answer

Use Spearman rho (rank correlation) when your data is ordinal (ranked), when the relationship between variables is monotonic but not necessarily linear, or when your data contains significant outliers. Spearman rho is more robust than Pearson r because it operates on ranks rather than raw values, making it less sensitive to extreme values and non-normal distributions. If both coefficients are similar, the relationship is likely linear. If Spearman is notably higher, the relationship may be monotonic but non-linear.

Question 4

What does R-squared tell me that Pearson r does not?

Accepted Answer

R-squared (the coefficient of determination) is simply the square of Pearson r. While r tells you the direction and strength of a correlation, R-squared tells you the proportion of variance in Y that is explained by X. For example, if r = 0.80, then R-squared = 0.64, meaning 64% of the variability in Y can be explained by its linear relationship with X. The remaining 36% is due to other factors. R-squared is always between 0 and 1 and is especially useful when you want to communicate how much predictive power one variable has over another.

Question 5

How many data points do I need for a reliable correlation?

Accepted Answer

As a general rule, you need at least 30 data pairs for a reasonably stable correlation estimate. With fewer than 10 pairs, correlation coefficients can fluctuate wildly and p-values are unreliable. For small samples (10 to 30), be cautious about drawing strong conclusions — the confidence interval around r will be wide. For critical decisions, aim for 50 or more data points. Also consider that outliers have a disproportionate effect on small samples, so always inspect your scatter plot visually before trusting the numbers.

Correlation Calculator