Correlation vs Causation (Pearson, Spearman)
Correlation measures association
Correlation answers:
- “Do variables move together?”
It does not answer:
- “Does X cause Y?”
Pearson correlation
- Measures linear relationship
- Sensitive to outliers
Pearson
import numpy as np
from scipy import stats
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 5, 4, 5, 7])
r, p = stats.pearsonr(x, y)
print("r:", r)
print("p:", p)Pearson
import numpy as np
from scipy import stats
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 5, 4, 5, 7])
r, p = stats.pearsonr(x, y)
print("r:", r)
print("p:", p)Spearman correlation
- Uses ranks
- Captures monotonic relationships
- More robust to outliers and non-linearity
Spearman
import numpy as np
from scipy import stats
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([10, 9, 7, 6, 3, 1])
rho, p = stats.spearmanr(x, y)
print("rho:", rho)
print("p:", p)Spearman
import numpy as np
from scipy import stats
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([10, 9, 7, 6, 3, 1])
rho, p = stats.spearmanr(x, y)
print("rho:", rho)
print("p:", p)Practical guidance
- Plot scatter first.
- Consider transformations (log) if scales vary.
- Be cautious: confounders can create spurious correlation.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
