Skip to content

Correlation vs Causation (Pearson, Spearman)

Correlation measures association

Correlation answers:

  • “Do variables move together?”

It does not answer:

  • “Does X cause Y?”

Pearson correlation

  • Measures linear relationship
  • Sensitive to outliers
Pearson
import numpy as np
from scipy import stats
 
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 5, 4, 5, 7])
 
r, p = stats.pearsonr(x, y)
print("r:", r)
print("p:", p)
Pearson
import numpy as np
from scipy import stats
 
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 5, 4, 5, 7])
 
r, p = stats.pearsonr(x, y)
print("r:", r)
print("p:", p)

Spearman correlation

  • Uses ranks
  • Captures monotonic relationships
  • More robust to outliers and non-linearity
Spearman
import numpy as np
from scipy import stats
 
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([10, 9, 7, 6, 3, 1])
 
rho, p = stats.spearmanr(x, y)
print("rho:", rho)
print("p:", p)
Spearman
import numpy as np
from scipy import stats
 
x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([10, 9, 7, 6, 3, 1])
 
rho, p = stats.spearmanr(x, y)
print("rho:", rho)
print("p:", p)

Practical guidance

  • Plot scatter first.
  • Consider transformations (log) if scales vary.
  • Be cautious: confounders can create spurious correlation.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did