src/data/question-groups/data-science/content/multicollinearity.md
Multicollinearity is when two or more independent variables in a regression model are highly correlated, meaning they tell similar stories. This makes it hard for the model to figure out which variable is actually influencing the target, leading to unreliable or unstable coefficient estimates.
For example, in a regression model looking at economic growth, common variables will be GDP, Unemployment Rate, and Consumer Spending. These variables are all related, and the model might not be as effective as it should be.
To detect multicollinearity use:
Common pitfall: Including highly correlated predictors without checking VIF may inflate model error and reduce stability.