Back to Developer Roadmap

Linear Regression

src/data/question-groups/data-science/content/linear-regression.md

4.01.8 KB
Original Source

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It is a type of supervised machine learning algorithm that computes the linear relationship between the dependent variable and one or more independent features by fitting a linear equation with observed data. It predicts the output variables based on the independent input variable.

For example, if you want to predict someone's salary, you use various factors such as years of experience, education level, industry of employment, and location of the job. Linear regression uses all these parameters to predict the salary as it is considered a linear relation between all these factors and the price of the house.

Assumptions for linear regression include:

  • Linearity: Linear regression assumes there is a linear relationship between the independent and dependent variables. This means that changes in the independent variable lead to proportional changes in the dependent variable, whether positively or negatively.
  • Independence of errors: The observations should be independent from each other, that is, the errors from one observation should not influence another.
  • Homoscedasticity (equal variance): Linear regression assumes the variance of the errors is constant across all levels of the independent variable(s). This indicates that the amount of the independent variable(s) has no impact on the variance of the errors.
  • Normality of residuals: This means that the residuals should follow a bell-shaped curve. If the residuals are not normally distributed, then linear regression will not be an accurate model.
  • No multicollinearity: Linear regression assumes there is no correlation between the independent variables chosen for the model.