A common choice for linear regression is ordinary least squares (OLS).
y = mx + b + error
We define "best" as the line that minimizes the total squared error for all data points.
This total squared error is called the loss function in machine learning.
loss = (-1)^2 + (3)^2 = 1 + 9 = 10
The fitted values are the predicted weights for each person in the dataset that was used to fit the model, while the residuals are the differences between the predicted weight and the true weight for each person
The relationship between the outcome variable and predictor is linear (can be described by a line). We can check this before fitting the regression by simply looking at a plot of the two variables.
Once we’ve calculated the fitted values and residuals for a model, we can check the normality and homoscedasticity assumptions of linear regression.
The normality assumption states that the residuals should be normally distributed.
# To check this assumption, we can inspect a histogram of the residuals
# and make sure that the distribution looks approximately normal
# -- no skew or multiple “humps”.
plt.hist(residuals)
plt.show()
Homoscedasticity — residuals have equal variation across all predictor variables.
# A common way to check this is by plotting the residuals against the fitted values.
# If the homoscedasticity assumption is met, then this plot will look
# like a random splatter of points, centered around y=0.
#
# If there are any patterns or asymmetry, that would indicate the assumption is NOT met
# and linear regression may not be appropriate.
plt.scatter(fitted_values, residuals)
plt.show()
import statsmodels.api as sm
# Read in the data
students = pd.read_csv('test_data.csv')
# Create the model here:
model = sm.OLS.from_formula('score ~ hours_studied', data = students)
# Fit the model here:
results = model.fit()
# Print the coefficients here:
print(results.params)
# Fitted & residuals example
fitted_values = results.predict(body_measurements)
residuals = body_measurements.weight - fitted_values # AKA actual values less fitted values