What is it?
This is a model in which the relationship between output and input is represented by a linear function. The model is trained using the supervised learning method.
Specifically, the data used for training will be in the form of \begin{equation}
\boldsymbol{D}=\left\{\left(\boldsymbol{x}_1, y_1\right), \ldots,\left(\boldsymbol{x}_n, y_n\right)\right\}
\end{equation}, where x_i are m-dimensional vectors and y_i are scalar.
The model to be trained is the function f of the form \begin{equation}
f(x)=w x+b
\end{equation}.
The model training process involves using data points \begin{equation}
\left\{\left(x_1, y_1\right), \ldots,\left(x_n, y_n\right)\right\} \mathrm{t}
\end{equation} to find w and b such that the distance between the value given by the function f, i.e., y ̂_i=f(x_i) = wx_i + b, and the ground truth value, i.e., \begin{equation}
y_i
\end{equation}, is minimized. This is achieved by minimizing the objective function \begin{equation}
L(\boldsymbol{w}) \left\lvert\,=\frac{1}{2 n} \sum\left(y_i-\hat{y}_i\right)^2\right.
\end{equation}. The function L is also known as the loss function.

Suppose each vector $\boldsymbol{x}_i$ is represented as: $\boldsymbol{x}_i=\left[x_i^1, \ldots, x_i^m\right]^T$. If we se $\left[1, x_i^1, \ldots, x_i^m\right]^T$, ven the loss function can be expressed as:
\begin{equation}
L(w)=\frac{1}{2 n} \sum\|\boldsymbol{y}-\overline{\boldsymbol{X}} \overline{\boldsymbol{w}}\|_2^2
\end{equation}
This function will reach the minimum value when its derivative is 0. Therefore, the model found by linear regression can be represented as:
L(w)=1/2n ∑▒‖y -X ̅w ̅ ‖_2^2 .