Make the supervised algorithm to be able to output 4 numbers that indicates the boundaries of the object
$$ y = \begin {bmatrix} p_c \\ b_x \\ b_y \\ b_h \\ b_w\\ c_1 \\ c_2 \\ c_3 \end {bmatrix} $$
Where $p_c$ indicates whether the image has an object (different from the background) or not.
$b_s$ indicates the coordinates for the square. $b_x, b_y$ the center of the square, $b_h,b_w$ width, height
And $c_x$ indicates if there is an object of the class $x$
The loss if $y_1 = 1$ $L(\hat y , y) = (\hat y_1 , y_1)^2 + (\hat y_2 , y_2)^2 + ... + (\hat y_8 , y_8)^2$
If $y_1 = 0$ $L(\hat y , y) = (\hat y_1 , y_1)^2$