One of the challenges is that the inputs can get really big
And this means that this matrix here will have three billion parameters which is just very, very large. And with that many parameters, it's difficult to get enough data to prevent a neural network from overfitting. And also, the computational requirements and the memory requirements to train a neural network with three billion parameters is just a bit infeasible.
The first thing you might do is detect vertical edges (vertical lines), horizontal edges
Convolution is an important process