* Data is the most important thing in deep learning. Model is the way how you interpret that data.
* The way you interpret data is crucial that you have to make sure each step you take, you are extracting valid features from each single data points.
* Dot product, or weighted sum, is the notion of similarity between two vectors. The higher the product, the more similar two vectors are.
Dot product can be interpreted as a logical AND.
Weight is like a volume knob, which you can use to make the input more relevant or not.
Negative weights act like Not operator. It just flips the result.
We don’t use negative errors because we don’t want the errors themselves to cancel each other out. Say if you have an positive error of 1000, that’s pretty big; but you’ve got another negative error of -1000, then the result average error will be 0. Like the book says, you fooled yourself to believe that the model works.
Computers are dump, but quick. So error doesn’t have to have a direction. Set the goal to reduce the error, and let the model try to dial the weight up or down the amount of the error, it’ll know what to do.