Pre era (less than 100,000 examples) We used to have 60% of the data for Training, 20% development (cross validation data to test algorithms) and 20% testing data (to use it in the tested and better performance algorithm)
Big Data (1,000,000) examples,, we use 10,000 examples inthe dedv set, and 10,000 as test is enough have. 98%, 1%, 1%. or some rations
Make sure the dev and tests sets come from the same distribution
When the optimal error is close to 0% (What to expect from a human)
When the optimal (Bayes) Error is more, suppose 15%
It fixes too well into some points of the data but it does’t generalize well