There are many examples in which making more "precise" predictions gives worse performance (e.g. Runge's phenomenon). My professor implied that there was a sound basis for choosing "simple" functions over complex ones in the general case, and that it had to do with information theory.
Does anyone know what he was referring to?
As an example: consider least square's. Obviously we could find a polynomial of very high degree which has zero error, but we prefer a linear equation with higher error. Why should this be?
(I am familiar with some basic notions like entropy, but not much more than that, so simpler explanations would be much preferred. Although I understand that if it's complex, it's complex.)