This is a very deep question that many famous mathematicians have worked on, including Godel, Hilbert, ...
So the question is why simple elementary problems (say problems with only universal quantifiers) do not have short elementary proofs. But before continuing we first need to agree on "what is an elementary proof?". In proof theory, an elementary proof is a proof that does not use concepts (and formulas) that does not appear in the axioms or in the theorem.
Hilebrt tried to show that reasonable elementary axioms are sufficient for proving all elementary theorems. His attempt failed because of Godel's incompleteness theorem. For any reasonable set of axioms, there is an elementary statement that is not provable from them. So we need non-elementary axioms for proving elementary theorems. This is the first point. Complicated axioms are needed to prove some simple elementary theorems.
Now let's assume that we have a proof of an elementary theorem (from some set of axioms). Then there is an elementary proof for the theorem from the axioms. This due to Gentzen's cut-elemnitation which asserts the existence of cut-free proofs, and moreover we have an algorithm that whenever we give it an arbitrary proof for a theorem, the algorithm converts it to a cut-free proof pf the same theorem from the same axioms. One of the nice properties of a cut-free proof is that any formula in the proof is either a subformlua of one of the axioms or a subformula of the theorem. That seems quite good.
But why people don't use such proofs? The answer is that cut-free proofs can be a lot (super-exponentially) larger than proofs using cuts. Informally you can think of cuts as proving lemmas and then using them for further results. A 100 page proof can turn into a $2^{50000}/500$ page proof after cut-elimination (I am assuming each page contains 500 symbols).
When we use concepts from outside, we usually define them to stand for complicated and hard to understand formulas. This simplifies the understanding of the proof if we are familiar with those concepts. Take for example the following pseudo-statement:
$\forall \epsilon \exists \delta \exists m \forall n>m ...$
This is quite complicated formula if one is not already familiar with similar formulas. The first two quantifiers are the $\epsilon\text{-}\delta$ we know from limits. The second pair is just for all sufficiently large $n$. Or take for example:
$\forall \epsilon \exists \delta \forall y \text{ s.t. } 0<|y-x|<\delta; \ |\frac{y^3+2y^2-x^3-2x^2}{y-x}-3x^2-4x|<\epsilon$
This is much more complicated than $(x^3+2x^2)'=3x^2+4x$. Here we are using newly defined concepts to shorten the formula and make it more human readable. Using lemmas about these concepts we can give a short proof of this equality, while the elementary proof can be much longer. Definitions of new concepts and lemmas can make a proof much shorter and more understandable. So elementary proofs might be harder to understand that non-elementary proofs using concepts from outside. This is the second point.
Now apply what I said above to the several hundred page long proof of FLT, ignoring the fact that the full proof will need to contain the proofs of all theorems and lemmas that are used in the proof. The resulting proof will probably be practicably inexpressible on any medium and completely impossible to understand by a human-being.
If we have a short elementary proof, then it is good. But there are elementary theorems with short non-elementary proofs which do not have short elementary proofs. This is the third point.
In summery:
- complicated axioms are needed to prove some simple elementary theorems (Godel's incompleteness theorem),
- non-elementary proofs might be easier to understand,
- there are elementary theorems with short non-elementary proofs which do not have short elementary proofs.
Items 2 and 3 are related to what we call definitions and lemmas in mathematics and are related to proof theoretic and proof complexity concepts called cuts and extension by definitions. (There are other things that also make a proof more readable and related concepts in proof complexity but going into them would diverge from the question about a proof being elementary.)
Two related notes:
- There is a speed-up theorem by Godel in logic.
- Having a short elementary proof is not sufficient for finding one (assuming widely believed conjectures in computational complexity theory). See this and this.