This is essentially Raphael's answer expanded, and taking into account what I was (probably unsuccessfully) trying to communicate in the comments. As I recall, the reason the Universal Machine is important is that it is through the UTM that we can actually explain how the "construction" of the modified machines come about; but I could have messed that up and if so I apologize.
Added: I guess the point is that you are technically correct that the proof you present, which glosses over some important details, is incorrect. The machine $Q$ you give cannot be asked about the behavior of $Q$ with $Q$ as an input, given the way you describe $Q$. Technically the reason is that if you try to formalize it along the lines I give below, you end up unable to formalize the input you should put into $Q$. The argument is "morally" correct in that a few technical (yet standard) tricks will solve that issue by using a slight modification of $Q$ instead of $Q$ directly. Then again, it is clear that the proof is not trying to be technically accurate, but only "morally" accurate, so one might forgive slight blurrings. It's similar to what happens in the usual description of Cantor's 2nd proof of uncountability of the reals, where the issue of distinct representations of reals is glossed over and can lead to a techncially incorrect argument.
Turing machines each have a code $C$, and take numbers as input. (We can use, for example, Goedel numbering to translate specific sentences or problems into numbers). Let us write $C(m)$ to mean the giving the machine with cod $C$ the input $m$.We can take a pair $(C,m)$, where both $C$ and $m$ are numbers, and obtain a number $N$ that "corresponds" (in a precise and well-defined way) with a pair $(C,m)$. Let us write that as $N\sim (C,m)$.
Suppose you have a machine $H$ that solves the Halting Problem; we can tweak it so that it will accept any in put and, when given an input $N$, will do the following:
\begin{equation*}
H(N) = \left\{\begin{array}{ll}
1 & \mbox{if $N\sim (C,m)$, and $C$ halts when $m$ is its input;}\\
0 & \mbox{if $N\sim (C,m)$ and $C$ does not halt when $m$ is its input;}\\
0 & \mbox{if $N\not\sim(C,m)$ for any $C$ and $m$.}
\end{array}\right.
\end{equation*}
If $H$ exists, then we can "construct" (that is, there also exists) a Turing Machine $Q$ such that:
\begin{equation*}
Q(N) = \left\{\begin{array}{ll}
1 & \mbox{if $H(M)=0$, where $M\sim(N,N)$;}\\
\mbox{does not halt} & \mbox{if $H(M)=1$, where $M\sim(N,N)$.}
\end{array}\right.
\end{equation*}
Note that $M$ is well-defined, since it is just the code for the pair $(N,N)$.
The key is that $Q$ does not take as an input a pair of the form $(C,k)$; it takes a single number $k$, and then figures out what $H(k,k)$ would have been. It's much more restrictive machine than one that "does the opposite" of what $H$ does.
Now, $Q$ being a Turing machine has a specific code $C$. So the question is, what is $Q(C)$? By definition, $Q(C) = 1$ if and only if $H(M)=0$, where $M\sim(C,C)$; but $H(M)=0$ means either that $M$ is not the code of a valid pair, or else that $C$ does not halt when given the number $C$ as an input. However, $M$ *is* the code of a pair, because that is how we defined it. So that means that $Q(C)=1$ if and only if the machine with code $C$ does not halt when given $C$ as an input. But the machine with code $C$ *is* $Q$, so $Q(C)=1$ if and only if the result of doing $Q(C)$ is non-terminating. Therefore, we cannot have $Q(C)=1$.
When is $Q(C)\neq 1$? $Q(C)\neq 1$ if and only if $Q(C)$ does not halt; and $Q(C)$ does not halt if and only if $H(M)=1$, where $M\sim(C,C)$. But $H(M)=1$ in this case if and only if the machine with code $C$ halts when it is given the input $C$. But the machine with code $C$ *is* $Q$, so $Q(C)$ fails to halt if and only if $Q(C)$ halts, a contradiction.
I think the key here is two-fold: first, that the inputs to Turing Machines are just numbers, rather than actual pairs or sets. We are identifying what in the meta-language corresponds to a pair with a single number in order to do the input, so we cannot run into problems of foundation (inputs are not pairs of pairs of pairs... etc, they are just numbers). The issue could arise if we could not define the number because it requires a recursive process that "goes on indefinitely" and leads to an "infinite number". And second: you do not construct $Q$ so that it will halt if $H$ says 'doesn't halt' with that same input, or will halt if $H$ says 'it does halt' with the same input. Instead, $Q$ transforms its input in a precise and well defined way (in this case, taking $N$ to the code for $(N,N)$) and only then asking whether $H$ will halt with that new input or not. That is, $Q$ need not be quite as general as you make it out to be in your post, which is why we can actually use $Q$ to ask about itself.