I would not insist on demanding a geometrical meaning from Galerkin methods in general. There is a connection, but it becomes less meaningful as you extend it further and further. (In a sense, it would be better to call Galerkin methods "generalized projection methods".) To really understand the connection between collocation and Galerkin methods requires quite a bit of mathematics (functional analysis and measure theory, to be precise), but I'll try to summarize the salient points.
The idea behind Galerkin methods is the following: Assume you have an equation $F(x) = 0$ for some (possibly nonlinear) operator $F:X\to Y$ between (infinite-dimensional) spaces $X$ and $Y$, i.e., you are trying to find $\bar x\in X$ with $F(\bar x)=0$. It is a theorem in functional analysis that $F(x) = 0$ if and only if
$$\langle F(x),y^*\rangle_Y = 0\qquad\text{for all } y^*\in Y^*,$$
which is the dual space of $Y$. (Bear with me, there will be examples later.) This is still something you can't solve, but if you replace $X$ by a finite-dimensional subspace $X_h\subset X$ and $Y^*$ by a finite-dimensional subspace $Y^*_h\subset Y^*$, you can try to find $\bar x_h \in X_h$ such that $$\label{1}
\langle F(\bar x_h),y^*_h\rangle_Y = 0\qquad\text{for all } y^*_h \in Y_h^*, \tag{1}
$$
or equivalently, since $Y^*_h$ is finite-dimensional, for all basis functions $y^*_1,\dots,y^*_n$ of $Y^*_h$. If $X_h$ and $Y_h$ are chosen correctly, this gives you $n$ independent conditions for $n$ unknown coefficients.
Let's specialize a bit: If $Y=X$ is a Hilbert space, you can take $Y^*_h = X_h$; this is called a Ritz-Galerkin method. Furthermore, if $F(x)=x-z$ for some $z\in X$, what we're actually doing is looking for $x_h\in X_h$ such that $x_h-z$ is orthogonal to $X_h$ -- these are exactly the conditions that characterize the orthogonal projection of $z$ onto the subspace $X_h$. Hence the name (generalized) projection method.
Now for collocation (and the promised example): If $Y\neq X$ and thus $Y_h^* \neq X_h$, one speaks of Petrov-Galerkin methods. In the specific case that $Y=C(E)$, the space of continuous functions on some closed set $E$ (e.g., the real interval $[0,1]$), the dual space is given by the space $\mathcal{M}(E)$ of Borel measures on $E$, and the duality pairing between a continuous function $u$ and a measure $\mu$ is given by
$$\langle u,\mu\rangle_C = \int_E u(x) \,d\mu(x).$$
An important example of Borel measures are the Dirac measures $\delta_{x_0}$ for $x_0\in E$, which are defined by their action as
$$\langle u,\delta_{x_0} \rangle_C := u(x_0).$$
Now we're almost there: If we take $Y^*_h$ as the subspace spanned by Dirac measures for $n$ distinct points $x_1,\dots,x_n\in E$, the Galerkin conditions \eqref{1} become
$$ F(\bar u_h)(x_i) = \langle F(\bar u_h),\delta_{x_i}\rangle_Y = 0\qquad\text{for all } 1\leq i \leq n.$$
Similarly, if $X=C(D)$ for some space $D$ (e.g., $D=[0,1]$ as well), you can choose $X_h$ as the subspace of piecewise linear (on a given set of intervals $[x_1,x_2],\dots,[x_{n-1},x_n]\subset D$ and continuous functions on $D$ and uniquely represent $\bar u_h\in X_h$ by its values $\bar u_h(x_i)$ for $1\leq i\leq n$. Then what you actually have to find is the values $\bar u_h(x_i)$ for $1\leq i\leq n$ such that $F(\bar u_h)(x_i) = 0$ for all $1\leq i\leq n$ -- which is precisely collocation.