The Galerkin Method
Consider the situation in which we are given a (possibly infinite-dimensional) inner-product space $(W,g:W\times W\rightarrow{\mathbb R})$, a linear map from the vector-space to itself, $A:W\rightarrow W$, and an element of the vector space $b\in W$. The goal is to solve for the vector $x\in W$ such that $A(x)=b$. To solve this problem in practice, there are two questions that need to be addressed:
Restricting to a Finite-Dimensional System
Given a finite-dimensional subspace $V\subset W$, what is the corresponding linear system that one needs to solve over $V$?
Coarsening to a Lower-Dimensional Subspace
Given a subspace $U\subset V$, how does the linear system defined with respect to $U$ relate to the linear system defined with respect to $V$?

The challenge in answering these questions is that we cannot directly restrict the linear system to the subspace $V\subset W$ since (i) the image of $A$ need not be contained in $V$, even when the domain of the operator is restricted to $V$ and (ii) the vector $b$ need not be in $V$.

To address these challenges, we replace the linear system $A(x)=b$ with its weak formulation. Specifically, thinking of the inner-product $g:W\times W\rightarrow{\mathbb R}$ as a map from the primal space to the dual: \begin{align*} g:W&\rightarrow W^*\\ w&\mapsto g(w,\cdot) \end{align*} we consider the system: \begin{equation} A_W(x) = b_W \label{eq:weak_form}\tag{1} \end{equation} with $A_W=(g\circ A):W\rightarrow W^*$ and $b_W=g(b)\in W^*$.

The advantage of working with the weak formulation is that when $V$ is a subspace of $W$, any element $w^*\in W^*$ can be thought of as an element of $V^*$. This is because $w^*$ is a linear map from $W$ to ${\mathbb R}$ and so, by restricting the domain of $w^*$, can be thought of as a linear map from $V$ to ${\mathbb R}$.

Formally, letting $\imath_V^W:V\hookrightarrow W$ denote the (trivial) injection operator of $V$ into $W$, we have the dual map $(\imath_V^W)^*:W^*\rightarrow V^*$ taking linear functionals on $W$ to linear functionals on $V$.

Restricting

In particular, (i) $A_W$ can be thought of a map from $V$ to $V^*$ by restricting the domain of $A_W$ to $V$ and thinking of the output of $A_W$ as an element of $V^*$, and (ii) the vector $b_W$ can be thought of as an element of $V^*$: $$A_V\equiv(\imath_V^W)^*\circ A_W\circ\imath_V^W:V\rightarrow V^*\qquad\hbox{and}\qquad b_V\equiv(\imath_V^W)^*(b_W)\in V^*.$$ This gives the restriction of the linear system to $V$: $$A_V(x) = b_V.$$

Coarsening
Noting that the injection of $U$ into $W$ satsifies $\imath_U^W=\imath_V^W\circ\imath_U^V$, we have two equivalent expressions for the restricted linear system $A_U(x)=b_U$. Either: $$A_U = (\imath_U^W)^*\circ A_W\circ \imath_U^W\qquad\hbox{and}\qquad b_U = (\imath_U^W)^*(b_w)$$ or, equivalently: $$A_U = (\imath_U^V)^*\circ A_V\circ\imath_U^V\qquad\hbox{and}\qquad b_U=(\imath_U^V)^*(b_V).$$
Choosing a Basis Let ${\mathcal B}_U=\{u_1,\ldots,u_m\}\subset U$ be a basis for $U$ and let ${\mathcal B}_V=\{v_1,\ldots,v_n\}\subset V$ be a basis for $V$ (with $m\leq n$).
Restricting
The operator $A_V$ is expressed by the matrix ${\mathbf A}_V\in{\mathbb R}^{n\times n}$ with: $$\left( {\mathbf A}_V\right)_{ij} = v_i\big( A_W(v_j)\big) = v_i\big( g\circ A(v_j) \big) = g\big(A(v_j),v_i\big).$$ Similarly, the restricted vector $b_V$ is expressed by the array ${\mathbf b}_V\in{\mathbb R}^n$ with: $$\left({\mathbf b}_V\right)_i = v_i\big(g(b_v)\big) = g(b,v_i).$$
Coarsening
With respect to the bases, the injection operator $\imath_U^W:U\rightarrow V$ is expressed by the prolongation matrix ${\mathbf P}\in{\mathbb R}^{m\times n}$ and its dual $(\imath_U^V)^*:V^*\rightarrow U^*$ is expressed by the restriction matrix ${\mathbf P}^\top$.

In particular, the restriction of the linear system to $U$ becomes: $${\mathbf A}_U(x) = {\mathbf b}_U$$ with $${\mathbf A}_U = {\mathbf P}^\top\cdot{\mathbf A}_V\cdot{\mathbf P}\qquad\hbox{and}\qquad{\mathbf b}_U = {\mathbf P}^\top\cdot {\mathbf b}_V.$$


Energy Consistency
In the case that the linear operator $A_W:W\rightarrow W^*$ is symmetric and positive semi-definite ("bounded and elliptic" is the proper terminology, I believe), if $x^0\in W$ satisfies the system $A_W(x^0)=b_W$, we can define a non-negative energy on $W$ measuring how close a vector $x\in W$ is to $x^0$ (a true solution of the linear system) with respect to the inner-product defined by $A_W$: $$ E_W(x_0,x) = [A_W(x^0-x)](x^0-x) = b_W(x^0) + [A_W(x)](x) - 2b_W(x).$$ (Note that if $w\in\hbox{Ker}(A_W)$, the energies $E_W(x_0,x)$ and $E_W(x_0+w,x)$ are equal. That is, the definition of the energy is independent of which solution of $A_W(x)=b_W$ we use.)
Derivation For the energy, we have: \begin{align*} E_W(x_0,x) &= [A_W(x^0-x)](x^0-x)\\ &= [A_W(x^0-x)](x^0)-[A_W(x^0-x)](x)\\ &= [A_W(x^0)](x^0) - [A_W(x^0)](x) - [A_W(x)](x^0) + [A_W(x)](x)\\ &= [A_W(x^0)](x^0) + [A_W(x)](x) - 2[A_W(x^0)](x)\\ &= b_W(x^0) + [A_W(x)](x) - 2b_W(x) \end{align*}

If $w\in\hbox{Ker}(A_W)$, we have: \begin{align*} E_W(x_0+w,x) &= [A_W(x^0+w-x)](x^0+w-x)\\ &= [A_W(x^0-x)](x^0+w-x)\\ &= [A_W(x^0+w-x)](x^0-x)\\ &= [A_W(x^0-x)](x^0-x)\\ &= E_W(x_0,x), \end{align*} where the third equality follows from the symmetry of the operator $A_W$.

This energy is stricly non-negative and only vanishes for $x\in W$ which are solutions to the system $A_W(x)=b_W$. Thus, solving the linear system $A_W(x)=b_W$ is equivalent to finding the value of $x$ minimizing the energy $E_W(x_0,x)$.

Since $b_W(x^0)$ is independent of $x$, finding $x\in W$ minimizing $E_W(x_0,x)$ is equivalent to finding $x\in W$ minimizing $E_W(x)$ with: $$E_W(x) = [A_W(x)](x) - 2b_W(x).$$

Analogously, we can define the restricted energy on $V$, setting: $$E_V(x) = [A_V(x)](x) - 2b_V(x).$$ As above, this energy is minimized precisely when $A_V(x) = b_V$.

Derivation The energy $E_V(x)$ is minimized at $x\in V$ only if the gradient of the energy vanishes at $x$. Taking the gradient at $x\in V$ and setting it to zero, we get: $$0 = \nabla E_V\Big|_x = 2\left(A_V(x) - b_V\right).$$ Thus, the gradient of $E_V(x)$ vanishes if and only if $A_V(x) = b_V$.
Since the operator $A_W$ is symmetric positive semi-definite, so is $A_V$. Thus, the energy $E_V$ is convex, all minima have the same value, and the gradient of $E_V$ only vanishes at the global minima. So the energy $E_V$ is minimized if and only if $A_V(x) = b_V$.

Finally, we note that given $x\in V$: $$E_V(x) = [A_V(x)](x) - 2b_V(x) = [A_W(\imath(x))](\imath(x)) - 2b_W(\imath(x)) = [A_W(x)](x) - 2b_W(x) = E_W(x).$$ Thus the restriction of the system to $V\subset W$ is consistent in that it reduces to a minimization of the same energy as the one minimized over $W$.