Taking the density matrix $$\rho=W+\frac{I_d}{d}=\frac 1M \sum_{m=1}^M\left|x^{\left(m\right)}\rangle\langle x^{\left(m\right)}\right|,$$ many of the details are all contained in the following paragraph on page 2:
Crucial for quantum adaptations of neural networks is the classical-to-quantum read-in of activation patterns. In our setting, reading in an activation pattern $x$ amounts to preparing the quantum state $|x〉$. This could
in principle be achieved using the developing techniques of quantum random access memory (qRAM) [33] or efficient quantum state preparation, for which restricted, oracle based, results exist [34]. In both cases, the computational overhead is logarithmic in terms of $d$. One can alternatively adapt a fully quantum perspective and take the activation patterns $|x〉$ directly from a quantum device or as the output of a quantum channel. For the former, our preparation run time is efficient whenever the quantum device is composed of a number of gates scaling at most polynomially with the number of qubits. Instead, for the latter, we typically view the channel as some form of fixed system-environment interaction that does not require a computational overhead to implement.
The references in the above are:
[33]: V. Giovannetti, S. Lloyd, L. Maccone, Quantum random access memory, Physical Review Letters 100, 160501 (2008) [PRL link, arXiv link]
[34]: A. N. Soklakov, R. Schack, Efficient state preparation for a register of quantum bits, Physical Review A 73, 012307 (2006). [PRA link, arXiv link]
Without going into details of how, both of the above are indeed schemes for respectively, implementing an efficient qRAM; and efficient state preparation that recreate the state $\left|x\right>$ in time $\mathcal O\left(\log_2 d\right)$.
However, this only gets us so far: this can be used to create the state $\rho^{\left(m\right)} = \left|x^{\left(m\right)}\rangle\langle x^{\left(m\right)}\right|$, while we want a sum over all the possible $m$'s.
Crucially, $\rho = \sum_m\rho^{\left(m\right)}/M$ is mixed, so cannot be represented by a single pure state, so the second of the above two references on recreating pure states doesn't apply and the first requires the state to already be in qRAM.
As such, the authors make one of three possible assumptions:
They have a device that just so happens to give them the correct input state
They either have the states $\rho^{\left(m\right)}$ in qRAM,
They are able to create those states at will, using the second of the above references. The mixed state is then created using a quantum channel (i.e. a completely positive, trace preserving (CPTP) map).
Forgetting about the first two of the above options for the moment (the first magically solves the problem anyway), the channel could either be:
an engineered system, in that it would be created for a specific instance in something akin to an analogue simulation. In other words you've got a physical channel that takes a physical length of time $t$ (as opposed to some time complexity). This is the "fixed system-environment interaction that does not require a computational overhead to implement."
The channel is itself simulated. There are a few papers on this, such as Bény and Oreshkov's Approximate simulation of quantum channels (arXiv link - this looks like a thorough paper, but I couldn't find any time complexity statements), Lu et. al.'s Experimental quantum channel simulation (no arXiv version seems to exist) and Wei, Xin and Long's arXiv preprint Efficient universal quantum channel simulation in IBM's cloud quantum computer, which (for number of qubits $n=\lceil\log_2 d\rceil$) gives a time complexity of $\mathcal O\left(\left(8n^3+n+1\right)4^{2n}\right)$. Stinespring dilation can also be used, with a complexity of $\mathcal O\left(27n^34^{3n}\right)$.
Now looking at option 21, one possible more efficient method would be to transfer the states from the address register to the data register in the usual method: for addresses in register $a$, $\sum_j\psi_j\left|j\right>_a$, transferring this to the data register gives the state in the data register $d$ as $\sum_j\psi_j\left|j\right>_a\left|D_j\right>_d$. It should be possible to simply decohere the address and data register to turn this into a mixed state, giving a small time overhead, although no extra computational complexity overhead, giving a much improved complexity of producing $\rho$, given a qRAM with the states $\left|x^{\left(m\right)}\right>$, of $\mathcal O\left(n\right)$. This is also the complexity of creating the states $\left|x^{\left(m\right)}\right>$ in the first place, giving a potential (much improved) complexity of producing $\rho$ of $\mathcal O\left(n\right)$.
1 With thanks to @glS for pointing this possibility out in chat
This density matrix is then fed into 'qHop' (quantum Hopfield), where it is used to simulate $e^{-iAt}$ for $$A=\begin{pmatrix}W-\gamma I_d && P\\ P&& 0\end{pmatrix}$$ as per the "Efficient Hamiltonian Simulation of A" subsection on page 8.