"a hardware device ... which apparently sat between the 6502 and its socket to trap any undocumented opcodes. ... Assuming such a device existed, how did that work?"
That sounds a lot like "the KK Computer: a Radical 6502 Redesign".
The 6502 was designed to have a unified Princeton architecture, not a split I/D memory.
However, the 6502 has a "SYNC" pin that pulses when an opcode is fetched, which is how the KimKlone "detect if the byte being read was an instruction or something else". (I would be surprised if the Trap65 used any other method of detecting which memory reads were instructions).
In the KimKlone ("KK"),
the SYNC output pin of the 6502 (op-code fetch) and the phase 2 output pin of the 6502, and the memory data bus, are the only inputs to a few standard logic chips that track exactly what instruction is executing and how many cycles it has gone into that instruction -- including the "new" instructions that the KimKlone substitutes for many previously Undefined instructions --
as documented in "KimKlone: Processor Partnership".
The KimKlone splits the data bus into the "memory data bus" and the "CPU data bus".
During an instruction fetch (i.e., when the 6502 pulses the SYNC pin),
the data from the memory bus goes into 4 chips:
- the KK latches the actual data from memory into the KK's instruction register (a 74x273 chip).
- Meanwhile, all 8 bits go into a 74xx245 bus transceiver.
- Meanwhile, the high 5 bits (xxxx_x) drive a 74S288 EPROM, which selects a substitute opcode.
- Meanwhile, the low 3 bits go into a small decoder that tests that instruction on the memory data bus for xxxx_x011 patterns, which are all single-cycle NOPs on the 65C02.
If the decoder detects a xxxx_x011 pattern during the op-code fetch, that pattern never reaches the CPU bus -- the KK disables the transceiver and enables the EPROM to drive a substitute 65C02 op-code into the CPU on the CPU data bus.
During all other times (the decoder detects a "normal" 65C02 instruction being fetched, or the decoder sees a normal data LOAD or STORE where the SYNC pin doesn't pulse) -- that decoder enables the transceiver that connects the memory data bus and the CPU data bus normally.
(Apparently Laughton considered the possibility that the decoder + EPROM might not be fast enough to do this before the 6502 latched the instruction, and the backup plan was to allow that single-cycle NOP to execute normally, and then substitute a "normal" 65C02 instruction on the following cycle).
Because the KK latches the original instruction from RAM (not the possibly-substituted instruction going to the 65C02), it can continue doing the "extra stuff" necessary to implement that specific new instruction.
The hardware schematic in the Carl W. Moser article unconditionally substitutes a NOP instruction,
but the KK sometimes substitutes some other instruction, so some of the new instructions of the KK can re-use the 6502 address generation unit rather than requiring extra external hardware to do that work.
Jeff Laughton has documented all the details on how the KimKlone works and how to re-implement it, starting at
"KimKlone: Introduction & Table of Contents".