Yes, there's an architectural single-step flag on x86. Returning from kernel to user-space gives the kernel a chance to set both RIP/RFLAGS at the same time, so it can set the single-step for user-space without having it trigger on a kernel instruction.
For some reason, the Trap Flag has its own wikipedia article! See also wikipedia's EFLAGS article.
See the x86 tag wiki for links to Intel's architecture manuals which document all of this.
Perhaps instead you could execute the instruction followed by the 'ud2' opcode to trigger a signal
Then you'd need code to determine the decode x86 instruction lengths. And you wouldn't use ud2, you'd use int3 which exists for this purpose.
IIRC, there are also debug registers which can set hardware breakpoints without modifying the code.
Fun fact: not all ISAs have hardware support for PTRACE_SINGLESTEP.
Case in point, the Linux kernel used to emulate it for ARM, but that required an ARM disassembler in the kernel to place a breakpoint at the next instruction, even if a branch target. It was removed in ~2011; now ptrace(PTRACE_SINGLESTEP) returns -ENOSYS on ARM.
They just ripped out all that complexity instead of trying to make it SMP-safe and support every new instruction like Thumb-2 and so on. (http://lists.infradead.org/pipermail/linux-arm-kernel/2011-February/041324.html)
So debuggers have to manually use breakpoints on such ISAs instead of having the kernel do it for them. If that means other threads notice a debug-break opcode in memory temporarily, that's not the kernel's problem.