It would appear that your problem lends itself very nicely to using a CDMA scheme.
Let starts with some properties of (DSSS) CDMA. (Direct Sequence Spread Spectrum, Code-Division Multiple Access). Its a mouthful, but it is really easy to implement.
In CDMA, your pulse (at baseband) is actually made up of many concatenated 'chips' as they are called. The chips are just 1s or -1s, of a fixed duration. For example, your chipping sequence might be [1 -1 1 -1 -1 -1 1]. You would use this chipping sequence to modulate your carrier.
However, you cannot just make up your chipping code. What you want to do is use chipping codes that have the very nice property, that their autocorrelation function is a delta function like so:

(Equivalently, their power spectral density is white). For example, you can look into using Barker Sequences as your chipping code, (usually used in radar), or you can also look at using Gold Codes. Practically speaking however, this means that you get the maximum correlation score in your receiver, ONLY when the receivers' code, exactly lines up with the transmitted code, and zero otherwise.
How does this help you? In your receiver, you would be running a correlator continuously. The correlator would be performing a running dot-product of its own local code, with whatever is received. Now imagine that you receive a transmitted waveform from your pen, and a second waveform from a reflection. As your receivers' correlator runs, it will give a peak when its own codeword exactly alligns with your code from the pen. This will cause your detector to 'lock' onto that specific delay value. Now, here is where you reap the benefits of a near-delta autocorrelation function of your code: The reflected signal will also be present, and will also have its dot product taken with the receivers' locked code, but it will give zero, or near zero score, since it is orthoginal or near-orthogonal to the delayed code that your receiver has already locked onto.
In contrast, if you had send out a un-coded carrier pulse, you would be at the mercy of constructive or destructive interference throwing off when exactly your pulse peaked at the detector level of your receiver, and thus get erroneous TDOAs.