Interestingly, the above version is sometimes called prelu (parametric relu), see wikipedia page. The leaky one is with $a=0.01$ although they are in the same form. The Prelu implementation in keras and pytorch also makes the parameter a learnable, so that's why there is two of them and it wouldn't be too meaningful to set $a$ to $0.01$ for the entire ML industry to use. In the inference phase, it doesn't matter if it is a prelu or a negative-slope adjustable leaky relu.
That being said, I think Maas et.al's paper in 2013 might be the first publication in modern deep learning that mentions it. (They use $a=0.01$) They don't specifically refer to another source for using this function, but from their explanation, I understand that this function was first defined/mentioned somewhere else.
...To alleviate potential problems caused by the hard 0
activation of ReL units, we additionally evaluate leaky
rectified linear (LReL) hidden units ...
At least, this looks like it is probably the first modern deep learning reference to it.