The earliest usage of the ReLU activation that I've found is Fukushima (1975, page 124, equation 2). Thanks to johann to pointing this out. Fukushima also wrote at least one other paper involving ReLU activations (1980), but this is the earliest one that I am aware of. Unless I missed something, the function is not given any particular name in this paper. I am not aware of an older reference, but because terminology is inconsistent and rapidly changing, it's eminently possible that I've missed a key detail in an even older publication.
It is common to cite Nair & Hinton (2010) as the first usage of $f$. For example, Schmidhuber (2014) cites Nair & Hinton when discussing ReLU units in his review article. Certainly, Nair & Hinton's paper is important because it spurred the recent interest in using $f$ in neural networks, and it is the source of the modern nomenclature "rectified linear units." Nonetheless, the idea of using $f$ as an activation is decades older than the 2010 paper.
Incidentally, Hinton also coauthored a chapter in Parallel Distributed Processing in which $f$ was used in a neural network. In this paper, $f$ is called the "threshold function." However, this volume was published in 1986, eleven years after Fukushima's paper.
References
Jürgen Schmidhuber. "Deep Learning in Neural Networks: An Overview." 2014.
Fukushima, K. (1975). "Cognitron: A self-organizing multilayered neural network." Biological Cybernetics, 20(3-4), 121–136. doi:10.1007/bf00342633
Kunihiko Fukushima. "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position." Biological Cybernetics. 1980.
D.E. Rumelhart, G.E. Hinton, and J.L. McClelland. "A General Framework for Parallel Distributed Processing" in Parallel Distributed Computing, Vol 1. 1986.
Vinod Nair, Geoffrey E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines" 2010.