I am currently doing a course in tensorflow in which they used tf.one_hot(indices, depth). Now I don't understand how these indices change into that binary sequence.
Can somebody please explain to me the exact process???
I am currently doing a course in tensorflow in which they used tf.one_hot(indices, depth). Now I don't understand how these indices change into that binary sequence.
Can somebody please explain to me the exact process???
Suppose you have a categorical feature in your dataset (e.g. color). And your samples can be either red, yellow or blue. In order to pass this argument to a ML algorithm, you first need to encode it so that instead of strings you have numbers.
The easiest way to do such a thing is to create a mapping where:
red --> 1
yellow --> 2
blue --> 3
and replace each string with its mapped value.
However this might create unwanted side effects in our ML model as when dealing with numbers it might think that blue > yellow (because 3 > 2) or that red + yellow = blue (because 1 + 2 = 3). The model has no way of knowing that these data were categorical and then were mapped as integers.
The solution to this problem is one-hot encoding where we create N new features, where N is the number of unique values in the original feature. In our exampel N would be equal to 3, because we have 3 unique colors (red, yellow and blue).
Each of these features be binary and would correspond to one of these unique values. In our example the first feature would be a binary feature telling us if that sample is red or not, the second would be the same thing for yellow and the third for blue.
An example of such a transformation is illustrated below:
Note, that because this approach increases the dimensionality of the dataset, if we have a feature that takes many unique values, we may want to use a more sparse encoding (like the one I presented above).
depth: A scalar defining the depth of the one hot dimension.
indices: A Tensor of indices.
This the example given in tensorflow documentation.
1. Only Specifying indices and depth(Default Values of on_value is 1 and off_value is 0)
indices = [0, 1, 2]
depth = 3
tf.one_hot(indices, depth) # output: [3 x 3]
# [[1., 0., 0.],
# [0., 1., 0.],
# [0., 0., 1.]]
indices = [0, 2, -1, 1]
depth = 3
tf.one_hot(indices, depth, on_value=5.0, off_value=0.0, axis=-1)
##output: [4 x 3]
# [[5.0, 0.0, 0.0], # one_hot(0)
# [0.0, 0.0, 5.0], # one_hot(2)
# [0.0, 0.0, 0.0], # one_hot(-1)
# [0.0, 5.0, 0.0]] # one_hot(1)
You can also see the code on GitHub