What is one hot encoding in tensorflow?

Question

I am currently doing a course in tensorflow in which they used tf.one_hot(indices, depth). Now I don't understand how these indices change into that binary sequence.

Can somebody please explain to me the exact process???

score 21 · Accepted Answer · answered Apr 12 '18 at 10:05

Suppose you have a categorical feature in your dataset (e.g. color). And your samples can be either red, yellow or blue. In order to pass this argument to a ML algorithm, you first need to encode it so that instead of strings you have numbers.

The easiest way to do such a thing is to create a mapping where:

red --> 1
yellow --> 2
blue --> 3

and replace each string with its mapped value.

However this might create unwanted side effects in our ML model as when dealing with numbers it might think that blue > yellow (because 3 > 2) or that red + yellow = blue (because 1 + 2 = 3). The model has no way of knowing that these data were categorical and then were mapped as integers.

The solution to this problem is one-hot encoding where we create N new features, where N is the number of unique values in the original feature. In our exampel N would be equal to 3, because we have 3 unique colors (red, yellow and blue).

Each of these features be binary and would correspond to one of these unique values. In our example the first feature would be a binary feature telling us if that sample is red or not, the second would be the same thing for yellow and the third for blue.

An example of such a transformation is illustrated below:

Note, that because this approach increases the dimensionality of the dataset, if we have a feature that takes many unique values, we may want to use a more sparse encoding (like the one I presented above).

score 3 · Answer 2 · answered Apr 12 '18 at 12:05

depth: A scalar defining the depth of the one hot dimension.

indices: A Tensor of indices.

This the example given in tensorflow documentation.
1. Only Specifying indices and depth(Default Values of on_value is 1 and off_value is 0)

 indices = [0, 1, 2]
    depth = 3
    tf.one_hot(indices, depth)  # output: [3 x 3]
    # [[1., 0., 0.],
    #  [0., 1., 0.],
    #  [0., 0., 1.]]

Specifying on_value and off_value

indices = [0, 2, -1, 1]
depth = 3
tf.one_hot(indices, depth, on_value=5.0, off_value=0.0, axis=-1)
##output: [4 x 3]
    # [[5.0, 0.0, 0.0],  # one_hot(0)
    #  [0.0, 0.0, 5.0],  # one_hot(2)
    #  [0.0, 0.0, 0.0],  # one_hot(-1)
    #  [0.0, 5.0, 0.0]]  # one_hot(1)

You can also see the code on GitHub

The tf doc is more comprehensible: https://www.tensorflow.org/api_docs/python/tf/one_hot — dturvene, Jun 15 '19 at 14:26
Is depth the number of categories of the categorical variable? — skan, Oct 01 '23 at 19:25

What is one hot encoding in tensorflow?

2 Answers2