11

Can someone help me understand a bit better this problem.

I must train a neural network which should output 200 mutually independent categories, each of these categories is a percentage ranging from 0 to 1. To me this seems like a binary_crossentropy problem but every example i see on the internet uses binary_crossentropy with a single output. Since my output should be 200, if i apply binary_crossentropy, would that be correct?

This is what i have in mind, is that a correct approach or should i change it?

inputs = Input(shape=(input_shape,))
hidden = Dense(2048, activation='relu')(inputs)
hidden = Dense(2048, activation='relu')(hidden)
output = Dense(200, name='output_cat', activation='sigmoid')(hidden)
model = Model(inputs=inputs, outputs=[output])
loss_map = {'output_cat': 'binary_crossentropy'}
model.compile(loss=loss_map, optimizer="sgd", metrics=['mae', 'accuracy'])
RaduS
  • 2,228
  • 8
  • 40
  • 63
  • I think your approach is okay. You can search for multi-label examples instead of the binary classification ones. – Yu-Yang Oct 28 '17 at 12:50

4 Answers4

12

To optimize for multiple independent binary classification problems (and not multiple category problem where you can use categorical_crossentropy) using Keras, you could do the following (here I take the example of 2 independent binary outputs, but you can extend that as much as needed):

    inputs = Input(shape=(input_shape,))
    hidden = Dense(2048, activation='relu')(inputs)
    hidden = Dense(2048, activation='relu')(hidden)
    output = Dense(units = 2, activation='sigmoid')(hidden )

here you split your output using Keras's Lambda layer:

    output_1 = Lambda(lambda x: x[...,:1])(output)
    output_2 = Lambda(lambda x: x[...,1:])(output)

    adad = optimizers.Adadelta()

your model output becomes a list of the different independent outputs

    model = Model(inputs, [output_1, output_2])

you compile the model using one loss function for each output, in a list. (In fact, if you give only one kind of loss function, I believe it will apply it to all the outputs independently)

    model.compile(optimizer=adad, loss=['binary_crossentropy','binary_crossentropy'])
deepit
  • 121
  • 3
6

I know this is an old question, but I believe the accepted answer is incorrect and the most upvoted answer is workable but not optimal. The original poster's method is the correct way to solve this problem. His output is 200 independent probabilities from 0 to 1, so his output layer should be a dense layer with 200 neurons and a sigmoid activation layer. It's not a categorical_crossentropy problem because it's not 200 mutually exclusive categories. Also, there's no reason to split the output using a lambda layer when a single dense layer will do. The original poster's method is correct. Here's another way to do it using the Keras interface.

        model = Sequential()
        model.add(Dense(2048, input_dim=n_input, activation='relu'))
        model.add(Dense(2048, input_dim=n_input, activation='relu'))
        model.add(Dense(200, activation='sigmoid'))
        model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Troy D
  • 1,774
  • 12
  • 26
0

For multiple category classification problems, you should use categorical_crossentropy rather than binary_crossentropy. With this, when your model classifies an input, it is going give a dispersion of probabilities between all 200 categories. The category that receives the highest probability will be the output for that particular input.

You can see this when you call model.predict(). If you were to call this function only on one input, for example, and print the results, you will see a result of 200 percentages (in total summing to 1). The hope is that one of those 200 percentages would be vastly higher than the others, which signals that the model thinks that there is a strong probability that this is the correct output (category) for this particular input.

This video may help clarify the prediction piece. Printing out the predictions starts around 3:17, but to get the full context, you'll need to start from the beginning.

blackHoleDetector
  • 2,779
  • 2
  • 11
  • 13
  • 5
    What i am looking for is 200 categories and each category with a percentage between 0 and 1. Not a total sum of 1 for all 200. Would `categorical_crossentropy` help then? – RaduS Oct 28 '17 at 17:39
-2

When there are multiple classes, categorical_crossentropy should be used. Refer to another answer here.

pyan
  • 3,399
  • 2
  • 22
  • 35
  • 1
    And how do i return percentage with `categorical_crossentropy`? There are 200 classes and each can have a percentage between 0 and 1? These classes are not exclusive of each other, meaning that i need all 200 – RaduS Oct 28 '17 at 06:52