1

I have a neural network with a softmax at the end.

Something like this:

def forward(self, x)
    x = self.conv(x)
    x = self.channel_transform_layer(x)
    output = self.softmax(x)
    return output

I would like the maximum value of the logits to be p (p being between 0 and 1, say 0.7). I'm working on a task where an output greater than p does not make sense, so I want to constrain all the logits to be between 0 and p.

taking a concrete example with pytorch:

import torch
softmax = torch.nn.functional.softmax
softmax(torch.Tensor([1,1,5])) 
# => tensor([0.0177, 0.0177, 0.9647])

# I want to define constrained_softmax such that
constrained_softmax(torch.Tensor([1,1,5]), p=0.7)
# => tensor([0.175, 0.175, 0.65])
# those are approximate values to get the idea. Importantly, all 
# values should sum to 1 and the max logit should be < p

I tried tweaking the softmax without success. I also tried to apply the softmax multiple times and scaling it, but I don't end up with the desired result.

  • A softmax by default sums to 1 across all classes, and each class is at most 1. So why not just multiply the entire softmax (or each class) by $0.7$? However, it's unclear then why you're using a softmax, as it doesn't sound like you really need the sum to be constrained (unless you can elaborate more on your problem?) – Alex R. May 21 '19 at 00:36

1 Answers1

1

You could restrict the range of the logits before putting them into the softmax with min/max or clamp.

In pytorch:

#
# testing values within the clamp range
#

softmax(torch.Tensor([-0.5, -0.5, 0.5]).clamp(-1, 1)) 
# => tensor([0.2119, 0.2119, 0.5761])


#
# testing values outside the clamp range, we get the same results
#

softmax(torch.Tensor([-2, -2, 2]).clamp(-1, 1)) 
# => tensor([0.1065, 0.1065, 0.7870])

softmax(torch.Tensor([-10, -10, 10]).clamp(-1, 1)) 
# => tensor([0.1065, 0.1065, 0.7870])

Here, the max logit after softmax is 0.7870 (instead of the 0.7 i asked for). But this value will change for different clamp values and the number of logits, so adjust accordingly.

Also I ended up taking a tanh because it's differentiable, and multiplying by a constant to constrain the the range I want. Something like:

softmax(0.9 * torch.tanh(torch.Tensor([-100, -100, 100])))
# => tensor([0.1242, 0.1242, 0.7515])