One common approach would be to use an autoencoder (neural network) with a $n$ dimensional bottleneck. As you say, the output would then be $2^n$ dimensional again and then one would apply the softmax function to it. I'd assume you could do this relatively easily with gradient descent and e.g. a cross-entropy loss. Autoencoders are of course a pretty common idea. It would sort of feature engineer its own $n$ features out of the original inputs via the layers of the neural network.
Of course, it's probably not clear what the best architecture is (e.g. how many layers until the bottle-neck, e.g. 2, with how many neurons in each layer, e.g. decreasing in some sensible way, and then the same questions when going back to the $2^n$ dimensional output), what activation functions (e.g. ReLU) to use in which layers, how to regularize (I'd assume drop-out might not be so great for a regression type task, but maybe weight decay of some form) and whether to perhaps do something like a denoising autoencoder. All those are probably best experimented with via a suitable cross-validation set-up.