Unless I'm mistaken, deep neural networks are good for learning functions that are nonlinear in the input.
In such cases, the input set is linearly inseparable, so the optimisation problem that results from the approximation problem is not convex, so it cannot be globally optimised with local optimization. Support Vector Machines (try to) get around this by choosing features such that the projection of the input space into the feature space is linearly separable, and we have a convex optimisation problem once again.
I don't suppose that a deep neural network always learns features that make the projection of the input set in the feature space linearly separable. So what's the point in it learning features?