Second derivative in Keras

Question

For a custom loss for a NN I use the function . u, given a pair (t,x), both points in an interval, is the the output of my NN. Problem is I'm stuck at how to compute the second derivative using K.gradient (K being the TensorFlow backend):

def custom_loss(input_tensor, output_tensor):
    def loss(y_true, y_pred):

        # so far, I can only get this right, naturally:            
        gradient = K.gradients(output_tensor, input_tensor)

        # here I'm falling badly:

        # d_t = K.gradients(output_tensor, input_tensor)[0]
        # dd_x = K.gradient(K.gradients(output_tensor, input_tensor),
        #                   input_tensor[1])

        return gradient # obviously not useful, just for it to work
    return loss

All my attemps, based on Input(shape=(2,)), were variations of the commented lines in the snippet above, mainly trying to find the right indexation of the resulting tensor.

Sure enough I lack knowledge of how exactly tensors work. By the way, I know in TensorFlow itself I could simply use tf.hessian, but I noticed it's just not present when using TF as a backend.

What do you denote with ***t*** and ***x*** in your equation exactly? Also, when you say "equation" in the first sentence, do you mean that the sum of these two terms is is your loss function? Then say so and lose the "=0" part please. Want to first make sure I understand the question before trying to answer... :) — Peter Szoldan, Apr 20 '18 at 08:58
@PeterSzoldan **t** and **x** are both points in an interval, say, both come from either numpy's `linspace` or `meshgrid`. And yes, the sum of the two terms is my loss, made it explicit together with the change you suggested. — Lucas Farias, Apr 20 '18 at 12:07
Ok, thanks. Will you then use this loss function to train your network, in other words do you plan to then plug this into a `model.fit()`? — Peter Szoldan, Apr 20 '18 at 12:35
And your want your network to find the right ***t***, ***x***, or both? — Peter Szoldan, Apr 20 '18 at 13:50
@PeterSzoldan actually I want it to find **u**, which is a function of **t** and **x**, I'm feeding them as inputs. — Lucas Farias, Apr 20 '18 at 13:55
@PeterSzoldan by the way, I said I want to compute the second derivative *using* `K.gradients` but that's not true, any other way to compute this is valid for me. — Lucas Farias, Apr 20 '18 at 14:01
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/169451/discussion-between-lucas-farias-and-peter-szoldan). — Lucas Farias, Apr 20 '18 at 14:16
Spent some time on this, added an answer with a few options for you. — Peter Szoldan, Apr 21 '18 at 00:38

score 9 · Accepted Answer · answered Apr 21 '18 at 00:34

In order for a K.gradients() layer to work like that, you have to enclose it in a Lambda() layer, because otherwise a full Keras layer is not created, and you can't chain it or train through it. So this code will work (tested):

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def grad( y, x ):
    return Lambda( lambda z: K.gradients( z[ 0 ], z[ 1 ] ), output_shape = [1] )( [ y, x ] )

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ] ) )
double = Input(tensor=tf.constant( [ 2.0 ] ) )

a = network( fixed_input, double )

b = grad( a, fixed_input )
c = grad( b, fixed_input )
d = grad( c, fixed_input )
e = grad( d, fixed_input )

model = Model( inputs = [ fixed_input, double ], outputs = [ a, b, c, d, e ] )

print( model.predict( x=None, steps = 1 ) )

def network models f( x ) = log( x + 2 ) at x = 1. def grad is where the gradient calculation is done. This code outputs:

[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32), array([-0.07407409], dtype=float32)]

which are the correct values for log( 3 ), ⅓, -1 / 3², 2 / 3³, -6 / 3⁴.

Reference TensorFlow code

For reference, the same code in plain TensorFlow (used for testing):

import tensorflow as tf

a = tf.constant( 1.0 )
a2 = tf.constant( 2.0 )

b = tf.log( a + a2 )
c = tf.gradients( b, a )
d = tf.gradients( c, a )
e = tf.gradients( d, a )
f = tf.gradients( e, a )

with tf.Session() as sess:
    print( sess.run( [ b, c, d, e, f ] ) )

outputs the same values:

[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]

Hessians

tf.hessians() does return the second derivative, that's a shorthand for chaining two tf.gradients(). The Keras backend doesn't have hessians though, so you do have to chain the two K.gradients().

Numerical approximation

If for some reason none of the above works, then you might want to consider numerically approximating the second derivative with taking the difference over a small ε distance. This basically triples the network for each input, so this solution introduces serious efficiency considerations, besides lacking in accuracy. Anyway, the code (tested):

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ], dtype = tf.float64 ) )
double = Input(tensor=tf.constant( [ 2.0 ], dtype = tf.float64 ) )

epsilon = Input( tensor = tf.constant( [ 1e-7 ], dtype = tf.float64 ) )
eps_reciproc = Input( tensor = tf.constant( [ 1e+7 ], dtype = tf.float64 ) )

a0 = network( Subtract()( [ fixed_input, epsilon ] ), double )
a1 = network(               fixed_input,              double )
a2 = network(      Add()( [ fixed_input, epsilon ] ), double )

d0 = Subtract()( [ a1, a0 ] )
d1 = Subtract()( [ a2, a1 ] )

dv0 = Multiply()( [ d0, eps_reciproc ] )
dv1 = Multiply()( [ d1, eps_reciproc ] )

dd0 = Multiply()( [ Subtract()( [ dv1, dv0 ] ), eps_reciproc ] )

model = Model( inputs = [ fixed_input, double, epsilon, eps_reciproc ], outputs = [ a0, dv0, dd0 ] )

print( model.predict( x=None, steps = 1 ) )

Outputs:

[array([1.09861226]), array([0.33333334]), array([-0.1110223])]

(This only gets to the second derivative.)

What a well written answer and awesome explanation, thank you **a lot**, Peter. This serves as a class, a really good one by the way. — Lucas Farias, Apr 21 '18 at 04:27

score 0 · Answer 2 · answered Jul 29 '20 at 21:37

Solution posted by Peter Szoldan is an excellent one. But it seems like the way keras.layers.Input() take in arguments has changed since the latest version with tf2 backend. The following simple fix will work though:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
import numpy as np

class CustomModel(tf.keras.Model):

    def __init__(self):
        super(CustomModel, self).__init__()
        self.input_layer = Lambda(lambda x: K.log( x + 2 ) )

    def findGrad(self,func,argm):
        return keras.layers.Lambda(lambda x: K.gradients(x[0],x[1])) ([func,argm])
    
    def call(self, inputs):
        log_layer = self.input_layer(inputs)
        gradient_layer = self.findGrad(log_layer,inputs)
        hessian_layer = self.findGrad(gradient_layer, inputs)
        return hessian_layer


custom_model = CustomModel()
x = np.array([[0.],
            [1],
            [2]])
custom_model.predict(x)

Going through layers: input layer-> lambda layer appylying log(x+2) -> lambda layer applying gradient -> one more lambda layer applying gradeint -> Output.
Note that this solution is for a general custom model and if you are using functional api, it should be similar.
If you are using tf backend, then using tf.hessians, instead of applying K.gradients twice, will work as well.

Second derivative in Keras

2 Answers2

Reference TensorFlow code

Hessians

Numerical approximation

Linked