On page 3, the authors write "we found $\gamma = 2$ to work best in our experiments." This tells us that they chose $\gamma$ experimentally by training models with different values and then choosing the $\gamma$ from the best model. Because this value was chosen experimentally, it may not be the best choice across all modeling tasks or datasets.
You might use $\gamma =2$, with the reasoning that the paper authors found that to be the best value. Alternatively, you can tune $\gamma$ experimentally (assuming you have the resources to do so).