4

I want to find the maximum likelihood estimator of the "rate parameter theta of the Exponential Distribution".

So i followed the following commands in R:

 x=rexp(500,rate=2)
 f <- function(x,theta){
        sum(-dexp(x,rate=theta,log=T))
 }
 optimize(f=f,x=x,interval=c(0,5))

I do not understand the logic - why do we consider maximum=FALSE(default) in the optimize function?

Why haven't we used maximum=TRUE to find the MLE?

ABC
  • 1,705
  • I must be missing your point, but the maximum likelihood estimator of the rate parameter of an exponential is the reciprocal of the mean. Invoking an optimizer should get you the right answer but is like going from Paris to Versailles via Vladivostok. – Nick Cox Nov 30 '13 at 01:16
  • @NickCox Would you please tell me the proper way to find MLE? – ABC Nov 30 '13 at 01:23
  • Maximize the likelihood, naturally. Not trying to outsmart you, but what answer are you seeking? The particular method depends on whether there is a closed form solution that gets you there in one (unusual, but true in this case) or you have to estimate it numerically. – Nick Cox Nov 30 '13 at 01:25
  • 1
    I agree with @NickCox - the only purpose I can see to this exercise would be if it were part of an introduction to finding MLEs numerically by beginning with an example you can also easily do by hand. Which implies you're doing this for some subject. Is that the case? – Glen_b Nov 30 '13 at 01:26
  • @Glen_b yes. I want to find MLE by using R. It is one of my course. – ABC Nov 30 '13 at 01:28

1 Answers1

5

The term inside your definition of f :- sum(-dexp(x,rate=theta,log=T)) is NOT the likelihood, but something else.

What is it that is being calculated?

When you consider what it is that is being optimized there, you will also understand why you're minimizing that function in order to maximize the likelihood.

To quote your own algebra, here's the likelihood:

$\cal{L}(\theta)=\prod_{i=1}^{n}\theta e^{-\theta x_i}=\theta^n e^{-\theta \sum_{i=1}^{n}x_i}$

dexp with log=TRUE doesn't return the density. Here's what the help says: log, log.p logical; if TRUE, probabilities p are given as log(p). ... that is when you say log=TRUE you get the log of the density.

The likelihood at $\theta$ will be the product of the densities, taken at each data point.

The log-likelihood is the sum of the log-densities, over the data points, evaluated at a given $\theta$.

That is, sum(dexp(x,rate=theta,log=T)) would be the log-likelihood function. We'd want to maximize that.

But we have sum(-dexp(x,rate=theta,log=T)) (don't ask me why they didn't write the obviously equivalent but presumably faster -sum(dexp(x,rate=theta,log=T))).

That is, the program is minimizing the negative log-likelihood, which is equivalent to maximizing the log-likelihood. Here's the result on calling f on theta values between 1 and 3:

enter image description here

By contrast, this is what the likelihood function looks like:

enter image description here

sum(dexp(x,rate=theta,log=T)) is calculating $θ^ne^{−θ∑^n_{i=1}x_i}$?

It's calculating the log of that quantity.

But here I see I have the minus sign in every program related to MLE in my lecture sheet.

Minimizing rather than maximizing is a convention. There's no particular need for it.

The R documentation is saying that in optim function par Initial values for the parameters to be optimized over. How can I select the initial values?

Would you please tell me how can I relate this program
fexp = function(theta, x){ prod(dexp(x,rate=(1/theta))) }
res3<-optimize(f=fexp,interval=c(0,50), maximum=T, x=x)
res3

with my above program that I have posted in the question?

Why here is the prod function being called?

Because the likelihood is a product.

And why here we have mentioned maximum=T?

Because it's computing the likelihood, which we want to maximize.

Edit: I notice another issue with the above code: it says rate = 1/theta. That implies that the theta there is not the rate parameter of your earlier mathematics and code, but is in fact a scale parameter. Watch out for that! Another thing to watch out for is that likelihood calculations often have underflow problems (and sometimes, overflow problems).

Glen_b
  • 282,281
  • Actually i have missed the class for political unrest in our country. So it would be very kind if you explain me what the above commands are trying to proof ? – ABC Nov 30 '13 at 01:26
  • 1
    This comment: "i have missed the class for political unrest in our country" is such a non sequitur as to require explanation. Are you actually saying that you're learning to compute MLEs in R during a class on political unrest? If you don't understand what your code is doing, it's hardly surprising that it makes no sense to you. Do you know what the likelihood function for the exponential rate parameter is? Could you write it down? – Glen_b Nov 30 '13 at 01:30
  • 2
    @Glen_b My second reading was that Harry missed a class because of political unrest in his country. – Nick Cox Nov 30 '13 at 01:32
  • @NickCox ah, that makes more sense. OP, my question on the exponential likelihood stands. Do you know what the likelihood function for the exponential rate parameter is? Could you write it down? (If you can, please do so, preferably in your question). If you can't, that is probably closer to where your actual question lies. – Glen_b Nov 30 '13 at 01:34
  • The likelihood function is $L=\prod_{i=1}^{n}\theta e^{-\theta x_i}=\theta^n e^{-\theta \sum_{i=1}^{n}x_i}$ . And the MLE can be found by taking log of the likelihood function and then differentiating the log likelihood function with respect to $\theta $ and set it to zero which yields $\hat\theta=\frac{1}{\bar x} $. I have learnt the process in our inference class. But now i have to compute it in R. – ABC Nov 30 '13 at 01:39
  • Okay, thanks harry. Knowing more about what you know leaves me in a better place to answer your question. Can you see what is calculated in f? – Glen_b Nov 30 '13 at 01:41
  • No , i am not understanding the command sum(-dexp(x,rate=theta,log=T)) and also the reason of minimizing the function. – ABC Nov 30 '13 at 01:46
  • See the new last part of my answer - what does dexp return when log=TRUE? (If you're not sure, take a look at ?dexp) – Glen_b Nov 30 '13 at 01:51
  • dexp(x,rate=theta,log=T) -this is computing the density function of exponential variate x with 50 valuse and the sum command is computing the sum of these 50 probabilities. But what is the function of this minus(-)sign inside it? – ABC Nov 30 '13 at 01:52
  • dexp with log=TRUE doesn't return the density. Here's what the help says: log, log.p logical; if TRUE, probabilities p are given as log(p). ... that is when you say log=TRUE you get the log of the density. – Glen_b Nov 30 '13 at 01:53
  • And with log is that x's are given as log(x)? – ABC Nov 30 '13 at 01:55
  • 1
    Sorry, but I don't understand your question there. Take another look at my updated answer. I will say your replies come so quickly that it suggests you're not spending enough time thinking about the replies you are getting. Stop. Slow down. Investigate. Contemplate. – Glen_b Nov 30 '13 at 01:59
  • Is the purpose of MLE to minimize the function for every probability density ? or sometimes we have to maximize the function to get MLE? Is that taking differentiation with respect to the parameter and set it to zero is the process of minimization? I knew it as the process of maximization. – ABC Nov 30 '13 at 02:03
  • What happens if you calculate directly $\theta exp(-\theta x)$ for a particular $x$ and $\theta$. How does it compare with dexp(x,theta)? How does it compare with dexp(x,theta,log=TRUE)? How does that compare with log(dexp(x,theta))? – Glen_b Nov 30 '13 at 02:03
  • The first two will yield the same result that is our density and The last two will yield the same result that is the log of our density . – ABC Nov 30 '13 at 02:07
  • Right. So what would sum(dexp(x,theta,log=TRUE)) be in terms of the problem you want to solve? (note: my code there doesn't have the minus. We'll get to that.) – Glen_b Nov 30 '13 at 02:26
  • $sum(dexp(x,rate=theta,log=T)) $ is calculating $ \theta^n e^{-\theta \sum_{i=1}^{n}x_i}$? But here i see i have the minus sign in every program related to MLE in my lecture sheet. – ABC Nov 30 '13 at 02:39
  • The R documentation is saying that in optim function `par

    Initial values for the parameters to be optimized over.` How can i select the initial values?

    – ABC Nov 30 '13 at 02:44
  • I have found my answer.Many thanks to you. Would you please tell me how can i relate this program fexp = function(theta, x){ prod(dexp(x,rate=(1/theta))) } res3<-optimize(f=fexp,interval=c(0,50), maximum=T, x=x); res3 with my above program that i have posted in the question?Why here the prod function is being called?And why here we have mentioned maximum=T?Is that because of we didn't call a minus sign here? – ABC Nov 30 '13 at 03:08
  • 1
    Oh, it's now clear . prod(dexp(x,rate=theta)) is calculating $\prod_{i=1}^{n}\theta e^{-\theta x_i}$ . You are really a nice teacher , also a wise person i have ever seen before. Thank you very very very much. – ABC Nov 30 '13 at 03:23
  • You have it. I have also added a little on it in my answer; it's a bit slower getting your answers this way, but you'll remember it better. – Glen_b Nov 30 '13 at 04:11