Newton method implementation of maximum likelihood estimation based on R and Python

catalogue

0. Preface

1. Theoretical basis

2. Maximum likelihood estimation of Cauchy distribution

2.1 theoretical basis

2.2 algorithm

2.2.1R language implementation

2.2.2 Python language implementation

3. Maximum likelihood estimation of gamma distribution

3.1 theoretical basis

3.2 algorithm

3.2.1R language implementation

3.2.2 Python language implementation

0. Preface

        Recently, I was studying Theory and Method of Statistics. The textbook used was computer age statistical inference: algorithms, evolution, and data science, which was jointly written by Bradley Efron and Trevor Hastie. Fisher inference and maximum likelihood estimation (Fisher inference and maximum likelihood estimation) described in Chapter 4 of the book mentioned that the maximum likelihood estimation is not so easy to solve in practical applications, such as Cauchy distribution and Gamma distribution.

         If the maximum likelihood estimation method has no explicit solution, it can be solved by numerical calculation (such as Newton method); Furthermore, if the second derivative does not exist or the Hessian matrix is non positive definite, the quasi Newton method can be used; More complex, you can use MM algorithm (EM is a special case of MM). Taking Newton method as an example, this paper gives the solution   The theory of maximum likelihood estimation parameters of Cauchy distribution and Gamma distribution is realized by R and Python.

1. Theoretical basis

        This section gives the general theory of maximum likelihood parameter estimation of distribution by Newton method.

         If random variable  Independent and identically distributed inAnd a set of samples is known, in order to estimate the parameters of the distribution, the maximum likelihood estimation method can be used.

       First, write the likelihood function of the sample

       yesLogarithmic processing is carried out to obtain the logarithmic likelihood function

       Then unknown parameters are solvedEquivalent to solving the following equations

       Let's assume that the convergent solution is, willIn The neighborhood of is expanded into Taylor series

       In this way, an iterative relationship is obtained

       If it is continuous and the zero point to be solved is isolated, there is a region around the zero point. As long as the initial value is in this adjacent region, the Newton method must converge. Moreover, if it is not 0, the Newton method will have the performance of square convergence. Roughly speaking, this means that the significant number of Newton's results will double with each iteration.

2. Maximum likelihood estimation of Cauchy distribution

         If random variableSubject to Cauchy distribution, recorded as , whereIs the scale parameter of half the width at half the maximum value,It defines the location parameter of the distribution peak position

       When , the Cauchy distribution at this time is called the standard Cauchy distribution

       The most special property of Cauchy distribution is that its expectation and higher-order moments do not exist, so it is naturally impossible to estimate the moments of parameters. However, the cdf of Cauchy distribution has good properties, and the parameters can be estimated by using the quantiles of a group of samples.

2.1 theoretical basis

        Estimation using maximum likelihood method, known samples

        Therefore, the iterative formula is

2.2 algorithm

Step0: given,, stop accuracy

Step 1: Calculation , if

       Then terminate, otherwise proceed to the next step

Step 2: Calculation , order

Step3: Calculation,   Jump to step 1

2.2.1R language implementation

# Set up Newton algorithm framework
Newtons = function(fun, x, theta,ep = 1e-5, it_max = 100){
  index = 0 ;k = 1
  while (k <= it_max){
    theta1 = theta
    obj = fun(x,theta)
    theta = theta - solve(obj$J,obj$f)
    norm = sqrt((theta - theta1) %*% (theta - theta1))
    if (norm < ep){
      index = 1; break
    }
    k = k + 1
  }
  obj = fun(x,theta)
  list(root = theta, it = k, index = index, FunVal = obj$f)
}
# Calculate the hessian matrix and the first derivative
funs = function(x,theta){
  n = length(x)
  temp_1 = (x-theta[2])^2+theta[1]^2
  temp_2 = (x-theta[2])^2-theta[1]^2
  f = c(n/theta[1]-sum(2*theta[1]/temp_1), 
         sum((2*(x-theta[2]))/temp_1))
  J = matrix(c(-n/theta[1]^2-sum(2*temp_2/temp_1^2), -sum((4*theta[1]*(x-theta[2]))/temp_1^2),
                -sum((4*theta[1]*(x-theta[2]))/temp_1^2),sum(2*temp_2/temp_1^2)), nrow = 2, byrow = T)
  list(f = f, J = J)
}

# Take 1000 samples
set.seed(80)
sample = rcauchy(1000,scale = 2,location = 2)

# The quantile estimation of cauchy distribution parameters is calculated as the initial value
gamma_ = quantile(sample,0.75) - median(sample)
theta_ = median(sample)
theta = c(gamma_,theta_)

Newtons(funs,sample,theta)

2.2.2 Python language implementation

import numpy as np
import scipy.stats as st

np.random.seed(12)
sample = st.cauchy.rvs(loc=1,scale = 1,size = 100) # scipy.stats.cauchy.rvs(loc=0, scale=1, size=1, random_state=None)
gamma_ = np.quantile(sample,0.75) - np.median(sample)
theta_ = np.median(sample)
theta = np.array([gamma_,theta_])

def Newtons(fun,x,theta,ep=1e-5,it_max = 100):
    index = 0
    k = 1
    while k <= it_max:
        theta1 = theta
        obj = fun(x,theta)
        theta = theta - np.dot(np.linalg.inv(obj[1]),obj[0])
        norm = np.sqrt(np.sum((theta - theta1)**2))
        if norm < ep:
            index = 1
            break
        k = k+1
    obj = fun(x,theta)
    print('gamma,theta The estimated value is%.3f,%.3f'%(theta[0],theta[1]))
def funs(x,theta):
    n = len(x)
    temp_1 = (x - theta[1]) ** 2 + theta[0] ** 2
    temp_2 = (x - theta[1]) ** 2 - theta[0] ** 2
    f = np.array([n/theta[0]-np.sum(2*theta[0]/temp_1),
                  np.sum((2*(x-theta[1]))/temp_1)])
    j = np.array([[-n/theta[0]**2-sum(2*temp_2/temp_1**2),-np.sum((4*theta[0]*(x-theta[1]))/temp_1**2)],
                  [-np.sum((4*theta[0]*(x-theta[1]))/temp_1**2),sum(2*temp_2/temp_1**2)]])
    return [f,j]

Newtons(funs,sample,theta)

3. Maximum likelihood estimation of gamma distribution

         If random variable Obey the Gamma distribution and record as, whereIs a shape parameter, β Is the scale parameter, λ Is the location parameter

        In order to give the moment estimation of the three parameters of Gamma distribution, the first, second and third order origin moment of Gamma distribution is considered (the solution skill is to gather Gamma function)

        Taking the proof of the first-order origin moment as an example, x is divided into

so

                                                       

                           

               

                                    

                                          

 ①                                                                                    

        For the second-order and third-order origin moments, respectivelySplit into

        Easy to get

        also

        Available from ① ② ③

3.1 theoretical basis

        Known samples, parameters

        Therefore, the iterative formula is

3.2 algorithm

Step0: given,, stop accuracy

Step 1: Calculation , if

       Then terminate, otherwise proceed to the next step

Step 2: Calculation , order

Step3: Calculation,   Jump to step 1

3.2.1R language implementation

# Set up Newton algorithm framework
Newtons = function(fun, x, theta,ep = 1e-5, it_max = 100){
  index = 0; k = 1
  while (k <= it_max ){
    theta1 = theta; obj = fun(x,theta)
    theta = theta - solve(obj$J,obj$f)
    norm = sqrt((theta - theta1) %*% (theta - theta1))
    if (norm < ep){
      index = 1; break
    }
    k = k + 1
  }
  obj = fun(x,theta)
  list(root = theta, it = k, index = index, FunVal = obj$f)
}
# Calculate the hessian matrix and the first derivative
funs = function(x,theta){
  n = length(x)
  f = c(-n*log(theta[2])-n*digamma(theta[1])+sum(log(x-theta[3])), 
         -n*theta[1]/theta[2]+1/(theta[2]^2)*sum(x-theta[3]),
         (theta[1]-1)*sum(-1/(x-theta[3]))+n/theta[2])
  J = matrix(c(-n*trigamma(theta[1]), -n/theta[2], sum(-1/(x-theta[3])),
                -n/theta[2],n*theta[1]/(theta[2]^2)-2/(theta[2]^3)*sum(x-theta[3]),-n/(theta[2]^2),
                sum(-1/(x-theta[3])),-n/(theta[2]^2),(theta[1]-1)*sum(-1/(x-theta[3])^2)), nrow = 3, byrow = T)
  list(f = f, J = J)
}

# Extract random gamma
set.seed(80)
sample = rgamma(100,2,)

# The moment estimation of gamma distribution parameters is calculated as the initial value
s_m = mean(sample)
s_v = var(sample)
m_3 = (sum((sample-s_m)^3))/length(sample)
alpha = (4*(s_v)^3)/(m_3^2)
beta = m_3/(2*s_v)
lambda = s_m-2*((s_v)^2/m_3)
theta = c(alpha,beta,lambda)

Newtons(funs,sample,theta)

3.2.2 Python language implementation

import numpy as np
import scipy.stats as st
import scipy.special as sp

np.random.seed(12)
sample = st.gamma.rvs(2,scale = 1,size = 50) # scipy.stats.gamma.rvs(a, loc=0, scale=1, size=1, random_state=None)
s_m = sample.mean()
s_v =sample.var()
m_3 = np.mean(((sample - s_m)**3))
alpha = (4*(s_v)**3)/(m_3**2)
beta = m_3/(2*s_v)
lambda_ = s_m-2*((s_v)**2/m_3)
theta = np.array([alpha,beta,lambda_])

def Newtons(fun,x,theta,ep=1e-5,it_max = 100):
    index = 0
    k = 1
    while k <= it_max:
        theta1 = theta
        obj = fun(x,theta)
        theta = theta - np.dot(np.linalg.inv(obj[1]),obj[0])
        norm = np.sqrt(np.sum((theta - theta1)**2))
        if norm < ep:
            index = 1
            break
        k = k+1
    obj = fun(x,theta)
    print('alpha,beta,lambda The estimated value is%.3f,%.3f,%.3f'%(theta[0],theta[1],theta[2]))
def funs(x,theta):
    n = len(x)
    f = np.array([-n*np.log(theta[1])-n*sp.digamma(theta[0])+np.sum(np.log(x-theta[2])),
                  -n*theta[0]/theta[1]+1/(theta[1]**2)*np.sum(x-theta[2]),
                  (theta[0]-1)*np.sum(-1.0/(x-theta[2]))+n/theta[1]])
    j = np.array([[-n*sp.polygamma(1,theta[0]),-n/theta[1],np.sum(-1/(x-theta[2]))],
                  [-n/theta[1],n*theta[0]/(theta[1]**2)-2/(theta[1]**3)*np.sum(x-theta[2]),-n/(theta[1]**2)],
                  [np.sum(-1/(x-theta[2])),-n/(theta[1]**2),(theta[0]-1)*np.sum(-1/(x-theta[2])**2)]])
    return [f,j]

Newtons(funs,sample,theta)

Note: in the process of maximum likelihood estimation of three parameters of Gamma distribution, if Newton method is used, it is easy to have positive definite matrix, so the correct solution cannot be obtained. At this time, quasi Newton method or modified Newton method can be used.

Tags: Python Algorithm

Posted on Tue, 21 Sep 2021 18:47:52 -0400 by longtone