Example 3: Maximum Likelihood Estimation with Probit Model (Binary Dependent Variable Case)

Case Study Contents

Problem statement

In statistics, a probit model (binary dependent variable case) is a type of regression in which the dependent variable can take only two values (0/1), for example, married or not married. The name comes from probability and unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific category.

As an example, consider the purchase of fluid milk by Mexican households as it relates to the concern about the lack of an adequate intake of calcium, especially by children. We can apply probit regression on the Encuesta Nacional de Ingresos y Gastos de los Hogares (ENIGH) data (2002). For descriptions of variables in the data file, see here.

Assume for each observation $t$, the net utility gained from the consumption of fluid milk $U_t^*$, which is not observable, is related to a set of exogenous variables $x_t$ ($I \times 1$ vector, where $I$ is the total number of exogenous variables). Then, we are interested in coefficients $\beta$, which describe this relationship in the following latent model (as well as in the related probit model), assuming error term $\mu_t$ follows a standard normal distribution, i.e., $\mu_t \sim N(0,1)$:

U_t^* = x'_t\beta + \mu_t.

This latent model is equivalent to the probit model \begin{equation}
y_t = x'_t\beta + \mu_t,

when the relationship between latent utility variable $U_t^*$ and the observable response (0/1) variable of whether a household purchases fluid milk, $y_t$, satisfies:

\[ y_t = \left\{
1 & \quad \text{if
$U^*_t > 0$ } \\
0 & \quad \text{otherwise}.
\end{aligned} \right.\]

Note that in the above model, the $j^{th}$ element of coefficients vector $\beta$, $\beta_j$ ($j \in \{1,2,\dots, I\}$) measures the change in the conditional probability $\Pr(y_t = 1|x_t) $ when there is unit change in $x^j_t$ ($j^{th}$ element in vector $x_t$). To further develop this regression model, in addition to i.i.d normally distributed error terms, we assume that the conditional probability takes the normal form:
$$\Pr(y_t = 1|x_t) = \Phi(x'_t\beta),$$
where $\Phi(\cdot)$ is the standard normal CDF.

Mathematical Formulation

A standard statistical textbook such as Greene (2011) would show that the estimator $\hat{\beta}$ could be calculated through maximizing the following log-likelihood function $\ln\mathcal{L}(\beta)$:

\[ \hat{\beta} = \arg\max_{\beta}\left[\ln\mathcal{L}(\beta)\right] = \arg\max_{\beta}\left[\sum_t\left(y_t\ln\Phi(x'_t\beta) + (1-y_t)\ln\left(1-\Phi(x'_t\beta)\right)\right)\right].\]

In order to report standard regression outcomes such as t-statistic, p-value and others as calculated in Example 1, we need the estimated co-variance matrix of the estimator $\hat{\beta}$, i.e., $\hat{V_{\hat{\beta}}}$, which is based on the inverse Hessian matrix according to Greene (2011),
$$\hat{V}_{\hat{\beta}} = (\hat{H})^{-1},$$
where $\hat{H} = \nabla^2\ln\mathcal{L}(\beta)_{|\hat{\beta}}$ is the estimated Hessian of the log-likelihood function $\ln\mathcal{L}(\beta)$ at the solution point $\hat{\beta}$.

GAMS provides a mechanism to generate the Hessian matrix $H$ at the solution point. As we can see from this maximum likelihood example in GAMS with gdx data, probit_gdx.gms, we rely on the convertd solver with options dictmap and hessian, generating a dictionary map from the solver to GAMS and the Hessian matrix at the solution point, then saving them in data files dictmap.gdx and hessian.gdx individually. Combining information from these two files will provide the Hessian matrix $H$ at the solution point $\hat{\beta}$.


We have implemented a discrete choice model in GAMS that can be solved using the NEOS Server. Click here to experiment with the demo of example 3.


A printable version of the probit model is here: probit_gdx.gms for gdx input and probit_txt.gms for text input.


Optimization Category (Linear Programing, Integer, MIP and etc.):