E4: Maximum Likelihood Estimation with Logit Model (Binary Dependent Variable Case)

Problem Statement

Similar to the probit model we introduced in Example 3, a logit (or logistic regression) model is a type of regression where the dependent variable is categorical. It could be binary or multinomial; in the latter case, the dependent variable of multinomial logit could either be ordered or unordered. On the other hand, the logit is different from the probit in several key assumptions.

This example covers the case of binary logit when its dependent variables can take only two values (0/1). Greene (1992) estimated a model of consumer behavior where he examined whether or not an individual had experienced a major negative derogatory report in his/her credit history. The file credit.gdx contains information on the credit history of a sample of more than 1,000 individuals. For descriptions of variables in the data file, see here.

In order to examine the determinants of whether a credit card holder experiences a derogatory credit report, we set up the following discrete choice model similar to what we did in Example 3:
$$y_t = x’_t\beta + \mu_t,$$
where $y_t$ is a discrete (0/1) response variable of card holding satisfying:
\[ y_t = \left\{
\begin{aligned}
1 & \quad \text{if number of major derogatory credit reports $> 0$ } \\
0 & \quad \text{otherwise,}
\end{aligned} \right.\] and $x_t$ is a vector of exogenous variables. Therefore, the conditional probability $\Pr(y_t = 1|x_t)$ measures the chance that the observed outcome for the dependent variable is the “noteworthy” possible outcome — here the probability of receiving major derogatory credit reports given exogenous variables. $\mu_t$ is the error term of observation $t$, while coefficient $\beta$ is the marginal effect measure on the conditional probability $\Pr(y_t = 1|x_t)$ when there is unit change in data $x_t$ (as introduced in Example 3). Having the model, we then can estimate it using the logit model specification and maximum likelihood techniques.

Unlike their counterparts in the probit model, we now assume that error term $\mu_t$ follows an i.i.d. logistic distribution, and the conditional probability takes the logistic form:
$$\Pr(y_t = 1|x_t) = \frac{\exp(x’_t\beta)}{1+\exp(x’_t\beta)}.$$

Mathematical Formulation

A standard statistical textbook such as Greene (2011) would show that the estimator $\hat{\beta}$ could be calculated through maximizing the following log-likelihood function $\ln\mathcal{L}(\beta)$:
$$\hat{\beta} = \arg\max_{\beta}\left[\ln\mathcal{L}(\beta)\right] = \arg\max_{\beta}\left[\sum_t\left( y_t\ln\left(\frac{\exp(x’_t\beta)}{1+\exp(x’_t\beta)}\right)+ (1-y_t)\ln\left(\frac{1}{1+\exp(x’_t\beta)}\right)\right)\right].$$

Similar to Example 3, we report estimated variances based on the diagonal elements of the covariance matrix $\hat{V}_{\hat{\beta}}$ along with t-statistics and p-values.

Demo

This demo provides two data input options for variable estimation and reports regression statistics based on a logit regression model. The reported statistics include estimators, standard errors, T values, and p-values (against non-significant coefficients assumption) at the estimated point.

Option 1: Data in a text file

Users who have access to the data needed in the estimation should create a text file with the data, for example, the credit history data in Greene (1992). See credit_data.txt. User-provided data files must satisfy the following restriction:

The column that contains dependent variable data must be indexed by y. Note that the dependent variable data may be either a (0/1) response variable or the “original” dependent variable that is converted to a (0/1) variable.

Note that the estimated coefficients in the logit model are indexed by the names of the explanatory variables in the data.

Users then can download a sample GAMS model file, logit_txt.gms (logit model with text input), and modify it to solve their own estimation problems. Users should specify their own set definitions (sets “t” and “n” in the sample), include their own table of data (as described above), and run the modified model to obtain the estimation results.

Option 2: Data in a GAMS data exchange (gdx) file

Users who have access to the data in a GAMS data exchange (gdx) file can use one of the following two methods.

Method 1: Solve using the NEOS Server
Users can click on the “Solve with NEOS” button to find estimation results based on the default gdx file, i.e., the credit history data from Greene (1992). See credit.gdx. Alternatively, users can upload their own data by clicking on the button next to “Upload GDX File” and then “Solve with NEOS”. User-provided gdx files must satisfy the same restriction as listed above in Option 1.

Clicking on the “Reset” button will clear the solution.
Method 2: Calculate the regression statistics locally
Users who have access to GAMS can download the GAMS model file, logit_gdx.gms (logit model with gdx input), and solve the model locally with the following command:
- “gams logit_gdx –in=mydata”
where mydata.gdx is a data file provided by the user. The gdx file must satisfy the restriction as described above in Option 1.

Model

A printable version of the model is here: logit_gdx.gms with gdx form data and logit_txt.gms with text form data.

References

Greene, William. 1992. A Statistical Model for Credit Scoring. Working Paper #92-29, Department of Economics, Stern School of Business, New York University, New York.
Kalvelagen, Erwin. 2007. Least Squares Calculations with GAMS. Available for download at http://www.amsterdamoptimization.com/pdf/ols.pdf.
Greene, William. 2011. Econometrics Analysis, 7th ed. Prentice Hall, Upper Saddle River, NJ.