** Case Study Contents**

## Problem Statement

Similar to the *probit* model we introduced in Example 3, a *logit* (or logistic regression) model is a type of regression where the dependent variable is categorical. It could be binary or multinomial; in the latter case, the dependent variable of multinomial logit could either be ordered or unordered. On the other hand, the *logit* is different from the *probit* in several key assumptions.

This example covers the case of binary *logit* when its dependent variables can take only two values (0/1). Greene (1992) estimated a model of consumer behavior where he examined whether or not an individual had experienced a major negative derogatory report in his/her credit history. The file credit.gdx contains information on the credit history of a sample of more than 1,000 individuals. For descriptions of variables in the data file, see here.

In order to examine the determinants of whether a credit card holder experiences a derogatory credit report, we set up the following discrete choice model similar to what we did in Example 3:

$$y_t = x'_t\beta + \mu_t,$$

where $y_t$ is a discrete (0/1) response variable of card holding satisfying:

\[ y_t = \left\{

\begin{aligned}

1 & \quad \text{if number of major derogatory credit reports $> 0$ } \\

0 & \quad \text{otherwise,}

\end{aligned} \right.\]

and $x_t$ is a vector of exogenous variables. Therefore, the conditional probability $\Pr(y_t = 1|x_t)$ measures the chance that the observed outcome for the dependent variable is the "noteworthy" possible outcome -- here the probability of receiving major derogatory credit reports given exogenous variables. $\mu_t$ is the error term of observation $t$, while coefficient $\beta$ is the marginal effect measure on the conditional probability $\Pr(y_t = 1|x_t)$ when there is unit change in data $x_t$ (as introduced in Example 3). Having the model, we then can estimate it using the *logit* model specification and maximum likelihood techniques.

Unlike their counterparts in the *probit* model, we now assume that error term $\mu_t$ follows an *i.i.d.* logistic distribution, and the conditional probability takes the logistic form:

$$\Pr(y_t = 1|x_t) = \frac{\exp(x'_t\beta)}{1+\exp(x'_t\beta)}.$$

## Mathematical Formulation

A standard statistical textbook such as Greene (2011) would show that the estimator $\hat{\beta}$ could be calculated through maximizing the following log-likelihood function $\ln\mathcal{L}(\beta)$:

$$\hat{\beta} = \arg\max_{\beta}\left[\ln\mathcal{L}(\beta)\right] = \arg\max_{\beta}\left[\sum_t\left( y_t\ln\left(\frac{\exp(x'_t\beta)}{1+\exp(x'_t\beta)}\right)+ (1-y_t)\ln\left(\frac{1}{1+\exp(x'_t\beta)}\right)\right)\right].$$

Similar to Example 3, we report estimated variances based on the diagonal elements of the covariance matrix $\hat{V}_{\hat{\beta}}$ along with t-statistics and p-values.

## Demo

Check out the demo of example 4 to experiment with a discrete choice model for estimating and statistically testing the *logit* model.

## Model

A printable version of the model is here: logit_gdx.gms with gdx form data and logit_txt.gms with text form data.

## References

- Greene, William. 1992.
*A Statistical Model for Credit Scoring*. Working Paper #92-29, Department of Economics, Stern School of Business, New York University, New York. - Kalvelagen, Erwin. 2007.
*Least Squares Calculations with GAMS*. Available for download at http://www.amsterdamoptimization.com/pdf/ols.pdf. - Greene, William. 2011.
*Econometrics Analysis, 7th ed.*Prentice Hall, Upper Saddle River, NJ.