### Case Study contents

## Problem Statement

The *Tobit* is a statistical model proposed by James Tobin (1958) to describe the relationship between a **non-negative** dependent variable $y_t$ and a set of exogenous variables $x_t$. The term *Tobit* was derived from * Tob*in's name by truncating and adding

*by analogy with the*

**it***prob*model.

**it**/log**it**Suppose we use the *Encuesta Nacional de Ingresos y Gastos de los Hogares* (ENIGH, 2002) data from Mexico as mentioned in Example 3 to examine purchases of cheese by Mexican households. We discover that more than 60% of the surveyed households did not report cheese purchases over the survey period. This is understandable given the survey covers purchases for only a 1-week period and the shelf-life of many cheeses is longer than the survey period. For descriptions of variables in the data file, see here.

In order to examine the determinants of Mexican households cheese consumption while accounting for many zero values, we set up the following latent regression model.

$$y^*_t = x'_t\beta + \mu_t.$$

The model supposes that there is a latent (i.e. unobservable) variable $y^*_t$. This variable linearly depends on a set of exogenous variables $x_t$ via a vector $\beta$, which determines the relationship between exogenous variables $x_t$ and the latent variable $y^*_t$. (See Example 3 for a definition of vector $\beta$.) In addition, there is a normally distributed error term $\mu_t \sim N(0,\sigma^2)$ to capture random influences on this relationship. Note that standard error $\sigma$ here is a parameter to be estimated in *Tobit*.

However, in reality, we often only observe that

\[ y_t = \left\{

\begin{array}{l l}

y^*_t & \quad \text{if $y^*_t > \tau$}\\

\tau & \quad \text{otherwise.}

\end{array} \right.\]

when the threshold value being $\tau$ ($\tau = 0$ in this example). We could estimate the above model using the *Tobit* model specification and maximum likelihood techniques.

## Mathematical Formulation

For a *Tobit* that is censored from below at $\tau$, the log-likelihood function is the sum of the probability density function of error term $\mu_t$ when $y_t^* > \tau$ and the probability mass function of $\mu_t$ when $y_t^*$ is less than or equal to $\tau$. When the threshold value $\tau = 0$, a standard statistical textbook such as Greene (2011) would show that the estimator $\hat{\beta}$ could be calculated by maximizing the following log-likelihood function $\ln\mathcal{L}(\beta)$:

where $\Phi(\cdot)$ is the cumulative distribution function of a standard normal distribution, and $\phi(\cdot)$ is the corresponding density function.

To report standard regression outcomes such as t-statistics, p-values, and others as defined in Example 1, we need the estimated co-variance matrix of the estimator $\hat{\beta}$, i.e., $\hat{V_{\hat{\beta}}}$, as we did in Example 3.

## Demo

Check out the demo of example 5 to experiment with a discrete choice model for estimating and statistically testing the *tobit* model.

## Model

A printable version of the model is here: tobit_gdx.gms with gdx form data and tobit_txt.gms with text form data.

## References

- Tobin, James. 1958. Estimation of relationships for limited dependent variables.
*Econometrica***26**(1): 24-36. - ENIGH data. Available for download at Instituto Nacional de Estadistica y Geograffia.
- Kalvelagen, Erwin. 2007.
*Least Squares Calculations with GAMS*. Available for download at http://www.amsterdamoptimization.com/pdf/ols.pdf. - Greene, William. 2011.
*Econometrics Analysis, 7th ed.*Prentice Hall, Upper Saddle River, NJ.