Built using Zelig version 5.1.4.90000


Bayesian Tobit Regression with tobit.bayes.

Bayesian tobit regression estimates a linear regression model with a censored dependent variable using a Gibbs sampler. The dependent variable may be censored from below and/or from above. For other linear regression models with fully observed dependent variables, see Bayesian regression, maximum likelihood normal regression, or least squares.

Syntax

z.out <- zelig(Y ~ X1 + X2, below = 0, above = Inf,
               model = "tobit.bayes", weights = w, data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Inputs

zelig() accepts the following arguments to specify how the dependent variable is censored.

  • below: point at which the dependent variable is censored from below. If the dependent variable is only censored from above, set below = -Inf. The default value is 0.

  • above: point at which the dependent variable is censored from above. If the dependent variable is only censored from below, set above = Inf. The default value is Inf.

Additional Inputs

Use the following arguments to monitor the convergence of the Markov chain:

  • burnin: number of the initial MCMC iterations to be discarded (defaults to 1,000).

  • mcmc: number of the MCMC iterations after burnin (defaults to 10,000).

  • thin: thinning interval for the Markov chain. Only every thin-th draw from the Markov chain is kept. The value of mcmc must be divisible by this value. The default value is 1.

  • verbose: defaults to FALSE. If TRUE, the progress of the sampler (every \(10\%\)) is printed to the screen.

  • seed: seed for the random number generator. The default is NA which corresponds to a random seed of 12345.

  • beta.start: starting values for the Markov chain, either a scalar or vector with length equal to the number of estimated coefficients. The default is NA, such that the least squares estimates are used as the starting values.

Use the following parameters to specify the model’s priors:

  • b0: prior mean for the coefficients, either a numeric vector or a scalar. If a scalar, that value will be the prior mean for all coefficients. The default is 0.

  • B0: prior precision parameter for the coefficients, either a square matrix (with the dimensions equal to the number of the coefficients) or a scalar. If a scalar, that value times an identity matrix will be the prior precision parameter. The default is 0, which leads to an improper prior.

  • c0: c0/2 is the shape parameter for the Inverse Gamma prior on the variance of the disturbance terms.

  • d0: d0/2 is the scale parameter for the Inverse Gamma prior on the variance of the disturbance terms.

Zelig users may wish to refer to help(MCMCtobit) for more information.

Examples

Basic Example

Attaching the sample dataset:

data(tobin)

Estimating linear regression using tobit.bayes:

z.out <- zelig(durable ~ age + quant, model = "tobit.bayes",
               data = tobin, verbose = FALSE)
## How to cite this model in Zelig:
##   Ben Goodrich, and Ying Lu. 2013.
##   tobit-bayes: Bayesian Tobit Regression for a Censored Dependent Variable
##   in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
##   "Zelig: Everyone's Statistical Software," http://zeligproject.org/

You can check for convergence before summarizing the estimates with three diagnostic tests. See the section Diagnostics for Zelig Models for examples of the output with interpretation:

z.out$geweke.diag()
z.out$heidel.diag()
z.out$raftery.diag()
summary(z.out)
## Model: 
## 
## Iterations = 1001:11000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean       SD Naive SE Time-series SE
## (Intercept)  18.24881  41.2238 0.412238       0.738521
## age          -0.28131   0.6070 0.006070       0.014936
## quant        -0.04816   0.1485 0.001485       0.002272
## sigma2      186.84624 433.9812 4.339812      21.462492
## 
## 2. Quantiles for each variable:
## 
##                 2.5%     25%      50%       75%    97.5%
## (Intercept) -59.1541 -0.7344 17.48159  35.82751 100.1944
## age          -1.6027 -0.5018 -0.21511   0.02447   0.6598
## quant        -0.3361 -0.1149 -0.04628   0.01886   0.2289
## sigma2       19.6083 49.1654 88.57180 178.15120 917.7208
## 
## Next step: Use 'setx' method

Setting values for the explanatory variables to their sample averages:

x.out <- setx(z.out)

Simulating quantities of interest from the posterior distribution given x.out.

s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
## 
##  sim x :
##  -----
## ev
##       mean       sd     50%      2.5%    97.5%
## 1 2.093275 1.451578 1.74511 0.5979316 5.694411
## pv
##          mean       sd      50% 2.5%    97.5%
## [1,] 5.713629 9.426081 1.806649    0 29.96906

Simulating First Differences

Set explanatory variables to their default(mean/mode) values, with high (80th percentile) and low (20th percentile) liquidity ratio (quant):

x.high <- setx(z.out, quant = quantile(tobin$quant, prob = 0.8))
x.low <- setx(z.out, quant = quantile(tobin$quant, prob = 0.2))

Estimating the first difference for the effect of high versus low liquidity ratio on duration(durable):

s.out2 <- sim(z.out, x = x.high, x1 = x.low)
summary(s.out2)
## 
##  sim x :
##  -----
## ev
##       mean       sd      50%      2.5%    97.5%
## 1 1.883163 1.859314 1.412404 0.2393809 6.341747
## pv
##          mean       sd      50% 2.5%    97.5%
## [1,] 5.624543 9.712387 1.555601    0 30.37524
## 
##  sim x1 :
##  -----
## ev
##       mean       sd      50%      2.5%    97.5%
## 1 2.581795 1.894538 2.149404 0.5747006 7.390108
## pv
##          mean       sd      50% 2.5%    97.5%
## [1,] 5.955529 9.338868 2.278004    0 30.40807
## fd
##        mean       sd       50%      2.5%    97.5%
## 1 0.6986325 2.149956 0.6738879 -3.383639 4.875659

Model

Let \(Y_i^*\) be the dependent variable which is not directly observed. Instead, we observe \(Y_i\) which is defined as following:

\[ Y_i = \left\{ \begin{array}{lcl} Y_i^* &\textrm{if} & c_1<Y_i^*<c_2 \\ c_1 &\textrm{if} & c_1 \ge Y_i^* \\ c_2 &\textrm{if} & c_2 \le Y_i^* \end{array}\right. \]

where \(c_1\) is the lower bound below which \(Y_i^*\) is censored, and \(c_2\) is the upper bound above which \(Y_i^*\) is censored.

  • The stochastic component is given by

\[ \begin{aligned} \epsilon_{i} & \sim & \textrm{Normal}(0, \sigma^2)\end{aligned} \]

where \(\epsilon_{i}=Y^*_i-\mu_i\).

  • The systematic component is given by

\[ \begin{aligned} \mu_{i}= x_{i} \beta,\end{aligned} \]

where \(x_{i}\) is the vector of \(k\) explanatory variables for observation \(i\) and \(\beta\) is the vector of coefficients.

  • The semi-conjugate priors for \(\beta\) and \(\sigma^2\) are given by

\[ \begin{aligned} \beta & \sim & \textrm{Normal}_k \left( b_{0},B_{0}^{-1}\right) \\ \sigma^{2} & \sim & \textrm{InverseGamma} \left( \frac{c_0}{2}, \frac{d_0}{2} \right) \end{aligned} \]

where \(b_{0}\) is the vector of means for the \(k\) explanatory variables, \(B_{0}\) is the \(k\times k\) precision matrix (the inverse of a variance-covariance matrix), and \(c_0/2\) and \(d_0/2\) are the shape and scale parameters for \(\sigma^{2}\). Note that \(\beta\) and \(\sigma^2\) are assumed a priori independent.

Quantities of Interest

  • The expected values (qi$ev) for the tobit regression model is calculated as following. Let

\[ \begin{aligned} \Phi_1 &=& \Phi\left(\frac{(c_1 - x \beta)}{\sigma}\right) \\ \Phi_2 &=& \Phi\left(\frac{(c_2 - x \beta)}{\sigma}\right) \\ \phi_1 &=& \phi\left(\frac{(c_1 - x \beta)}{\sigma}\right) \\ \phi_2 &=& \phi\left(\frac{(c_2 - x \beta)}{\sigma}\right) \end{aligned} \]

where \(\Phi(\cdot)\) is the (cumulative) Normal density function and \(\phi(\cdot)\) is the Normal probability density function of the standard normal distribution. Then the expected values are

\[ \begin{aligned} E(Y|x) &=& P(Y^* \le c_1|x) c_1+P(c_1<Y^*<c_2|x) E(Y^* \mid c_1<Y^*<c_2, x)+P(Y^* \ge c_2) c_2 \\ &=& \Phi_{1}c_1 + x \beta(\Phi_{2}-\Phi_{1}) + \sigma (\phi_1 -\phi_2) + (1-\Phi_2) c_2,\end{aligned} \]

  • The first difference (qi$fd) for the tobit regression model is defined as

\[ \begin{aligned} \text{FD}=E(Y\mid x_{1})-E(Y\mid x).\end{aligned} \]

  • In conditional prediction models, the average expected treatment effect (qi$att.ev) for the treatment group is

\[ \begin{aligned} \frac{1}{\sum t_{i}}\sum_{i:t_{i}=1}[Y_{i}(t_{i}=1)-E[Y_{i}(t_{i}=0)]],\end{aligned} \]

where \(t_{i}\) is a binary explanatory variable defining the treatment (\(t_{i}=1\)) and control (\(t_{i}=0\)) groups.

Output Values

The Zelig object stores fields containing everything needed to rerun the Zelig output, and all the results and simulations as they are generated. In addition to the summary commands demonstrated above, some simply utility functions (known as getters) provide easy access to the raw fields most commonly of use for further investigation.

In the example above z.out$get_coef() returns the estimated coefficients, z.out$get_vcov() returns the estimated covariance matrix, and z.out$get_predict() provides predicted values for all observations in the dataset from the analysis.

See also

Bayesian tobit regression is part of the MCMCpack library by Andrew D. Martin and Kevin M. Quinn. The convergence diagnostics are part of the CODA library by Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines.

Martin AD, Quinn KM and Park JH (2011). “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software, 42 (9), pp. 22. <URL: http://www.jstatsoft.org/v42/i09/>.