On the Assessing of Fits for Simple and Multiple Logistic Models with Dichotomous Response Categories

Veeranun Pongsapukdee


The logistic model to allow for one or several explanatory variables of which the model is also called simple or multiple logistic model, respectively. In the usual case of logistic models, the basic random variable Y is dichotomous response which is commonly used procedure in many disciplines in health sciences research, medical sciences, engineering settings, and is becoming increasingly popular in the behavioral and social sciences and in quality control. In this model data taking the value 1 with the success probability P1, and the value 0 with the failure probability (1-P1). Problems arise with different proposed statistics for assessing the fit of the models and which one of them is more preferable. In this article, 1,000 computer simulation experiments in each condition of the probability of Y=1 (P1), calculated parameters and X’s distributions, were generated to evaluate the performance of several statistics, all of which were used for assessing the goodness-of-fit of the models. Ten statistics were computed for each combination of base rate levels and model conditions (Table 1): the likelihood ratio statistics GM, the indexes of predictive efficiency which consist of λp, τpand Φp (Menard, 1995), the coefficients of determination or R2 analogs which consist of R2C (the contingency coefficient R2; Aldrich and Nelson, 1984), R2L (the log likelihood ratio R2; McFadden,1974; Menard,1995),R2M (the geometric mean squared improvement per observation R2; Maddala, 1983; Ryan,1997), R2N (the adjusted geometric mean squared improvement R2; Nagelkerke,1991; Ryan,1997), and R2O (the ordinary least squares R2). Moreover, the correlation matrices for determining their magnitude (absolute values) of the measures of independence from the base rate levels, and among each pair of the statistics, the percentages of correct classification of the model (%Correct) and the type II error rates, corresponding to the percentages of power of the tests (%Accept) were also computed.

The results of the simulation studies show that, for hypothesis testing goodness-of-fit of models, both of the %Correct (77-99 %) and the %Accept (94-96 %) are all satisfied and are consistent. The average of %Correct, when X is Exponential is around 77% and when X’s are Bernoulli and multinomial distributed, they are approximately equal to 99%. Similarly for the average of % Accept which are also approximately equal to 94 %. For X~ Exponential, R2C, R2M, and R2O are preferable and for X~ Bernoulli R2C, R2M, R2O are still preferable but R2O outperforms. For (X1, X2)~ Multinomial, the results are similar but slightly superior to those of X~ Bernoulli. The indexes of predictive efficiency of the multinomial case when the success probability P1 is high, the lp , tp statistics may be used as alternatives of the R2C, R2M and R2O. Besides these, it is also found that the absolute values correlation coefficients among the R2 analogs increase as the P1 increase and also the values among the R2 analogs are higher than those among the R2 analogs and the indexes of predictive efficiency. Some recommendations are made for logistic models with dichotomous response and exponential explanatory variable distributed. Those are the statistics R2C, R2M, R2O, λp

Full Text: PDF