10 Structure of the npde package

10.0.1 Online help

Details on the functions, available graphs and options can be found in the online documentation. Please type:

?npde.plot.default

to get started.

10.1 Errors during execution

Sometimes the function is unable to compute the decorrelated prediction distribution errors for one or more subjects. The following error messages can appear:

The computation of the npde has failed for subject xx because the Cholesky
decomposition of the covariance matrix of the simulated data could not be
obtained. 

or

The computation of the npde has failed for subject xx because the covariance
matrix of the simulated data could not be inverted. 

followed by:

This usually means that the covariance matrix is not positive-definite. This can
be caused by simulations widely different from observations (in other words, a
poor model). We suggest to plot a prediction interval from the simulated data
to check whether the simulations are reasonable, and to consider prediction
discrepancies.
Prediction discrepancies will now be computed.

In our experience, this usually happens when the model is so ill-conditioned that the matrices involved in the computation of the prediction distribution errors are singular, and mostly happens when the model predicts the data very poorly. A prediction interval (or Visual Predictive Check) can be plotted to check this.

When \(\mathrm{npde}\) cannot be computed, the program computes automatically \(\mathrm{pd}\) even if the {calc.pd=F} option was used.

The following graphs are plotted using \(\mathrm{pd}\) instead of \(\mathrm{npde}\)

  1. a quantile-quantile plot: plot of the \(\mathrm{pd}\) versus the corresponding quantiles of a uniform distribution

    • the line \(y=x\) is also drawn
  2. a histogram of the \(\mathrm{pd}\) with the uniform density \(\mathcal{U}(0,1)\) overlain

  3. a plot of the \(\mathrm{pd}\) versus the independent variable X

  4. a plot of the \(\mathrm{pd}\) versus \(\mathrm{ypred}\)

    • for these last two graphs, we plot the lines corresponding to \(y=0\) and to the 5% and 95% critical values (delimiting the 90% confidence interval in which we expect to find the bulk of the \(\mathrm{pd}\)).

In this case, approximated prediction intervals are not plotted by default, since the approximation (sampling from the standard gaussian distribution) neglects the correlation between the different \(\mathrm{pd}\) in an individual, and this leads to substantially narrower prediction intervals when the number of data per subject is large. Prediction bands may be added by combining the option bands=TRUE with option approx.pi set to either TRUE for an approximated prediction interval (fast but rough) or FALSE for an approximated prediction interval (obtained using the simulated datasets), as in:

x<-dist.pred.sim(x)
plot(x,bands=TRUE,approx.pi=TRUE)

As seen here, requesting an exact (simulated) prediction interval requires first to compute the distribution of \(\mathrm{pd}\) in the simulated dataset, using the function dist.pred.sim (by default and in the interest of time, this function computes only the distribution of the \(\mathrm{pd}\), but if called with the additional argument calc.npde=TRUE the \(\mathrm{npde}\) in the simulated dataset will also be included in the output, allowing the user to use approx.pi=TRUE for graphs including \(\mathrm{npde}\)).

10.2 Functions in the npde package

npde has been programmed using the S4 classes in R. S4 classes implement Object oriented programming (OOP) in R, allowing to construct modular pieces of code which can be used as black boxes for large systems. Most packages in the base library and many contributed packages use the former class system, called S3. However, S4 classes are a more traditional and complete object oriented system including type checking and multiple dispatching. S4 is implemented in the methods package in base R. More information on S4 classes and R packages can be found in tutorials on the Web. I used extensively the following manual (Genolini (2010)) (in French).

The object returned by the npde() and autonpde() functions has the NpdeObject class, and both generic and specific methods have been defined for this class:

  1. print: the print function produces a summary of the object in a nice format

  2. show: this function is used invisibly by R when the name of the object is typed, and produces a short summary of the object (more details can be obtained by using the alternative showall() function

  3. summary: this function produces a summary of the object, and invisibly returns a list with a number of elements, which provides an alternative way to access elements of the class; the list contains the following elements:

    • obsdat the data (a matrix with 3 columns, id=subject id, xobs=observed X, yobs=observed Y, plus if present in the data additional columns containing censored information, mdv, covariates, …)
    • id subject id (first column in obsdat)
    • observed x (second column in obsdat)
    • y observed y (third column in obsdat)
    • npde the computed \(\mathrm{npde}\)
    • pd the computed prediction discrepancies
    • ploq the probability of being BQL for each observation (if computed)
    • N number of subjects
    • nrep number of simulations used in the computations
    • ntot.obs total number of non-missing observations
    • ypred predicted Y (the empirical mean of the simulated predicted distribution for each observation (\(\mathbb{E}_k(y^{sim(k)}_{ij})\)))
    • ycomp completed vector of observed y (includes the values that were imputed during the computation when BQL data are in the dataset)
    • ydobs the decorrelated observed data \(y_{ij}^*\)
    • ydsim the decorrelated simulated data \(y^{sim(k)*}_{ij}\)
    • xerr an integer valued 0 if no error occurred during the computation or a positive number (1 or 2) depending on the error encountered, if an error occurred
    • options options (can also be seen by using the print function)
    • prefs graphical preferences (can also be seen by using the print function)
  4. plot: this produces plots of the different objects

    • when called without any argument, the default four plots are produced
    • the argument plot.type can be used to produce different plots
    • the argument
    • all plots can be tweaked to add titles, change colours,…, (see section 8)
  5. \([\) function: the get function, used to access the value of the slots in an object

  6. \([<\)-: function: the set function, used to replace the value of the slots in an object

  7. gof.test: goodness-of-fit tests on the distribution of npde (or pd)

Examples of calls to these functions are given in the corresponding man pages and in the documentation (section {sec:exampletheo}).

References

Genolini, C. 2010. Construire Un Package - Classic Et S4. Paris, France: INSERM U669. http://cran.r-project.org/doc/contrib/Genolini-ConstruireUnPackage.pdf.