[NMusers] Imputation of multiple categorical covariates with missing data
I would like to investigate the effect of several genotypes on clearance in
a pop PK model. The issue is most genotypes have some amount of missing
I have discarded the genotypes which have way too many missing samples (
>30%) and now want to handle the remaining genotypes appropriately, before
I move on to an automated stepwise covariate search in PsN. A colleague
informed me that the following mixture model can serve for imputation of a
single categorical covariate (let's call it GENO):
; In the dataset, the genotype is saved in the variable GENO and coded -99
if unknown, otherwise it takes on the values 0,1,2
$INPUT ID OCC TIME AMT DV .... GENO ....
; here you check if the genotype is available or not (GENO==--99). If it's
available, you save the new variable GENOME=GENO...
IF (GENO.NE.-99) THEN
GENOME = GENO
; ... otherwise you use the mixture to impute GENOME
IF (MIXNUM.EQ.1) GENOME = 0
IF (MIXNUM.EQ.2) GENOME = 1
IF (MIXNUM.EQ.3) GENOME = 2
; then you use the variable GENOME (not GENO, which was in the dataset) to
define CL, or whichever other parameter you want.
; you need to use a new variable since NONMEM won't let you change the
value of one of the fields in the dataset.
TVCL = THETA(1)*((WT/12.5)**0.75)
TVBIO = 1
; Three sub-populations whose proportion is given by the THETAs
$THETA 0.4 FIX ; GENO = 0 fixed to observed proportion in known genotype
$THETA 0.4 FIX ; GENO = 1 fixed to observed proportion in known genotype
$THETA 0.2 FIX ; GENO = 2 fixed to observed proportion in known genotype
*The question is how to repeat such an approach when there are several
missing genotypes (GENO1, GENO2, ..., GENOX) which need to be explored? *
The answer I received from my colleague is it would be rather difficult, as
the mixture model would require the specification of every possible
combination of different genotypes.
One approach I am considering is performing the stepwise covariate search
in PsN (where per default missing categorical data is set to equal the most
common value). Then I retrace the steps of the search based on the scm log
file and check the difference between the OFV drops + p-values of the
chosen relationships with those observed had a mixture model approach been
used. If the difference is small and far removed from any other
relationships which could have been chosen, I accept it and build my
Any input on this matter would be very much appreciated.
Have a good day.
Received on Wed Sep 23 2015 - 04:44:13 EDT
This archive was generated by hypermail 2.3.0
: Fri Sep 27 2019 - 16:46:16 EDT