RE: [NMusers] Genotype data missing in some individuals

From: Mats Karlsson <>
Date: Wed, 19 Nov 2014 18:17:01 +0000


I would use:

To handle THETA(3) there are different options
If I believe that missingness is completely at random (MCAR):
THETA(3) can be fixed to the frequency of GENOTYPE=1 in the population you are studying if it is known what this frequency is.
If it is not known, I would fix it to the fraction of GENOTYPE=1 in your sample. If I was really ambitious, I would take into account that your sample is small and therefore it may not perfectly reflect the proportion in your population. If so you could use the prior functionality.
If you believe missingness is missing at random (MAR) another approach could be implemented. [MAR in this case could be that there are more missing of one ethnic group than another, but you know the ethnicity of everyone.] You would only modify one line in the code above:

If you believe that missingness could be not at random (MNAR), for example that genotyping failed more often for sujects with true GENO=1, then use the top code but estimate THETA(3) would be the appropriate thing to do.

There are other options too. Two recent articles on this are provided below with comparison between methods. Also it describes a multiple imputation routine that we recently implemented in PsN.

Comparison of methods for handling missing covariate data.
Johansson ÅM, Karlsson MO.
AAPS J. 2013 Oct;15(4):1232-41. doi: 10.1208/s12248-013-9526-y.

Multiple imputation of missing covariates in NONMEM and evaluation of the method's sensitivity to η-shrinkage.
Johansson ÅM, Karlsson MO.
AAPS J. 2013 Oct;15(4):1035-42. doi: 10.1208/s12248-013-9508-0.

Best regards,

Mats Karlsson, PhD
Professor of Pharmacometrics
Dept of Pharmaceutical Biosciences
Faculty of Pharmacy
Uppsala University
Box 591
75124 Uppsala
Phone: +46 18 4714105
Fax + 46 18 4714003

-----Original Message-----
From: [] On Behalf Of Denney, William S.
Sent: Wednesday, November 19, 2014 6:24 PM
To: Leonid Gibiansky; Jeroen Elassaiss-Schaap; 이소정
Subject: RE: [NMusers] Genotype data missing in some individuals

Hi SoJeong,

I agree with Leonid here on the value of the mixture model. With potentially subtle changes, mixture models can be very difficult. One way that I've had luck previously with a similar approach is to make "unknown genotype" a separate category and then to fit a parameter that is fraction "yes" (similar to a mixture model, but not specifying a genotype for a subject). Something like:

G1 = THETA(1)
G2 = THETA(2)
FRA = 1/(1+EXP(-THETA(3)))

You can then compare FRA to the expected genotypic distribution in the population.



-----Original Message-----
From: [] On Behalf Of Leonid Gibiansky
Sent: Wednesday, November 19, 2014 10:11 AM
To: Jeroen Elassaiss-Schaap; 이소정
Subject: Re: [NMusers] Genotype data missing in some individuals

I would do mixture model only if there is a very large -several folds- difference in PK parameters for two genotypes. If the difference is comparable with the inter-subject variability within the genotype, I would introduce category "missing" to remove the effect of those subjects on covariate effect estimate. So if the genotype is binary (YES/NO), you introduce the new third level "missing", work with it as with the 3-level categorical covariate, and report the difference between NO and YES as the genotype effect on PK. As a check for consistency, you may want to check whether the estimate of the PK parameter for "missing" level is somewhere between the estimates for "NO" and "YES" levels, closer to the value for the level with higher prevalence in your dataset.

Leonid Gibiansky, Ph.D.
President, QuantPharm LLC
e-mail: LGibiansky at
tel: (301) 767 5566

On 11/19/2014 6:16 AM, Jeroen Elassaiss-Schaap wrote:
> Dear SoJeong,
> First you might want to answer the question whether that phenotype is
> indeed important in your dataset. With the initial popPK model you
> could plot posthoc clearance against bodyweight and/or inspect the
> posthocs of clearance for evidence of multiple peaks in your
> distribution. You also may see the impact of phenotype in stratified
> concentration versus time plots. Depending on the dataset, with its
> sampling scheme, number of subjects (perhaps a low number) and
> distribution across age, it could be masked.
> If the impact is clear however, it might be benificial to try to
> include the subjects wih missing genotype. With a clear effect, you
> might be able to develop a mixture model. The mixture approach would
> describe the different populations in your dataset corresponding to
> the different phenotypes. The genotype would than inform the mixture
> as a covariate - the missing information would fall back to the pure
> mixture approach. As a warning, this approach is quite difficult. I
> would advise you to read up on the nonmem guides ($MIX) on this and
> look in the literature for examples - the Karlsson group has published
> about it, most recently this one (it contains code):
> A search
> in the literature gives you additional background such as
> and
> If the impact is not clear, a more empirical approach might be called
> for, in this case a subset analysis, i.e. where you exclude the
> missing subjects, of the covariate relationship might be all that you
> could achieve. If there is no impact at all, you do not need the
> genotype of course.
> Hope this helps!
> Best regards,
> Jeroen
> <>
> _at_PD_value
> +31 6 23118438
> -- More value out of your data!
> On Nov 19, 2014, at 7:57 AM, "이소정" <
> <>> wrote:
> Dear all,
> I’ve analyzed a tacrolimus PopPK in pediatric patients.
> As you know, CYP3A5 genotype can change the tacrolimus PK
> significantly, 3A5 genotyping was performed in the study,
> however, in 20% of the subjects, the genotype data was missed.
> Then, how can I reflect the CYP3A5 genotype effect to the tacrolimus
> population model appropriately?
> Is there any solution?
> Best regards,
> SoJeong Yi
Received on Wed Nov 19 2014 - 13:17:01 EST

This archive was generated by hypermail 2.3.0 : Fri Sep 27 2019 - 16:42:12 EDT