NONMEM Users Network Archive

Hosted by Cognigen

Re: unbalanced data set

From: Alison Boeckmann <alisonboeckmann>
Date: Fri, 22 Jan 2016 11:36:50 -0800

Nick's comment answered the question that was asked, although later
responses moved to a somewhat different subject.

I'd like to add a little history, as best as I remember what I was told,
that may illuminate the original issue, especially for non-

Prior to 1978, PK data was obtained from drugs that were tested on
healthy young volunteers (typically medical students).  The data was
balanced, i.e., same number of samples at the same times from each of
them, typically over one day. If someone dropped out early, it was
generally for a reason un-related to the drug, and that subject's data
was simply ignored. A methodology such as ANOVA could be used to
analyze the data.

Lewis Sheiner objected to this. He said the drugs should be tested on
the target population. This sometimes meant sick people, in a clinical
setting, over a multi-visit time frame.  If a subject dropped out earl=
it might be because this person either over-responded to the drug or under-
responded and needed to be put on a rescue medication. But these were
the "outlier" subjects that the study was most interested in!  Lewis
needed a way of combining unbalanced data. Stuart Beal joined him in
1978. His PhD thesis was on a technique for analyzing such data sets.
By 1980, they released the first version of NONMEM.

To make the point more clear:

At the Short Course, Stuart used to talk about a data set with 99
observed values of 100 and 1 observed value of 50.  If there is no
other information, then the best estimate of the mean in the population
is a number close to 100. But what if you knew that the 99 values were
from one subject, and the single value of 50 was from a second subject?
You'd be very sure of the value 100, but much less sure about the value
50.  Therefore, 75 would be a poor choice for the mean in the
population.  There is a methodology "BLUE" (Best Linear Unbiased
Estimator).  I can't remember what Stuart said this gave, but it was a
number between 75 and 100.

That is the whole idea behind NONMEM: to provide a weight for each
observation that takes into account the fact that observations come from
different subjects.

As Lewis says in Guide V, "mixed effect modeling ... is especially
useful when there are only a few pharmacokinetic measurements from each
individual sampled in the population, or when the data collection design
varies considerably between these individuals."

-- Alison Boeckmann

On Tue, Jan 5, 2016, at 06:03 PM, Zheng Liu wrote:
> Dear all,


> I recently have a data set for pk parameters fitting. The issue is
> some patients have far more measurement points than others (i.e. a few
> patients have ~15 points, other patients have only 1 or 2). I
> speculate in the fitted parameters, those patients with
 many points would contribute much more than those with less points.
 Then the population "average" values of fitted pk parameters are not
 anymore average from all the patients, but more biased to those
 patients with many points. This is not what I expect.


> Of course I could take away some points from the patients with many
> points, in order to be comparable to less-points patients.  Then I
> will be forced to lose some information from the data set. I just
> wonder are there anyone who have better proposal to solve
 this problem? I appreciate your help very much!


> Best regards,


> Zheng

  Alison Boeckmann

Received on Fri Jan 22 2016 - 14:36:50 EST

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to:

Once subscribed, you may contribute to the discussion by emailing: