From: Bob Leary <*Bob.Leary*>

Date: Wed, 30 Jul 2014 12:28:00 -0400

Paolo,

Just to confuse matters a little further, it should be born in mind that the function that is optimized to get the ‘optimal’ ETA value is the

joint likelihood , while the individual contribution to the overall OBJ function is based on the marginal likelihood. The marginal likelihood integrates the

Joint likelihood over eta space, and in the FOCE approximation, the marginal likelihood is a function of both the maximum value of the objective function

used to find eta, as well as the FOCE approximation to the hessian at this optimal eta. Thus it is perfectly possibly

That if you run the eta optimization from two different starting points, and get diiferent result ETA1 and ETA2, with ETA2 being better in the sense of having a better

Joint likelihood objective function than ETA1, ETA1 may still have the better overall FOCE marginal likelihood. Also in this case where there are apparently

multiple optima in the joint likelihood function, the FOCE approximation itself is extremely dubious. You might want to try one of the EM methods here.

From: owner-nmusers

Sent: Wednesday, July 30, 2014 11:51 AM

To: Paolo Denti; nmusers

Subject: RE: [NMusers] Funny behaviour with MCETA>1 and parallel computation

Paolo:

This may not be a matter of MCETA, it might have something to do with that particular individual’s data plus your model, and is there some unusual evaluation that can accidentally occur, causing the optimization for that subject to fail. Would you mind sharing with me your control stream file and data set, that I might give it a try.

Robert J. Bauer, Ph.D.

Vice President, Pharmacometrics, R&D

ICON Development Solutions

7740 Milestone Parkway

Suite 150

Hanover, MD 21076

Tel: (215) 616-6428

Mob: (925) 286-0769

Email: Robert.Bauer

Web: www.iconplc.com<http://www.iconplc.com/>

From: Paolo Denti [mailto:paolo.denti

Sent: Wednesday, July 30, 2014 3:54 AM

To: Bauer, Robert; nmusers

Subject: Re: [NMusers] Funny behaviour with MCETA>1 and parallel computation

Hi Bob,

thanks for the prompt response and suggestions.

I have tried to implement with RANMETHOD=P and increasing MCETA to 1000, but unfortunately without success.

Even using MCETA=1000, the result is the same: when I use the parallel computation feature, it performs worse than using MCETA=0. With one processor, things work as expected.

Maybe I did not explain the situation well. The issue is not that the optimisation takes a slightly different path and it reaches a different minimum, that would not surprise me, as I know different rounding and other random factor can influence that.

The problem here is that when using parallel computation even on the first iteration (using MAXEVAL=0) MCETA>1 gives a worse OFV that MCETA=0. This still does not make any sense to me, irrespectively of random number generators, numerical approximation,etc. My understanding is that, in each individual, NONMEM will try 0 and other initial estimates to find the optimal ETAs, and then it will choose the solution giving the lowest individual OFV. Even if this is done with different seeds and on different CPUs, they will all try 0, so whatever MCETA=0 gives out, it should be the upper bound for the OFV for that individual. Then all these individual OFVs are summed together to find the total OFV. Since NONMEM is trying 0 in each subject - plus other random values that may vary - it should at least be able to use those results. In each individual it can only do better by trying extra values, and if all the individual OFVs are lower or at worst the same as the ones provided by MCETA=0, then the total can only be better.

Am I missing something? Is NONMEM maybe minimising only some other individual likelihood in the MAP step, and that does not coincide with the individual OFV?

To better understand I have looked at the values of individual OFVs in the two runs (MCETA=1000 and MCETA=0, both with parallel computaion) and all the tables are exactly the same cell by cell (to the 5th digit or so), except for the records of that outlying individual. In spite of trying 1000 initial estimates (including 0) MCETA=1000 gets the individual ETA for that subject wrong, and gives a worse iOFV than MCETA=0. And it's not a matter of numerical noise, the individual OFV is 100 points worse and the individual parameters are very different..

MCETA=0 is a sub-case of MECTA>1, so MCETA>1 should not do any worse. I have tried and re-tried, and it is unlikely that the estimator was unlucky all the times with that subject, even trying 1000 initial estimates..

I don't know how to explain this. But maybe I am misunderstanding how this works?

Any explanation?

Thank you,

Paolo

-----------------------------------------------

NOTICE: The information contained in this electronic mail message is intended only for the personal and confidential

use of the designated recipient(s) named above. This message may be an attorney-client communication, may be protected

by the work product doctrine, and may be subject to a protective order. As such, this message is privileged and

confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to

the intended recipient, you are hereby notified that you have received this message in error and that any review,

dissemination, distribution, or copying of this message is strictly prohibited. If you have received this

communication in error, please notify us immediately by telephone and e-mail and destroy any and all copies of this

message in your possession (whether hard copies or electronically stored copies). Thank you.

buSp9xeMeKEbrUze

Received on Wed Jul 30 2014 - 12:28:00 EDT

Date: Wed, 30 Jul 2014 12:28:00 -0400

Paolo,

Just to confuse matters a little further, it should be born in mind that the function that is optimized to get the ‘optimal’ ETA value is the

joint likelihood , while the individual contribution to the overall OBJ function is based on the marginal likelihood. The marginal likelihood integrates the

Joint likelihood over eta space, and in the FOCE approximation, the marginal likelihood is a function of both the maximum value of the objective function

used to find eta, as well as the FOCE approximation to the hessian at this optimal eta. Thus it is perfectly possibly

That if you run the eta optimization from two different starting points, and get diiferent result ETA1 and ETA2, with ETA2 being better in the sense of having a better

Joint likelihood objective function than ETA1, ETA1 may still have the better overall FOCE marginal likelihood. Also in this case where there are apparently

multiple optima in the joint likelihood function, the FOCE approximation itself is extremely dubious. You might want to try one of the EM methods here.

From: owner-nmusers

Sent: Wednesday, July 30, 2014 11:51 AM

To: Paolo Denti; nmusers

Subject: RE: [NMusers] Funny behaviour with MCETA>1 and parallel computation

Paolo:

This may not be a matter of MCETA, it might have something to do with that particular individual’s data plus your model, and is there some unusual evaluation that can accidentally occur, causing the optimization for that subject to fail. Would you mind sharing with me your control stream file and data set, that I might give it a try.

Robert J. Bauer, Ph.D.

Vice President, Pharmacometrics, R&D

ICON Development Solutions

7740 Milestone Parkway

Suite 150

Hanover, MD 21076

Tel: (215) 616-6428

Mob: (925) 286-0769

Email: Robert.Bauer

Web: www.iconplc.com<http://www.iconplc.com/>

From: Paolo Denti [mailto:paolo.denti

Sent: Wednesday, July 30, 2014 3:54 AM

To: Bauer, Robert; nmusers

Subject: Re: [NMusers] Funny behaviour with MCETA>1 and parallel computation

Hi Bob,

thanks for the prompt response and suggestions.

I have tried to implement with RANMETHOD=P and increasing MCETA to 1000, but unfortunately without success.

Even using MCETA=1000, the result is the same: when I use the parallel computation feature, it performs worse than using MCETA=0. With one processor, things work as expected.

Maybe I did not explain the situation well. The issue is not that the optimisation takes a slightly different path and it reaches a different minimum, that would not surprise me, as I know different rounding and other random factor can influence that.

The problem here is that when using parallel computation even on the first iteration (using MAXEVAL=0) MCETA>1 gives a worse OFV that MCETA=0. This still does not make any sense to me, irrespectively of random number generators, numerical approximation,etc. My understanding is that, in each individual, NONMEM will try 0 and other initial estimates to find the optimal ETAs, and then it will choose the solution giving the lowest individual OFV. Even if this is done with different seeds and on different CPUs, they will all try 0, so whatever MCETA=0 gives out, it should be the upper bound for the OFV for that individual. Then all these individual OFVs are summed together to find the total OFV. Since NONMEM is trying 0 in each subject - plus other random values that may vary - it should at least be able to use those results. In each individual it can only do better by trying extra values, and if all the individual OFVs are lower or at worst the same as the ones provided by MCETA=0, then the total can only be better.

Am I missing something? Is NONMEM maybe minimising only some other individual likelihood in the MAP step, and that does not coincide with the individual OFV?

To better understand I have looked at the values of individual OFVs in the two runs (MCETA=1000 and MCETA=0, both with parallel computaion) and all the tables are exactly the same cell by cell (to the 5th digit or so), except for the records of that outlying individual. In spite of trying 1000 initial estimates (including 0) MCETA=1000 gets the individual ETA for that subject wrong, and gives a worse iOFV than MCETA=0. And it's not a matter of numerical noise, the individual OFV is 100 points worse and the individual parameters are very different..

MCETA=0 is a sub-case of MECTA>1, so MCETA>1 should not do any worse. I have tried and re-tried, and it is unlikely that the estimator was unlucky all the times with that subject, even trying 1000 initial estimates..

I don't know how to explain this. But maybe I am misunderstanding how this works?

Any explanation?

Thank you,

Paolo

-----------------------------------------------

NOTICE: The information contained in this electronic mail message is intended only for the personal and confidential

use of the designated recipient(s) named above. This message may be an attorney-client communication, may be protected

by the work product doctrine, and may be subject to a protective order. As such, this message is privileged and

confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to

the intended recipient, you are hereby notified that you have received this message in error and that any review,

dissemination, distribution, or copying of this message is strictly prohibited. If you have received this

communication in error, please notify us immediately by telephone and e-mail and destroy any and all copies of this

message in your possession (whether hard copies or electronically stored copies). Thank you.

buSp9xeMeKEbrUze

Received on Wed Jul 30 2014 - 12:28:00 EDT