RE: [NMusers] setup of parallel processing and supporting software - help wanted

From: Bauer, Robert
Date: Tue, 8 Dec 2015 23:27:22 +0000

Regarding point 2, keep in mind that PARSE_TYPE=2 or 4, algorithms you he=
lped with, do empirical load balancing, improving its assessment with each =
iteration, so the idle time waiting for all to finish is reduced.

 Mark Sale
Sent: Tuesday, December 08, 2015 3:00 PM
To: Pavel Belo;
 The loss of efficiency with parallel computing in NONMEM has two sources:

1. I/O time, each process has to do it's calculation, then write those resu=
lts to a disc file (on a single machine, even with the MPI method the resul=
ts are written to a file, that file may or may not be written to disc by th=
e operating system, depending on the file size the whether the OS decides t=
he file may be used soon, same actually in the FPI method, where the OS may=
 decide to buffer the file and not actually write it to disc.). This ineffi=
ciency gets larger with the number of processes, and gets substantially lar=
ger when you go to multiple machines, as they must send data over the netwo=
rk (and must actually write the data to disk, with either MPI or FPI method=
). You can actually run parallel NONMEM over a VPN, but as you might imagi=
ne, this slows it down substantially.

2. Inefficiency due to one process finishing it's slice of the data before =
the other. The manager program must wait until the last process is finishe=
d before it can do the management (sum the OBJ, calculate the gradient, get=
 the next parameter values, send them out to the processes). This also ge=
ts larger with more processes. In a well conditioned problem, where every =
individual takes roughly the same amount of time to calculate the OBJ for, =
this isn't too bad. But, occasionally, with stiff ODEs you'll find a small=
 number of individuals who take much, much longer to solve the ODES, and yo=
u'll find that efficiency drops substantially.

All that said, here are my recommendations:

Don't bother trying to parallelize a run that takes less than 10 minutes, t=
he I/O time will cancel out any gain in execution time.

Single machine:

If the execution time for a single function evaluation (note a run is often=
 between 1000 and 5000 function evaluations) is less than 0.5 seconds, you =
probably can improve performance with parallel execution. Note that 1000 fu=
nction evaluations at 0.5 seconds each = 500 seconds, 8 minutes.

Multiple machines,

Assuming a 1 gbit network, if the execution time for a single function eval=
uation is > 1 second, you probably can improve performance with parallel ex=

I have personally never found a problem that benefited from more than 24 pr=
ocesses, but, in theory some very large problems (run time of weeks) may.

 Pavel Belo
Sent: Tuesday, December 8, 2015 4:54 PM
Hello The Team,

We hear different opinions about effectiveness of parallel processing with =
NONMEM from very helpful to less helpful. It can be task dependent. How =
useful is it in phase 3 for basic and covariate models, as well as for boot=

We reached a non-exploratory (production) point when popPK is on a critical=
 path and sophisticated but slow home-made utilities may be insufficient. =
Are there efficient/quick companies/institutions, which setup parallel proc=
essing, supporting software and, possibly, some other utilities (cloud comp=
Pavel

