NONMEM Users Network Archive

Hosted by Cognigen

RE: [NMusers] Usage of parallel nonmem in tendem with SGE

From: Faelens, Ruben (Belgium) <"Faelens,>
Date: Fri, 29 Apr 2016 14:58:56 +0200

Hi Paul,

How are you launching jobs?
Are you launching nmfe using qsub -pe myqueue 4 ./nmfe-launch.sh ?

I may be mistaken, but I assume that SGE should automatically handle this sort of thing, no?
If you specified the correct number of slots in the cluster queue, only that amount of cores should be used at the same time. Any other jobs will wait until enough slots are available.

See http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_pe.html , slots attribute.
See also the -pe switch for qsub: http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html

Of course, maybe your grid was configured with unlimited slots. This might make sense for non-CPU-limited processes, where you want every CPU to multitask 2 or 3 jobs, ensuring it is always busy while waiting for I/O.
Maybe the environment for mpirun is not correctly set when it is actually called from the nmfe script? In that case, I suggest you play around with the command-line in the PNM file.
In my tests, I actually called my own ‘mpirun.sh’ which printed the full environment, the mpirun call and then waited for user input. This way, I could try the mpirun call manually and check that everything was okay.

From my experience with SLURM and NONMEM:

· It is best to assign only one job per (virtual hyper-threaded) CPU. Hyper-threading will make sure any CPU can always choose from 2 processes, ensuring that a single I/O-bound process does not impact cluster throughput.

· The above strategy requires some ‘user education’, since they might feel unfairly treated if they get stuck at the end of a long queue.

· Some cluster schedulers can mitigate this (e.g. PriorityDecayHalfLife in SLURM), or pre-emptying low-priority jobs for high-priority ones.

· Take care when combining PsN and NONMEM grid functionality. PsN might reserve your full cluster, leaving you unable to start any MPI runs from NONMEM. This will essentially deadlock the whole thing.

· You cannot predict whether NONMEM will actually benefit from these cores. A user might specify 8 cores, waiting 4h until these are available on the cluster, while NONMEM decides it only benefits from 2 cores! It will leave the rest occupied (and sometimes even in MPI BUSY WAITING). There is no good solution for this, apart from sensible defaults and user education…

As a side question:
Does anyone know if PsN can automatically reserve the right amount of cores with SLURM ? (based on the -nodes parameter ?)

Kind regards,
Ruben

Information in this email and any attachments is confidential and
intended solely for the use of the individual(s) to whom it is addressed
or otherwise directed. Please note that any views or opinions presented
in this email are solely those of the author and do not necessarily
represent those of the Company.
Finally, the recipient should check this email and any attachments for
the presence of viruses. The Company accepts no liability for any damage
caused by any virus transmitted by this email.
All SGS services are rendered in accordance with the applicable SGS
conditions of service available on request and accessible at
http://www.sgs.com/en/Terms-and-Conditions.aspx
Received on Fri Apr 29 2016 - 08:58:56 EDT

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to: nmusers-request@iconplc.com. Once subscribed, you may contribute to the discussion by emailing: nmusers@globomaxnm.com.