Re: [NMusers] Usage of parallel nonmem in tendem with SGE
I would think you can just use the parallel environments flag to reserve
(by_node) the required slots and execute once they are available.
Either way though, can be pretty inefficient, as you're going to be
'wasting' compute resources waiting for the additional nodes to become
It's still not an ideal solution though IMO, given one long process could
cause many cores to sit waiting until that one finished.
Eg need 4 cores - 3 cores are available, but the 1 additional core is taken
up by a job that will run for another 20 minutes, don't know how much of a
concern that would be. As a "low effort" workaround personally, I'd
probably reserve N-X slots, then have the parallel job submit "too early"
and let the OS handle the oversaturation of threads for the remainder of
the last couple job(s). You could also do some basic introspection and
check what types of jobs are also submitted, an ongoing sse/scm run vs
simple execute statements could have markedly different justifications of
how to proceed.
No silver bullet out there that I know of, at least without more
orchestration effort and training people using the cluster, especially for
small/resource constrained clusters.
On Thu, Apr 28, 2016 at 12:23 PM Paul Jewell [Rudraya] <pjewell_at_rudraya.com>
> Hello All,
> I have used nonmem for a while in an environment which uses sun grid
> engine and mpich2. Currently the two services do not interact. Batch jobs
> go to the grid engine, and parallel jobs run across all nodes using the mpi
> daemon. This is usually not an issue, but during times of heavy user
> activity, the amount of nonmem processes running on each compute node can
> exceed the total number of cores, causing inefficiency. I am looking for a
> method of running a parallel job such that it waits for the required number
> of slots / cores to be available and clear of gridengine jobs before
> running. (and no new gridengine jobs are submitted after it until it is
> finished) I have seen that gridengine supports parallel queues but a method
> of interfacing this with nonmem's parafile specification is not
> immediately apparent. I wanted to check if there is any possibility of
> using nonmem/sge in this way before writing a wrapper bash script that does
> something like this:
> -submit N number of shell scripts which sleep forever
> -poll the grid engine until N number of shell scripts are seen running in
> the queue
> -begin the parallel run
> -when done, qdel all of the shell scripts
> This solution would some problems with evenly using cores, and would
> require a lot of manual code writing, so I was looking for a better
> solution first. Please advise if you have heard of any solution.
> Thank you.
Received on Thu Apr 28 2016 - 13:03:33 EDT
This archive was generated by hypermail 2.3.0
: Fri Sep 27 2019 - 16:50:17 EDT