Safe Haskell | None |
---|---|
Language | Haskell2010 |
Hyperion.WorkerCpuPool
Synopsis
- newtype NumCPUs = NumCPUs Int
- data WorkerCpuPool = WorkerCpuPool {
- cpuMap :: TVar (Map WorkerAddr NumCPUs)
- newWorkerCpuPool :: Map WorkerAddr NumCPUs -> IO WorkerCpuPool
- getAddrs :: WorkerCpuPool -> IO [WorkerAddr]
- data WorkerAddr
- getSlurmAddrs :: IO [WorkerAddr]
- newJobPool :: [WorkerAddr] -> IO WorkerCpuPool
- withWorkerAddr :: (MonadIO m, MonadMask m) => WorkerCpuPool -> NumCPUs -> (WorkerAddr -> m a) -> m a
- data SSHError = SSHError String (ExitCode, String, String)
- type SSHCommand = Maybe (String, [String])
- sshRunCmd :: String -> SSHCommand -> (String, [String]) -> IO ()
General comments
This module defines WorkerCpuPool
, a datatype that provides a mechanism
for hyperion
to manage the resources allocated to it by SLURM
. The only
resource that is managed are the CPU's on the allocated nodes. This module
works under the assumption that the same number of CPU's has been allocated
on all the nodes allocated to the job.
A WorkerCpuPool
is essentially a TVar
containing the Map
that maps
node addresses to the number of CPU's available on that node. The addess can
be a remote node or the local node on which WorkerCpuPool
is hosted.
The most important function defined in this module is withWorkerAddr
which
allocates the requested number of CPUs from the pull on a single node and
runs a user function with the address of that node. The allocation mechanism
is very simple and allocates CPU's on the worker which has the most idle CPUs.
We also provide sshRunCmd
for running commands on the nodes via ssh
.
WorkerCpuPool
documentation
A newtype for the number of available CPUs
data WorkerCpuPool Source #
The WorkerCpuPool
type, contaning a map of available CPU resources
Constructors
WorkerCpuPool | |
Fields
|
newWorkerCpuPool :: Map WorkerAddr NumCPUs -> IO WorkerCpuPool Source #
newWorkerCpuPool
creates a new WorkerCpuPool
from a Map
.
getAddrs :: WorkerCpuPool -> IO [WorkerAddr] Source #
Gets a list of all WorkerAddr
registered in WorkerCpuPool
data WorkerAddr Source #
A WorkerAddr
representing a node address. Can be a remote node or the local node
Constructors
LocalHost String | |
RemoteAddr String |
Instances
Eq WorkerAddr Source # | |
Defined in Hyperion.WorkerCpuPool | |
Ord WorkerAddr Source # | |
Defined in Hyperion.WorkerCpuPool Methods compare :: WorkerAddr -> WorkerAddr -> Ordering # (<) :: WorkerAddr -> WorkerAddr -> Bool # (<=) :: WorkerAddr -> WorkerAddr -> Bool # (>) :: WorkerAddr -> WorkerAddr -> Bool # (>=) :: WorkerAddr -> WorkerAddr -> Bool # max :: WorkerAddr -> WorkerAddr -> WorkerAddr # min :: WorkerAddr -> WorkerAddr -> WorkerAddr # | |
Show WorkerAddr Source # | |
Defined in Hyperion.WorkerCpuPool Methods showsPrec :: Int -> WorkerAddr -> ShowS # show :: WorkerAddr -> String # showList :: [WorkerAddr] -> ShowS # |
getSlurmAddrs :: IO [WorkerAddr] Source #
Reads the system environment to obtain the list of nodes allocated to the job.
If the local node is in the list, then records it too, as LocalHost
.
newJobPool :: [WorkerAddr] -> IO WorkerCpuPool Source #
Reads the system environment to determine the number of CPUs available on
each node (the same number on each node), and creates a new WorkerCpuPool
for the [
assuming that all CPUs are available.WorkerAddr
]
withWorkerAddr :: (MonadIO m, MonadMask m) => WorkerCpuPool -> NumCPUs -> (WorkerAddr -> m a) -> m a Source #
Finds the worker with the most available CPUs and runs the given routine with the address of that worker. Blocks if the number of available CPUs is less than the number requested.
sshRunCmd
documentation
Instances
Show SSHError Source # | |
Exception SSHError Source # | |
Defined in Hyperion.WorkerCpuPool Methods toException :: SSHError -> SomeException # fromException :: SomeException -> Maybe SSHError # displayException :: SSHError -> String # |
type SSHCommand = Maybe (String, [String]) Source #
The type for the command used to run ssh
. If a Just
value, then
the first String
gives the name of ssh
executable, e.g. "ssh"
, and the
list of String
s gives the options to pass to ssh
. For example, with
SSHCommand
given by ("XX", ["-a", "-b"])
, ssh
is run as
XX -a -b <addr> <command>
where <addr>
is the remote address and <command>
is the command we need
to run there.
The value of Nothing
is equivalent to using
ssh -f -o "UserKnownHostsFile /dev/null" <addr> <command>
We need -o "..."
option to deal with host key verification
failures. We use -f
to force ssh
to go to the background before executing
the sh
call. This allows for a faster return from readCreateProcessWithExitCode
.
Note that "UserKnownHostsFile /dev/null"
doesn't seem to work on Helios.
Using instead "StrictHostKeyChecking=no"
seems to work.
sshRunCmd :: String -> SSHCommand -> (String, [String]) -> IO () Source #
Runs a given command on remote host (with address given by the first String
) with the
given arguments via ssh
using the SSHCommand
. Makes at most 10 attempts via retryRepeated
.
If fails, propagates SSHError
outside.
ssh
needs to be able to authenticate on the remote
node without a password. In practice you will probably need to set up public
key authentiticaion.
ssh
is invoked to run sh
that calls nohup
to run the supplied command
in background.