| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
Hyperion.WorkerCpuPool
Synopsis
- newtype NumCPUs = NumCPUs Int
- data WorkerCpuPool = WorkerCpuPool {
- cpuMap :: TVar (Map WorkerAddr NumCPUs)
- newWorkerCpuPool :: Map WorkerAddr NumCPUs -> IO WorkerCpuPool
- getAddrs :: WorkerCpuPool -> IO [WorkerAddr]
- data WorkerAddr
- getSlurmAddrs :: IO [WorkerAddr]
- newJobPool :: [WorkerAddr] -> IO WorkerCpuPool
- withWorkerAddr :: (MonadIO m, MonadMask m) => WorkerCpuPool -> NumCPUs -> (WorkerAddr -> m a) -> m a
- data SSHError = SSHError String (ExitCode, String, String)
- type SSHCommand = Maybe (String, [String])
- sshRunCmd :: String -> SSHCommand -> (String, [String]) -> IO ()
General comments
This module defines WorkerCpuPool, a datatype that provides a mechanism
for hyperion to manage the resources allocated to it by SLURM. The only
resource that is managed are the CPU's on the allocated nodes. This module
works under the assumption that the same number of CPU's has been allocated
on all the nodes allocated to the job.
A WorkerCpuPool is essentially a TVar containing the Map that maps
node addresses to the number of CPU's available on that node. The addess can
be a remote node or the local node on which WorkerCpuPool is hosted.
The most important function defined in this module is withWorkerAddr which
allocates the requested number of CPUs from the pull on a single node and
runs a user function with the address of that node. The allocation mechanism
is very simple and allocates CPU's on the worker which has the most idle CPUs.
We also provide sshRunCmd for running commands on the nodes via ssh.
WorkerCpuPool documentation
A newtype for the number of available CPUs
data WorkerCpuPool Source #
The WorkerCpuPool type, contaning a map of available CPU resources
Constructors
| WorkerCpuPool | |
Fields
| |
newWorkerCpuPool :: Map WorkerAddr NumCPUs -> IO WorkerCpuPool Source #
newWorkerCpuPool creates a new WorkerCpuPool from a Map.
getAddrs :: WorkerCpuPool -> IO [WorkerAddr] Source #
Gets a list of all WorkerAddr registered in WorkerCpuPool
data WorkerAddr Source #
A WorkerAddr representing a node address. Can be a remote node or the local node
Constructors
| LocalHost String | |
| RemoteAddr String |
Instances
| Eq WorkerAddr Source # | |
Defined in Hyperion.WorkerCpuPool | |
| Ord WorkerAddr Source # | |
Defined in Hyperion.WorkerCpuPool Methods compare :: WorkerAddr -> WorkerAddr -> Ordering # (<) :: WorkerAddr -> WorkerAddr -> Bool # (<=) :: WorkerAddr -> WorkerAddr -> Bool # (>) :: WorkerAddr -> WorkerAddr -> Bool # (>=) :: WorkerAddr -> WorkerAddr -> Bool # max :: WorkerAddr -> WorkerAddr -> WorkerAddr # min :: WorkerAddr -> WorkerAddr -> WorkerAddr # | |
| Show WorkerAddr Source # | |
Defined in Hyperion.WorkerCpuPool Methods showsPrec :: Int -> WorkerAddr -> ShowS # show :: WorkerAddr -> String # showList :: [WorkerAddr] -> ShowS # | |
getSlurmAddrs :: IO [WorkerAddr] Source #
Reads the system environment to obtain the list of nodes allocated to the job.
If the local node is in the list, then records it too, as LocalHost.
newJobPool :: [WorkerAddr] -> IO WorkerCpuPool Source #
Reads the system environment to determine the number of CPUs available on
each node (the same number on each node), and creates a new WorkerCpuPool
for the [ assuming that all CPUs are available.WorkerAddr]
withWorkerAddr :: (MonadIO m, MonadMask m) => WorkerCpuPool -> NumCPUs -> (WorkerAddr -> m a) -> m a Source #
Finds the worker with the most available CPUs and runs the given routine with the address of that worker. Blocks if the number of available CPUs is less than the number requested.
sshRunCmd documentation
Instances
| Show SSHError Source # | |
| Exception SSHError Source # | |
Defined in Hyperion.WorkerCpuPool Methods toException :: SSHError -> SomeException # fromException :: SomeException -> Maybe SSHError # displayException :: SSHError -> String # | |
type SSHCommand = Maybe (String, [String]) Source #
The type for the command used to run ssh. If a Just value, then
the first String gives the name of ssh executable, e.g. "ssh", and the
list of Strings gives the options to pass to ssh. For example, with
SSHCommand given by ("XX", ["-a", "-b"]), ssh is run as
XX -a -b <addr> <command>
where <addr> is the remote address and <command> is the command we need
to run there.
The value of Nothing is equivalent to using
ssh -f -o "UserKnownHostsFile /dev/null" <addr> <command>
We need -o "..." option to deal with host key verification
failures. We use -f to force ssh to go to the background before executing
the sh call. This allows for a faster return from readCreateProcessWithExitCode.
Note that "UserKnownHostsFile /dev/null" doesn't seem to work on Helios.
Using instead "StrictHostKeyChecking=no" seems to work.
sshRunCmd :: String -> SSHCommand -> (String, [String]) -> IO () Source #
Runs a given command on remote host (with address given by the first String) with the
given arguments via ssh using the SSHCommand. Makes at most 10 attempts via retryRepeated.
If fails, propagates SSHError outside.
ssh needs to be able to authenticate on the remote
node without a password. In practice you will probably need to set up public
key authentiticaion.
ssh is invoked to run sh that calls nohup to run the supplied command
in background.