| Safe Haskell | None | 
|---|---|
| Language | Haskell2010 | 
Hyperion.Cluster
Contents
Synopsis
- data ProgramInfo = ProgramInfo {}
- data ClusterEnv = ClusterEnv {}
- class HasProgramInfo a where- toProgramInfo :: a -> ProgramInfo
 
- type Cluster = ReaderT ClusterEnv Process
- data MPIJob = MPIJob {- mpiNodes :: Int
- mpiNTasksPerNode :: Int
 
- runCluster :: ClusterEnv -> Cluster a -> IO a
- modifyJobOptions :: (SbatchOptions -> SbatchOptions) -> ClusterEnv -> ClusterEnv
- setJobOptions :: SbatchOptions -> ClusterEnv -> ClusterEnv
- setJobTime :: NominalDiffTime -> ClusterEnv -> ClusterEnv
- setJobMemory :: Text -> ClusterEnv -> ClusterEnv
- setJobType :: MPIJob -> ClusterEnv -> ClusterEnv
- setSlurmPartition :: Text -> ClusterEnv -> ClusterEnv
- setSlurmConstraint :: Text -> ClusterEnv -> ClusterEnv
- setSlurmAccount :: Text -> ClusterEnv -> ClusterEnv
- setSlurmQos :: Text -> ClusterEnv -> ClusterEnv
- defaultDBRetries :: Int
- dbConfigFromProgramInfo :: ProgramInfo -> IO DatabaseConfig
- runDBWithProgramInfo :: ProgramInfo -> ReaderT DatabaseConfig IO a -> IO a
- slurmWorkerLauncher :: Maybe Text -> FilePath -> HoldMap -> Int -> TokenPool -> SbatchOptions -> ProgramInfo -> WorkerLauncher JobId
- newWorkDir :: (Binary a, Typeable a, ToJSON a, HasProgramInfo env, HasDB env, MonadReader env m, MonadIO m, MonadCatch m) => a -> m FilePath
General comments
In this module we define the Cluster monad. It is nothing more than a
 Process with an environment ClusterEnv.
The ClusterEnv environment contains information about
- the ProgramIdof the current run,
- the paths to database and log/data directories that we should use,
- options to use when using sbatchto spawn cluster jobs,
- data equivalent to DatabaseConfigto handle the database,
- a WorkerLauncherto launch remote workers. More precisely, a functionclusterWorkerLauncherthat takesSbatchOptionsandProgramInfoto produce aWorkerLauncher.
A ClusterEnv may be initialized with newClusterEnv, which
 use slurmWorkerLauncher to initialize clusterWorkerLauncher. In this
 scenario the Cluster monad will operate in the following way. It will perform
 the calculations in the master process until some remote function is invoked,
 typically through remoteEval, at which point it will
 use sbatch and the current SbatchOptions to allocate a new job and then
 it will run a single worker in that allocation.
This has the following consequences.
- Each time Clusterruns a remote function, it will schedule a new job withSLURM. If you run a lot of small remote functions (e.g., using Hyperion.Concurrently) inClustermonad, it means that you will schedule a lot of small jobs withSLURM. If your cluster's scheduling prioritizes small jobs, this may be a fine mode of operation (for example, this was the case on the now-defunctHyperioncluster at IAS). More likely though, it will lead to your jobs pending and the computation running slowly, especially if the remote functions are not run at the same time, but new ones are run when old ones finish (for example, if you try to perform a lot of parallel binary searches). For such casesJobmonad should be used.
- One should use nodesgreater than 1 if either: (1) The job runs an external program that uses MPI or something similar and therefore can access all of the resources allocated bySLURM, or (2) the remote function spawns newhyperionworkers using theJobmonad. If your remote function does spawn new workers, then it may make sense to usenodesgreater than 1, but your remote function needs to take into account the fact that the nodes are already allocated. For example, from theClustermonad, we can run a remote computation in theJob, allocating it more than 1 node. TheJobcomputation will automagically detect the nodes available to it, the number of CPUs on each node, and will create aWorkerCpuPoolthat will manage these resources independently ofSLURM. One can then run remote functions on these resources from theJobcomputation without having to wait forSLURMscheduling. See Hyperion.Job for details.
The common usecase is that a Cluster computation is ran on the login node.
 It then schedules a job with a bunch or resources with SLURM. When the job
 starts, a Job calculation runs on one of the allocated nodes. It then spawns
 Process computations on the resources available to the job, which it manages
 via WorkerCpuPool.
Besides the Cluster monad, this module defines slurmWorkerLauncher and
 some utility functions for working with ClusterEnv and ProgramInfo, along
 with a few others.
Documentation
data ProgramInfo Source #
Type containing information about our program
Constructors
| ProgramInfo | |
| Fields | |
Instances
data ClusterEnv Source #
The environment for Cluster monad.
Constructors
| ClusterEnv | |
Instances
| HasWorkerLauncher ClusterEnv Source # | We make  | 
| Defined in Hyperion.Cluster Methods toWorkerLauncher :: ClusterEnv -> WorkerLauncher JobId Source # | |
| HasDB ClusterEnv Source # | 
 | 
| Defined in Hyperion.Cluster Methods dbConfigLens :: Lens' ClusterEnv DatabaseConfig Source # | |
| HasProgramInfo ClusterEnv Source # | |
| Defined in Hyperion.Cluster Methods | |
class HasProgramInfo a where Source #
Methods
toProgramInfo :: a -> ProgramInfo Source #
Instances
| HasProgramInfo ClusterEnv Source # | |
| Defined in Hyperion.Cluster Methods | |
| HasProgramInfo JobEnv Source # | |
| Defined in Hyperion.Job Methods toProgramInfo :: JobEnv -> ProgramInfo Source # | |
type Cluster = ReaderT ClusterEnv Process Source #
The Cluster monad. It is simply Process with ClusterEnv environment.
Type representing resources for an MPI job.
Constructors
| MPIJob | |
| Fields 
 | |
Instances
| Eq MPIJob Source # | |
| Ord MPIJob Source # | |
| Show MPIJob Source # | |
| Generic MPIJob Source # | |
| ToJSON MPIJob Source # | |
| Defined in Hyperion.Cluster | |
| FromJSON MPIJob Source # | |
| Binary MPIJob Source # | |
| type Rep MPIJob Source # | |
| Defined in Hyperion.Cluster type Rep MPIJob = D1 ('MetaData "MPIJob" "Hyperion.Cluster" "hyperion-0.1.0.0-BChDBJtiU1m4GBpewNuAxw" 'False) (C1 ('MetaCons "MPIJob" 'PrefixI 'True) (S1 ('MetaSel ('Just "mpiNodes") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Int) :*: S1 ('MetaSel ('Just "mpiNTasksPerNode") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Int))) | |
runCluster :: ClusterEnv -> Cluster a -> IO a Source #
modifyJobOptions :: (SbatchOptions -> SbatchOptions) -> ClusterEnv -> ClusterEnv Source #
setJobOptions :: SbatchOptions -> ClusterEnv -> ClusterEnv Source #
setJobTime :: NominalDiffTime -> ClusterEnv -> ClusterEnv Source #
setJobMemory :: Text -> ClusterEnv -> ClusterEnv Source #
setJobType :: MPIJob -> ClusterEnv -> ClusterEnv Source #
setSlurmPartition :: Text -> ClusterEnv -> ClusterEnv Source #
setSlurmConstraint :: Text -> ClusterEnv -> ClusterEnv Source #
setSlurmAccount :: Text -> ClusterEnv -> ClusterEnv Source #
setSlurmQos :: Text -> ClusterEnv -> ClusterEnv Source #
defaultDBRetries :: Int Source #
The default number of retries to use in withConnectionRetry. Set to 20.
runDBWithProgramInfo :: ProgramInfo -> ReaderT DatabaseConfig IO a -> IO a Source #
Arguments
| :: Maybe Text | Email address to send notifications to if sbatch
 fails or there is an error in a remote
 job.  | 
| -> FilePath | Path to this hyperion executable | 
| -> HoldMap | HoldMap used by the HoldServer | 
| -> Int | Port used by the HoldServer (needed for error messages) | 
| -> TokenPool | TokenPool for throttling the number of submitted jobs | 
| -> SbatchOptions | |
| -> ProgramInfo | |
| -> WorkerLauncher JobId | 
newWorkDir :: (Binary a, Typeable a, ToJSON a, HasProgramInfo env, HasDB env, MonadReader env m, MonadIO m, MonadCatch m) => a -> m FilePath Source #
Construct a working directory for the given object, using its
 ObjectId. Will be a subdirectory of programDataDir. Created
 automatically, and saved in the database.