KubeRay integration with MCAD (Multi-Cluster-App-Dispatcher)¶
The multi-cluster-app-dispatcher is a Kubernetes controller providing mechanisms for applications to manage batch jobs in a single or multi-cluster environment. For more details please refer here.
Use case¶
MCAD allows you to deploy Ray cluster with a guarantee that sufficient resources are available in the cluster prior to actual pod creation in the Kubernetes cluster. It supports features such as:
- Integrates with upstream Kubernetes scheduling stack for features such co-scheduling, Packing on GPU dimension etc.
- Ability to wrap any Kubernetes objects.
- Increases control plane stability by JIT (Just-in Time) object creation.
- Queuing with policies.
- Quota management that goes across namespaces.
- Support for multiple Kubernetes clusters; dispatching jobs to any one of a number of Kubernetes clusters.
In order to queue Ray cluster(s) and gang dispatch
them when aggregated resources are available please create a KinD cluster using the instruction below and then refer to the setup KubeRay-MCAD integration on a Kubernetes Cluster or an OpenShift Cluster.
On OpenShift, MCAD and KubeRay are already part of the Open Data Hub Distributed Workload Stack. The stack provides a simple, user-friendly abstraction for scaling, queuing and resource management of distributed AI/ML and Python workloads. Please follow the Quick Start in the Distributed Workloads for installation.
Create KinD cluster¶
We need a KinD cluster with the specified cluster resources to consistently observe the expected behavior described in the demo below. This can be done with running KinD with Podman.
Note: Without Podman, a KinD worker node is allowed to see the cpu/memory resources on the host. In addition, this environment is created to run the tutorial on a resource-constrained local Kubernetes environment. It is not recommended for real workloads or production.
Expect the Podman Machine running with the follow CPU and MEMORY resourcespodman machine init --cpus 8 --memory 8196 podman machine start podman machine list
Create KinD cluster on the Podman Machine:NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE podman-machine-default* qemu 2 minutes ago Currently running 8 8.594GB 107.4GB
Creating a KinD cluster should take less than 1 minute. Expect the output similar to:KIND_EXPERIMENTAL_PROVIDER=podman kind create cluster
using podman due to KIND_EXPERIMENTAL_PROVIDER enabling experimental podman provider Creating cluster "kind" ... â Ensuring node image (kindest/node:v1.26.3) đŧ â Preparing nodes đĻ â Writing configuration đ â Starting control-plane đšī¸ â Installing CNI đ â Installing StorageClass đž Set kubectl context to "kind-kind" You can now use your cluster with: kubectl cluster-info --context kind-kind Have a nice day! đ
Describe the single node cluster:
kubectl describe node kind-control-plane
Expect the cpu
and memory
in the Allocatable
section to be similar to:
Allocatable:
cpu: 8
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8118372Ki
pods: 110
Submitting KubeRay cluster to MCAD¶
After the KinD cluster is created using the instruction above, make sure to install the KubeRay-MCAD integration Prerequisites for KinD cluster.
Let's create two RayClusters using the AppWrapper custom resource(CR) on the same Kubernetes cluster. The AppWrapper is the custom resource definition provided by MCAD to dispatch resources and manage batch jobs on Kubernetes clusters.
- We submit the first RayCluster with the AppWrapper CR aw-raycluster.yaml:
kubectl create -f https://raw.githubusercontent.com/project-codeflare/multi-cluster-app-dispatcher/main/doc/usage/examples/kuberay/config/aw-raycluster.yaml
generictemplate
. We also specified matching resources for each of the RayCluster Head node and worker node in the custompodresources
. The MCAD uses the custompodresources
to reserve the required resources to run the RayCluster without creating pending Pods.
Note: Within the same AppWrapper, you may also wrap any individual k8s resources (i.e. configMap, secret, etc) associated with this job as a generictemplate to be dispatched together with the RayCluster.
Check AppWrapper status by describing the job.
kubectl describe appwrapper raycluster-complete -n default
The Status:
stanza would show the State
of Running
if the wrapped RayCluster has been deployed. The 2 Pods associated with the RayCluster were also created.
Status:
Canrun: true
Conditions:
Last Transition Micro Time: 2023-08-29T02:50:18.829462Z
Last Update Micro Time: 2023-08-29T02:50:18.829462Z
Status: True
Type: Init
Last Transition Micro Time: 2023-08-29T02:50:18.829496Z
Last Update Micro Time: 2023-08-29T02:50:18.829496Z
Reason: AwaitingHeadOfLine
Status: True
Type: Queueing
Last Transition Micro Time: 2023-08-29T02:50:18.842010Z
Last Update Micro Time: 2023-08-29T02:50:18.842010Z
Reason: FrontOfQueue.
Status: True
Type: HeadOfLine
Last Transition Micro Time: 2023-08-29T02:50:18.902379Z
Last Update Micro Time: 2023-08-29T02:50:18.902379Z
Reason: AppWrapperRunnable
Status: True
Type: Dispatched
Controllerfirsttimestamp: 2023-08-29T02:50:18.829462Z
Filterignore: true
Queuejobstate: Dispatched
Sender: before manageQueueJob - afterEtcdDispatching
State: Running
Events: <none>
(base) asmalvan@mcad-dev:~/mcad-kuberay$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
raycluster-complete-head-9s4x5 1/1 Running 0 47s
raycluster-complete-worker-small-group-4s6jv 1/1 Running 0 47s
- Let's submit another RayCluster with the AppWrapper CR and see it queued without creating pending Pods using the command:
Check the raycluster-complete-1 AppWrapper
kubectl create -f https://raw.githubusercontent.com/project-codeflare/multi-cluster-app-dispatcher/main/doc/usage/examples/kuberay/config/aw-raycluster-1.yaml
Thekubectl describe appwrapper raycluster-complete-1 -n default
Status:
stanza should show theState
ofPending
if the wrapped object (RayCluster) has been queued. No pods from the secondAppWrapper
were created due toInsufficient resources to dispatch AppWrapper
.Status: Conditions: Last Transition Micro Time: 2023-08-29T17:39:08.406401Z Last Update Micro Time: 2023-08-29T17:39:08.406401Z Status: True Type: Init Last Transition Micro Time: 2023-08-29T17:39:08.406452Z Last Update Micro Time: 2023-08-29T17:39:08.406451Z Reason: AwaitingHeadOfLine Status: True Type: Queueing Last Transition Micro Time: 2023-08-29T17:39:08.423208Z Last Update Micro Time: 2023-08-29T17:39:08.423208Z Reason: FrontOfQueue. Status: True Type: HeadOfLine Last Transition Micro Time: 2023-08-29T17:39:08.439753Z Last Update Micro Time: 2023-08-29T17:39:08.439753Z Message: Insufficient resources to dispatch AppWrapper. Reason: AppWrapperNotRunnable. Status: True Type: Backoff Controllerfirsttimestamp: 2023-08-29T17:39:08.406399Z Filterignore: true Queuejobstate: Backoff Sender: before ScheduleNext - setHOL State: Pending Events: <none>
We may manually check the allocated resources:
kubectl describe node kind-control-plane
Allocated resources
section showed cpu Requests as 6050m(75%) therefore the remaining cpu resource did not satisfy the second AppWrapper.
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 6050m (75%) 5200m (65%)
memory 6824650Ki (84%) 6927050Ki (85%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
For example, observe the other RayCluster been created after deleting the first AppWrapper using:
kubectl delete appwrapper raycluster-complete -n default
Note: This would also simultaneously remove any K8s resources you may have wrapped as generictemplates within this AppWrapper.