Updated April 13, 2023

Introduction to Kubernetes gpu

Here gpus allows organization or enterprises to scale up their deployment and training to the multi-cloud GPU clusters. By the help of GPU, we can easily automate our deployment process, scheduling, maintenance, operation of the multiple GPU also helps us to accelerate containers of the application throughout the clusters of nodes. As we have seen, we have an increasing number of AI services and applications and the large availability of GPUs in the public cloud, so we have a requirement of open-source like Kubernetes to be GPU aware strongly. In the coming section of the tutorial, we will see its internal working, implementation, and other important things in detail for better clarity of using it, for beginners to understand it better.

What is Kubernetes gpu?

As we have seen, GPU allows enterprises to automate their process of deployment, operations, maintenance, scheduling, etc.; also, it helps in accelerating the application containers in the clusters of nodes. Also, Kubernetes support and maintain GPUs which stand for Graphical Processing Unit across the various nodes of the clusters. In order to use GPUs, we have install that software from the appropriate vendor, which you can choose according to your requirement. As we have seen, we have a large use in the field of AI/ML; this also leads to the growth of GPUs because it maintains and supports the computer models to train models and the processing of images and many more things in place. Let see a few points about the Kubernetes GPUs to understand it better way see below;

1) In short, Kubernetes does not have the ability or support we can say to handle the GPUs resources.

2) So it provides them with extended resources.

3) We have different vendors who provide support and maintenance for the extended resources that are GPUs in Kubernetes

4) We have to first install this software.

Using GPUs in Kubernetes and Kubernetes GPU setup

In this section, we will closely look at the usage of GPUs in the Kubernete; for this, we have to do a few configurations; after that, we can use it. First, we will set up Kubernete, and the second one pods our Ray cluster configuration. Let’s first understand the Ray cluster in detail for better clarity; we have Ray Docker Hub, which helps us to host CUDA images packaged with Ray just to use them with Kubernete pods.

Now we have Nvidia GPUs to use; we again have to do a few configurations, such as us must specify the relevant resources in the Kubernete configuration. So we have a standard file that shows we have a pod that is running a Ray GPU and using Nvidia GPU; the file should look like below;

Example:

apiVersion: v1
kind: Pod
metadata:
generateName: example-cluster-ray-worker
spec:
...
containers:
- name: ray-node
image: rayproject/ray:nightly-gpu
...
resources:
cpu: 1000m
memory: 512Mi
limits:
memory: 512Mi
nvidia.com/gpu: 1

As this field shows the basic configuration, we have mentioned the apiVersion, image, memory, and nvidia GPU. All this we can mention when we are using nvidia GPU; this is the basic file that you can use to configure your GPUs setup. Now we will talk about the toleration and trains of the nvidia GPUs in detail;

1) We have the Nvidia GPU plugin, which helps us apply trains to the GPU nodes for Kubernete; exactly this train helps is: it allows us to prevent the non-gpu nodes from being used getting scheduled on the GPU nodes.

2) We also have Kubernete managed services such as EKS, AKS, and GKE; these all automatically apply the toleration to the pods requesting GPU resources.

3) This toleration is applied by the use of ETAC, which stands for Extended Resources Toleration Admission Controller.

4) We have to enable this controller on the Kubernete cluster; if not, we may require adding the toleration manually to the GPU toleration and to each GPU pod configuration.

5) For this, we have one file which shows the standard configuration for this; see below;

Example:

apiVersion: v1
kind: Pod
metadata:
generateName: example-cluster-ray-worker
spec:
...
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
...
containers:
- name: ray-node
image: rayproject/ray:nightly-gpu
...

Searching for GPU quotas

As we know that it is an extended resource, so in the release of 1.10, they have added the extended resource. Overcommit for the extended resources is not allowed yet, because it is not making any sense to specify both of them, that is, requests and limit for the extended resource in a quota for GPU. As of now, only it allows the quota item with the prefix request. In this section, we will see how we can limit the nvidia quota resource by using the below configuration; also, we will understand what it is doing and specifying see below;

Example:

requests.nvidia.com/gpu: 10

Here in the above line of code, the resource name is ‘nvidia.com/gpu,’ and here we are trying to limit the total number of the extended resource that is GPU requested to be used is 10. We can do our own configuration if needed, just by modifying it.

Kubernetes gpu sharing

In this section, we will be disusing about the sharing of GPUs, few key points below;

1) We are able to specifying the limit of GPU without even specifying the requests.

2) We can also specify the request and limit, but both values should be equal.

3) Also, one more criteria is that we can not specify the request without specifying the limits.

4) That means the pods and Containers do not share GPUs.

But the sharing of GPUs can be at least possible in the case of Nvidia GPUs. So for this, just don’t specify both the values that is limit and request.

Conclusion

In this GPU tutorial, we have seen all the things related to Kubernetes GPU, how to use and implement it. Follow the whole article to get a better sense of clarity and understanding of it in detail. Reference configuration files are also there to start your configuration for this.