Contents

Build Data Analysis Enviornment on Docker on GPU

TL;DR

  • Setup GPU instance (p2.xlarge)
  • Install GPU Driver (https://docs.nvidia.com/datacenter/tesla/)
  • Install nvidia-container-toolkit (hKps://github.com/NVIDIA/nvidia-docker)
  • nvidia-smi
  • FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
  • docker run –gpus all image

Capacity Reservation

On EC2 dashboard, Capacity Reservation > Create a Capacity Reservation

  • Instance type: p2.xlarge
  • Availability zone: us-east-1a
  • Total capacity: 1 instances
  • Capacity reservation detail: At specific time and set due.

Then Create, there is a warning on that.

1
You have requested more vCPU capacity than your current vCPU limit of ${normalized_limit} allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

At http://aws.amazon.com/contact-us/ec2-request, request belows:

  • Region: US East(Northern Virginia)
  • Primary Instance Type: All P instances
  • New Limit value: 4
  • Use case description
    1
    
    Hi, I would like to make a request to increase the number of vCPU for p2.xlarge instance. 
    

In my case, set capacity reservation ends at specific time (it costs 0.9USD/hour just 2 hours from now, for example)

Create GPU instance on EC2

On instance creation,

select the same region and instance type : p2.xlarge. Also, expand the disk space to 20GB.

1. Install Docker on EC2

1
2
3
4
sudo apt-get update
sudo apt-get install docker.io
sudo gpasswd -a ubuntu docker
docker --version

2. Install NVIDIA Driver

Check NVIDIA Tesla Installation Notes.

https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts

Check System Management Interface

1
nvidia-smi

If nvidia-smi does not work, you need to install it.

On EC2, check linux version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Check recommended NVIDIA driver version:

1
2
3
sudo apt-get update
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
== /sys/devices/pci0000:00/0000:00:1e.0 ==
modalias : pci:v000010DEd0000102Dsv000010DEsd0000106Cbc03sc02i00
vendor   : NVIDIA Corporation
model    : GK210GL [Tesla K80]
driver   : nvidia-driver-470 - distro non-free recommended
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

In this case, we want to install nvidia-driver-470.

1
2
sudo apt install  nvidia-driver-470
sudo reboot

Run nvidia-smi (system management interface)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$ nvidia-smi
Tue Aug 22 09:52:43 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02   Driver Version: 470.199.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   37C    P0    58W / 149W |      0MiB / 11441MiB |     54%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

3. Install NVIDIA Container Toolkit

The installation command at the above links did not work for me.

1
2
3
4
$ sudo apt-get update \
    && sudo apt-get install -y nvidia-container-toolkit-base
...
E: Unable to locate package nvidia-container-toolkit

Instead, some manual command lines at the github issue worked for me.

1
2
3
4
5
6
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

Check if installed successfully.

1
2
3
$ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.13.5
commit: 6b8589dcb4dead72ab64f14a5912886e6165c079

Check the latest (or appropriate) tag of nvidia/cuda at https://hub.docker.com/r/nvidia/cuda/tags.

Then run docker with it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ docker run --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

Unable to find image 'nvidia/cuda:12.2.0-base-ubuntu22.04' locally
12.2.0-base-ubuntu22.04: Pulling from nvidia/cuda
6b851dcae6ca: Pull complete 
8f5f0e71700a: Pull complete 
fac7ce4a13c3: Pull complete 
1af9bee222cb: Pull complete 
d47e0a26d15c: Pull complete 
Digest: sha256:f8870283bea6a85ba4b4a5e1b65158dd15e8009e433539e7c83c94707e703a1b
Status: Downloaded newer image for nvidia/cuda:12.2.0-base-ubuntu22.04
Tue Aug 22 12:56:41 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02   Driver Version: 470.199.02   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   37C    P0    57W / 149W |      0MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The above output shows nvidia-smi output on docker container (with docker container toolkit).

Upload build context

Access SFTP

1
2
3
4
sftp -i ~/.ssh/mydocker.pem ubuntu@ec2-54-146-60-95.compute-1.amazonaws.com
$put -r dsenv_build
$ssh -i mygpukey.pem ubuntu@<hostname> 
$vim dsenv_build/Dockerfile

Update Dockerfile for GPU

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y \
    sudo \
    wget \
    vim
WORKDIR /opt
RUN wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh && \
    sh Anaconda3-2019.10-Linux-x86_64.sh -b -p /opt/anaconda3 && \
    rm -f Anaconda3-2019.10-Linux-x86_64.sh
ENV PATH /opt/anaconda3/bin:$PATH

RUN pip install --upgrade pip && pip install \
    keras==2.3 \
    scipy==1.4.1 \
    tensorflow-gpu==2.1
WORKDIR /
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root", "--LabApp.token=''"]

Build context.

1
$docker build .

Run a container on GPU

1
2
docker run --gpus all -v ~:/work -p 8888:8888 <image>
nvidia-smi

Access to Jupyter lab on <Public DNS>:8888

Run the code at https://github.com/keras-team/keras/blob/keras-2/examples/mnist_cnn.py

Check how nvidia-smi works on EC2.