Contents

Build Data Analysis Enviornment on Docker on AWS

Install Docker on EC2

1
2
3
ssh -i ~/.ssh/xxx.pem ubuntu@<hostname>
sudo apt-get update
sudo apt-get install docker.io

docker images fails by permission denied error. So, we make a group name docker and add ubuntu user.

1
sudo gpasswd -a ubuntu docker

Then docker images works.

Option 1: Upload docker image to AWS

There are several cases to do upload docker image to AWS.

  • User docker registry
  • Use Docker file (light, but possibility to change build context)
  • Make docker image to tar file and send it.

Make docker image to tar file.

If you cannot access internet on your host environment (EC2 in this case), you need to send docker image by tar file.

1
2
docker save <image id> > xxx.tar
docker load < xxx.tar

Make a tiny docker image on a temporary directory (e.g. ~/repo/dhub/tmp_image) on host (M1 Mac)

1
cd `~/repo/dhub/tmp_image`

Dockerfile

1
2
FROM alpine
RUN touch test

Build an image

1
docker build --platform linux/amd64 .

Save an image as tar file

1
docker save 420246f3b86e > tmp_image.tar

Then you will see a tar file.

Upload docker image (tar file) to EC2 by SFTP

1
2
3
4
sftp -i ~/.ssh/mydocker.pem ubuntu@<host name>
pwd # Check current directory
put local/path [remote/path]
get remote/path [local/path]

For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
sftp -i ~/.ssh/mydocker.pem ubuntu@<host name>

sftp> pwd 
Remote working directory: /home/ubuntu
sftp> put tmp_image.tar 
Uploading tmp_image.tar to /home/ubuntu/tmp_image.tar
tmp_image.tar                                                                             100% 7775KB   2.8MB/s   00:02    

sftp> get something
Fetching /home/ubuntu/something to something
sftp> ls
something      tmp_image.tar
sftp> exit

Load docker image from tar file

On EC2,

1
docker load < tmp_image.tar
1
2
3
4
5
6
$ docker images
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
<none>       <none>    ae15ab12e282   3 minutes ago   7.34MB

docker run -it ae15ab12e282 sh
/ #

Option 2: Upload Dockerfile

1
2
3
sftp -i ~/.ssh/mydocker.pem ubuntu@<host name>
sftp> put /Users/tato/repo/dhub/dsenv_build/Dockerfile
sftp> exit

After uploading Dockerfile to EC2, build docker image.

1
2
ssh -i ~/.ssh/mydocker.pem ubuntu@<host name>
docker build .

This fails because the disk space of T2 Micro is limited(<8GB) that is less than the docker image.

  • docker image consumptions on Linux: /var/lib/docker. (When mac case, ~/Library/Containers/com.docker.docker/Data).
  • Docker daemon’s configuration file: /etc/docker/daemon.json

On EC2;

1
2
3
4
5
6
7
8
/dsenv_build$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       7.6G  7.6G   19M 100% /
tmpfs           483M     0  483M   0% /dev/shm
tmpfs           194M  852K  193M   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/xvda15     105M  6.1M   99M   6% /boot/efi
tmpfs            97M  4.0K   97M   1% /run/user/1000

Expand the disk size of EC2

Change EBS (Elastic Block Store). On EBS, Action > Modify Volumes, and change the size into 20GB.

If the root (/) has not been expanded, check the volume size by lsblk.

1
2
3
4
5
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

xvda    202:0    0  12G  0 disk

└─xvda1 202:1    0   8G  0 part /

If the size of root partition is not changed,

1
sudo growpart /dev/xvda 1 

If file system is ext4,

1
sudo resize2fs /dev/xvda1

Build iamge and run jupyter on AWS

After expanding the disk space, you will successfully build an image. On EC2,

1
2
cd dsenv_build
docker build .

After the installation finishes, you can run a container with data analysis environment (Jupyte lab in this case).

1
docker run -v ~:/work -p 8888:8888 <Image ID> 

You can access <public DNS host name>:8888 from web browser.

Change Security group

If you cannot access to Jupyter Lab on EC2, you may need to create a security group and change the configuration.

On EC2 dashboard, Network & Security > Security Group, Create Security Group.

Inbound and Outbound Rule:

  • Type: All Trafic
  • Source: 0.0.0.0/0

Then assign the security group to EC2 instance. In my case, this solves the access error on Jupyter Lab on a browser.

This is an experimental project, but you need to limit IP address and/or token access for the Jupyter Lab.

Control container and volume access authorization

On EC2 instance, we can create user freely (more than local laptop environment)

1
2
3
4
5
6
sudo adduser --uid 1111 aaa
sudo adduser --uid 2222 bbb
docker run -u 1111 -v /home/aaa:/home/aaa -v /home/bbb:/home/bbb -it ubuntu bash
I have no name!@090372b518be:/$ cd /home 
I have no name!@090372b518be:/home$ cd bbb
bash: cd: bbb: Permission denied