Contents

Build Data Analysis Enviornment on Docker

Docker container is useful when building the data analysis environment with fixed version of software (e.g. ubuntu, jupyterlab, or anaconda).

Dockefile

1
2
3
4
5
6
FROM ubuntu:latest
RUN apt-get update && apt-get install -y \
    sudo wget \
    vim
WORKDIR /opt
RUN wget https://repo.anaconda.com/archive/Anaconda3-2023.07-2-Linux-x86_64.sh

Build image

As my environment is M1 Mac (Apple Silicon), I needed to add --platform linux/amd64 option.

1
docker build --platform linux/amd64 .

Run a container

1
docker run -it 571f59ade236 bash

Output

1
2
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root@a1234b549e43:/opt#

Run Anaconda installer

1
sh Anaconda3-2023.07-2-Linux-x86_64.sh 

After finishing the installation, the prompt says;

1
2
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]

entered “yes”. After few minutes, you will see

1
Thank you for installing Anaconda3!

Add Anaconda bin directory to PATH

Anaconda is installed in /opt/anaconda3 in my case.

On the container,

1
2
3
4
5
6
7
8
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# Export path
export PATH=/opt/anaconda3/bin:$PATH

$ echo $PATH
/opt/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Edit Dockerfile

To avoid interactive operation, we need to use options for usig Anaconda3-xxxx-Linux-x86_64.sh in batch mode.

By using sh -x option, you can check shell options:

1
2
3
4
5
6
7
8
-b           run install in batch mode (without manual intervention),
             it is expected the license terms (if any) are agreed upon
-f           no error if install prefix already exists
-h           print this help message and exit
-p PREFIX    install prefix, defaults to /root/anaconda3, must not contain spaces.
-s           skip running pre/post-link/install scripts
-u           update an existing installation
-t           run package tests after installation (may install conda-build)

We can use -b (run install in batch mode) and -p for installing prefix. The below command line will install Anaconda without manual intervention and set prefix for the path.

1
sh /opt/Anaconda3-2023.07-2-Linux-x86_64.sh -b -p /opt/anaconda3

Edit Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
FROM ubuntu:latest
RUN apt-get update && apt-get install -y \
    sudo wget \
    vim
WORKDIR /opt
RUN wget https://repo.anaconda.com/archive/Anaconda3-2023.07-2-Linux-x86_64.sh && \
    sh Anaconda3-2023.07-2-Linux-x86_64.sh -b -p /opt/anaconda3 && \
    rm -f Anaconda3-2023.07-2-Linux-x86_64.sh
ENV PATH /opt/anaconda3/bin:$PATH

RUN pip install --upgrade pip
WORKDIR /
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root", "--LabApp.token=''"]
  • By ENV command, set environment variable.
  • By CMD command, run jupyter lab on local host (--ip=0.0.0.0) and --allow-root and remove token setting.

Re-build docker image and run jupyter lab on a docker container.

1
docker build --platform linux/amd64 .

Then run a conatiner.

1
docker run e6ff3baff3db1

Without specifying -p option, we cannot access the jupyter lab from browser.

Instead, We need to run,

1
docker run -p 8888:8888 e6ff3baff3db1

Then we can access http://127.0.0.1:8888/lab.

Share file system between host and container

After creating an external directory from docker container on host (for example, in my case,)

1
mkdir /Users/tato/repo/dhub/ds_python

Run docker container with -v option.

1
2
docker run -p 8888:8888 -v /Users/tato/repo/dhub/ds_python:/work
 --name my-lab e6ff3baff3db1