
Cluster creation (first time only)¶
Kube SIPECAM cluster uses Kubernetes which is an open-source system for automating deployment, scaling, and management of containerized applications (see Kubernetes and Kubernetes github page ).
The nex steps follow kops and kops - Kubernetes Operations guides (another guide: Step Zero Kubernetes on AWS).
Configure a domain and a subdomain with their respective hosted zones. For the following description Route 53 service of AWS was used to create domain
conabio-route53.net
and subdomainantares3.conabio-route53.net
. Also a gossip based Kubernetes cluster can be used instead (see for example this issue ).Install same versions of kops and kubectl. You can use a
t2.micro
ec2 instance with AMIUbuntu 20.04 LTS
and a role attached to it withAmazonEc2FullAccess
to install this tools and label it with the next bash script:
Note: change region in next bash script where you deployed t2.micro instance
#!/bin/bash
##variables:
region=<region>
name_instance=deploy-k8s
shared_volume=/shared_volume
user=ubuntu
##System update
export DEBIAN_FRONTEND=noninteractive
apt-get update -yq
##Install awscli
apt-get install -y python3-pip && pip3 install --upgrade pip
pip3 install awscli --upgrade
##Tag instance
INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)
PUBLIC_IP=$(curl -s http://instance-data/latest/meta-data/public-ipv4)
aws ec2 create-tags --resources $INSTANCE_ID --tag Key=Name,Value=$name_instance-$PUBLIC_IP --region=$region
##Set variables for completion of bash commands
echo "export LC_ALL=C.UTF-8" >> /home/$user/.profile
echo "export LANG=C.UTF-8" >> /home/$user/.profile
##Set variable mount_point
echo "export mount_point=$shared_volume" >> /home/$user/.profile
##Useful software for common operations
apt-get install -y nfs-common jq git htop nano
##Create shared volume
mkdir $shared_volume
##install docker for ubuntu:
apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -yq
apt-get install -y docker-ce
service docker start
##install kops version 1.14.0:
wget -O kops https://github.com/kubernetes/kops/releases/download/1.14.0/kops-linux-amd64
chmod +x ./kops
sudo mv ./kops /usr/local/bin/
##install kubernetes command line tool v1.14: kubectl
wget -O kubectl https://storage.googleapis.com/kubernetes-release/release/v1.14.0/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
##enable completion for kubectl:
echo "source <(kubectl completion bash)" >> /home/$user/.bashrc
You can check kops and kubectl versions with:
kops version
kubectl version
Note
All kubectl
and kops
commands must be executed in this instance.
Set next bash variables:
#Your domain name that is hosted in AWS Route 53
#Use: export DOMAIN_NAME="antares3.k8s.local" #for a gossip based cluster
export DOMAIN_NAME="antares3.conabio-route53.net"
# Friendly name to use as an alias for your cluster
export CLUSTER_ALIAS="k8s-deployment"
# Leave as-is: Full DNS name of you cluster
export CLUSTER_FULL_NAME="${CLUSTER_ALIAS}.${DOMAIN_NAME}"
# AWS availability zone where the cluster will be created
export CLUSTER_AWS_AZ=us-west-2a,us-west-2b,us-west-2c
# Leave as-is: AWS Route 53 hosted zone ID for your domain (don't set it if gossip based cluster is used)
export DOMAIN_NAME_ZONE_ID=$(aws route53 list-hosted-zones \
| jq -r '.HostedZones[] | select(.Name=="'${DOMAIN_NAME}'.") | .Id' \
| sed 's/\/hostedzone\///')
export KUBERNETES_VERSION="1.14.0"
#To hold cluster state information export KOPS_STATE_STORE
export KOPS_STATE_STORE="s3://${CLUSTER_FULL_NAME}-state"
export EDITOR=nano
Create AWS S3 bucket to hold information for Kubernetes cluster:
Note
The instance needs the policy AmazonS3FullAccess attach to a role created by you to have permissions to execute next command.
#Bucket will be created in us-east (N. Virginia)
aws s3api create-bucket --bucket ${CLUSTER_FULL_NAME}-state
Create group and user kops and generate access keys for user kops:
Note
The instance needs the policy IAMFullAccess attach to a role created by you to have permissions to execute next command.
Create group and permissions of it:
name=kops
aws iam create-group --group-name $name
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess --group-name $name
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonRoute53FullAccess --group-name $name
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --group-name $name
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/IAMFullAccess --group-name $name
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonVPCFullAccess --group-name $name
aws iam attach-group-policy --policy-arn arn:aws:iam::aws:policy/AmazonElasticFileSystemFullAccess --group-name $name
Create user kops and add it to already created group kops:
aws iam create-user --user-name $name
aws iam add-user-to-group --user-name $name --group-name $name
Create access keys for user kops:
aws iam create-access-key --user-name $name
This will generate an AccessKeyId and SecretAccessKey that must be kept in a safe place. Use them to configure awscli and set next variables:
aws configure
AWS Access Key ID [None]: xxxx
AWS Secret Access Key [None]: xxxxxxx
Default region name [None]: <leave it empty>
Default output format [None]: <leave it empty>
export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id)
export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key)
Create a Key Pair with AWS console and a Public Key. See Amazon EC2 Key Pairs sections: Creating a Key Pair Using Amazon EC2 and Creating a Key Pair Using Amazon EC2. Save the Public Key in
/home/ubuntu/.ssh/id_rsa.pub
.Deploy Kubernetes Cluster. An example is:
kops create cluster \
--name=${CLUSTER_FULL_NAME} \
--zones=${CLUSTER_AWS_AZ} \
--master-size="t2.medium" \
--node-size="t2.medium" \
--node-count="1" \
--dns-zone=${DOMAIN_NAME} \
--ssh-public-key="/home/ubuntu/.ssh/id_rsa.pub" \
--kubernetes-version=${KUBERNETES_VERSION}
kops update cluster --name ${CLUSTER_FULL_NAME} --yes
Note
Check status of cluster with kops validate cluster
and wait until it says Your cluster $CLUSTER_FULL_NAME is ready
Note
You can delete cluster with: $kops delete cluster ${CLUSTER_FULL_NAME}
and then $kops delete cluster ${CLUSTER_FULL_NAME} --yes
(without yes
flag you only see what changes are going to be applied) and don’t forget to delete S3 bucket: $aws s3api delete-bucket --bucket ${CLUSTER_FULL_NAME}-state
after cluster deletion.
Note
You can scale up/down nodes of cluster with command: $kops edit ig nodes --name $CLUSTER_FULL_NAME
, edit screen that appears and set 3/0 number of instances in minSize, maxSize values (3 is an example) and then $kops update cluster $CLUSTER_FULL_NAME
and $kops update cluster $CLUSTER_FULL_NAME --yes
to apply changes. Command kops validate cluster
is useful to see state of cluster.
Note
To scale up/down master you can use: $kops edit ig master-us-west-2a --name $CLUSTER_FULL_NAME
(you can check your instance type of master with: $kops get instancegroups
) set 1/0 number of instances in minSize, maxSize values and then $kops update cluster $CLUSTER_FULL_NAME
and $kops update cluster $CLUSTER_FULL_NAME --yes
to apply changes. Command kops validate cluster
is useful to see state of cluster.
¿How do I ssh to an instance of Kubernetes Cluster?
Using the key-pem already created for the kops user execute:
ssh -i <key>.pem admin@api.$CLUSTER_FULL_NAME
Note
Make sure this <key>.pem has 400 permissions: $chmod 400 <key>.pem
.
You can also deploy kubernetes dashboard for your cluster.
Kubernetes dashboard¶
According to Kubernetes Dashboard kubernetes dashboard is a general purpose, web-based UI for kubernetes clusters. It allows users to manage applications running in the cluster and troubleshoot them, as well as manage the cluster itself.
Next steps are based on: Certificate management, Installation, Accessing Dashboard 1.7.X and above and Creating sample user from kubernetes official documentation and installation of Certbot for Ubuntu (18.04) bionic and certbot-dns-route53 to generate certificates and access kubernetes dashboard via https.
Install certbot and Route53 plugin for Let’s Encrypt client:
sudo apt-get install -y certbot
#check version of certbot and install route53 plugin:
certbot_v=$(certbot --version|cut -d' ' -f2)
sudo pip3 install certbot_dns_route53==$certbot_v
Create some useful directories:
mkdir -p ~/letsencrypt/log/
mkdir -p ~/letsencrypt/config/
mkdir -p ~/letsencrypt/work/
Using kubectl
retrieve where is kubernetes master running:
kubectl cluster-info
Kubernetes master is running at <location>
KubeDNS is running at <location>/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Generate certificate for the <location> (remove https if it’s the case, just the dns name) of last command (make sure to save directory letsencrypt in a safe place):
$certbot certonly -d <location> --dns-route53 --logs-dir letsencrypt/log/ --config-dir letsencrypt/config/ --work-dir letsencrypt/work/ -m myemail@myinstitution --agree-tos --non-interactive --dns-route53-propagation-seconds 20
Note
Make sure you save the date that will expire your certificate. To renew certificate execute:
certbot renew --dns-route53 --logs-dir letsencrypt/log/ \
--config-dir letsencrypt/config/ --work-dir letsencrypt/work/ \
--non-interactive
Note
Also you need to have some symlinks created under directory: letsencrypt/config/live/<location>
:
cert.pem -> ../../archive/<location>/cert1.pem
chain.pem -> ../../archive/<location>/chain1.pem
fullchain.pem -> ../../archive/<location>/fullchain1.pem
privkey.pem -> ../../archive/<location>/privkey1.pem
Create directory certs
and copy cert and private key:
mkdir certs
cp letsencrypt/config/archive/<location>/fullchain1.pem certs/
cp letsencrypt/config/archive/<location>/privkey1.pem certs/
Note
When renewing your certificate the latest ones will be symlinks located: letsencrypt/config/live/<location>/
. See Where are my certificates?
Retrieve yaml to deploy kubernetes dashboard and change some values:
curl -O https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
sed -ni 's/- --auto-generate-certificates/#- --auto-generate-certificates/;p' recommended.yaml
sed -i '/args:/a \ \ \ \ \ \ \ \ \ \ \ \ - --tls-cert-file=fullchain1.pem' recommended.yaml
sed -i '/args:/a \ \ \ \ \ \ \ \ \ \ \ \ - --tls-key-file=privkey1.pem' recommended.yaml
Create deployments and services with kubernetes-dashboard.yaml
:
kubectl apply -f recommended.yaml
Delete certs and recreate secrets using the .pem that we created with certbot:
kubectl delete secret kubernetes-dashboard-certs -n kubernetes-dashboard
kubectl create secret generic kubernetes-dashboard-certs --from-file=certs -n kubernetes-dashboard
You can check that containers are running by executing:
kubectl -n kubernetes-dashboard get pods
To visualize kubernetes-dashboard one possibility is to change type ClusterIP
to NodePort
(see Accessing Dashboard 1.7.X and above) when executing next command:
kubectl edit service kubernetes-dashboard -n kubernetes-dashboard
and get port with:
kubectl get service kubernetes-dashboard -n kubernetes-dashboard
Open port retrieved by last command in masters security group of kubernetes cluster with aws console. In your browser type:
https://<location>:<port>
Documentation of Creating sample user can be used to access via token generation. Use:
kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')
to retrieve token.


To scale down components of kubernetes dashboard:
kubectl -n kubernetes-dashboard scale deployments/dashboard-metrics-scraper --replicas=0
kubectl -n kubernetes-dashboard scale deployments/kubernetes-dashboard --replicas=0
To scale up components of kubernetes dashboard:
kubectl -n kubernetes-dashboard scale deployments/dashboard-metrics-scraper --replicas=1
kubectl -n kubernetes-dashboard scale deployments/kubernetes-dashboard --replicas=1
To delete components of kubernetes dashboard:
#delete admin-user created:
kubectl -n kubernetes-dashboard delete serviceaccount admin-user
kubectl -n kubernetes-dashboard delete ClusterRoleBinding admin-user
#delete dashboard components:
kubectl delete deployment kubernetes-metrics-scraper -n kubernetes-dashboard
kubectl delete deployment kubernetes-dashboard -n kubernetes-dashboard
kubectl delete service dashboard-metrics-scraper -n kubernetes-dashboard
kubectl delete clusterrolebinding kubernetes-dashboard -n kubernetes-dashboard
kubectl delete rolebinding kubernetes-dashboard -n kubernetes-dashboard
kubectl delete clusterrole kubernetes-dashboard -n kubernetes-dashboard
kubectl delete role kubernetes-dashboard -n kubernetes-dashboard
kubectl delete configmap kubernetes-dashboard-settings -n kubernetes-dashboard
kubectl delete secret kubernetes-dashboard-key-holder -n kubernetes-dashboard
kubectl delete secret kubernetes-dashboard-csrf -n kubernetes-dashboard
kubectl delete service kubernetes-dashboard -n kubernetes-dashboard
kubectl delete serviceaccount kubernetes-dashboard -n kubernetes-dashboard
kubectl delete secret kubernetes-dashboard-certs -n kubernetes-dashboard
kubectl delete namespace kubernetes-dashboard -n kubernetes-dashboard
Jupyterlab deployment for pipelines¶
Create deployment kale-jupyterlab
that has Kale installed:
LOAD_BALANCER_SERVICE=loadbalancer-0.6.1-efs
JUPYTERLAB_SERVICE_HOSTPATH_PV=jupyterlab-cert-0.6.1-efs
URL=https://raw.githubusercontent.com/CONABIO/kube_sipecam/master/deployments/jupyterlab_cert/
kubectl create -f $URL/efs/$LOAD_BALANCER_SERVICE.yaml
kubectl create -f $URL/efs/$JUPYTERLAB_SERVICE_HOSTPATH_PV.yaml
Check Yamls for kale-jupyterlab deployment for last version.
Cluster deployment¶
Once cluster is created in us-west-2
region launch a t2.micro
instance with AMI Ubuntu-20.04
in which we will execute kubectl
commands to deploy cluster.
Use next bash script which will install docker
, kops
and kubectl
. Also choose a suitable name for your instance and write it as value of name_instance
variable.
#!/bin/bash
##variables:
region=us-west-2
name_instance=deploy-k8s
shared_volume=/shared_volume
user=ubuntu
##System update
export DEBIAN_FRONTEND=noninteractive
apt-get update -yq
##Install awscli
apt-get install -y python3-pip && pip3 install --upgrade pip
pip3 install awscli --upgrade
##Tag instance
INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)
PUBLIC_IP=$(curl -s http://instance-data/latest/meta-data/public-ipv4)
aws ec2 create-tags --resources $INSTANCE_ID --tag Key=Name,Value=$name_instance-$PUBLIC_IP --region=$region
##Set variables for completion of bash commands
echo "export LC_ALL=C.UTF-8" >> /home/$user/.profile
echo "export LANG=C.UTF-8" >> /home/$user/.profile
##Set variable mount_point
echo "export mount_point=$shared_volume" >> /home/$user/.profile
##Useful software for common operations
apt-get install -y nfs-common jq git htop nano
##Create shared volume
mkdir $shared_volume
##install docker for ubuntu:
apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -yq
apt-get install -y docker-ce
service docker start
##install kops version 1.19.0:
wget -O kops https://github.com/kubernetes/kops/releases/download/v1.19.1/kops-linux-amd64
chmod +x ./kops
sudo mv ./kops /usr/local/bin/
##install kubernetes command line tool v1.19: kubectl
wget -O kubectl https://storage.googleapis.com/kubernetes-release/release/v1.19.1/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
##enable completion for kubectl:
echo "source <(kubectl completion bash)" >> /home/$user/.bashrc
Scale up of worker nodes¶
Use ssh
to login to ec2 instance. Choose which type of cluster you will use and set next variables accordingly:
# For testing
export DOMAIN_NAME="dummy.route53-kube-sipecam.net"
export CLUSTER_ALIAS="k8s-dummy"
export CLUSTER_FULL_NAME="${CLUSTER_ALIAS}.${DOMAIN_NAME}"
export KUBERNETES_VERSION="1.19.1"
export KOPS_STATE_STORE="s3://${CLUSTER_FULL_NAME}-state"
export EDITOR=nano
# For production
export DOMAIN_NAME="proc-sys.route53-kube-sipecam.net"
export CLUSTER_ALIAS="k8s"
export CLUSTER_FULL_NAME="${CLUSTER_ALIAS}.${DOMAIN_NAME}"
export KUBERNETES_VERSION="1.19.1"
export KOPS_STATE_STORE="s3://${CLUSTER_FULL_NAME}-state"
export EDITOR=nano
Scale up worker nodes of cluster using kops
.
kops edit ig nodes-us-west-2a --name $CLUSTER_FULL_NAME
Note
If you will use GPU then add under spec
next lines:
spec:
additionalUserData:
- name: install_dependencies_for_kube_sipecam_gpu.sh
type: text/x-shellscript
content: |-
#!/bin/bash
sudo apt-get update && sudo apt-get install -y build-essential
#install nvidia driver
nv_driver=460.32.03
nv_driver_run=NVIDIA-Linux-x86_64-460.32.03.run
cd ~ && wget http://us.download.nvidia.com/tesla/$nv_driver/$nv_driver_run
chmod a+x $nv_driver_run
sudo ./$nv_driver_run --accept-license --silent
#install nvidia-docker2
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo su && echo '{"default-runtime": "nvidia","runtimes": {"nvidia": {"path": "/usr/bin/nvidia-container-runtime","runtimeArgs": []}}}' > /etc/docker/daemon.json && pkill -SIGHUP dockerd && systemctl restart kubelet
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210315
machineType: p2.xlarge
Next line is just to see what changes are going to be applied.
kops update cluster --name $CLUSTER_FULL_NAME
Apply the changes.
kops update cluster --name $CLUSTER_FULL_NAME --yes --admin
Next line is useful to know when cluster is ready.
kops validate cluster --wait 10m
Scale up of components¶
When cluster is ready scale up next:
Dashboard components
kubectl -n kubernetes-dashboard scale deployments/dashboard-metrics-scraper --replicas=1
kubectl -n kubernetes-dashboard scale deployments/kubernetes-dashboard --replicas=1
Get port of dashboard UI with:
kubectl get service kubernetes-dashboard -n kubernetes-dashboard
Access dashboard UI:
https://api.$CLUSTER_ALIAS.$DOMAIN_NAME:<port retrieved with last command>
Elastic File System
kubectl -n kubeflow scale deployments/nfs-client-provisioner --replicas=1
Kubeflow components
kubectl scale -n kubeflow deployment cache-deployer-deployment --replicas=1
kubectl scale -n kubeflow deployment cache-server --replicas=1
kubectl scale -n kubeflow deployment workflow-controller --replicas=1
kubectl scale -n kubeflow deployment ml-pipeline-ui --replicas=1
kubectl scale -n kubeflow deployment ml-pipeline-scheduledworkflow --replicas=1
kubectl scale -n kubeflow deployment ml-pipeline-persistenceagent --replicas=1
kubectl scale -n kubeflow deployment ml-pipeline --replicas=1
kubectl scale -n kubeflow deployment metadata-writer --replicas=1
kubectl scale -n kubeflow deployment metadata-envoy-deployment --replicas=1
kubectl scale -n kubeflow deployment metadata-grpc-deployment --replicas=1
kubectl scale -n kubeflow deployment ml-pipeline-viewer-crd --replicas=1
kubectl scale -n kubeflow deployment ml-pipeline-visualizationserver --replicas=1
kubectl scale -n kubeflow deployment mysql --replicas=1
kubectl scale -n kubeflow deployment controller-manager --replicas=1
Get port of kubeflow UI with:
kubectl get service ml-pipeline-ui -n kubeflow
Access kubeflow dashboard UI:
http://api.$CLUSTER_ALIAS.$DOMAIN_NAME:<port retrieved with last command>
Jupyterlab service
kubectl -n kubeflow scale deployments/kale-jupyterlab --replicas=1
Access jupyterlab UI:
https://api.$CLUSTER_ALIAS.$DOMAIN_NAME:30001/myurl
Scale down of components¶
Once the work is done scale down next:
Kubeflow components
kubectl scale -n kubeflow deployment cache-deployer-deployment --replicas=0
kubectl scale -n kubeflow deployment cache-server --replicas=0
kubectl scale -n kubeflow deployment workflow-controller --replicas=0
kubectl scale -n kubeflow deployment ml-pipeline-ui --replicas=0
kubectl scale -n kubeflow deployment ml-pipeline-scheduledworkflow --replicas=0
kubectl scale -n kubeflow deployment ml-pipeline-persistenceagent --replicas=0
kubectl scale -n kubeflow deployment ml-pipeline --replicas=0
kubectl scale -n kubeflow deployment metadata-writer --replicas=0
kubectl scale -n kubeflow deployment metadata-envoy-deployment --replicas=0
kubectl scale -n kubeflow deployment metadata-grpc-deployment --replicas=0
kubectl scale -n kubeflow deployment ml-pipeline-viewer-crd --replicas=0
kubectl scale -n kubeflow deployment ml-pipeline-visualizationserver --replicas=0
kubectl scale -n kubeflow deployment mysql --replicas=0
kubectl scale -n kubeflow deployment controller-manager --replicas=0
Dashboard components
kubectl -n kubernetes-dashboard scale deployments/dashboard-metrics-scraper --replicas=0
kubectl -n kubernetes-dashboard scale deployments/kubernetes-dashboard --replicas=0
Elastic File System
kubectl -n kubeflow scale deployments/nfs-client-provisioner --replicas=0
Jupyterlab service
kubectl -n kubeflow scale deployments/kale-jupyterlab --replicas=0
Scale down of worker nodes¶
Wait 5 minutes then scale down worker nodes of cluster using kops
.
kops edit ig nodes-us-west-2a --name $CLUSTER_FULL_NAME
Next line is just to see what changes are going to be applied.
kops update cluster --name $CLUSTER_FULL_NAME
Apply the changes.
kops update cluster --name $CLUSTER_FULL_NAME --yes --admin
Terminate t2.micro
instance.