Integrating Amazon S3 with EKS for Scalable Kubernetes Storage
This article provides a comprehensive guide on integrating Amazon S3 as a robust and scalable storage solution with Amazon Elastic Kubernetes Service (EKS). We’ll leverage Terraform for infrastructure provisioning and Kubernetes YAML manifests to deploy a simple Nginx container that serves website files directly from an S3 bucket. The core of this integration lies in the Mountpoint for S3 CSI driver, which seamlessly exposes Amazon S3 objects to Kubernetes workloads using standard POSIX interfaces.
Understanding Amazon S3 for Kubernetes Workloads
What is Amazon S3?
Amazon Simple Storage Service (Amazon S3) is a highly scalable, durable, and available object storage service. Unlike traditional block storage (like Amazon EBS) or file storage (like Amazon EFS), S3 stores data as “objects” within “buckets.” This makes it an excellent choice for a wide range of use cases, including hosting static websites, storing application logs, backups, and large datasets, due to its virtually unlimited capacity and cost-effectiveness.
Architecture for S3 Integration with EKS
The architecture for connecting Amazon S3 to an EKS cluster is built around the S3 Container Storage Interface (CSI) driver and the Mountpoint for S3 technology.
- S3 Bucket: At the foundation, an S3 bucket serves as the backend object store where all application data resides.
- StorageClass: Within Kubernetes, a
StorageClassdefines how storage is provisioned. For S3, however, only static provisioning is currently supported. This means you must manually create aPersistentVolume(PV) and explicitly map it to an existing S3 bucket. - PersistentVolume (PV): This Kubernetes resource represents a piece of storage in your cluster, in this case, a specific S3 bucket or a prefix within it.
- PersistentVolumeClaim (PVC): Kubernetes workloads request storage using a
PersistentVolumeClaim. The PVC then binds to an available PV, allowing pods to access the underlying S3 storage. - S3 CSI Driver & Mountpoint for S3: When a pod requests a PVC linked to an S3 PV, the S3 CSI driver comes into play. It uses Mountpoint for S3 to provide a file system-like interface to the S3 bucket, making object storage accessible as if it were a traditional file system.
- Security (IRSA): Access control is managed through IAM Roles for Service Accounts (IRSA). This mechanism ensures that Kubernetes pods have only the minimum necessary permissions to interact with specific S3 buckets, adhering to the principle of least privilege.
It’s crucial to note that while dynamic provisioning is common for other storage types like EBS and EFS (where PVCs automatically trigger the creation of new storage volumes), it is not yet available for S3 CSI. This makes static provisioning the exclusive method, requiring administrators to pre-define the mapping between PVs and S3 buckets.
Key Benefits of Using S3 with EKS
- Scalability: Enjoy virtually limitless storage capacity, adapting effortlessly to your application’s growth.
- Durability: Benefit from Amazon S3’s industry-leading 11 nines (99.999999999%) of data durability.
- Cost-Effectiveness: Pay only for the storage you consume, with no upfront provisioning or wasted capacity.
- Seamless Integration: Leverage easy integration with other AWS services such as Athena, Glue, and CloudFront.
Important Considerations for EKS
- CSI Driver Requirement: The Mountpoint for S3 CSI driver must be installed and properly configured in your EKS cluster.
- Pod Identity Support: Currently, IRSA (IAM Roles for Service Accounts) is required for authentication. AWS Pod Identity is not yet supported with this driver.
- Static Provisioning Only: As of now, only static provisioning is available. Dynamic provisioning is a highly anticipated feature on the roadmap.
- Resource Quotas: Be mindful of limitations concerning open file descriptors and network throughput when designing and scaling workloads that heavily interact with S3.
Step 1: Provisioning EKS Cluster with Terraform in a VPC
The initial step involves setting up your EKS cluster within a dedicated Amazon Virtual Private Cloud (VPC) for network isolation and security. We recommend using the official AWS Terraform community modules for both VPC and EKS, as they provide well-architected and tested configurations. Refer to the main module of the provided GitHub repository for the complete Terraform setup.
Step 2: Creating the S3 Bucket and IAM Roles
This step focuses on defining the S3 bucket, configuring its security, and setting up the necessary IAM roles for the Mountpoint for S3 CSI driver.
S3 Bucket Configuration (Terraform)
The Terraform configuration below provisions an encrypted S3 bucket with versioning enabled and robust public access blocks to ensure data security.
####################################################################################
# S3 Bucket for Kubernetes Storage
####################################################################################
resource "aws_s3_bucket" "main" {
bucket = var.bucket_name
tags = {
Name = var.bucket_name
Environment = var.environment
Terraform = "true"
}
}
####################################################################################
# S3 Bucket Versioning
####################################################################################
resource "aws_s3_bucket_versioning" "main" {
bucket = aws_s3_bucket.main.id
versioning_configuration {
status = var.versioning_enabled ? "Enabled" : "Disabled"
}
}
####################################################################################
# S3 Bucket Encryption
####################################################################################
resource "aws_s3_bucket_server_side_encryption_configuration" "main" {
bucket = aws_s3_bucket.main.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
bucket_key_enabled = true
}
}
####################################################################################
# S3 Bucket Public Access Block
####################################################################################
resource "aws_s3_bucket_public_access_block" "main" {
bucket = aws_s3_bucket.main.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
IAM Role for Mountpoint for S3 CSI Driver (IRSA)
An IAM role is created to be assumed by the S3 CSI driver’s service account. This uses IRSA (IAM Roles for Service Accounts) to provide fine-grained permissions.
####################################################################################
# IAM Role for S3 CSI Driver (IRSA)
####################################################################################
resource "aws_iam_role" "s3_csi_driver_role" {
name = "${var.cluster_name}-s3-csi-driver-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}"
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:kube-system:s3-csi-driver-sa"
"${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:aud" = "sts.amazonaws.com"
}
}
}
]
})
tags = {
Name = "${var.cluster_name}-s3-csi-driver-role"
Environment = var.environment
Terraform = "true"
}
}
####################################################################################
# S3 CSI Driver Policy (Based on AWS Documentation)
####################################################################################
resource "aws_iam_policy" "s3_csi_driver_policy" {
name = "${var.cluster_name}-s3-csi-driver-policy"
description = "IAM policy for S3 CSI Driver based on AWS documentation"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:ListBucket"
]
Resource = aws_s3_bucket.main.arn
},
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject"
]
Resource = "${aws_s3_bucket.main.arn}/*"
}
]
})
tags = {
Name = "${var.cluster_name}-s3-csi-driver-policy"
Environment = var.environment
Terraform = "true"
}
}
####################################################################################
# Attach S3 CSI Driver Policy
####################################################################################
resource "aws_iam_role_policy_attachment" "s3_csi_driver_policy" {
role = aws_iam_role.s3_csi_driver_role.name
policy_arn = aws_iam_policy.s3_csi_driver_policy.arn
}
Mountpoint for S3 Add-on
The aws-mountpoint-s3-csi-driver add-on for EKS integrates the Mountpoint for S3 CSI driver into your cluster, using the IAM role defined above for its service account.
####################################################################################
### S3 Mountpoint CSI Driver Addon (deployed after S3 module)
####################################################################################
resource "aws_eks_addon" "s3_mountpoint_csi_driver" {
cluster_name = module.eks.cluster_name
addon_name = "aws-mountpoint-s3-csi-driver"
service_account_role_arn = module.s3.s3_csi_driver_role_arn
# Using IRSA for CSI driver, Pod Identity for application pods
# Ensure this addon is created after the S3 module creates the IAM role and pod identity association
depends_on = [module.s3]
tags = {
Name = "${var.cluster_name}-s3-mountpoint-csi-driver"
Environment = var.environment
Terraform = "true"
}
}
Step 3: S3 Storage Implementation Patterns in Kubernetes
As discussed, static provisioning is the current method for S3 with EKS. This involves defining Kubernetes resources (StorageClass, PersistentVolume, PersistentVolumeClaim) that reference your S3 bucket.
storage-class.yaml
This StorageClass defines the provisioner for the S3 CSI driver. Note the placeholder ${S3_BUCKET_NAME} which will be replaced during deployment. The prefix parameter allows you to define a specific folder within your S3 bucket for Kubernetes storage.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: s3-csi-sc
provisioner: s3.csi.aws.com
parameters:
bucketName: ${S3_BUCKET_NAME}
prefix: "k8s-storage/"
volumeBindingMode: Immediate
allowVolumeExpansion: false
persistent-volume.yaml
This PersistentVolume (PV) explicitly maps to your S3 bucket. The volumeHandle and volumeAttributes are crucial for the S3 CSI driver to correctly identify and mount the S3 resource.
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: s3-csi-sc
csi:
driver: s3.csi.aws.com
volumeHandle: s3-csi-driver-volume
volumeAttributes:
bucketName: ${S3_BUCKET_NAME}
prefix: "k8s-storage/"
persistent-volume-claim.yaml
A PersistentVolumeClaim (PVC) is used by your application to request access to the storage. It references the s3-csi-sc StorageClass.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: s3-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: s3-csi-sc
resources:
requests:
storage: 1Gi
nginx-pod.yaml
This Nginx pod configuration demonstrates how to mount the S3 storage. The volumeMounts section points to the s3-storage volume, which in turn uses the s3-pvc. An index.html file is created directly in the mounted S3 path.
apiVersion: v1
kind: Pod
metadata:
name: nginx-s3-pod
namespace: default
labels:
app: nginx-s3
spec:
securityContext:
runAsUser: 0
runAsGroup: 0
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command: ["/bin/sh", "-c"]
args:
- |
echo '<h1>Hello from S3 Mountpoint!</h1><p><b>Pod:</b> '$POD_NAME'</p>' > /usr/share/nginx/html/index.html || true
sed -i 's/user nginx;/user root;/' /etc/nginx/nginx.conf
nginx -g 'daemon off;'
volumeMounts:
- name: s3-storage
mountPath: /usr/share/nginx/html
volumes:
- name: s3-storage
persistentVolumeClaim:
claimName: s3-pvc
nginx-service.yaml
A ClusterIP service to expose the Nginx pod within the EKS cluster.
apiVersion: v1
kind: Service
metadata:
name: nginx-s3-service
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 80
selector:
app: nginx-s3
Deployment Steps for Static Provisioning
Follow these steps to deploy the S3-backed Nginx application:
- Get S3 bucket name from Terraform: Navigate to your Terraform
infrastructuredirectory and run:cd infrastructure S3_BUCKET_NAME=$(terraform output -raw s3_bucket_name 2>/dev/null || echo "") - Update manifests with S3 values: Replace the placeholder
S3_BUCKET_NAMEin your Kubernetes YAML files:sed "s/\${S3_BUCKET_NAME}/$S3_BUCKET_NAME/g" storage-class.yaml > storage-class-final.yaml sed "s/\${S3_BUCKET_NAME}/$S3_BUCKET_NAME/g" persistent-volume.yaml > persistent-volume-final.yaml - Apply manifests: Deploy the Kubernetes resources to your EKS cluster:
kubectl apply -f storage-class-final.yaml kubectl apply -f persistent-volume-final.yaml kubectl apply -f persistent-volume-claim.yaml kubectl apply -f nginx-pod.yaml kubectl apply -f nginx-service.yamlYou can verify the creation of your storage resources by running:
$ kubectl get sc,pv,pvc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE storageclass.storage.k8s.io/s3-csi-sc s3.csi.aws.com Delete Immediate false 10m NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE persistentvolume/s3-pv 1Gi RWX Retain Bound default/s3-pvc s3-csi-sc <unset> 10m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE persistentvolumeclaim/s3-pvc Bound s3-pv 1Gi RWX s3-csi-sc <unset> 10mUpon successful deployment, you should observe an
index.htmlfile within your specified S3 bucket prefix (k8s-storage/).
Verification
To confirm your setup is working as expected:
- Check S3 CSI driver status:
bash
kubectl get pods -n kube-system -l app=aws-mountpoint-s3-csi-driver - Check deployment status:
bash
kubectl get pod nginx-s3-pod
kubectl get service nginx-s3-service
kubectl get pvc s3-pvc
kubectl get pv - Test Nginx web server:
bash
kubectl port-forward service/nginx-s3-service 8084:80
# Then visit http://localhost:8084 in your browser - Check S3 File Persistence:
bash
kubectl exec nginx-s3-pod -- ls -la /usr/share/nginx/html/
kubectl exec nginx-s3-pod -- cat /usr/share/nginx/html/index.html - Check S3 bucket content:
bash
aws s3 ls s3://$S3_BUCKET_NAME/k8s-storage/
Cleanup
To remove the deployed resources and avoid incurring unnecessary costs:
kubectl delete -f nginx-service.yaml
kubectl delete -f nginx-pod.yaml
kubectl delete -f persistent-volume-claim.yaml
kubectl delete -f persistent-volume-final.yaml # Use -final
kubectl delete -f storage-class-final.yaml # Use -final
After cleaning up Kubernetes resources, you can proceed to destroy your EKS infrastructure using Terraform:
cd infrastructure
terraform destroy
Conclusion
Integrating Amazon S3 with Amazon EKS using the Mountpoint for S3 CSI driver offers a powerful, scalable, and cost-effective solution for managing object storage within containerized environments. While the current limitation to static provisioning requires a more manual setup, this approach enables Kubernetes pods to seamlessly access and serve static files directly from S3 buckets.
In this guide, we successfully demonstrated how to provision an EKS cluster with Terraform, configure the necessary IAM roles with IRSA, deploy the S3 CSI driver, and run an Nginx container that leverages S3 for its content. This integration unlocks new possibilities for data management and application architecture on Kubernetes.