Everything about AWS ECS(with hands-on)

anubhav jhalani
10 min readDec 1, 2022

--

In my opinion, ECS is one of the big and complex service of AWS. I’ve seen many posts and tutorials about the ECS service but none of them go deep enough to showcase all the functionalities and power of this service. So I decided to get my hands dirty and showcase how to use ECS for a production use-case. I am going to explain ECS in a series of articles so if you want to quickly jump to another article then following are the links:

  1. ECS Overview and Task Definition
  2. Cluster
  3. Service
  4. Load Testing
  5. CI/CD Pipeline

ECS is the AWS Docker container service that handles the orchestration and provisioning of Docker containers. In this article, I am assuming that you have good understanding of Docker and if you don’t then I would recommend this series of articles to get good understanding of Docker before you go further down in this article.

ECS Overview

Before diving deep into ECS, It will be nice to have an overview about various components of ECS:

  1. Task Definition — This is the most basic entity in ECS. Task definition is a JSON file that defines how a docker container or group of docker containers should launch. It contains settings like exposed port, docker image, CPU, memory requirement, command to run, environmental variables etc. To some extent, it is similar to a docker compose file.
  2. Task — This is an instance of a Task Definition which runs container or group of containers with the settings defined in the Task Definition. In other context, Task Definition can be considered as a class and Task can be considered as an object of that class.
  3. Service — You can start a standalone Task from the Task Definition
    or
    you can use a Service to run and maintain a specified number of Tasks simultaneously. If one of your tasks fails or stops, the service scheduler launches another instance of your task definition to replace it. Service can also scale up and down the number of running tasks automatically based on CloudWatch Metrics. We will see this in action later. A service can be attached to a load balancer’s target group which we will also see in action later. In other context, you can consider Service as a manager of Tasks.
  4. Cluster — Cluster is basically a group of EC2 instances on which tasks are running as standalone or run by the Services. You can register an instance in Cluster manually or
    you can specify an EC2 AutoScalingGroup for Cluster which will register the instances in the Cluster automatically and also scale up and down the number of registered instances automatically based on CloudWatch Metrics. We will see this in action as well later.
  5. Container Instance — This is just another name for EC2 instance that is part of an ECS Cluster which runs standalone tasks or the tasks run by Services. It also has docker and the ecs-agent running on it.
  6. ECS Agent — This is a container, run and controlled by AWS ECS, on EC2 Instances which are part of the Cluster. An ECS Agent sends various metrices of running Tasks to ECS and CloudWatch. In other context, you can consider ECS Agent as a informer of Tasks.

Below are some diagrams showing relations between above components:

ECS Components
ECS Cluster

Now after an short overview, lets deep dive into each component of ECS with hands-on. I am going to launch NGINX containers inside EC2 instances using Service. This Service is connected to an Application Load Balancer to route requests evenly among the containers. Then I will do Load Testing on EC2 Instances and see the automatic scaling of Service and Cluster in action. In the end I will create CI/CD pipeline to automate the deployment of the cluster. So as you can see, there is a lot into it. So hold tight!

1. Task Definition

A Task Definition is required to run Docker containers in Amazon ECS. It is a JSON file which contains various parameters to define single or multiple containers.

For a list of available parameters, see Task definition parameters. I am going to define single container NGINX in my task definition.

If you are defining multiple containers inside a task definition, then this article might help you in deciding which containers should be grouped inside a task definition.
So first of all, open AWS ECS and click on Task Definition->Create New Task Definition. I am using new ECS console. These are my inputs in first section:

Task Definition Family : This will be the name of Task Definition. With each update in Task Definition the version will change but this name will remain same.

Image Name : It asks for Image Name which you can give anything.

Image URI : How to write Image URI is explained here. I am using nginx image from official repositories on Docker Hub so a single name in enough in URI.

Essential Container : All tasks must have at least one essential container. If the essential parameter for a container is marked as true and that container fails or stops for any reason, all other containers that are part of the task are also stopped. If the essential parameter of a container is marked as false, then its failure doesn't affect the rest of the containers in a task. If this parameter is omitted, a container is assumed to be essential.

Port Mappings : Now comes the confusing part. If you are familiar with docker then you might be wondering why it is not showing the host port. The reason is dynamic host port mappings are used by ECS under bridge networking mode. To understanding dynamic port mappings, first lets understand the networking modes in ECS:

  • awsvpc: If you use this networking mode then your task is allocated its own trunk elastic network interface (ENI) and a primary private IPv4 address. This gives the task the same networking properties as Amazon EC2 instances such as VPC Flow Logs, security groups etc. In simple words, this networking mode will attach an extra ENI to the EC2 instance. The traffic to your container will directly come from this ENI while security group working as a firewall. The host port either can be left blank or it must be the same value as the container port. And you have to allow the same port in your security group as well. Additionally, containers that belong to the same task can communicate over the localhost interface.
awsvpc mode
  • bridge: In this mode, the task uses Docker’s built-in virtual network on Linux, which runs inside each Amazon EC2 instance that hosts the task. The built-in virtual network on Linux uses the bridge Docker network driver. In simple words, in this mode, you can either define the host port which maps to the container’s port or leave the host port blank and a random host port will be mapped to the container(this is referred to as dynamic port mapping). That random host port will be in the ephemeral port range of your container instance operating system and Docker version.
bridge mode
  • host: The task uses the host’s network which bypasses Docker’s built-in virtual network by mapping container ports directly to the ENI of the Amazon EC2 instance that hosts the task. In simple words, each container receives traffic over the IP address of the Amazon EC2 instance that is hosting it. Unlike awsvpc mode, this mode will not connect an extra ENI to the EC2 instance. You need to define the host port which maps to the container’s port and allow the same host port in security group as well. Dynamic port mappings can’t be used in this network mode.
host mode

This documentation helps you to choose the right networking mode. I am going to use the classic bridge mode.

Environment Variables : This can be used to pass the environment variables inside the container. I am skipping this option.

Now comes the second section of inputs:

App Environment : This specifies is the launch type: Fargate and EC2 instances. Explaining Fargate will require a complete article altogether. I am going to use EC2 instances option which requires lot of configuration from our side, therefore, a lot to learn about ECS.

Operating System/Architecture : It defines the operating system and the CPU architecture that your tasks run on. I chose Linux with x86_64 bit architecture.

Task Size->CPU : The hard limit of CPU units to be available for the task. This means If any registered container instance does not have CPU units available which are required for the task at the time of putting the task on container instance, then the task fails to start. When you set the CPU parameter on the task-level, you’ll also be setting the maximum amount of CPU resources that your containers in a task are allowed to use.

Task Size->Memory : The hard limit of memory (in MiB) to present to the task. If any registered container instance does not have memory available which is required for the task at the time of putting the task on container instance, then the task fails to start.

Further inputs in second section:

Now we define containers:

Container Size->CPU : The number of CPU units the Amazon ECS container agent reserves for the container. The total amount of CPU reserved for all the containers that are within a task must be lower than the task-level CPU value. The thing to notice here is that we are using the word reserved and not the word hard limit. Let me explain with an example. Assume that you specify 512 units CPU for Container in a single-container task and run it on 1024 CPU unit container instance. In this example, the container can use the full 1,024 CPU unit share at any given time because 512 CPU units for container is not a hard limit here.

Container Size->Memory : The hard limit of memory to present to the container. If your container attempts to exceed the memory specified here, the container is killed. The total amount of memory reserved for all containers within a task must be lower than the task memory value, if one is specified.
I am leaving them blank because I have defined them on task level.

So we have seen that there are CPU and memory options both at task level and at container level. So you might be wondering what happens if you specify either one of them or both of them. Those different scenarios are explained very well in the following articles : https://aws.amazon.com/premiumsupport/knowledge-center/ecs-cpu-allocation/#:~:text=1024%20CPU%20units%20is%20the,is%20equal%20to%202%20vCPU.

https://aws.amazon.com/premiumsupport/knowledge-center/allocate-ecs-memory-tasks/

Task role : This allows the containers in the task permission to call the API calls that are specified in its associated policies on your behalf for cases like accessing S3 bucket. Please note that by default the containers aren’t prevented from accessing the credentials that are supplied to the Amazon EC2 instance profile (through the Amazon EC2 instance metadata server). To prevent it from happening follow the guide here.

Task execution role : This grants the Amazon ECS container agent permission to make AWS API calls on your behalf for cases like private registry authentication or referencing sensitive data using Secrets Manager secrets or AWS Systems Manager Parameter Store parameters.

Network mode : As mentioned above, I am going to use bridge mode with a specific host port 80. Defining a specific port is called static port mapping. As you can see in the footnote, it tells to go one step back and specify the host port for static port mapping. Thats how it will look like when you go on previous step:

Further optional inputs:
Storage->Volume : If you need to retrieve data from a volume to the container or save data from container to the volume then you specify storage. Only Bind volume type is supported. With bind mounts, a file or directory on the Amazon EC2 instance is mounted into a container. By default, bind mounts are tied to the lifecycle of the container that uses them. When all of the containers that use a bind mount are stopped, such as when a task is stopped, the data is removed. Under Volume, you specify a Volume Name which you can specify anything and then the file or directory path of EC2 instance in Source Path which you want to be accessible in the container.

Storage->Container Mount Points : Here you specify the Container Name to which you want to map the volume specified above. Then you specify the Source Volume in which you specify the Volume Name specified above. The Container Path is the file path within the container to mount the data volume to.
I am leaving Storage option blank.

Monitoring and Logging : This where you specify the destination for your container logs and whether to enable trace data collection for your application. These are those logs which appear on the console when you run the container. I am specifying task to send container logs to CloudWatch with below configuration:

Here our task definition gets completed. Click on Create and it will appear in the task definition list like this:

Phew! this article has gone way too long. So I am going to explain the next component Cluster in the next article.

--

--