Photo by Jess Bailey on Unsplash
The Case of the Re-spawning ECS task
Solving the Mystery of an Inaccessible ECS Service Task Linked to a Load Balancer's Target Group
A ECS service task was in a running state and was serving an API with elastic load balancing. The target group attached to the task was scanning for a health check on the service task’s private IP and was continuously in an unhealthy state, which led to the continuous cycle of termination and creation of the task.
To debug the issue, we need to create a task on the latest task definition from AWSCLI by setting the --execute-command flag.
First, ensure that AWSCLI is installed and configured for access to AWS.
For installing and configuring AWSCLI, check the latest docs of AWS here:
Install AWS CLI on your machine.
Once configured, create an AWS ECS Fargate task withthe latest task definition with the --enable-execute flag. (illustrated as an example in the screenshot)
Replace with your credentials in the mentioned syntax.
aws ecs run-task \ 3m 41s 13:40:15
--cluster "foo-cluster" \
--count 1 \
--enable-execute-command \
--launch-type "FARGATE" \
--network-configuration "awsvpcConfiguration={subnets=[subnet-04ac7a7dcdvk5f67tvv5],securityGroups=[sg-055dcsdhgcu6y76t76],assignPublicIp=DISABLED}" \
--platform-version "LATEST" \
--task-definition "foo-task-definitiont:15" \
--profile "demo-profile-name" \
--region "ap-south-1"
- After the task's current state is in RUNNING condition, login to the container as illustrated in the attached screenshot
aws ecs execute-command \
--region ap-south-1 \ 14:19:13
--cluster example-cluster \
--task fd1e4a63c0c24edgfrkdcaec30ef4f0 \
--container example-container-in-task-definition \
--command "/bin/bash" \
--interactive --profile foo-user
- Fetch package lists for your distro, as shown in the screenshot (commands may vary)
apt update
- Install net-tools to use netstat.
apt install net-tools
- In our case, we used netstat –tlnp to ensure that nginx is listening on TCP port 80.
netstat -tlnp
There was no service listening on port 80 .
We tried to debug the issue.
Checked if the nginx configuration files are set up properly.
It was found that a non-root user is is set to start nginx. Ports below 1024 can be only used by privileged users, which in our case was attempted by the ‘app’ user earlier created while creating the container.
The app user is mentioned in the nginx configuration file.
Removed the entire line containing the user to run the nginx. Alternatively, we could have set up nginx to listen on a port number > 1024
Pushed the latest changes and run the container with the latest changes.
Logged in to the task container with the latest task definition created.
Once logged in to the container, we checked if nginx is up and running on port 80 or whatever port is set up by the code to serve nginx.
netstat -tlnp
In the screenshot, it can clearly be seen that Nginx is up and running on port 80.
Now the new container is in a running state and and target group attached to the service task showed the target ECS task tp be healthy.
The IP is now reachable and so is our web application.