Share on FacebookTweet about this on TwitterShare on LinkedIn
Recently, I was working on a Terraform script for a client that required some instance-level provisioning at creation time. The user_data atrribute of Terraform’s aws_instance provider is a perfect use-case for this sort of setup. In this case, I was working with an Amazon Linux AMI, so I elected to work with cloud-init as my user_data mechanism (vs vanilla shell commands). Using “traditional” cloud-init syntax, such as this:

#cloud-config
repo_update: true
repo_upgrade: all

packages:
  - httpd24
  - server

runcmd:
  - service httpd start
  - chkconfig httpd on
  - groupadd www
  - [ sh, -c, "usermod -a -G www ec2-user" ]
  - [ sh, -c, "chown -R root:www /var/www" ]
  - chmod 2775 /var/www
  - [ find, /var/www, -type, d, -exec, chmod, 2775, {}, + ]
  - [ find, /var/www, -type, f, -exec, chmod, 0664, {}, + ]

led to persistent failures. This sort of config, on instance creation, attempts to update all repos (repo_update/repo_upgrade), install apache and some other packages (packages), and run some shell commands (runcmd). While this is handy, it doesn’t give us a great amount of control over timing (it basically runs as soon as the instance is provisioned and RunInstances is called by TF). After several attempts to run this configuration on machines in a private subnet with a properly allocated S3 endpoint (i.e. the endpoint’s CIDR is allocated in the security group), I surmised that perhaps the endpoint wasn’t coming up before the scripts were invoked, leading to the failure. This theory was further bolstered by two facts:

  1. That I could SSH to these hosts (through a bastion in the public subnet) after the entire Terraform process was complete, run the commands, and they ran without issue, and
  2. That when applying the same configuration to a bastion host in a public subnet, the user_data cloud-init statement executed successfully

This validated for me that my cloud-init was correct, and that the machines in the private subnet could actually install via yum commands (if later rather than sooner). This led me to try out an alternative cloud-init script passed into user_data, which looks like this:

#cloud-config
runcmd:
  - [ sh, -c, "sleep 3m" ]
  - yum -y update
  - yum -y install httpd24
  - service httpd start
  - chkconfig httpd on
  - groupadd www
  - [ sh, -c, "usermod -a -G www ec2-user" ]
  - [ sh, -c, "chown -R root:www /var/www" ]
  - chmod 2775 /var/www
  - [ find, /var/www, -type, d, -exec, chmod, 2775, {}, + ]
  - [ find, /var/www, -type, f, -exec, chmod, 0664, {}, + ]

Imagine my surprise to see service httpd status return a valid response after changing the cloud-init script! The three minutes assigned in the sleep comand is a totally arbitrary number; it could probably be lowered, but it never failed to apply the configuration successfully the few times I tested it, either.

And there you have it – provisioning servers via Terraform’s user_data attribute for EC2 instances in private subnets with S3 VPC endpoints just takes a little more time!