Say "NO" to Beanstalk! Use CodeDeploy instead pt.1

Introduction

AWS Beanstalk is an application deployment option provided by AWS. The mail goal of this service is to simplify and speed up an environment setup. It provides an infrastructure and deployment framework to deliver a web application fast without any need to manage and provision multiple AWS resources.

Even though this service seems to be a great options for people without any knowledge about AWS, or for somebody who would like to deliver the product fast, it has some hidden obstacles that may slow down you work significantly if you don't know how to solve them. Also provisioning resources that Beanstalk manages is not that difficult and can be easy done with Terraform or CloudFormation. The deployment process can be automated with highly customizable CodeDeploy or ECS, so maybe it's worthwhile spending more time on infrastructure creation that is fully managed by Terraform or CloudFormation rather than using Beanstalk?

In this post I would like to present some obstacles that you may encounter when you choose Beanstalk as your deployment solution. In the next part I will present a similar deployment solution using CodeDeploy. I hope that after reading this article it would be easier for you to choose the best solution for you use case.

Provisioning

AWS Beanstalk environment can be created and managed by dedicated CLI tool called EB CLI it's an interactive tool that can be used to configure, monitor, update and clone Beanstalk environments, the downside of this tool, is that it's not designed to work in non-interactive mode (for example as part of a Jenkins pipeline).

The better option is to use CloudFormation or Terraform, by using these Infrastructure as Code tools you can automate provisioning of a Beanstalk environment.

Source ofmain.tf describing Python3.8 environment HERE

Source of beanstalk.yaml HERE

In both cases version_label (or in CloudFormation VersionLabel) is not provided and Beanstalk during the first deployment will run a sample application, so there is no need to provide any application code in this phase.

As provided platforms may become deprecated and finally removed it's worthwhile to fetch the newest one during the infrastructure update, in Terraform it can be done with:

data "aws_elastic_beanstalk_solution_stack" "python-stack" {
  name_regex = "^64bit Amazon Linux 2 (.*) running Python 3.8$"
  most_recent = true
}

CloudFormation doesn't have such feature, but it can be provided as a stack parameter:

  # Returns a list of the available solution stack names, with the public version first and then in reverse chronological order.
  SOLUTION_STACK=$(aws elasticbeanstalk list-available-solution-stacks \
    | jq -r '.SolutionStacks | map(select(test("64bit Amazon Linux 2 (.*) running Python 3.8"))) | .[0]')
  aws cloudformation deploy --template-file cloudformation/beanstalk.yaml \
    --stack-name "demo-app-stack" \
    --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
    --parameter-overrides SolutionStackName="${SOLUTION_STACK}"

If your application uses different health check path than the default one it can be easily provided as part of the OptionSettings block (Or setting in Terraform):

      OptionSettings:
        - Namespace: "aws:elasticbeanstalk:application"
          OptionName: "Application Healthcheck URL"
          Value: "/status/health"
        - Namespace: "aws:elasticbeanstalk:environment:process:default"
          OptionName: "HealthCheckPath"
          Value: "/status/health"

Beanstalk will not fail during first deployment (but it used to) if health check path is different from the one used by sample application.

Other option is to set it using .ebextensions folder by creating a file with .config extension.

Source of .ebextensions/app-health-check.config:

option_settings:
    "aws:elasticbeanstalk:application":
      "Application Healthcheck URL": "HTTP:80/status/health"
    "aws:elasticbeanstalk:environment:process:default":
      "HealthCheckPath": "/status/health"

This file will set listed environment properties during deployment. It doesn't collide with neither Terraform nor CloudFormation, both will not detect any changes even if we update environment properties using .config files.

When the infrastructure is provisioned a new version can be deployed using AWS CLI

    ACCOUNT_ID=$(aws sts get-caller-identity | jq -r '.Account')
    REGION=$(aws configure get region)
    BUCKET_NAME="demo-app-bucket-${REGION}-${ACCOUNT_ID}"
    TIMESTAMP=$(date +%s)
    LABEL="app-${TIMESTAMP}"
    FILE_NAME="${LABEL}.zip"
    aws s3 cp latest.zip "s3://${BUCKET_NAME}/${FILE_NAME}"
    aws elasticbeanstalk create-application-version --application-name "demo-app" \
      --version-label "${LABEL}" \
      --source-bundle S3Bucket="${BUCKET_NAME}",S3Key="${FILE_NAME}"
    aws elasticbeanstalk update-environment --application-name "demo-app" \
      --environment-name "demo-app-env" \
      --version-label "${LABEL}"

update-environment only triggers the update, to wait until the deployment is finished you can use:

aws elasticbeanstalk wait environment-updated --application-name "demo-app" --environment-name "demo-app-env"

But the problem with this command is that it has fixed timeout:

It will poll every 20 seconds until a successful state has been reached.
This will exit with a return code of 255 after 20 failed checks.

Sometimes 7 minutes may be not enough so i would recommend to write you own script using Python BOTO3 Beanstalk Client or other scripting language offering AWS SDK.

YAML parser

You can provision additional resources by placing .config files with CloudFormation code inside .ebextensions directory, if you are familiar with CloudFormation you know that such code is totally fine:

    GroupName: !Ref AWSEBSecurityGroup

But for some reason it will fail during version deployment with the following error:

The configuration file .ebextensions/enable-ssh.config in application version app-1662191692 contains invalid YAML or JSON.
YAML exception: Invalid Yaml: could not determine a constructor for the tag !Ref in 'reader',
line 9, column 18: GroupName: !Ref AWSEBSecurityGroup ^ ,
JSON exception: Invalid JSON: Unexpected character (R) at position 0.. Update the configuration file.

To solve this problem the line should be changed to:

     GroupName:
        Ref: AWSEBSecurityGroup

It may also fail for other functions like !Sub or !GetAtt so it's better to use the full form.

AutoScaling Group Metrics

AutoScaling group provides free scaling metrics that are disabled by default and can be enabled using MetricsCollection property of AWS::AutoScaling::AutoScalingGroup. In Beanstalk AutoScaling group is created and managed by the environment so modification is possible only by using .ebextensions, you can modify any existing resource by creating a .config file with resource that have the same name as one listed HERE.

These metrics can be very useful especially if you plan to run some performance tests.

Metrics can be configured for the AutoScaling group by following .config file:

Source of .ebextensions/asg-metrics.config:

Resources:
    AWSEBAutoScalingGroup:
        Type: AWS::AutoScaling::AutoScalingGroup
        Properties:
            MetricsCollection:
                - Granularity: "1Minute"
                  Metrics:
                      - "GroupMinSize"
                      - "GroupMaxSize"
                      - "GroupDesiredCapacity"
                      - "GroupInServiceInstances"

Resource name must exactly match the one listed in the documentation. For this example it's AWSEBAutoScalingGroup. Beanstalk will merge this configuration part with the original template managed by the environment.

Security Group update

For testing purposes (and only for testing! Don't allow any SSH traffic on production environment!) you may want to have an SSH access to your instances. AWS Beanstalk creates a Security Group that allows only incoming traffic from a Load Balancer. It's possible to override Security Group for the Environment using SecurityGroups option from aws:autoscaling:launchconfiguration namespace, but there is a simpler solution, by placing a file with .config extension in a .ebextensions folder of your application zip archive you can create a ingress rule that allows SSH traffic from anywhere.

Source of beanstalk_ebextensions/.ebextensions/enable-ssh.config:

Resources:
  SshIngressRule:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      CidrIp: "0.0.0.0/0"
      FromPort: 22
      ToPort: 22
      IpProtocol: "tcp"
      GroupName:
        Ref: AWSEBSecurityGroup

It will create a new resource that will be connected to existing Security Group named AWSEBSecurityGroup created by Beanstalk.

Logs rate limit

Beanstalk comes with the default configuration files for journald and rsyslog, because of that, both programs use default log rate limits that are quite low: 1000 logs within 30 seconds for journald and 20000 logs within 10 minutes for rsyslog. Even a small application can go over these limits easily, when the limit is reached new logs are dropped, you can verify it by checking journald and rsyslogd logs.

Result of journald -u systemd-journald:

Sep 03 14:15:56 ip-172-31-42-178.eu-west-1.compute.internal systemd-journal[1142]: Suppressed 8749 messages from /system.slice/web.service

Result of journald -u rsyslog:

Sep 03 16:10:47 ip-172-31-38-241.eu-west-1.compute.internal rsyslogd[4712]: imjournal: begin to drop messages due to rate-limiting

Beanstalk doesn't offer any environment option to change the limits, but it can be done with .ebextensions config file described HERE

Source of .ebextensions/log-rate-limit.config:

files:
    "/etc/systemd/journald.conf":
        owner: root
        group: root
        mode: "000644"
        content: |
            [Journal]
            RateLimitInterval=30s
            RateLimitBurst=20000
    "/etc/rsyslog.d/rate-limit.conf":
        owner: root
        group: root
        mode: "000644"
        content: |
            $imjournalRatelimitInterval 30
            $imjournalRatelimitBurst 20000
commands:
    restart_journald:
        command: systemctl restart systemd-journald
    restart_rsyslog:
        command: systemctl restart rsyslog

Using this config Beanstalk will create /etc/systemd/journald.conf and /etc/rsyslog.d/rate-limit.conf with specified content. In this example the limit is set to 20000 logs within 30 seconds.

As config file sections are executed in this order:

  • packages
  • groups
  • users
  • sources
  • files
  • commands
  • services
  • container_commands

The same config file can be used to restart both journald and rsyslog to let them fetch defined properties by using commands section and running systemctl restart.

Required VPC Endpoints

To use Beanstalk in private subnets without NAT Gateway the following endpoints are required:

Gateway:

  • com.amazonaws.${AWS::Region}.s3

Interface:

  • com.amazonaws.${AWS::Region}.cloudformation
  • com.amazonaws.${AWS::Region}.elasticbeanstalk-health
  • com.amazonaws.${AWS::Region}.elasticbeanstalk
  • com.amazonaws.${AWS::Region}.sqs

Interface endpoints should be provisioned in every subnet that is used to run EC2 instances, these subnets are configured using Subnets environment property form aws:ec2:vpc namespace.

SQS endpoint may look odd on this list, but it's required by cfn-hup script, without it, the environment will not start.

Also, it's worth to mention, that Beanstalk uses AWS::CloudFormation::WaitCondition resource to wait for a signal from EC2 instance. As described HERE WaitCondition uses presigned S3 URL to signal success, so S3 endpoint policy should allow s3:PutObject action on arn:aws:s3:::cloudformation-waitcondition-${AWS::Region}/*

Sample application

Beanstalk sample applications that are deployed when version_label is not specified require internet connection. Python application uses

pip -r requirements.txt

to download dependencies during deployment, Java application uses build phase to run maven that tries to download some plugins (and fails), i didn't check all available platforms, but it may be a common problem. Because of that it's required to deploy the initial version by yourself using version_label property, example HERE. I would suggest to decouple infrastructure update and new version deployment, so version_label changes should be ignored:

  lifecycle {
    ignore_changes = [version_label]
  }

As CloudFormation doesn't have ignoring feature it needs to be used as both infrastructure and version deployment tool. To do so you can have separated CloudFormation stack with AWS::ElasticBeanstalk::Environment resource and pass previously created with aws elasticbeanstalk crete-application-version CLI command ApplicationVersion as a stack parameter.

Custom Image

If your customization cannot be done using .ebextensions (as it requires internet connection to download the application bundle), or you want to install additional packages, like for example New Relic Agent, with no deployment slowdown you may use ImageId environment property from aws:autoscaling:launchconfiguration and create a custom image using Packer, the image can be based on existing Beanstalk platform. Keep in mind that image doesn't have any lifecycle policy so old images should be removed by yourself (for example with Lambda triggered by Cron).

NOTE: even if you provide a custom image the SolutionStackName property is still required, it's worth to fetch the newest one using Terraform aws_elastic_beanstalk_solution_stack or provide it as a CloudFormation stack parameter as described earlier. Solution stacks may become deprecated and finally removed, so you build pipeline may fail if you hardcode a specific version.

Conclusion

The title of this blog post is kind of ClickBait, AWS Beanstalk is not THAT bad, and it offers multiple customization options including custom image. You can customize it with config files, update existing infrastructure or even override some /etc config files. The downside of this is that everything mentioned (except custom image) is a part of deployment process, it requires to download all config files and process them, every upscaling event will require to perform these operations on newly provisioned instance. Also, it mixes infrastructure management with deployment process, you need to check both Infrastructure as Code configuration AND .ebextensions to have a consistent view of your environment configuration.

If Beanstalk works fine for you after applying presented tweaks use it and enjoy, if you are looking for more advanced and customizable solution check the next part of this series, i will present how to create an environment using CodeDeploy that may look similar to one that Beanstalk provides.