Boston Lee

Deploying an astro.build website on AWS S3, with CloudFront


It is only fitting that the first post on this blog would be about the construction of this website.

The soft requirements for this website were as follows:

  1. Have an easy-to-use static static site generator
  2. Have the website be as cheap as reasonably possible, while still hosted in the cloud.
  3. Use Terraform for deployment. I wanted practice with Terraform, and enjoy the paradigm of infrastructure-as-code.

After playing around with Astro, Hugo, and Pelican, I found Astro to have the best balance of ease-of-use and community support. So that was point (1) covered.

As far as point (2), I knew that it was possible, at least in theory, to deploy a website on AWS S3. If that was possible, it would almost certainly be cheaper than most other site hosting options, but would require a solely static site. This restriction suited me fine. I don’t ever see moving beyond a simple static site.

For point (3), obviously AWS has extensive Terraform integration. That all set, I underwent a journey of trying to find the best way to use S3 to host a static site.

Prerequisites

Throughout this process, I had a DNS record from my domain provider that redirected all requests from bostonlee.com to www.bostonlee.com.

The Terraform version at the time of writing is 4.65.0. I note this because I read a few blogs on this topic, and ended up having to resort to reading documentation, because all of the Terraform resources had changed.

Stage 1: S3 Static site support

S3 has the ability to host a static site directly, through S3 website endpoints. This lets you put your website files into a bucket, and then have those files hosted at a designated endpoint.

The terraform data source for this resource is here. This requires you to make the bucket publicly accessible, which requires a bucket policy. The Terraform documentation for this resource is here.

From there, I simply had to add a CNAME record pointing from www.bostonlee.com to the website endpoint given by S3.

My domain provider for some reason would not route traffic correctly unless the bucket name started with “www”, so the S3 website endpoint looked like http://www.bostonlee.com.s3-website.us-east-1.amazonaws.com instead of http://bostonlee.com.s3-website.us-east-1.amazonaws.com. That is, I had to name the bucket www.bostonlee.com.

This approach worked! I was able to successfully access my website. However, I underestimated the importance of HTTPS to website perception. Astute readers (or readers whose eyes are drawn to color) probably noticed the large red box on the bottom of the S3 website endpoint documentation. An excerpt:

Amazon S3 website endpoints do not support HTTPS or access points. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3.

When I showed my rough-draft website to friends and family, the first comment from everyone was about the lack of secured connection. Long gone are the days when I would have to chastise relatives about getting HTTPS Everywhere. Now I was the one getting a look for not having a secure connection. And, fair enough. When everyone’s browser gives them explicit warning about an insecure connection, it is probably not a great idea to simply use a service with no support for secure connection and move on.

Luckily, the AWS documentation provided a specific solution: Serve the website on CloudFront.

Stage 2: CloudFront serving a website from an S3 bucket

This was quite a complicated process. I admit, I don’t fully understand all of the possible arguments for a CloudFront distribution resource. However, I will do my best to lay out the pieces of infrastructure that make up the website.

I set up my Terraform to use an S3 bucket (manually-managed) as a backend:

terraform {
  required_providers {
    aws = {
      version = ">= 2.7.0"
      source  = "hashicorp/aws"
    }
  }
  backend "s3" {
    bucket = "bostonlee.com-terraform"
    key    = "terraform.tfstate"
    region = "us-east-1"
  }
}

Then, I created some variables that I could use to abstract my domain name out of the components:

variable "domain_name" {
  default = "bostonlee.com"
  type    = string
}

variable "bucket_name" {
  default = "www.bostonlee.com"
  type    = string
}

I then used those variables to set up an S3 bucket:

resource "aws_s3_bucket" "website_bucket" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_public_access_block" "block_public_access" {
  bucket = aws_s3_bucket.website_bucket.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_policy" "access_control" {
  bucket = aws_s3_bucket.website_bucket.id
  policy = data.aws_iam_policy_document.access_control.json
}

data "aws_iam_policy_document" "access_control" {
  statement {
    actions = ["s3:GetObject"]

    resources = ["${aws_s3_bucket.website_bucket.arn}/*"]

    principals {
      type        = "Service"
      identifiers = ["cloudfront.amazonaws.com"]
    }
    condition {
      test     = "StringEquals"
      variable = "AWS:SourceArn"
      values   = [aws_cloudfront_distribution.s3_distribution.arn]
    }
  }
}

Note that the bucket blocks public access, and only allows access from the (yet-to-be-created) CloudFront distribution. Special thanks to this blog for the policy document configuration. The application order of the blocks in this file is not straightforward. The bucket policy depends on the policy document, which in turn depends on the CloudFront distribution. So, the CloudFront distribution will be deployed before the policy is applied (unless I am wildly misunderstanding…). However, I opted to keep all of the S3-related configuration in the same place.

Next up is a Cloudfront distribution that uses that S3 bucket. I had to play around with the arguments for this, given that some of the arguments are required, but the documentation does not give an easy answer for what to do in the simplest possible case. The caching behavior, for instance, is simply copied from the Terraform example docs. Seeing as I was hosting a minimal static site, I wanted the easiest possible solution. Here is what I came up with:

locals {
  s3_origin_id = "bolee_website_origin"
}

resource "aws_cloudfront_origin_access_control" "website_access_control" {
  name                              = "example"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

resource "aws_cloudfront_distribution" "s3_distribution" {
  origin {
    domain_name              = aws_s3_bucket.website_bucket.bucket_regional_domain_name
    origin_access_control_id = aws_cloudfront_origin_access_control.website_access_control.id
    origin_id                = local.s3_origin_id
  }

  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"

  aliases = ["www.${var.domain_name}"]

  default_cache_behavior {
    allowed_methods = ["GET", "HEAD", "OPTIONS"]
    cached_methods  = ["GET", "HEAD"]
    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
    viewer_protocol_policy = "redirect-to-https"
    target_origin_id       = local.s3_origin_id
    function_association {
      event_type   = "viewer-request"
      function_arn = aws_cloudfront_function.index_redirect.arn
    }
  }

  price_class = "PriceClass_100"

  restrictions {
    geo_restriction {
      restriction_type = "whitelist"
      locations        = ["US", "CA", "GB", "DE"]
    }
  }

  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate.website_acm_certificate.id
    ssl_support_method  = "sni-only"
  }
}

resource "aws_cloudfront_function" "index_redirect" {
  name    = "index_redirect"
  runtime = "cloudfront-js-1.0"
  comment = "https://docs.astro.build/en/guides/deploy/aws/#cloudfront-functions-setup"
  publish = true
  code    = file("${path.module}/redirect_function.js")
}

Let’s break down the pieces of this. The components of the Cloudfront distribution are:

  1. A Cloudfront distribution, where the website will actually be served. This is relatively self-explanatory, though there is a lot of configuration that can be set. The part of the configuration that makes this distribution markedly different than hosting on S3 is the viewer_certificate block. This block allows the site to be secured (note the “redirect-to-https” argument as well). This requires a separate ACM certificate, which I created as follows:

    resource "aws_acm_certificate" "website_acm_certificate" {
      domain_name       = "www.${var.domain_name}"
      validation_method = "EMAIL"
    }
    
    resource "aws_acm_certificate_validation" "website_certificate_validation" {
      certificate_arn = aws_acm_certificate.website_acm_certificate.arn
    }

    This allows for the creation of a certificate with email validation. I had an email address set as a catchall on my domain provider (NameCheap). This allowed me to easily verify the certificate. Whenever the certificate must be recreated upon a terraform apply, the deployment process will wait for the ACM certificate to be verified. So, this method is not very prone to automation. However, it is convenient.

  2. Notice the CloudFront Function block. This has to do with a quirk in how Astro handles URLs. URLs in astro are represented as paths, such as www.mydomain.com/blog/this-blog/, but in reality those links need to get redirected “under the hood” to actual index documents, like www.mydomain.com/blog/this-blog/index.html. Because I was simply planning on deploying a pre-built static site to CloudFront, the astro server would not be doing this work for me. Luckily, the Astro docs provide a nice guide for making links work in CloudFront. The guide also mentions a couple of other AWS hosting methods, including S3 static sites (I wish I had looked in the Astro docs from the outset…). The recommended function looks as follows:

    function handler(event) {
          var request = event.request;
          var uri = request.uri;
    
          // Check whether the URI is missing a file name.
          if (uri.endsWith('/')) {
                  request.uri += 'index.html';
                } 
          // Check whether the URI is missing a file extension.
          else if (!uri.includes('.')) {
                  request.uri += '/index.html';
                }
    
          return request;
    }

    I opted to simply include this function in the same directory as my Terraform files for the time being.

  3. There is also an Origin Acces Control block, which helps secure the site further.

Once all of that was in place, I simply had to run terraform apply, and verify the certificate through my email.

Heading to the console, I could see the CloudFront distribution URL: d3oizcfhaagn9u.cloudfront.net. Sure enough, this link is to my site.

The final piece of the puzzle was to add a CNAME record from the www subdomain of my site to the CloudFront distribution (Here is a tutorial from NameCheap on the subject). As mentioned at the beginning of this article, I set up DNS to redirect bare requests for bostonlee.com to www.bostonlee.com. So, the CNAME record would stil work for those typing only bostonlee.com into their browser.

And that is the current state of my website setup, as of the date of this article. If things change a fair amount, I may write more about it and add an update here.

If you have any suggestions for me on this front, I would be more than happy to hear them. I wish you luck if you want to try this out yourself!