IT

Splunking AWS ECS And Fargate Part 3: Sending Fargate Logs To Splunk

Welcome to part 3 of the blog series where we go through how to forward container logs from Amazon ECS and Fargate to Splunk.

In part 1, Splunking AWS ECS Part 1: Setting Up AWS And Splunk, we focused on understanding what ECS and Fargate are, along with how to get AWS and Splunk ready for log routing to Splunk’s Data-to-Everything Platform. In part 2, Splunking AWS ECS Part 2: Sending ECS Logs To Splunk, we focused on how to configure an ECS cluster, create tasks and workloads and send their outputs to Splunk for indexing. In this next segment in the series we will be focusing on building a Fargate profile, defining tasks and deploying a simple container that routes its application logs to Splunk with Firelens.

This blog segment picks up directly after part 1 in the series, and is not dependent on part 2. Configuring Fargate is an alternative to ECS which we’ll discuss later and can be performed independently of the work we completed in part 2. However if you did follow along in part 2, you’ll notice there are a number of similarities in the principals we’ll be discussing in this segment.

As a quick recap, in part 1 we configured a CloudWatch log group and two IAM roles that will be required for this walkthrough along with an HTTP Event Collector and index within Splunk. In order to follow the remainder of this post in the series, you will need the following information that we defined in the last part:

  • AWS Region: US-East-1 (the region you’re working in)
  • AWS CloudWatch Log Group: SplunkECS (the name of the log group) 
  • AWS ECS Instance Role: ecsInstanceRole (the name of the role to run container instances)
  • AWS Task Execution Role: ecsTaskExecutionRole (the name of the role to run ECS tasks)
  • Splunk HEC server: https://stackname.example.com (Splunk Enterprise) or https://http-inputs-stackname.splunkcloud.com (Splunk Cloud)
  • Splunk HEC Port: 8088 or 443
  • Splunk HEC Index: scratch (the name of the index you configured in your HEC)
  • Splunk HEC Token: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
     

What is Fargate, And How Is It Different from ECS?

In part 2 of this series we built an Elastic Container Service (ECS) cluster to host and manage our application containers. We built a cluster with a single VM running as our container instance which hosts the container and all of its dependencies. This gives the site engineer a degree of flexibility in managing the underlying infrastructure but with the added cost of complexity.

When Amazon built its cloud infrastructure, it realized that this is not necessarily desirable for users and built a solution to address this. In 2017 Amazon made Fargate available to customers who want a simple way to deploy their containers without having to worry about managing their own cluster infrastructure. Fargate is essentially just serverless container orchestration. 

Much like we saw in our ECS tutorial in part 2, Fargate is managed with task definitions. A task definition therefore behaves similarly to a Kubernetes deployment where you define and manage your resources and container relationships in deployment descriptors. The only difference being that we can’t rely on having access to the underlying policies of where the container is run. The benefit to the engineer is two-fold with Fargate; the first being that there’s no need to worry about administrative tasks like managing a container runtime or resource scaling and the second being that the engineer is only billed on a usage-based model, usually reducing operating costs and technical debt.

Configuring Fargate Prerequisites

Configuring Fargate is a bit simpler than setting up ECS. Since we have no need for running dedicated container instances - or nodes, we won’t have to worry about access management for other Amazon resources beyond what we configure in our containers. We will however need to still run an EC2 instance to manage our configuration files in our shared elastic file store.

Creating A Key Pair For EC2 Management Instances

First, we will need a key pair that we will use to manage our EFS (Elastic File Store) system. If you already have a key pair or followed along with part 2 of this series, you can skip this section.

In the EC2 management console, under Network and Security > create a new key pair. In most cases we’ll be connecting via SSH, so choose PEM format if that’s your tool of choice, or choose PPK if you’re using a PuTTY terminal in Windows.

Add a tag if desired (it’s optional, but best practice). 

When you create the tag, the .pem or .ppk file should download automatically. Be careful as this is the only time it will download. If you lose this file, you will need to create a new key pair.

Now that we have our private key pair, we need to create a new AWS security group for access to and from the container instance.

Creating A New Security Group

  1. From the EC2 console, select Security Group under the Network and Security Heading.
  2. Create a new security group, providing the following:
  3. Name: EFS-Fargate-Access
  4. Description: Manages EFS and Task Definition Access
  5. VPC: This will have to be the same VPC we’ll be using for our tasks, storage buckets etc. If you don’t know, leave this as your default.
     

Add the following inbound rules:

  • HTTP: Anywhere
  • HTTPS: Anywhere
  • SSH: Your workstation IP, Domain or location. Using ‘Anywhere’ here is risky but if you’re having trouble connecting to your instance via SSH this can be changed later.
  • Save and make note of the security group name as we’ll be needing it later.
  • Security Group: “ECS Container Instance SG”
     

Creating An Elastic File Store Mount

By virtue of having no dedicated underlying infrastructure, Fargate introduces a few challenges. One of those being that without a dedicated node running our containers, we have no way of accessing S3 storage directly. If you followed along in part 2 of this series, we used an S3 bucket to store our Splunk endpoint configuration. Since this isn’t available to use in Fargate we need to think of another way to present our Splunk configurations to the log router. 

Amazon’s Elastic File Store is a great alternative to S3 storage. Basically, an EFS system grows dynamically as files and content are added, but because it’s treated as any other linux mount point we can mount the file system endpoint directly within our containers running in Fargate. This allows us to overcome the challenge of file system access from an ephemeral container!

  1. From the EFS Management Console > Create File System
  2. Name: EFS-Fargate
  3. Virtual Private Cloud (VPC): default
  4. Click Create
     

That’s really all there is to creating an EFS mount that we can use in our Fargate containers! Later we’ll store our FluentD configurations on this mount for use by the log router.

Creating an EC2 Instance To Manage EFS Contents

The easiest way to manage contents that we present to our application container is to access the EFS mount with a conventional system. Since AWS offers a free-tier eligible image we can use this to simply mount and manage the contents on our file store that we will be referencing in our firelens log router.

  1. From the EC2 management console > Launch Instance
  2. Select an Amazon Linux 2 AMI
  3. Select t2.micro instance > Configure Instance Details
  4. Use the default network (the same network where our EFS mount lives)
  5. Under File Systems > EFS-Fargate > /mnt/efs/fs1
  6. Make sure Automatically create and attache the required security groups is checked

  7. Next > Next > Add a Name tag with “Fargate EFS Manager” > Next
  8. Under Configure Security Group > Select the security group we created earlier > Review and Launch
  9. Use the Key Pair we created earlier > Launch Instance
  10. Last but not least we need to create a Fargate ‘cluster’
     

Creating A Fargate ‘Cluster’

The term ‘cluster’ in Fargate is ambiguous - it’s really just a namespace. Since we don’t need to define any nodes (or container instances) you can either use an existing ECS cluster or create an entirely new one. If you followed along in part 2 of this series, you can use that cluster. Otherwise, setting up a cluster dedicated to Fargate tasks is quite simple.

  1. From the ECS Management Console > Create Cluster
  2. Select Networking Only; Powered by AWS Fargate > Next Step
  3. Set a cluster name (SplunkFargate) and enable Cloudwatch Container Insights > Create
     

Creating A Load Balancer For Task Target Groups

Now that we have our cluster built, we need a way to access the containers running in the container instances itself. There are a number of ways to manage this, but the easiest by far is to create an application load balancer. This has the latent advantage of also being more secure and provides its own meaningful metrics and logs. Note that this is the same process as we outlined in part 2 of this series. 

Since we’ll be using the same port assignments, we’ll need a second ALB.

  1. From the EC2 management console, select Load Balancing > Load Balancers.
  2. Create a new Application Load Balancer.
  3. Provide the new load balancer a name (SplunkFargate), and make the scheme internet facing.
  4. Create a single listener on port 80.
  5. Under availability zones: Select your default VPC
  6. Choose two availability zones (subnets) 
  7. Next, move on to Configure Security Settings > Configure Security Groups.
  8. Select the same security group we used when we created our EFS file store.
  9. In Configure Routing, Create a new target group
  10. Name: nginx-fargate
  11. Select Instance target type
  12. Use the HTTP Protocol on Port 80. This will be the same port that our nginx container will be listening on later.
  13. Select HTTP1, and leave all of the remaining settings as the default.
  14. In Register Targets DO NOT register any targets at this time. We will be handling this later in our task definitions in Fargate.
  15. Review and Create
     

Creating The First Task Definition

Much like what we did in ECS in part 2 of this series, we will be configuring a simple NGINX web server container to start generating some logs to STDOUT for us to route to Splunk. Fargate uses similar task definitions to ECS but is somewhat simpler without many of the IAM requirements.

Task definitions are important because they essentially define the deployment and act as a proxy for more traditional YAML-style configurations. AWS offers flexibility with task definitions and allows users to either configure them as JSON code or through a web user interface in the ECS management console. Both are viable options and both offer version control which makes changes and updates easier and less risky.

For our first task definition we’ll deploy a simple NGINX server that serves up the very simple and basic boilerplate landing page. Although we’re deploying a simple, single container in our task definition (with one sidecar, but we’ll get to that when we discuss log routing) there are some great tutorials on how to deploy much richer nginx web apps.

  1. From the ECS management console, select Task Definitions > Create new Task Definition > Fargate Definition
  2. Name: Splunk-Fargate
  3. Task Role: ecsTaskExecutionRole (This is the role we configured in part 1 of this series)
  4. Network Mode: awsvpc
  5. Task execution role: ecsTaskExecutionRole.
  6. Task memory: 2GB
  7. Task cpu: 1 vCPU
  8. Under container definitions, add a container:
  9. Container name: nginx-web
  10. Image: nginx:latest
  11. Memory Limit: 256
  12. Port Mappings: Container: 80
  13. Protocol: tcp
  14. (Optional): Enable CloudWatch Logs
  15. Save the task definition

    Note:
    Although version control is enabled by default for task definitions, it is highly recommended to back up the JSON of the task. Copy the JSON output of the task definition and store it in a private repository for recoverability.

The last thing we should do is take stock of our configuration items from earlier, and add our newly configured parameters (Note: Don’t forget to replace these with your own values!)

  • AWS Region: US-East-1 (the region you’re working in)
  • AWS CloudWatch Log Group: SplunkECS (the name of the log group) 
  • AWS Task Execution Role: ecsTaskExecutionRole (the name of the role to run ECS tasks)
  • Splunk HEC server: https://stackname.example.com (Splunk Enterprise) or https://http-inputs-stackname.splunkcloud.com (Splunk Cloud)
  • Splunk HEC Port: 8088 or 443
  • Splunk HEC Index: scratch (the name of the index you configured in your HEC)
  • Splunk HEC Token: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  • AWS Instance Key Pair: ECSDemo
  • AWS EFS: EFS-Fargate
  • AWS Fargate Cluster Name: SplunkFargate
  • AWS Load Balancer Name: SplunkFargate
  • AWS Load Balancer DNS Address (Found in the Load Balancer Description): <Your Fully Qualified Address>
  • AWS Task Definition Name: SplunkFargate
     

Running The Task On Our ECS Cluster

Now that we have all of our AWS components configured, we need to actually run our first iteration of the task definition. Typically I run the first version of the task definition to make sure my apps and containers are configured correctly. Once I’m happy with how they’re running, I’ll add the log routing which will cover in the last two segments of this article.

  1. From the ECS management console, select your ECS cluster we configured earlier (SplunkFargate). 
  2. Under Services > Create a new service. Services will handle the work of creating the requisite networking and manage the deployment for us in one step.
  3. Under configure service:
  4. Launch Type: Fargate
  5. Task definition: SplunkFargate
  6. Revision: latest
  7. Platform Version: 1.4.0 or newer (1.3.0 or older won't be able to mount the EFS target)
  8. Cluster: SplunkFargate
  9. Service name: SplunkFargate
  10. Service type: Replica
  11. Number of tasks: 1
  12. Minimum healthy percent: 100
  13. Maximum percent: 200
  14. Next Step
  15. Under VPC and security groups:
  16. Cluster VPC: <default>
  17. Subnets: 2 of your choice
  18. Security Groups: Select the same security group we used when we created our EFS file store
  19. Under load balancing:
  20. Load balancer type: Application
  21. Load balancer name: SplunkFargate
  22. Container to load balance: nginx-web:80:80
  23. Add to load balancer
  24. Production Listener Port: 80:HTTP
  25. Target Group Name: Create New
  26. Target Group Protocol: HTTP
  27. Path Pattern: /
  28. Evaluation Order: 1
  29. Health Check Path: /

  30. Next step > do not adjust the service’s desired count > next step
  31. Create Service

It may take a while for the service and task to start up. Under the task tab in the cluster summary page, you can monitor the state of the task as it starts up. If you find that the task is stuck in PENDING or keeps stopping unexpectedly, you can alway click on the task ARN and view the details as to why the behavior is occurring. If you need to make changes to your task definition you can always create a new task revision. As always, be sure to take a backup of your work.

Once your service is up and running, you can validate the life of the NGINX web server running in our ECS container by visiting the DNS address associated with our load balancer endpoint.

Even though this landing page is nothing interesting it can actually provide a very interesting test bed for us. Since the page is now being hosted live in AWS anyone in the world can access it. With the engine now running, behind the scenes our NGINX server is writing both its access and error logs to a standard out buffer which is currently going to CloudWatch. With everything going at this point, the only thing left to do is have some fun and learn about event routing and getting our data to Splunk!

If you’ve been following along since part 2 of this blog series, you’ll notice that there are many similarities to running tasks in Fargate and ECS. Other than the difference between where we’re configuring the FluentD sidecar, the process is nearly identical to routing logs from ECS.

Creating A FluentD Configuration

Now that we have a Fargate profile up and running serving the content from a containerized web server, we need to configure and set up our log router. AWS Firelens will handle the log routing from Fargate and we’ll be using FluentD as a means to forward those logs from the router to Splunk. 

If you’re curious about the workings of FluentD, their site does an awesome job of explaining the platform in more detail. Suffice to say, it’s a great platform to extract logs from orchestration systems and forward to an analytics platform like Splunk!

First, we need to log in to the EC2 EFS management VM we created earlier in this article. You’ll need your key pair we created to log in to the console.

From the EC2 management console, connect to the CLI for the server with SSH. SSH should be enabled based on the security group we attached to this VM. 

$ ssh -i "ECSDemo.pem" ec2-user@<Amazon EC2 host name>.compute-1.amazonaws.com

Use your instance fully-qualified name to connect.

Earlier we configured /mnt/efs/fs1 as the main mount point for our refs system. Anything we create under this directory will be globally available to our containers that are running with EFS read privileges.

From the EC2 VM, change to the root EFS system mount and create a new directory called fluentd and a file called fluent.conf. We also want to ensure that all users can read the file, but only root can modify it.

$ cd /mnt/efs/fs1  
$ sudo mkdir fluent
$ sudo touch fluent/fluent.conf
$ chmod 644 fluent/fluent.conf

Edit the fluent.conf file and add the following contents, making sure to replace the values with your own:

<system>
  log_level info
</system>
<match **>
  @type splunk_hec
  protocol <https or http>
  hec_host <Your Splunk IP>
  hec_port 8088
  hec_token XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  index <your index - eg. scratch>
  host_key ec2_instance_id
  source_key ecs_cluster
  sourcetype_key ecs_task_definition
  insecure_ssl true
  <fields>
    container_id
    container_name
    ecs_task_arn
    source
  </fields>
  <format>
    @type single_value
    message_key log
    add_newline false
  </format>
</match>

Save the file

Since we made this change to a file that’s on our elastic file store, as long as we mount the file system to a container we will have access to the fluent.conf file. This means that we can configure FluentD once and deploy it alongside many applications quickly and easily.

Putting It All Together: Running Firelens To Route Logs To FluentD And Splunk

By this point we should have everything we need to start routing container logs from our Fargate applications to Splunk. We built our Fargate profile to host our applications and we’ve successfully verified that our Fargate profile is running application containers and our FluentD configuration is created and accessible to our containers. 

Our last task is to put it all together to route our NGINX logs to Splunk. Just like we did for our ECS task definitions in the last part of this series, we’ll be making the configuration in the task definition descriptor, which is a JSON object. This is another way of configuring Fargate task definitions and offers a bit of flexibility we need for this part of the tutorial.

From the ECS management console > Task Defintions

Select the task definition we created for the NGINX deployment earlier

Select the latest revision

Create a new revision

Scroll down to the bottom of the dialog window and select Configure via JSON

The first thing we need to do is define the sidecar container with the FluentD configuration. Add the following block to the containerDefintions{} stanza:

{
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/sample-nginx",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "cpu": 0,
      "mountPoints": [
        {
          "readOnly": true,
          "containerPath": "/mnt/efs/",
          "sourceVolume": "fluent"
        }
      ],
      "image": "splunk/fluentd-hec:1.2.0",
      "firelensConfiguration": {
        "type": "fluentd",
        "options": {
          "config-file-type": "file",
          "config-file-value": "/mnt/efs/fluent/fluent.conf"
        }
      },
      "name": "log_router"
    }

The log driver in the fluentd container is configured as an aws log driver. The reason for this is that we need a fallback location to send FluentD logs to in case the container is having problems sending data to Splunk. Next, we need to tell the NGINX container to route its logs via firelens, rather than through the aws log driver. Locate your NGINX container definition and replace the logConfiguration{} stanza with the following:

"logConfiguration": {
        "logDriver": "awsfirelens"
      },

Next, we’ll need to add the mount point of our EFS mount that contains the FluentD configuration. Add the following block directly above the last closing bracket for the task definition config. Make sure to replace the file system id of the EFS mount with your own

"volumes": [
    {
      "efsVolumeConfiguration": {
        "fileSystemId": "<Your EFS File System ID>",
        "authorizationConfig": {
          "iam": "DISABLED",
          "accessPointId": null
        },
        "transitEncryption": "DISABLED",
        "rootDirectory": "/"
      },
      "name": "fluentd",
      "host": null,
      "dockerVolumeConfiguration": null
    }
  ]

What we are doing here is telling the Fargate platform that when the container runs, mount the file system with the matching ID. We’re going to mount the image directly to the root location (which we defined as /mnt/efs/). Any files beneath the root can be accessed directly which is how we’re getting the fluent.conf path in the container definition.

Save the JSON file and check for any validation errors

The finalized JSON task definition configuration with all null parameters removed should look similar to this example:

{
  "containerDefinitions": [
    {
      "logConfiguration": {
        "logDriver": "awsfirelens"
      },
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "image": "nginx:latest",
      "essential": true,
      "name": "nginx-web"
    },
    {
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/sample-nginx",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "cpu": 0,
      "mountPoints": [
        {
          "readOnly": true,
          "containerPath": "/mnt/efs/",
          "sourceVolume": "fluent"
        }
      ],
      "image": "splunk/fluentd-hec:1.2.0",
      "firelensConfiguration": {
        "type": "fluentd",
        "options": {
          "config-file-type": "file",
          "config-file-value": "/mnt/efs/fluent/fluent.conf"
        }
      },
      "name": "log_router"
    }
  ],
  "memory": "2048",
  "family": "SplunkFargate",
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "1024",
  "volumes": [
    {
      "fsxWindowsFileServerVolumeConfiguration": null,
      "efsVolumeConfiguration": {
        "transitEncryptionPort": null,
        "fileSystemId": "<Your EFS File System ID>",
        "authorizationConfig": {
          "iam": "DISABLED",
          "accessPointId": null
        },
        "transitEncryption": "DISABLED",
        "rootDirectory": "/"
      },
      "name": "fluentd",
      "host": null,
      "dockerVolumeConfiguration": null
    }
  ]
}

Navigate back to the ECS management console > Clusters and select the cluster running the existing Fargate task

Just like in part 2 when we re-deployed our task, we’ll want to force a re-deploy of the task definition by draining the existing task. In the SplunkFargate cluster, select the SplunkFargate service > Delete

 

Now let’s make sure that our task definition is fully bled by re-visiting the DNS address associated with our load balancer endpoint for SplunkFargate. If the ELB has bled properly, we’ll get a 503 message indicating that the load balancer is resolving but has nowhere to send the traffic.

Now we need to remove the original ELB target. From the EC2 management console > Load Balancers and select the Listeners tab for the SplunkFargate load balancer and delete all listeners.

From target groups, delete the SplunkFargate target group

Now let’s head back to the ECS management console > SplunkFargate cluster and Create a new service. Set the following parameters, leaving the rest as their defaults:

  1. Launch Type: Fargate
  2. Task Definition Family: SplunkFargate
  3. Task Definition Revision: Latest
  4. Platform Version: 1.4.0 (note: 1.3.0 does not support EFS mounts)
  5. Cluster: SplunkFargate
  6. Service Name: SplunkFargate
  7. Service Type: Replica
  8. Number of Tasks: 1
  9. Next Step
  10. Cluster VPC: default
  11. Subnets: (choose the same two you set in your elastic load balancer configuration)
  12. Auto-assign public IP: Enabled
  13. Load Balancer Type: Application
  14. Load Balancer Name: SplunkFargate
  15. Container Name Port: nginx-web:80:80 > Add to Load Balancer
  16. Production Listener Port: create new > 80
  17. Production Listener Protocol: HTTP
  18. Target Group Name: create new > SplunkFargate
  19. Target Group Protocol: HTTP
  20. Health Check Path: /
  21. Next Step > Next Step > Create Service > View Service

It will take a few minutes for the service task task to start up. When the task has reached running state, in a web browser navigate to the SplunkFargate Load Balancer’s fully qualified address. Once again, we should see our NGINX welcome page!

Now, let’s make some invalid URI requests to verify that our Fargate web logs are going to Splunk. Try adding a deliberately invalid uri path e.g., /the-empire-strikes-back. This will give us a 404NF message but we will be able to easily identify it in Splunk:

Now, head back to your Splunk instance and do an open text search for “the-empire-strikes-back” on the index we specified in our fluent.conf file earlier.

 

index=scratch “the-empire-strikes-back”

Right on! Just like with ECS we’re now getting the standard output logs from our firelens router installed on Fargate!

 

At this point, you can pick up and do all of the awesome things with Splunk that we love so much like aliasing, tagging and modelling our data!


Now that we’ve set up our AWS profiles for ECS and Firelens there’s no limit to what you can do with your data. This concludes the mini-series where we walk through the setup and management of AWS resources to start sending our containerized application logs to Splunk with Amazon’s Firelens service.

Don’t forget to check out Splunk.com for the latest updates, downloads and events for everything Splunk and join the Splunk community at splunk-usergroups.slack.com !

Thanks for following me through this journey and happy Splunking!

----------------------------------------------------
Thanks!
Andrij Demianczuk

Splunk
Posted by

Splunk

TAGS
Show All Tags
Show Less Tags