DEVOPS

Observability for AWS Fargate Deployments Powered by Graviton2 Processors

Today, cloud native technologies empower a number of organizations to build and run scalable applications in public, private and hybrid cloud environments. Developer and operation teams can build and deploy applications, APIs and microservices architectures with the speed and immutability of containers. Gartner predicts that by 2024, more than 75% of large enterprises in mature economies will be using containers in production.

Challenges Today

With a large number of applications to operate businesses, users must manage infrastructure and face the operational overhead of scaling, provisioning servers. With AWS Fargate, a serverless compute engine, organizations do not need to provision, configure or scale VMs to run containers. Fargate scales the compute to closely match the specified resource requirements and you pay for only what you use. 

With the large number of containerized applications running in different environments, IT and DevOps teams are facing more operational complexity and require end-to-end observability for monitoring and troubleshooting. In the last couple of years, the pandemic has led to a surge in accelerated digital transformation that has increased the traffic that organizations must serve, translating to spikes in resource, compute capacity utilization and costs.

We’ve Got You Covered!

Splunk is proud to partner with Amazon for AWS Graviton2 as a new architecture for AWS ECS Fargate. With SplunkⓇ Observability Cloud, teams can get all their answers in one place with unified metrics, traces and logs collected in real time — users can monitor the CPU, memory utilization for their application containers that helps analyze, troubleshoot issues as well as track costs. 

Graviton2, AWS’ ARM64 based processor delivers better price performance for cloud workloads. AWS Fargate powered by the Graviton2 processors deliver up to 40% better price performance at 20% lower cost over Intel x86-based Fargate for containerized applications. This allows customers to optimize cost and performance for running workloads on Fargate.

With Splunk Observability Cloud, monitoring AWS ECS Fargate is relatively straightforward. 

Splunk OpenTelemetry Collector

To begin collecting telemetry from the ECS Fargate cluster, you can deploy the Splunk OpenTelemetry Collector as a sidecar (additional container) to ECS tasks. OpenTelemetry is a collection of tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your application’s performance and behavior.

The Splunk OpenTelemetry Collector is the distribution of the upstream OpenTelemetry Collector  that provides a unified way to collect, process and export metric, log and tracing data to the Splunk Observability Cloud backend. Other than rpm, deb binaries for Linux, MSI for Windows and darwin standalone for Mac OSX, the distribution can also be run as a docker image on both AMD64 and ARM64 architectures. Any integrations backed by non native (subprocess) plugins (eg. collectd, nginx, postgresql, solr, etc) are not supported for ARM64 deployments.

To get started, follow these steps:

  • Navigate to the Splunk Observability Cloud suite UI
  • Click on Data Setup on the left navbar
  • On the Data Setup page, click the Amazon Fargate tile and follow the instructions to configure the integration. Specify the parameters including the correct access token as well as the quay url that points to the multi-architecture Splunk OpenTelemetry collector docker image that has ARM64 in the image manifest along with AMD64
     

  • Add the generated container definition snippet to your task definition and you are done! To verify the task is running with the Graviton based compute, you can see the  Linux/ARM64 for Operating system/Architecture in each task detail page of the ECS console
  • Users can navigate to the Infrastructure tab to see a heatmap of all the clusters in the ECS navigator as well as insightful details like:

             1. Top Clusters and services by CPU% and Memory%

             2. Number of clusters, tasks    

  • There is also the capability to visualize resource utilization, container health and counts with OOTB dashboards for ECS and get visibility to metrics filtered on a cluster level

Use Case

A large image photo sharing organization XYZ has a number of microservices running as containers in multiple AWS ECS Fargate clusters. With ECS Fargate, it is easy for them to deploy, manage and scale these applications without having to choose EC2 server types, decide when to scale clusters or optimize cluster packing. There are still some challenges faced by the company:

  • The DevOps and IT teams at XYZ can visualize the CPU and memory utilization for each cluster and service but they are unable to deeply analyze the tasks, containers and resource utilization along with the dependencies. They cannot troubleshoot task crashes that lead to pressure on their back office jobs as well as real-time impact to customers
  • Uploading and rendering a large number of high resolution images leads to an increase in the resource utilization, leading to a scale out to a number of tasks thus increasing the costs
     

With Splunk Observability Cloud and AWS ECS Fargate with Graviton2,

  • The developers and cluster administrators at XYZ can easily track each cluster, service level resource utilization, identify the root cause for task crashes, create alerts and respond in real time to prevent a bad customer experience
  • With the cloud workloads running on Fargate powered by AWS Graviton2 processors, there is better performance achieved with lower costs than comparable Intel x86-based Fargate
     

Getting Started

Splunk’s support for observability for AWS ECS Fargate powered by AWS Graviton2 processors is available with v0.38.0 upwards of the Splunk distribution of the OpenTelemetry Collector docker image. Please refer to documentation for more information on the installation.

If you’re interested in trying out Splunk Observability Cloud to monitor containerized applications, clusters on AWS ECS Fargate powered by AWS Graviton2 processors or for other use cases, you can get started today!

Aunsh Chaudhari
Posted by

Aunsh Chaudhari

Aunsh is a Product Manager at Splunk focused on how customers get their data into Splunk Observability Cloud. He is involved in the overall agent strategy along with Splunk’s contributions to OpenTelemetry. Prior to Splunk, Aunsh worked closely with engineering and product at Shutterfly on the User Media Assets platform and was part of the Search team at Chewy. Aunsh has a Master’s in Computer Science from Northeastern University, Boston.