DEVOPS

Tips for a Successful OpenTelemetry Deployment

OpenTelemetry offers vendor-agnostic APIs, software development kits (SDKs), agents, and other tools for collecting telemetry data from cloud-native applications and their supporting infrastructure to understand their performance and health. As the open standard to collect telemetry for cloud-native applications to be analyzed by backend platforms like Splunk, OpenTelemetry is about owning and controlling your data which makes it no surprise that OpenTelemetry has become widely adopted by many organizations as part of their observability framework for cloud-native software. Additionally, several popular open-source apps and middlewares are shipping with OpenTelemetry instrumentation built-in.

With OpenTelemetry top of mind for many, I would like to touch on a few tips to help you rapidly and confidently carry out your OpenTelemetry deployment. 

Note: While many of these tips are specific to the Splunk distribution of the OpenTelemetry collector, they still partially apply to the mainline version of the OpenTelemetry collector.

The OpenTelemetry Data Pipeline

One of OpenTelmetry’s most widely used components is the Collector, an agent that is most commonly run on each host or Kubernetes cluster. The Collector can capture system metrics, data emitted from OpenTelemetry SDKs and other components, and telemetry from other sources like Prometheus and Zipkin clients.

When deploying the OpenTelemetry Collector, planning for the best configuration is essential for a successful deployment. The OpenTelemetry Collector configuration file describes the data pipeline used to collect metrics, traces, and logs. It’s simple YAML, and defines the following: 

  • Receivers: How to get data in. Receivers can be push or pull-based.
  • Processors: What to do with received data.
  • Exporters: Where to send received data. Exporters can be push or pull-based.
  • Extensions: Provide capabilities on top of the primary functionality of the collector.

Each of these components is defined within their respective section and then also must be enabled within the service (pipeline) section. 

If you plan on using the Splunk distribution of OpenTelemetry, we make it easy to consider using the Splunk OpenTelemetry Configurator. Today, several splunk-distro only components are included and can't be turned off in the configurator, but are ideal for most configurations. The configurator will help you by automatically constructing a YAML file with each component required by the OpenTelemetry collector with an easy to use UI. The configurator offers configuration options for both standalone and Kubernetes deployments of the collector with a clear view of diffs from the standard configuration vs. your customized configuration.  With minimal knowledge of YAML required, you can easily get started with OpenTelemetry and quickly deploy the configuration that is best suited for your organization.

Troubleshooting

Here are some common issues we’ve seen customers run into when setting up their OpenTelemetry pipelines, and how to fix them:

Metrics Are Not Showing the Correct Deployment Environment

Having your deployment environment associated with your workloads can be helpful when trying to narrow down application bottlenecks within multiple environments. There are several ways to ensure your backend service (like Splunk) displays the correct application environment. 

Option 1: Include an environmental variable on your host system running the OpenTelemetry collector.

For Linux: Run the following command.

export OTEL_RESOURCE_ATTRIBUTES='deployment.environment=ProductionEnv'

For Kubernetes: Inject the bold environment variable to the container’s configuration by adding .spec.template.spec.containers.env to your deployment.yaml:

...
spec:
  template:
    spec:
      containers:
      - env:
        - name: SPLUNK_OTEL_AGENT
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://$(SPLUNK_OTEL_AGENT):4317"
        - name: OTEL_SERVICE_NAME
          value: "<serviceName>"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "deployment.environment=ProductionEnv"
        image: my-image
        name: myapp
...

For Windows: Modify the application’s configuration to update the environment:

$env:OTEL_RESOURCE_ATTRIBUTES='deployment.environment=ProductionEnv’

Option 2: Include the deployment environment as part of the OpenTelemetry configuration file. 

Use the resource/add_environment processor to add the deployment.environment tag to all captured spans. 

The bold text below highlights the addition to the processors section of the configuration file to aggregate ProductionEnv as the specific deployment environment. 

processors:
  resourcedetection:
    detectors: [system,env,gce,ec2]
    override: true
  resource/add_environment:
    attributes:
      - action: insert
        value: ProductionEnv
        key: deployment.environment

How Can I View and Share My Configuration Securely for Easy Troubleshooting?

To quickly extract your running configuration from a host actively running OpenTelemetry collector, retrieve the following URL. 

curl http://localhost:55554/debug/configz/effective

Note that the output redacts secure information like tokens and passwords stored within the configuration file. 

exporters:
  logging:
    loglevel: debug
  otlp:
    endpoint: :4317
    tls:
      insecure: true
  sapm:
    access_token: <redacted>
    endpoint: https://ingest.us1.signalfx.com/v2/trace
  signalfx:
    access_token: <redacted>
    api_url: https://api.us1.signalfx.com
    correlation: null
    ingest_url: https://ingest.us1.signalfx.com
    sync_host_metadata: true
  splunk_hec:
    endpoint: https://ingest.us1.signalfx.com/v1/log
    source: otel
    sourcetype: otel
    token: <redacted>

How Can I Confirm the OpenTelemetry Collector is Collecting Data?

To confirm the OpenTelemetry collector is successful in collecting and exporting data you’ll want to use zPages along with the logging exporter. By default, the Splunk OpenTelemetry collector does not have zPages enabled. To do so, navigate to the location of your configuration file:

For Linux:

/etc/otel/collector/

For Windows:

\ProgramData\Splunk\OpenTelemetry Collector\agent_config.yaml 

Uncomment the zpages endpoint by removing “#” from the configuration file and restart the OpenTelemetry collector service to enable the change. 

 zpages:
    #endpoint: 0.0.0.0:55679

Note: It is recommended to always backup the active configuration file when making changes.

Now that zPages has been enabled, using a web browser navigate to the following URL to view actively captured Trace Spans:

http://localhost:55679/debug/tracez

Note: If viewing on a remote machine, replace “localhost” with the IP address of the host machine. Example: http://192.168.86.20:55679/debug/tracez

Select a latency sample associated with one of your enabled exporters to view a snapshot of the data collected by your collector. 

Example zPages troubleshooting page showing collected data.

Snapshot of collected and exported data

Another great way to visualize if your collector is collecting and exporting data is to enable the logging exporter. To do so, navigate to the back to the OpenTelemetry collector’s configuration file. In this file, simply enable the logging exporter as part of your traces and logging pipeline. Here is an example below where the logging exporter was added to an existing configuration file. Note the bold text. 

service:
  extensions: [health_check, http_forwarder, zpages, memory_ballast]
  pipelines:
    traces:
      receivers: [jaeger, otlp, smartagent/signalfx-forwarder, zipkin]
      processors:
      - memory_limiter
      - batch
      - resourcedetection
      - resource/add_environment
      - attributes/newenvironment
      exporters: [sapm, signalfx, logging]
      # Use instead when sending to gateway
      #exporters: [otlp, signalfx]
    metrics:
      receivers: [hostmetrics, otlp, signalfx, smartagent/signalfx-forwarder]
      processors: [memory_limiter, batch, resourcedetection]
      exporters: [signalfx]
      # Use instead when sending to gateway
      #exporters: [otlp]
    metrics/internal:
      receivers: [prometheus/internal]
      processors: [memory_limiter, batch, resourcedetection/internal]
      exporters: [signalfx]
      # Use instead when sending to gateway
      #exporters: [otlp]
    logs/signalfx:
      receivers: [signalfx]
      processors: [memory_limiter, batch]
      exporters: [signalfx]
      # Use instead when sending to gateway
      #exporters: [otlp]
    logs:
      receivers: [fluentforward, otlp]
      processors:
      - memory_limiter
      - batch
      - resourcedetection
      - resource/add_environment
      - attributes/newenvironment
      exporters: [splunk_hec, logging]

With the logging exporter now enabled, restart the OpenTelemetry collector service to enable the change. 

Now that you have the logging exporter configured, use journalctl on your Linux hosts or Event Viewer on your Windows hosts to confirm the structure of your collected data. Let’s take a look at an example of exported metrics on a Linux host running the OpenTelemetry collector. 

Using journalctl run the following command to begin viewing exported metrics by the logging exporter. 

journalctl -u otel-collector -f
journalctl -u splunk-otel-collector.service -f (For the Splunk distribution)

The terminal now shows the exported metrics and the corresponding metadata. You now have the ability to confirm if the collector’s configuration and metadata are as expected before sending any data to your backend system. 

Conclusion

OpenTelemetry has changed how organizations are making their cloud-native workloads observable. I hope that these tips can help you become more successful in your OpenTelemetry journey.  

Want to try working with OpenTelemetry yourself? You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Get a real-time view of your infrastructure and start solving problems with your microservices faster today. If you’re an existing customer who wants to learn more about OpenTelemetry setup, check out our documentation.

Johnathan is part of the Observability Practitioner team at Splunk, and is here to help tell the world about Observability. Johnathan’s career has taken him from IT Administration to DevOps Engineer to Product Marketing Management. In addition to Observability, Johnathan’s professional interests include training, DevOps culture, and public speaking. Johnathan holds a Bachelor’s Degree of Science in Network Administration from Western Governors University.