Splunk 4 Down Under

I visited Sydney and Melbourne last week to host our first Splunk Live events in Australia. Its my first visit to Australia and I’m really blown away by the friendliness of the people we’ve met. And the “Australian for Grep” t-shirt finally had a proper home. Attendees at today’s event in Melbourne and Tuesday’s event in Sydney included an impressive list of current customers and partners and a number of new users evaluating Splunk for the first time including Telstra, Ericsson, InfoSys, Frontline Systems, Fujitsu, GE Capital Finance, Toll Holdings, Vanguard Investments and more. We owe a huge thanks to the team from Digital Networks Australia who sponsored the two events.

Martin Brown, A Large Australian Financial Services Company

In Sydney Martin Brown, pictured below with me, gave an excellent presentation on using Splunk for Identity Management Compliance. Martin is a Technical Architect managing the development and operations of the world wide web application security system‏ for a major financial institution. He’s had many career evolutions from implantable device electronics and software engineering, UNIX and network systems administration, internet systems management and security.

Martin’s company has a requirement for presenting client security history from their web applications and to be able to access this information to look for suspect IDs from the past six months. Tivoli Access Manager (TAM) is used for both external and internal identity management and access control. More than 200,000 clients authenticate externally through TAM.

His Splunk deployment is very much out of the box with a range of saved searches and some role partitioning. It consists of a single Splunk server with 1TByte of local disk for retention. The TAM logs are rsynced regularly and directly mounted from various hosts and systems. 12 internal and 12 external TAM hosts generate 5 GB/day of data or ~2TB of data a year.

The current user base consists of business second level support teams and TAM support group for third level support. The user bases is expected to extend to the Risk Management Group and first level help desk support soon. Their classic use case is

“Client X’s account has been compromised. What applications has he/she logged in to in the past 6 months?”

The old way required days / weeks of work and support from multiple teams. Often needed to pull in log files from offsite backup tapes then grep through GBytes of data from several hosts. Fun fun. Now with Splunk Martin’s team finds answers in minutes and soon will train Tier 1 agents to do the same, eliminating the hassle of Martin’s team fetching data for everyone. Next he plans to add App server, Web Server and Load Balancer data, role partitioning to restrict business user access to relevant logs, off-shore implementations to present local application logs, API consumption for helpdesk one-stop-shop interface.

Nick Clark, Ericsson

Nick Clark is a Technology Manager in the Solution Management & Utilities Consulting, System Integration & Multimedia practice with Ericsson where the focus is on bespoke support and life cycle management services for complex infrastructures. His group focuses on mobile and fixed network infrastructure, telecom services, software, broadband and multimedia solutions for operators, enterprises and the media industry. He presented his Splunk solution which Ericsson implemented at Telstra in the mobile multimedia services area to troubleshoot problems and investigate incidents. The solution was initially implemented to provide coverage of the 2008 Beijhing Olympics. Telstra predicted massive interest for mobile streaming yet demand exceeded all expectations. Splunk helped Ericsson and Telstra quickly pinpoint, manage and address problems. Because application failures and limits were discovered before they cause serious downtime Telstra maintained an uptime above 99.9% during the Olympic Games.

Telstra manages more than 10M users and 50 plus content providers on the Telstra Service Delivery Platform providing multiple mobile portals, content transformation, mobile streaming services and device specific rendering and UI over 2G and 3G networks. The environment consists of 60+ servers (Solaris 9/10, Windows 2003) and many platforms and technologies providing service orchestration, rich media content management, encoding and streaming for terabytes of active content.

Ericcson and Telstra’s challenges before Splunk were numerous including:

  • no central view of logs and events resulting in difficult to troubleshoot problems,
  • support and operations diverted to log fetching and ad-hoc reporting delaying work on high priority projects,
  • no consistent approach to log handling and storage making it difficult to locate, access and archive logs and
  • poor visibility of service and transaction flows extending outages.

The Ericsson team chose Splunk to help Telstra gain a holistic view of the environment, troubleshoot outages more quickly, provide users with ad-hoc reporting and control access to logs with by role. They are currently indexing roughly 20GB per day on a dual processor, dual core Xeon GHz server with 16GB of RAM. 30 support people (tier 1 and up) currently Splunk application, server and network logs and events to troubleshoot problems. The team makes extensive use of Splunk tagging to create alerts for future notification of problems reoccurring. Perhaps the most valuable thing Ericsson has done with Splunk is track end to end transactions on the Service Delivery Platform. With one view across all services and transactions to track activities the team can finally provide transaction level alerting and reporting.

Thank you again to Nick and Martin for presenting so well and Monsour, Martin and Sky with DNA who did a fantastic job and are representing Splunk very well down under.

Posted by