Splunk > Splunk Blogs http://www.splunk.com/ The IT Search Engine en-us Administrator hourly 1 http://backend.userland.com/rss Paul Pang: Live Security Showcase from PCCWhttp://blogs.splunk.com/paul/?p=10http://blogs.splunk.com/paul/?p=10Thu, 19 Nov 2009 14:40:16 +0000Paul PangWe are having a blackhat-like event in Hong Kong today and tomorrow. Our security guru from PCCW team are running the Live Security event in Taikoo Place, Hong Kong.

Why I say this is a blackhat-like event ? The hacking team from PCCW has done a very sexy LIVE hacking demonstration to around 80 guests as the grand opening  :)  By using the simple toolkits that you can download from internet or buy from small computer center in China, PCCW hacker team demonstrated how easy to crack the Wireless LAN WEP password, stealing password from careless user by Phishing email, hijacking the target Windows desktop by exploit kits, stealing the password by sniffing victim's VOIP call, and sending confidential data out by using MSN.

Live Hacking Demo

The 2nd part of the event is then the tour for all guest to learn what's the latest security technology from vendors including Avaya, Bluecoat, Checkpoint, Cisco, Junper, McAfee, NetApp, Radware, Websense and Splunk.

Our booth in cyber-look, and the pretty security lady :)

The final session of the event is certainly for Splunk !! Remember PCCW team has done the Live hacking demo ? Now, it's time to find out who the hacker is ! We have setup a splunk machine in the event to eat all LIVE data from all devices in the showcase. We have shown several live search demonstrations from search IP address among network devices, error and failure alert from servers and applications, and even the unstructured data such as Windows registry and MSN chat log. We then invite one guest to come out onto the stage, and let him try to search how the hacker steal the confidential data out from the company. By just a few seconds, we can immediately locate the MSN file transfer record and identify the time of the incident, the insider hacker and the stolen file name. All guest are amazing that using splunk is just as easy as using google.

Feel free to join us. You can still have chance to enjoy this funny event tomorrow !!

]]>
Michael Baum: Cisco CSIRT Presents at SplunkLive Raleighhttp://blogs.splunk.com/thebaum/2009/11/15/cisco-csirt-presents-at-splunklive-raleigh/http://blogs.splunk.com/thebaum/2009/11/15/cisco-csirt-presents-at-splunklive-raleigh/Mon, 16 Nov 2009 05:54:52 +0000Michael BaumLast Thursday Dave Schwartzburg and a few other Cisco security mavens attended SplunkLive Raleigh. The Cisco Computer Security Investigation Team (CSIRT) has been a applying Splunk to corporate security investigations for more than two years now and Dave was generous enough to share their experiences with us all. Joining Cisco presenting at the event was James Ervin of University of North Carolina Chapel Hill, a very knowledgeable Splunk customer. Patrick Ogden, Splunk Sales Engineer gave a rocking good demo of transaction tracing in a telco provisioning environment and Will Hayes, Splunk Sr. Solution Architect showed the latest Splunk for Cisco Security App being developed together with the Cisco CSIRT team.

Cisco CSIRT Team

Dave Schwartzburg

Dave Schwartzburg is an Information Security Investigator and runs the IDS infrastructure for Cisco Corporate and their internal networks and IT assets. He has an M.S. Information Security from East Carolina University and a B.S from the University of Wisconsin. Dave's been with the Cisco CSIRT team for two years and prior to that was with AT&T Internet Investigations & Security Services. Cisco has more than 100,000 employees and contractors and more than 127,000 devices on their corporate network. That's a lot to keep track of which is why the CSIRT team utilizes Splunk.

The Cisco CSIRT works to reduce the risk of loss as a result of security incidents for Cisco-owned businesses. CSIRT regularly engages in proactive threat assessment, mitigation planning, incident trending with analysis, security architecture, incident detection and response. This happens in three phases, investigations, mitigations and prevention.

A Tier 1 Event Analysis Group is located in Costa Rica. They handle security threat monitoring. The Tier 2 Event Analysis Group in Bangalore handles the easier case investigations and mitigations. Dave is part of the Tier 3 Global Incident Response Team handling more difficult cases and longer term prevention through changes to the infrastructure and security systems.

Cisco Security Environment

Cisco regularly collects web proxy (Ironport WSA), anti-virus (Ironport ESA), host-based intrusion protection (Cisco Security Agent), syslog, VPN logs, authentication messages, network IDS signatures and Netflow records from critical subnets.

  • 3 million IDS events per day
  • 3-5 billion Netflow records per day
  • 300 malware-related cases a day

Some event sources send their data to a global network of collection servers and some event types are pulled from their sources directly to a centralized server. Splunk handles the collection and indexing of the data.

Correlation and Reporting with Splunk

The CSIRT team makes extensive use of scheduled reporting and alerting for proactive monitoring of problems.

In this example, the team is correlating host-based IDS with antivirus logs and running malware reports via cron, using the Splunk CLI. The results of the report are scheduled and E-mailed to EA teams for processing and submission for remediation.

“Red Carpet Reports” monitor executive systems to make sure they aren’t infected or compromised. Here we see an example of the Koobface worm found in CSA logs on an executive laptop.

Finally the team has some way to make use of all the CSA data they receive. One of the most useful has been to pinpoint people disabling Cisco Security Agent itself indicating the machine is now unmanaged.

Results for the Security Team

The resulting productivity from centralized access to multiple data sources has been dramatic. Not only is the team lowering the time to respond to incidents, but they are also allowing lower skilled workers to handle more complex cases.. And surprisingly 10% of cases are no from previously unused/underutilized sources. The value of substantially faster access to important data and correlation across numerous sources for reporting and ad-hoc investigations is incredible.

Splunk for Cisco Security App

Some event sources send their data to a global network of collection servers and some event types are pulled from their sources directly to a centralized server. Splunk handles the collection and indexing of the data.

University of North Carolina Chapel Hill

James Ervin

James has been a doing system administration, network and security monitoring and application development with UNC since 1998 when he completed his MS in Computer Science NC State University. As part of the Information Technology Services (ITS) team at UNC his projects have included work on the university's original Active Directory deployment, Unix-based webmail systems and security and information event monitoring. Earlier this year he inherited a centralized logging project for the university. UNC was the nation's first state university, serving North Carolina for more than 2 centuries with 29,000 students and 4,000+ Faculty members. ITS is the largest IT organization on campus (~500 employees) looking after financials, admissions, centralized learning and centralized email. ITS frequently collaborates with other campus IT organizations of which there are many.

ITS Environment

The ITS team manages a moderate size mixed application, server and networking environment consisting of the following major components.

  • Multiple Unix flavors (AIX, RHEL, Solaris)
  • Large Windows infrastructure
  • ~600 devices total
  • ~20 IPS/IDS/FW/LB devices
  • PDU, environment probe data
  • Apache, Tomcat, JBoss

This environment is constantly in flux as students and faculty come and go and non-managed desktops, laptops and mobile devices connect to the network.

"We needed to determine what is possible within our environment and adopt a flexible architecture."
- James Ervin

Earlier this year, James and his team were facing an every growing list of requirements for their centralized log management project including:

  • Make syslog services more useful to the rest of the IT organizations
  • Collect and centralize Windows event logs
  • Alert on events of interest
  • Correlate security events
  • Provide NOC/SOC staff access to security logs
  • Give application developers access to application logs
  • Report on unplanned system changes
  • Satisfy the auditors

Evaluation Process

The ITS team reviewed a number of log and event centralization technologies including the possibility of building their own, before deciding on Splunk. Database-backed products were dismissed because they require tight control over log sources in order to be able to process incoming data properly (format changes could cause incoming data to drop). Few solutions could pull any intelligence out of arbitrary, unstructured data and customization was often difficult or required professional services. Some products imposed severe limitations on clients and users, and ITS wanted to grant access widely to enable other IT departments to do their work. Finally log appliances offered a degree of customization less than desired; James wanted an “open” architecture capable of handling arbitrary inputs and outputs with reasonable effort.

The Splunk Deployment

UNC's Splunk deployment includes a single Splunk indexing server that is fed by many different sources. New sources arrive almost daily as new applications and servers are installed around the university. An existing centralized syslog server feed Splunk. Approximately 80 Splunk forwarders on high-interest servers (AD domain controllers, Apache etc.) feed Splunk. And a "dropbox" indexes one-time batch uploads. The primary index size is ~1TB and data is kept online for 90 day retention. The university SAN is storage on the back-end and more than 80 users are sharing saved searches, reports and dashboards. Users have a long-tail distribution: a few "power users", lots of "casual users".

Measuring Success

Did it work? What I really liked is the simplistic but powerful way James and his team measured their success with Splunk. The team asked themselves a few fundamental questions which demonstrates the project was a lot more about solving problems than just generating some compliance reports.

  • What have we done with it that we expected to do?
  • What have we done with it that we didn’t expect to do?
  • How successful have we been?
  • What lessons have we learned?

The UNC team particularly like the fact that Splunk has no per client / per user license cost and that work can be distributed more effectively, data accessible to those who need it. James also likes Splunk because it can ingest any data you throw at it and Search-time extraction is infinitely easier to manage than index-time extraction.

Issue Identification and Troubleshooting

The first thing they looked at was how Splunk helps issue identification. IT Search, as it turns out just like Web Search is a metaphor that empowers end users; intimate knowledge of the systems or data is not required to get results.

'Splunk often produces serendipitous results the "look what I found!" moments.'

In many of the UNC scenarios, Splunk provides the "what is actually happening" view that like the university ITS team, so many IT organizations lack.

Client Remediation and Security Analysis

One of the security problems at a big university are client computing devices. Identifying owners of laptops, desktops and PDA that are infected, in violation of acceptable use policy, have been stolen or causing network trouble requires a data gathering process and specialized knowledge. Use Splunk to tie search results (DHCP logs, antivirus logs, etc.) to the client registration database allows results to be “doped” with additional data from the live registration database.

"In the case of a stolen laptop, input an IP/MAC address and Splunk returns the owner’s name and last known location used on the network.

Another key security driver at UNC is security event correlation including correlation of IDS/IPS events with server and network events for short-term alerting and long-term reporting. Splunk is correlating IDS/IPS data (Snort, etc.) from multiple sensors and issue alerts based on thresholds and combinations of events representing specific situations.

  • more than 10,000 hits from a single source over a time period
  • more than 15000 hits from multiple sources over a time period (DDOS detection)
  • hits for high-risk signatures

The combination of pre-defined search alerts and ability to do real-time arbitrary correlation (e.g. free-text search lets us correlate any attacker IP with events across ALL log sources via a single search) is really powerful.

James has found Splunk goes beyond a typical security event correlation in other ways too. Being able to audit all kinds of system and user activity provides the type of birds eye view the team never had before. Examples include:

  • Report of administrator account usage in entire AD forest used by AD administrators to discourage use of admin accounts on untrusted machines that might be keylogged
  • Geolocation of IDS/IPS events via SDK and MaxMind GeoIP database allows security team to “eyeball” results, eliminating tedious investigative steps
  • Web-based password change utility was being brute-forced; Splunk now reports when the number of requests to this page exceeds a threshold
  • Classroom Support uses a Splunk-generated report to track student lab usage

Lessons Learned

Perhaps the biggest lesson UNC has learned to date is how unanticipated uses are often as important as the anticipated ones. "Teach a man to fish..." the saying goes.

'How successful is Splunk? One of our users was quoted saying, "Thank god for this."'

Simplicity is a virtue. Complexity is also a virtue. Splunk provides both a simple interface and a more powerful customizable interface if you want to dig further. But the real power is in giving people tools that help them think, not turn off their brains and stare at red, yellow or green. Of course the UNC team also commented that they've learned products are not substitutes for policy, but policy is no substitute for reality. And there is no shortage of unenforceable policies at the university.

'The Splunk flexible architecture helps us to achieve the “middle ground” between what we need and what is achievable. New problems always emerge as old ones are solved. A good architecture enables you to solve the new problems, rather than forcing the new problems to fit into the old box.'

Unanticipated Benefits

So what else can a flexible architecture that's easy to implement do for a centralized logging infrastructure? Well, no more local logging for one. Some servers simply can’t log locally due to volume, performance, etc. This is bad from an auditing standpoint, although your policy may be to retain all logs locally for the amount of time required by legal and industry regulations. Splunk uses a local forwarder to route data over the network without logging it locally. Even if the network goes down Splunk won't lose events. The result is an ability to run transactional searches on high-volume log sources, without impacting the original service or developing specialized SQL or reporting applications.

]]>
Erin Sweeney: Splunk in the NOC at Interophttp://blogs.splunk.com/erin/2009/11/13/splunk-in-the-noc-at-interop/http://blogs.splunk.com/erin/2009/11/13/splunk-in-the-noc-at-interop/Fri, 13 Nov 2009 22:26:11 +0000Erin SweeneyFor those of you checking out Interop NYC at the Javits Center next week, be sure to hit up the Splunk booth #346 for schwag and a demo of the latest 4.06 hotness.

Splunk support engineer extraordinaire Deep Bains will be building and projecting dashboards in the NOC to track what's happening at the show. In the past we've identified network latencies and dictionary attacks. What will happen this year?

If you haven't registered yet, you can get a free expo pass or $300 off your conference pass with this code: CNLUNY05. The offer is good until November 15.

And as always, I want to hear your stories about how Splunk's making a difference in your IT environment. Stop by and brag to me about your latest IT ninja moment or your idea for a rad Splunk t-shirt slogan.

]]>
Michael Baum: Chad’s Armyhttp://blogs.splunk.com/thebaum/2009/11/10/chads-army/http://blogs.splunk.com/thebaum/2009/11/10/chads-army/Wed, 11 Nov 2009 06:05:43 +0000Michael BaumI stumbled upon this unexpected post from Chad Sakac of EMC talking about the VMware/EMC/Cisco collaboration.

For anyone who has spent their career on the start-up track in Silicon Valley this is not a novel story.

Isn't it fantastic to see some large companies still have the mojo of entrepreneurship and fast moving initiatives that survive outside of the normal organizational structure?

While it remains to be seen how successful VCE, Acadia and Vblock will be, it sure is exciting to have the industry talking about radically new approaches to simplify computing! Here is a great post summarizing Vblock from Mark Bowker @ Enterprise Strategy Group. Now if we can only get access to that lab and get Splunk running on one of those Vblocks ... hmmmm.

]]>
Erin Sweeney: Cisco uses Splunk. The CSIRT Team and a Prestigious University will share their Splunk stories–live from Raleigh, Next week: Thurs Nov 12http://blogs.splunk.com/erin/2009/11/03/cisco-uses-splunk-the-csirt-team-and-a-prestigious-university-will-share-their-splunk-stories-live-from-raleigh-next-week-thurs-nov-12/http://blogs.splunk.com/erin/2009/11/03/cisco-uses-splunk-the-csirt-team-and-a-prestigious-university-will-share-their-splunk-stories-live-from-raleigh-next-week-thurs-nov-12/Tue, 03 Nov 2009 07:00:21 +0000Erin SweeneyJoin us at the Marriott RTP next Thursday, November 12 for SplunkLive. Two more of our coolest customers will showcase how they're using Splunk in their IT environments.

The oldest state university is using Splunk for all the basics - log consolidation, email tracing, operational troubleshooting and as a SIEM in conjunction with IDS and IPS tools. Learn how the university is using Splunk to enhance operational effectiveness, and how one system administrator built the case to fund and deploy Splunk across the largest IT group in the university.

We'll also hear from Cisco's CSIRT (Security Incident Response Team). They use Splunk to combat malware across more than 65K desktops. Once they detect and remedy potential issues, Splunk's reports help them to keep tabs on trends and provide dashboards to IT management.

After the event, I'll post a full review, but if you're in the Raleigh Durham area, have friends there, or are able to get over there, join us. In addition to chatting with other customers, there's an afternoon hands-on session to give your implementation an extra boost.

Details here or Short URL: http://bit.ly/23Vv15

]]>
Erik Swan: Serendipity is….http://blogs.splunk.com/erik/2009/10/30/serendipity-is/http://blogs.splunk.com/erik/2009/10/30/serendipity-is/Fri, 30 Oct 2009 19:31:30 +0000Erik Swan"Serendipity is looking in a haystack for a needle and discovering a farmer's daughter
"
- Julius Comroe

I just read the quote in a presentation from Matt Jones of BERG at the DXf conference. There is so much i love about this presentation i don't know where to start. Just click through it ( embedded below ) and have your own reaction. It's clearly designed to be a fun/light read. I think I clicked at about one slide per few seconds. Then went back and stopped on a few that really spoke to me. It was entertainment that made me think which then made me smile.

At its heart, splunk is a time machine. It allows someone to go back in time and "see" what their world looked like at any given moment and to look for trends, anomalies, volume, momentum, etc. If you put enough data into splunk, you can re-live the past with microscope, hit "play", slow it down, speed it up, draw it on a chart, compare it to another time. We are all about time.

We recognized back when starting splunk that technology was beginning to record our lives. Everything logs, and these logs record footprints for most of what we do. Thanks to cell phones, personal computers, credit cards, traffic cameras, .... (insert long list), the world is slowly being recored by machines. The problem is that the volume is outrageous and their is no common format to "play" these recordings. In comes Splunk, your time machine.

Oh, this is the second cool link from the Berg guys this week. Also loved:
http://berglondon.com/blog/2009/10/23/toiling-in-the-data-mines-what-data-exploration-feels-like/

]]>
Michael Baum: SplunkLive Seattle Kicks IThttp://blogs.splunk.com/thebaum/2009/10/29/splunklive-seattle-kicks-it/http://blogs.splunk.com/thebaum/2009/10/29/splunklive-seattle-kicks-it/Thu, 29 Oct 2009 14:54:26 +0000Michael BaumOn what was an incredibly beautiful day we had more than 100 Splunk devotees attend our first ever SplunkLive event in Seattle last week. In the shadow of Microsoft we talked about our Windows and Microsoft strategy and compare notes with lots of customers that are running mixed Microsoft, Linux, Solaris environments. Many of our customers with Microsoft Active Directory, Exchange and SharePoint environments are utilizing Splunk to troubleshoot problems and implement security and compliance controls in large-scale, distributed environments. But, I'm still surprised at how little Microsoft .NET we're seeing in production large-scale applications.

Three Seattle-based customers presented their views on managing mission critical applications, IT data consolidation and Splunk.

  • T-Mobile USA
  • Blue Nile
  • Washington State University

T-Mobile USA

Sean White, Senior Engineer with T-Mobile Operations in Bellevue talked with us about their global rollout of Splunk. Sean is a member of the security engineering team charged with incident response, IDS, vulnerability scanning, anti-virus and enterprise unified logging. He graduated with a B.S. in Computer Science from University of Kansas and has a deep background in large telecom environments initially as a system administrator and webmaster, SS7 network C&C and performance, engineering and now in information security. Sean has been at T-Mobile for 4 years, prior to that at Cingular, AT&T Wireless. T-Mobile USA is the 4th largest US national provider of wireless voice, messaging, and data services to 34M subscribers with annual revenues of $17B. T-Mobile USA is the US operating entity of T-Mobile International AG, the mobile communications subsidiary of Deutsche Telekom AG (NYSE: DT). Deutsche Telekom is one of the largest telecommunications companies in the world, with nearly 120 million customers worldwide

It all started with PCI Compliance

Like many of our enterprise customers, T-Mobile started working with Splunk in one area but quickly saw the value of expanding into others. For Sean and his team, PCI Compliance was the beginning of the Splunk solution footprint, but soon everyone realized the consolidation of logs, events, messages, configurations and changes meant a whole lot more.

Beginning with proving PCI compliance, T-Mobile has very specific requirements. PCI Section 10: Track and monitor all access to network resources handling cardholder data. But in T-Mobile's case scale was a big issue. Fulfilling PCI DSS Section 10 meant tracking 26+ in-scope applications and the ability to trace transactions from start to finish across 650+ servers running Windows, Linux and Unix varieties. It also means more than 100 individuals logging into Splunk on a daily basis as part of the process.

The Splunk Set-up

The Splunk configuration consists of

  • Pairs of forwarders set up in each of 4 geographic locations.
  • Three short term indexers + 1 short term search box.
  • Three Long-term search boxes hooked into a 32 TB NAS.
  • Centrally controlled from a single deployment server.

The current installation is indexing more than 600GB/day of data and has just passed the 10B event mark. Controlling access to all this data is critical and T-Mobile has Splunk roles set up for managers and application teams to limit access to subsets of the data. The ability to segregate data access along lines of duties is critical to prove PCI compliance.

The Business Case for a SOC

In addition to proving PCI Compliance, T-Mobile has discovered Splunk's use for Security as well. Not long ago, a SIEM vendor would have told you IDS and firewall logs were all you need. That >=2 sources of data == correlation. Not so much.

“All the best new vulnerabilities are coming in on the application layer."
- Sean White

Enterprise logging—visibility into all of your IT data—is absolutely critical in defending against modern blended attacks. At T-Mobile Splunk has become a primary analysis tool for deciphering what is happening to the applications, servers and devices on the network. A few saved searches and Splunk helps does real correlation.

Nothing Boring about Logs and IT Data!

PCI Compliance mandates gave T-Mobile the excuse (read funding) to start an enterprise logging initiative. Logging all security, network and application events can truly give insight needed to not only measure and report on compliance controls but also to run a more secure and effective business. PCI has also discovered that integrating the ability to ask any question of their environment and get immediate answers also provides a pile of value to the help desk operations and better business intelligence functions.

“All the information about your company is in your logs—there’s nothing boring about it.”


Blue Nile

Jerry Brennock, Director Core Development at Blue Nile explained how the company is using Splunk to improve the experience of buying diamonds over the Web. Blue Nile, Inc. is an online retailer of diamonds and fine jewelry offering in-depth educational materials and unique online tools that place consumers in control of the jewelry shopping process. Importantly, the focus is on giving customers a great experience at a a great price – this translates to requiring high quality at a low cost. Jerry's team team builds and support the infrastructure and applications for merchandising and marketing, including the website. He's been with Blue Nile for 10 years and in the e-commerce space for more than 17.

The Killer Diamond App

Diamond Search is undoubtedly the killer application for Blue Nile's E-commerce experience. It's an asynchronous javascript app that has to work across any browser and there are many non-obvious use cases. All three of these factors means it is prone to failure in lots of edge cases.

"If this application isn't fast and accurate, we don't sell diamonds."
- Jerry Brennock

Jerry's team has embedded tracking pixels with name value pairs to track JavaScript profile information from each diamond search. This together with Web server 500 and 404 errors give the development, operations and customer support teams all the data they need to troubleshoot problems. The challenge is finding customer problems "in the moment" before the sale is lost.

Centralized Monitoring and Alerting with Splunk

In order to respond quickly the development, QA, operations and customer support teams needed a centralized, consolidated view of all Web logs across the infrastructure. In addition, the existing custom error alerting system was fragile and error prone. The Splunk solution was designed to collect logs and events in real-time and provide searches, alerts and notifications.

"If we solve a problem in one minute versus 30 minutes during a peak hour - Splunk pays for itself."

Real-Time Customer Service

The most important use case driving Blue Nile's retooling with Splunk is Customer Service. Superior service is a key driver of the company's growth. Repeat and referral business is very important in a high end E-commerce business like selling diamonds.

'With Splunk we can now contact customers intelligently, "We See you are looking for a 1.5 carat diamond and noticed you are having a problem with Internet Explorer..." this gives our customers intelligent service and let's them know we're not wasting their time.'

Sometimes alerts start firing immediately after a new code release. QA can react quickly using Splunk to research issues. This allows them to very quickly identify and correct edge cases that are difficult to catch in non-production environments

Low Barrier Reporting

Initially reporting with Splunk was seen as just an extra bonus. But, Splunk made ad-hoc reporting so easy we started publishing saved searches to understand which site features are valuable to customers and partners.

  • How many customers have active RSS feeds? Which readers?
  • How many partners are using that new pricing report?
  • How many customers actually scroll down in diamond search? How often?
  • How many partners are using that new pricing report?

One example here shows how many partners are using that new pricing report.

eventtype="XNet" (BNF_http_filename…") starthoursago=24 | rex field=vendid "(?[^0123456789%]{2,})" | sort bn_vendor_name | chart count(bn_vendor_name) by bn_vendor_name BNF_http_filename

Lessons Learned

Jerry's team has been using Splunk extensively as their centralized monitoring and reporting solution in the data center. They like how Splunk seamlessly transitions from alerts to research and troubleshooting mode. A few tips from his team.

  • Use event types and named fields to increase accuracy in your alerts
  • Think about Splunk not just for investigation but alerting and reporting.
  • Long-term trending analysis compliments real-time monitoring over time.
  • Saving searches is a great tool for internal training of operations, QA and support personnel.


Washington State University

JJ Warren is an Oracle Database Administrator at Washington State University and a super sharp Splunk expert. JJ has been working with Oracle databases for 10+ years and has been a SQL Server DBA  for various projects like the WSU data warehouse. He is the principle DBA and developer for many large private projects (Brownfield/Superfund sites, Marketing Research, etc.). JJ's core roles involve security, performance tuning, and assisting with database/application development and he's been known on occasion to dabble with networks and security (VPNs, firewalls, SNMP monitoring).

Washington State University is a land-grant university that provides world-class education to more than 25,000 students statewide. Founded in 1890, WSU’s statewide system includes campuses in Spokane, the Tri-Cities, and Vancouver, regional learning centers, extension offices in every county, and distance degree programs accessible around the world. U.S. News and World Report consistently ranks the University among the top 60 public universities.

We Needed Centralized Logging

The WSU IT team, like most enterprises, works in various silos:

  • Networks,
  • Security,
  • Operating systems,
  • Servers and
  • Infrastructure,
  • Critical Applications and
  • Mainframes.

But, there was miscommunication, misinformation and limited access across teams to solve broad problems.

"It is difficult to properly tune, secure, and help developers when you can’t properly see all the forces acting on your environment."
- JJ Warren

IT process improvement became the main focus to improve quality of service and reduce cost of running operations. The IT teams put together a number of process improvement goals including:

  • Ability to track E-mail MTA activities end to end across all mail systems (Barracuda, Sendmail, MSFT Exchange).
  • Ability to track Web-based sessions for single sign-on among various Web servers (Apache, IIS).
  • Ability to track home grown application transactions end to end utilizing custom log and event formats.
  • Making available logs and events that aren't sent off hosts over the network to the various silos with access controls.
  • Ability to track response times for services from end to send.
  • Develop standardized reports across the silos and schedule regular delivery.

Why Splunk?

JJ is very passionate ability IT process improvement, the roles IT data plays in process improvement and Splunk. He offered up some excellent reasons why WSU chose Splunk.

"Other vendors offer canned reports, but to truly understand our environment—and get up and running quickly, Splunk was the best answer."

The Results

Every IT system administrator (more than 40 people) are now using Splunk. Regex searches on the syslog server would have taken minutes to hours to write properly, run and report. It now takes seconds with Splunk. Splunk has become the proactive alerting system of choice. Now the WSU team can have multiple people jump on issues right away.

“Now multiple people can jump on issues. We’re no longer stovepipes but a much more effective team.”

What's Next?

Next JJ and his team are working to provide custom and saved searches to a broader audience and implementing indexing of application data to give developers new troubleshooting power and integrate development more closely with production operations. WSU's goal is to have Splunk on every server and every network device.

“Splunk is a best practice for our IT department—it’s embarrassing if it’s not in place somewhere.”

]]>
Nimish Doshi: Splunk, Developers, and SOA Appshttp://blogs.splunk.com/nimish/?p=17http://blogs.splunk.com/nimish/?p=17Tue, 27 Oct 2009 21:31:59 +0000Nimish DoshiWhen most people first come across Splunk, the first set of users associated with it naturally become operations, security, or compliance personnel. Splunk naturally lends itself for their use. I was speaking to some software engineers explaining what Splunk does and the connection for how it could be used for their engineered Service Oriented Architecture applications did not come immediately. I told them that one of Splunk's T-Shirts reads "Be an IT Superhero. Go Home Early." At that point, I got their interest.

Let's get back to the basics for one of the reasons Splunk exists, which applies to not only SOA, but also to all phases of multi-tier deployment. The typical developer may be involved in multiple stages of SOA development that produces applications and services residing on multiple physical servers. When something goes wrong on any of these servers, the developer may get called to investigate, but for reasons of security, are not given access to these servers. So, our friendly neighborhood developer, next calls someone in operations, who zips up relevant log and trace files to send to the developer via an FTP server. The next steps involve getting the files, unzipping them, and running various home grown scripts which usually have some derivative of Perl, Awk, and SED, to search for issues. If the results are not available for this server or it turns out another server is the culprit for the issue, the whole process is repeated and can take a while to accomplish.

Along comes Splunk to automate this whole effort and make IT search as easy as using a browser based search engine. Splunk Light Weight Forwarders (LWF) are installed on every leg of the SOA process to monitor application produced data. Each forwarder sends events to Splunk indexers in a Splunk controlled automatic load balanced manner. A separate Splunk server called a Search Head, which is essentially a Splunk indexer that does not index, but participates in a distributed search, is used by the developer to find the issue. Each event has a timestamp, host it came from, source file name, and a classification called sourcetype to narrow down the search. In a matter of minutes, issues can be tracked down, for what used to take hours. A sample Splunk deployment for this set up is below.
Distributed Search

In this example, we have forwarders for an application server, a service bus, and a BPM product. This is just one example as a SOA tier could just as easily have been a web portal or MQ Series. For completeness, we also have Firewall data being forwarded. However, Splunk role base access can restrict what the developer can see and do. For instance, all application data can be put into a separate index called application and the developer can only search for data where index=application. Further restrictions such as originating host or sourcetype can also be applied to the role.

For one technical note, Splunk's LWF are indeed light weight in that they purposely restrict the amount of network bandwidth they consume to send data to an indexer to a maximum default of 256 KBps. If you want to increase or decrease this maximum data rate, copy SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/limits.conf to SPLUNK_HOME/etc/apps/SplunkLightForwarder/local and change the settings in limits.conf.

There, you have it. Software developers who are constantly called upon to troubleshoot issues in production systems and SOA deployment can go home earlier as they could have role based access to data in their area of expertise. To make this even further compelling, Splunk can also be used to monitor and alert on additions, changes, and deletions in the file system to speed up these types of investigations. This combination should help create IT Superheros.

*************************

On an administrative note, in the past, I have written blog entries on various topics such as using JavaMail with Splunk or correlating with database records. For these entries I provided links to examples and applications that covered the topics. These have all been moved to the new Splunk Community Apps page.

]]>
Erik Swan: Add a Server or Two!http://blogs.splunk.com/erik/2009/10/27/add-a-server-or-two/http://blogs.splunk.com/erik/2009/10/27/add-a-server-or-two/Tue, 27 Oct 2009 21:04:08 +0000Erik SwanEvery week i run into someone that is having performance issues and they are not aware you can just add another server or two or ten. I'll travel to meet a company and I'll ask how many servers they are using for Splunk to search/index/report on a terabyte a day. They will say a couple. I'll then ask how many they have for a similar sized hadoop or data warehouse project. They will say 50 to 100X that number. Look if your going to give these systems 300+ servers, can we please get 15?

Somehow there is a breakdown in our communication that we scale like all other good architectures.

The following are hopefully some easy pictures to help tell the story. It should be extremely simple and straight forward, to the point of being obvious - if not bug me and i'll try again.

]]>
Erik Swan: Exponential is the entrepreneurs linearhttp://blogs.splunk.com/erik/2009/10/26/exponential-is-the-entrepreneurs-liner/http://blogs.splunk.com/erik/2009/10/26/exponential-is-the-entrepreneurs-liner/Mon, 26 Oct 2009 18:58:38 +0000Erik SwanI was in a meeting last thursday where some "important-people" ( not sure if they want to be named ) dropped the D word ( "disruptive" ) several times. They were presenting a slide that proved-out an age-old (1994?) adage that the key to success is ( can be ) a disruptive business model. It's one thing for professor Christensen to talk about it, and another when its bankers have a slide for it. Personally I need to be reminded of its importance every day, since being disruptive was one of the most important guiding principals when founding Splunk. As we grow, and become more established, i hope we continue to be a disruptive leader - it certainally faces constant tension.

Hearing the D-word reminded me i wanted to post about Steve Jurvetson recent video of a talk at Stanford. I can't tell if the talk was great, or it just spoke to me as i just needed a entrepreneurial boost. Also, I find myself very much into nano/bio engineering these days as it seems the next wave of innovation and DFJ is clearly backing some of the most innovative work. The great thing about Splunk is trying to make heads-or-tails of very large data and the more time I spend with it, the bio/nano space the next frontier.

With our company growth, the entrepreneurs are now in the minority, and it becomes critical to have a constant influence to not forget that feeling you can change the world. It made me smile when Steve said "Exponential is the entrepreneurs liner". Too right, i just hate linear. Also, come on stanford, I was stunned that none of the students knew about kurzweil's singularity - no wonder they are falling in the charts.

*** Steve Jurvetson, partner at Draper Fisher Jurvetson, on the business of the d-word ***

Interestingly, i went looking for a concise definition of a disruptive business/technology. Chirstensen has many. Wikipedia has a nice over worded version. Maybe its just one of those things that you know if when you see it but any real definition seems to not be quite right. Anyhow, I added a calendar reminder every week to make sure we are still being disruptive.

]]>
Erik Swan: Collision of big data analytics and splunkhttp://blogs.splunk.com/erik/2009/10/23/collision-of-big-data-analytics-and-splunk/http://blogs.splunk.com/erik/2009/10/23/collision-of-big-data-analytics-and-splunk/Fri, 23 Oct 2009 18:53:13 +0000Erik SwanbeerHow people use Splunk is often a surprise to us - at least they are going beyond our original intent. Initially we thought of splunk as a search engine for log files, Google for your logs if you will, to help IT folks troubleshoot their complex systems. Quickly we found that users started Splunking config files, network packets, source code, email, etc. Over the years our customers have been dragging us into all sorts of new uses-cases like global windmill power plant data analysis, protein structure prediction, or just something simple like analyzing user behavior on a website.

Lately we have started to see the collision of Splunk and big data analytics, usually with hadoop based tools, vertica, aster, greenplum, etc. In most cases there is complimentary value with these guys as they are better at some things than splunk, but there are use-cases where splunk by itself is just fine. Either way, Splunk is getting dragged into the big data area since we often are the collectors and often the primary indexer of long term historical data.

It was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into a larger big data discussion.

Many of our larger customers have Splunk for troubleshooting, monitoring and real-time alerting, and have other tools such as vertica, aster or others for doing analytics. Interestingly, we are starting to solve how to play well together. Both systems often require the same data, and with splunk often collecting at the source we are starting to see places where splunk feeds these systems. I think it will be fun over the next year or so to see how the hadoop movement, columnar store, parallel sql, and other technologies evolve along with Splunk. If you have one of these other systems and are curious how to better leverage splunk along side drop me a line. And do check out Curt's blog for keeping up on whats happening in the next gen database space.

]]>
Erin Sweeney: Are you in San Francisco for Oracle Open World? Come visit Splunk!http://blogs.splunk.com/erin/2009/10/13/are-you-in-san-francisco-for-oracle-open-world-come-visit-splunk/http://blogs.splunk.com/erin/2009/10/13/are-you-in-san-francisco-for-oracle-open-world-come-visit-splunk/Tue, 13 Oct 2009 18:43:00 +0000Erin SweeneyAre you in town for Oracle Open World? Do something fun AND productive with your time in SF - swing by the Splunk offices and join us for a beer. Meet with dev or support, tell us what works and what doesn't, and maybe even record a video to tell us how you're using Splunk.

We love it when customers swing by. We get feedback on how they're using Splunk in their environment, roadmap feedback and enhancement requests - and they love it too - they get first hand access to our support team to work through issues or understand best practices. Recently we've hosted Macy's, VeriSign, Edmunds.com, Cisco, Lawrence Livermore National Labs and nTelos.

We're only a few blocks away from Moscone at 2nd St at Brannan. Email me to let me know you're coming for a visit: erin AT splunk DOT com.

We'll have the Boddington's (in the new nitrogen kegerator!), Trumer Pils and Widmer Hefe on tap.

Hope to see you!

]]>
Michael Baum: Social Documentation Benefits and Pitfallshttp://blogs.splunk.com/thebaum/2009/10/13/social-documentation-benefits-and-pitfalls/http://blogs.splunk.com/thebaum/2009/10/13/social-documentation-benefits-and-pitfalls/Tue, 13 Oct 2009 18:20:47 +0000Michael BaumTim Jones of Agora Games posted a good summary of his experience with Splunk. Tim reveals what we've known for some time. Splunk is incredibly flexible and powerful but sometimes finding the Splunk documentation to do exactly what you want isn't as easy as it should be.

We've struggled over the years to keeping our documentation both up to date and easy to use. Earlier this year we moved to a wiki based approach to Splunk documentation in hopes of keeping it more up to date and usable with inter-documentation links. Suffice to say we are still embryonic in our use of wiki technology as applied to documentation. We power our docs site with MediaWiki the PHP wiki technology that runs Wikipedia. Along the way we've had to add a lot of capability around the MediaWiki platform to control docs permissions and versioning.

If you sign-up as a Splunk Community member you can modify and add to the Splunk Knowledgebase and docs wiki yourself including:

  • edit discussion tabs
  • edit any page except for major landing pages and
  • add new pages.

We're taking this "extended community approach" to documentation because we know there are many people like Tim that have a the ability to help us make not just the Splunk download and bits better, but also the Splunk documentation better and more complete. We realize the risk in opening up our documentation to the community is that things won't always be as easy to find as they should. But we believe in the long run this social approach to documentation will ultimately make Splunk a much better experience.

Please let us know what your think and how we can improve.

Happy Splunking

]]>
Erik Swan: The Puppet Master Comethhttp://blogs.splunk.com/erik/2009/10/08/the-puppet-master-cometh/http://blogs.splunk.com/erik/2009/10/08/the-puppet-master-cometh/Thu, 08 Oct 2009 23:12:55 +0000Erik Swanbeer
Last week Luke Kaines, The Master of Puppet, held a very well attended Puppet Camp here in SF. He drew a fantastic attendance from top notch companies - I was most impressed with the technical quality of the presentations and breakout sessions ( quality food too! ). These types of events can often be mundane or boring - this was not. Kudos to Luke for building a quality community.

I had the pleasure of meeting Luke some three years ago back at a BayLISA event where I saw him win over a tough audience with an early incarnation of Puppet. Its been fun watching him over the years deliver on that early promise and for continuing to win over a very tough crowd.

Recently I've been polling our customers how they do configuration/change management. Interestingly, I have noticed people mostly fall into two camps:

  • A very large percentage that use Puppet
  • A equally large percentage use nothing or home grown

It caught me off guard that such a large number use Puppet and equally surprised that there was no #2 vendor solution. Great news for Luke and team.

As part of my inquiry I've been compiling a list of integration points between Splunk and Puppet. Soon I'll be dropping a Puppet App for splunk with dashboards, saved searches, and reports, based on indexing puppet reports, logs and facts.

  • If anyone out there uses Puppet and would like an early copy or has integration ideas let me know
  • If anyone out there does not use Puppet, they should look into it, and feel free to ping me if you have any questions, maybe I can point you in the right direction

Drop me a line for either - erik at splunk dot com.

Congratulations again to the puppet master Luke and his team for building one of the most exciting pieces of IT software in a long time.

]]>
Michael Baum: Splunk Live Taipei Breaks All Recordshttp://blogs.splunk.com/thebaum/2009/10/05/splunk-live-taipei-breaks-all-records/http://blogs.splunk.com/thebaum/2009/10/05/splunk-live-taipei-breaks-all-records/Mon, 05 Oct 2009 18:27:46 +0000Michael BaumMore than 300 people attended Splunk Live Taipei last week and our partners at Systex hosted an incredible show of Splunk use cases, customer speakers and hands-on labs. The Systex Splunk Lab provided attendees with the opportunity to use Splunk with CICS and IBM System z mainframe data, Windows, servers and desktops, Unix and Linux, customer service operations environments, telco provisioning environments and more.

I'll be posting separately on the hands on the Systex Splunk Lab.



Our first guest customer speaker was Yi-Lang Tsai(蔡一郎) the Taiwan Chapter Chief Security Officer of the Global Honeynet Project and the Division Manager of the National Center for High-performance Computing, a Honeynet Project sponsor. Yi-Lang is also a freelance writer with more than 30 books published on operating systems, network and system security and IT management. He presented the very important botnet work Honeynet Project is doing and showed how his team is using Splunk to deepen their research and expose what they find to the Honeynet audience of security professionals worldwide.

What is Honeynet?

The mission of the Honeynet Project is to learn the tools, tactics, and motives of the blackhat community, and share the lessons learned. Honeynet is an all volunteer organization of security professionals around the world dedicated to researching cyber threats by deploying networks to be hacked. The goals are

  • Awareness: to raise awareness of threats that exist,
  • Information: for those already aware, tech and information about threats and
  • Research: To give organizations the capabilities to learn more on their own.

Honeynet is completely open source and all of the work, research and findings are share. Everything captured is happening in the wild (there is no theory). The organization has no agenda, no employees and no product or service to sell.

Honey is simply a “high-interation” honeypot attracking any and all cyber threats and attacks. It is architecture, not a product or software that gets populated with live systems donated and run by the various Honeynet chapters globally.

Once the Honeynet is compromised, data is collected, correlated and analyzed to learn the tools, tactics, and motives of the blackhat community. Specific benefits to the global community of security professionals are the

Research : Identifying new tools and new tactics,
  • Profiling: Generating and maintaining lists of blackhats,
  • Protection: Early detection, warning and prediction,
  • Response: Forensics and incident response and
  • Self-defense.

    Taiwan Honeynet Chapter’s Environment

    Yi-Lang’s environment at the Taiwan National Center for High Performance Computing disitribuytes Honeynet/Honeypots to the Taiwan Education Network, Taiwan Chapter members and the GDH project. The environment makes heavy use of virtualization in its deployment, you might call it a “Virtual Machine Honeynet.” Its running on an advanced blade server with 128GB of memory running VMware ESX. The blade server uses either SAS OR SSD storage. More than 200 Windows 2K/2K3, Windows XP/Vista/7, Linux and FreeBSD servers run in high and low interaction honeypots.

    The Taiwan Honeynet deployment is distributed across four different data centers in different geographies Taipei, Hsinchu, Taichung and Tainan. This distributed topology allows the honeypot to have a broad reaching capture network and makes use of idle network and CPU. This large-scale Honeynet deployment supports:

    • Malware Collection and Analysis
    • Honey-Driven Botnet Detection
    • Client -Side Attack
    • Malicious Web Server Exploring
    • RFI Scripts Detection
    • Fast-Flux Domain Service Tracking
    • Research Alliance
    • Distributed Search and Analysis on Honeynet Data

    Why Splunk?

    The Taiwan Honeynet teams uses Splunk to collect and manage information from the distributed Honeynet infrastructure including GBs of logs, 400k+ connections, 2GB+ of traffic flows and tools events and metrics.


    http://blogs.splunk.com/thebaum/wp-content/uploads/2009/10/allindexdata.png

    Data analysis is performed against a variety of pivot points that are automatically extracted from the Honeynet data sources. Date & Time, Malware Source IP address, Destination IP, Protocols, Files name and Malware MD5 are some of the main fields Splunk identifies and provides to the team for deeper analysis. In addition to Splunk searches and reports the team has built custom geo-dashboards with high resolution displays by tapping into the Splunk API.

    This interactive geo-view provides the team Botnet detection, malware presence, Honeynet traffic flows and an instant status report all from one location.

    Yong Sweah Liang (Linus), VP, Head of Infrastructure and Technology for Infocomm Asia Holdings Pte Ltd (IAHGames) was our second customer speaker.

    IAH is an online game company operating some major properties including:

    • EA SPORTS™ FIFA Online 2
    • Granado Espada
    • Dragonica
    • Distribution of Box products
    • BioShock®
    • Grand Theft Auto IV



    ]]>Paul Pang: SplunkLive@Taipei 2009http://blogs.splunk.com/paul/?p=4http://blogs.splunk.com/paul/?p=4Fri, 02 Oct 2009 15:46:20 +0000Paul PangWe are very excited this week that there are more than 300 IT guru to join our SplunkLive event in Taipei. This is the largest SplunkLive event that we ever have in Asia !

    Over 300 IT guru to join our largest SplunkLive event

    Systex Team has done a great job that has over 13 different Splunk show cases about how the Splunker can make use splunk from Application monitoring in online e-biz and stock trading, Mainframe and Oracle troubleshooting, Unix,Windows and Networking management.

    There is also a very nice arrangement that Systex team has setup more than 20 PC for all visitors to have the first hands experience with lastest splunk 4.0.4 Chinese version.

    - SplunkNinja are excited on the latest Chinese version of Splunk !

    The most hottest session in the event is the great presentation from our honorable guest Mr Linus Liang from IAH (http://www.iahgames.com), the largest Online Game service provider in South East Asia, and Mr Yi-Lang Tsai from HoneyNET Project Taiwan Chapter (http://www.honeynet.org.tw/). Mr Tsai has shared with the team about the amazing of applying Splunk IT search technology in Botnet investigation and detection. Because of the requirement to actively investigate over hundred of honeynet /honeypot servers from 6 class C networks, Splunk is the critical data engine for Tsai's team to quickly pinpoint, manage and analysis malware behavior.

    - Using Splunk to replay the malware and botnet activities in real time

    This is the most memorable day for us. Thank you very much to Mr. Linux Liang, Mr. Tsai and our very hardworking Systex team members !!

    Press release in Chinese : http://www.systex.com.tw/news/news_2.asp?Bkey=225

    ]]>
    Tina Phi: LDAP *BaseFilter Exampleshttp://blogs.splunk.com/tina/?p=4http://blogs.splunk.com/tina/?p=4Thu, 01 Oct 2009 17:39:28 +0000Tina PhiFour blog posts and three of them relate to LDAP.  This must be a complicated topic!  It can be and that is why I break it up into chunks that should be easier to digest.

    This post will be short and sweet.  I want to provide a few examples of userBaseFilters and groupBaseFilters that you can use in your configuration to make your Splunk experience, hopefully, better.

    When you specify a userBaseDN or groupBaseDN without a filter, you are asking your LDAP server to return all entries residing beneath the specified baseDN.   In 99.9999999999% of cases, you don't actually want all entries.  This is where *BaseFilter configuration comes in handy.  Now, let's go right to the examples:

    If you're using AD, you can use the following userBaseFilter to return ALL person-type entries that are NOT disabled (We can thank our friend Gerald K. for this one):

    userBaseFilter = (&(objectcategory=person)(objectclass=user)(!(userAccountControl:1.2.840.113556.1.4.803:=2)))

    A method that I have suggested is to use the memberOf attribute as a filter.  If your user entries contain this attribute, it consists of the DN for the group that the user is a "member of".  In this following example I list out 3 memberOf values to filter on:

    userBaseFilter = (|(memberOf=CN=Splunk Admins,OU=Groups,DC=splunksupport,DC=com)(memberOf=CN=Splunk Power Users,OU=Groups,DC=splunksupport,DC=com)(memberOf=CN=Splunk Users,OU=Groups,DC=splunksupport,DC=com))

    Please feel free to comment with your own examples!

    ]]>
    Erik Swan: (I’m Back!) The return of Splunk Free, as in Free Beerhttp://blogs.splunk.com/erik/2009/09/23/coming-soon-the-return-of-splunk-free-as-in-free-beer/http://blogs.splunk.com/erik/2009/09/23/coming-soon-the-return-of-splunk-free-as-in-free-beer/Wed, 23 Sep 2009 19:03:42 +0000Erik Swan*** Update 10/26/09 ***
    Free is Back!!

    Well it never really went away, but not its easy to run the free version of splunk.

    Downloads still contain an enterprise 60 day license, but you can covert to the free product at time you like and use it like a champion.

    beer
    Back several months, before the launch of 4.0, we were confronting at all the work ahead. As always, we had to make hard decisions about what is in and what is out. In 4.0 we had re-implemented much of the UI and a good chunks of the backend. With over 1000 paying customers and looking at a potentially challenging upgrade process and a huge testing task we needed to reduce risk to the schedule and product quality. It was a hard decision but we reduced the GA risk by pulling out the Free product until we GA'd and fixed most of the critical bugs. Our guess was that it would take 45-90 beyond the GA to get few maintenance releases out before we could test the free product.

    Again, this was a hard decision since we know that our free product helped us get a large and loyal user base. It was hard as it has been our motto to always have a high-quality and useful free version. But at the time we needed to get 4.0 out to our largest customers and we could not wait.

    Anyhow, cut to the end of the story, we will soon release the free product again. As always, its full of cool features and we know that is a good place for many people to start to their Splunk experience. Next time we have a major upgrade I hope that we have sorted out the free product by the time we GA.

    If you have questions about free or anything else just drop me an email -> my first name at splunk dot com.

    e

    ]]>
    Michael Baum: Splunk Live Washington DC 2009http://blogs.splunk.com/thebaum/2009/09/17/splunk-live-washington-dc-2009/http://blogs.splunk.com/thebaum/2009/09/17/splunk-live-washington-dc-2009/Fri, 18 Sep 2009 06:34:15 +0000Michael BaumObama-nomics is highly visible in our nation's capitol these days. The DC economy is humming as our tax dollars are hard at working fueling all kinds of government spending.With more than 100 attendees at Splunk Live on Thursday we certainly were not disappointed in our quest to help make all this growth in government more efficient! Managing large networks and security forensics were the hot topics of conversation at Splunk Live Washington, DC where everyone was treated to a trio of three incredible speakers.

    Our first speaker was Andy Purdy, the Co-Director, International Cyber Center, George Mason University and the Former Acting Director, National Cyber Security Division (NCSD) and US-CERT Department of Homeland Security. Andy was a member of the White House staff team that drafted the U.S. National Strategy to Secure Cyberspace (2003) and served on DHS tiger team that formed the National Cyber Security Division (NCSD). He was 3 1/2 years at DHS, the last two heading the NCSD and US-CERT as the “Cyber Czar” of the U.S. Andy is also a Special Government Employee on the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software. He is also a partner with the law firm of Allenbaugh Samini Gosheh, LLP.

    The Constantly Changing Threat Landscape

    Andy talked with us about the changing threat landscape and lessons learned from past approaches to cyber security that can be applied in a forward looking approach to Risk Management and Compliance.

    Since much of his experience has been spent preparing the country for what cyber threats are coming next, Andy thinks of IT security as a war fought in a constantly morphing theater with new technologies and vulnerabilities and new motivations and threats.

    A Different Approach Moving Forward

    For anyone serious about security this is a sound perspective whether you are a government agency, a major enterprise or a small business. But, the balance between open networks and services and robust security remains one of the major challenges for IT organization. Andy pointed us to lessons learned from his past, fueling a vibrant conversation during the customer and speaker roundtable. Perhaps the most important thing I heard was it’s not enough to prepare for the last war, or the last successful attack. While perimeter defense and legacy standards for network security are provide some measure of security, those measure are very often insufficient to deal with the new threats that seem to be gaining in sophistication at an accelerating pace. Andy encouraged us to focus on adopting new requirements and security infrastructure for situational awareness and control.

    Greater sophistication, slower, lower-level attacks, greater knowledge about the targets (data, activity, vulnerabilities) are all contributing to the need for near-time visibility on a large-scale. This has become far more important than sub-second correlation of known attack vectors against discrete sets of network devices.

    "NIST perspective: Continuing serious cyber attacks on federal information systems, large and small; targeting key federal operations and assets. Attacks are organized, disciplined, aggressive, and well resourced; many are extremely sophisticated. Adversaries are nation states, terrorist groups, criminals, hackers, and individuals or groups with intentions of compromising federal information systems."

    Andy went on to discuss how the effective deployment of malicious software causing significant exfiltration of sensitive information (including intellectual property) and potential for disruption of critical information systems/services has made detection of inforation and data leakage a key government and enterprise security requirement.

    Bob Flores, Former CTO and 31 year veteran of the CIA was our next speaker. Bob retired from the CIA six months ago and is now President and CEO of Applicology, providing cyber security and IT strategy consulting services. In his 31 years at the CIA, he held various positions in the Directorate of Intelligence, Directorate of Support, and the National Clandestine Service. Most recently he was the CIA’s CTO where he was responsible for ensuring that the Agency’s technology investments matched the needs of its many missions. Bob has a Bachelor and Master of Science degrees in Statistics from Virginia Tech.

    Quis custodiet ipsos custodes?

    Brush up on your Latin! "Who’s guarding the guards" was the topic of Bob’s talk. Insider threat in an every changing threat landscape was and remains our number one cyber security risk.

    "Defense-in-depth isn’t just about putting adequate technology in place, it’s also about paying attention to your people and implementing policies and procedures to reduce the likelihood of an insider attack."
    - Dawn Cappell, CERT

    The simple but not so obvious model Bob pursued at the CIA was an extension of the ISO stack to include the non-technical but motivational additions.


    We need to worry about all levels of the stack including layers eight and nine because we all have people messing around at various layers with applications, scripts, communications etc. And their motivation is often very clear.

    Nemo repente fuit turpissimus! Or no one ever became thoroughly bad in one step!”

    The point is people don’t just wake up one day and decide to be bad. They are motivated over time by larger causes and in EVERY CASE leave a trail of clues behind that can’t entirely be covered up.

    What to Do?

    According to Mr. Flores the focus needs to be on real-time visibility. You need visibility into who (or what) is perturbing your enterprise right now and over time. You can tediously review the logs of each device and user as the CIA used to do or you can take advantage of Splunk.

    "Splunk may not be the best thing since sliced bread, but it’s pretty darn close."
    - Bob Flores

    Why Splunk?

    Why did the CIA choose Splunk over so many other security forensic solutions? It all comes down to how easily and scalable Splunk can eat any logs, events and messages Bob’s organization throws at it. Combine that with the real-time search, alert and reporting and over time statistics and analysis on

    • user behavior,
    • network behavior,
    • system and application activities and
    • configuration changes
    • user customizable dashboards to enforce who can see what about whom and full data segregation and access auditing by user or role and you have the answer.

      Our last guest speaker was David Duvall, Infrastructure Architect at Discovery Communications. David is a lead technical architect working with teams across four continents to build critical systems and keep them running. Discovery is one of my favorite cable channels. If you haven’t seen it the series entitled Man Versus Wild, is just awesome. I won’t spoil it for you. Check it out. Discovery the world's number one non-fiction media company with more than 1.5 billion cumulative subscribers in over 170 countries. They run 100-plus worldwide networks, led by Discovery Channel, TLC, Animal Planet, Science Channel, Planet Green, Investigation Discovery and HD Theater. Yes all the good stuff that makes having a cable or satellite subscription service worth while.

      We’re Going Public!

      And oh we have just 16 Months to show SOX compliance. Discovery went public in September, 2008. The company knew they needed a log consolidation system for retention of at least 13 months worth of data with minimal time for rollout. They couldn’t spend a quarter implementing a new solution. The in-scope SOX environment includes

      • 50 domain servers on 4 continents,
      • Unix syslog,
      • WebSphere app server logs,
      • Client desktop logs,
      • Network backup status logs,
      • WMI Windows event logs,
      • Cisco, Juniper and F5 network device logs,
      • NetApp filer logs and
      • Oracle database logs.

      Splunk Deployment

      Discovery’s Splunk deployment took 1.5 weeks from start to finish. David was responsible for the installation and personally downloaded and installed Splunk, read the Splunk docs, wikis and got up and running without weeks of services. Most data sources are streamed to Splunk over the network from their native logging facilities.

      "I knew I could get Splunk up and running quickly to ensure I captured all the data. Then I could take my time to figure out what I wanted to do with the data."

      Approximately 100 Windows servers were outfitted with Splunk light- weight forwarders to bring Windows event logs, native files and registry change information into Splunk. Oracle database logs are stored in SQL tables and David was able to set-up a scripted Splunk data input which acts like any other SQL client to grab the Oracle database logs on a scheduled basis.

      Compliance Reporting Made Easy

      Once the initial deployment was complete, David and turned his attention to working with the company’s SOX auditors and department heads to develop the compliance reports required to demonstrate compliance with all the necessary controls.

      "As the auditors questions change from week to week—it’s easy to pull new data and generate ad-hoc reports."

      Using Splunk’s role-based access controls, David and the auditors then developed an implemented policies to guard the data and reports including audit reports to prove only the necessary individuals are using the information and to prove authenticity of the data itself. The auditors really like the secure audit trail and signing of data from source of origin all the way through to the Web-based control reports.

      Lessons Learned

      Adoption of Splunk proved easier than David and the audit team imagined because many of the IT team at Discovery had already downloaded and used Splunk for other tasks.

      "When you explain Splunk as “Index and Search” you’re glossing over a lot of the value. Dashboards that correlate failures from different sources and troubleshoot different environmental items are priceless."

      ]]>Michael Baum: Splunk Live Princeton 2009http://blogs.splunk.com/thebaum/2009/09/16/splunk-live-princeton/http://blogs.splunk.com/thebaum/2009/09/16/splunk-live-princeton/Thu, 17 Sep 2009 03:50:21 +0000Michael BaumWednesday and we're at Splunk Live Princeton, NJ. What an awesome place. Princeton is home to a great university and some great culinary experiences. Check out Mediterra - an interesting mix of Italian and Spanish influences. Apparently it's where all the Princeton parents treat their kids to dinner when they are in town. Next store to our venue was the great hope for the state of NJ - a new Governor. The current Governor has turned the state budget and tax base into toxic waste. Well things went much better for the more than 60 Splunk Live attendees in Princeton today, who gained insight into how a number of large Splunk customers keep their mission critical applications running in a time of IT budget slash and burn.

      Matthew Stevens, Director Software Systems and Architecture at Comcast provides guidance to Comcast executives on mission critical media systems and strategic systems architecture. Comcast is the country’s largest provider of cable services serving 23.9 million cable customers, 15.3 million high-speed Internet customers and 7.0 million Comcast Digital Voice customers.

      Comcast Developer Network

      Matthew's latest project is the Comcast Developers Network a Comcast-scale secure web services platform for the development of cool new media and entertainment offerings. The Comcast Web Platform environment generates of billions of software events each day from caching and load-balancing, origin application servers, databases, middleware and content delivery networks for images and video streams. Comcast services demand high quality. Much of the Comcast content is exclusive and premium services drive revenue. Interfaces between technology components (applications, delivery platforms) need to adhere to best practices to ensure the highest degree of end customer experience.

      Why Splunk?

      Comcast has acquired many system and application management platforms over the years, but nothing was providing the team with the robust information from operational telemetry the teams around the company need to ensure data integrity, stability, application quality and efficiency. Several efforts specifically drove Comcast to consider and deploy Splunk.

      • Product rollout: The team wanted the ability to predict and correct potential issues before going live into into production—Splunk has become a required best practice for new product rollouts.
      • Network/ System Integrity: Understanding security and user experience across a very large network and set of systems is a must to protect the business. Splunk provides the insight the network and system teams need across many different silos of technologies.
      • Business Intelligence: Having immediate access to real-time events and historical trends allows the various Comcast business teams to react quickly and adapt to changing customer behaviors.
      • Agility: Alerts and Dashboards indicate discrepancies so distributed teams can investigate immediately and remediate failures and attacks.

      Video CDN/CMS Performance

      "In content management systems and delivery networks a devil walks the long tail. If you're facing concurrent hits across the tail of the curve, sharpen your pencil, you've got problems!"

      Splunk helps Comcast understand the risks of instability in our systems, especially during periods of high concurrency. Through pre-production modeling of even patterns and subsequent monitoring of these patterns Splunk pays for itself by helping Comcast avoid deployment of vulnerable systems, downtime, and upset customers.

      Predicting System Imbalance

      Comcast has successfully used Splunk to evaluate potential infrastructure vendor’s solutions and determine if they will balance loads properly across a large, indeterminate infrastructure. Often the answer is no as illustrated here in a Splunk report of resource utilization across various services.

      Splunk has also been utilized to see whether solutions will be resilient to different traffic patterns, helping the company perform predictive analysis before making critical infrastructure investments.

      Load testing is performed during non peak hours and the results are analyzed for system failures over time using the telemetry data Splunk can correlated across various logs, messages and events.

      When failures are found the Comcast team uses Splunk reports to dig deeper into the data.


      Security and Compliance

      In addition to operations use cases, Comcast security and compliance teams leverage the consolidated logs across data centers to enable faster threat assessment and security monitoring.

      • Monitoring for bad actors to trigger alerts,
      • Conducting threat detection over time,
      • Detecting attacks/vulnerabilities in systems and
      • Auditing systems in support of security assessments and compliance.

      What's Next?

      Next up for Matthew and team is the launch of the Comcast CodeBig Platform enabling a network of developers to create content for the network. Some of these developers are already using Splunk in their own managed services like Mashery. Comcast is working to hook the Mashery Splunk installation to their own in-order to provide visibility across multiple services and providers of content and entertainment functionality.

      Chris Abboud manages the Enterprise Systems Management team at Dow Jones - monitoring customer facing infrastructure and applications. Dow Jones provides global business news and information services to millions of consumers and enterprise media groups. Keeping these revenue generating services running 7×24x365 is the highest priority. Chris also manages the DJ service management platforms (Remedy, Knowledge Base, etc.) He's been with the DJ organization for 10 years, in current role for 3 years.

      "Our mission is to address issues before they become service impacting events. Failures are going to happen - we need to make sure people know about them as soon as possible."

      The Splunk Set-up

      The Dow Jones Splunk installation includes

      • Data from 6000+ servers globally,
      • 13,500 + source types,
      • 1,700 network devices (primarily Cisco and Juniper) and
      • Ten distributed Splunk servers in difference geographies index ~100GB a day and provide a new global logging console.

      Why Splunk?

      Each Dow Jones command center now has the ability to know what’s happening before customers do across a wide range of internal and external services. Splunk speeds the time to resolution for email outages that may impact internal users’ productivity and editorial sites downtime that can directly impact to customer service and revenue. Dow Jones has found Splunk generates significantly fewer false positives than traditional monitoring systems and new resources are much easier to manage and deploy.

      Per server monitoring costs have dropped by a factor of 5X

      What's Next

      Next up Chris and Dow Jones will be checking out the Blue Coat and Cisco Apps as they turn Splunk onto those aspects of their infrastructure.

      Talk about doing more with less. Andrew Page in the Office of Information Technology at Rutgers University has seen IT budgets go from lean to next to nothing. In this unprecedented time of state educational cuts, Andrew, responsible for enterprise monitoring and service management has turned on and been turned on by Splunk. The self confessed “ITIL guy” at Rutgers, Andrew oversees operations for systems for 50,000 students on campus in three different geographies (Camden, Newark and New Brunswick. The university's back office supports 27 degree-granting units offer majors in more than 100 fields, with thousands of courses covering the full range of human experience.

      The Splunk Set-up

      The Rutgers Splunk set up includes

      • 2000+ data sources,
      • 1,850 network devices,
      • ~100 Servers: Windows, Solaris, Unix,
      • ~50 J2EE apps
      • 5-10 GB logs and messages / day
      • 95% coverage of infrastructure in Splunk
      • 40+ users
      • Single Splunk Server

      Why Splunk?

      Six months ago Rutgers was facing a number of log consolidation drivers including:

      • The need for real time access for production logs by service teams,
      • Faster cross-silo problem resolution and collaboration,
      • Simplification of problem troubleshooting for load balanced applications,
      • Decommissioning of “critical” monitoring scripts running in home directories and
      • GLBA and PCI compliance and regulatory reporting mandates.

      Fast Implementation

      Can you fully implement Splunk in a few days? Yes you can according to the Rutgers team. From download through basic implementation took 1.5 weeks and only part of a single resource. The Rutgers implementation included roles for data security, form searches and transaction searches, and custom dashboards.

      Performance Management

      Andrew and his team use Splunk to grab performance data. A scripted input makes HTTP calls into running JVMs. The team graphs this data and correlates it to load and error messages.

      Outage Avoidance

      In other scenarios Andrew presented how the Rutgers team finds problems before they become widespread outages. Eight weeks ago a certificate error started causing application failures and could have resulted in widespread outage. It took 6 minutes to answer...

      • Who was affected?
      • What time it happened?
      • What apps were involved?

      Lessons Learned

      Some valuable lessons from the Rutgers team include and emphasis on distributed deployment and the key to speed of installation. Second, think about security before you start. Third, during deployment get others involved quickly.

      We had users on day two. The rule is that if you send in data you get a Splunk account.

      Your early adopters will build their own solutions, but make sure you plan for availability as users become dependent on Splunk quickly and will notice any Splunk outages fast, fast, fast.

      What's Next

      • Expand use in the Application environment,
      • Feed in Oracle databases,
      • Migration to Splunk 4, of course…,
      • Expanded roles and security around roles should be big win,
      • Improved dashboard cache controls and
      • Offer some in-house training in advanced skills.
      ]]>
      Michael Baum: Splunk Live New York 2009http://blogs.splunk.com/thebaum/2009/09/15/splunk-live-new-york-2009/http://blogs.splunk.com/thebaum/2009/09/15/splunk-live-new-york-2009/Tue, 15 Sep 2009 10:10:39 +0000Michael BaumThis week we’re on the East Coast enjoying some fantastic customer presentations and roundtables at Splunk Live events in New York City, Princeton NJ and Washington DC. It's Tuesday and we have more than 100 customers and Splunk users attending Splunk Live in midtown Manhattan. The vibe is electric as we're being treated to awesome talks by IDT and New York Life. At lunch, long-term customer's Bloomberg and AT&T joined the customer roundtable conversation.

      Gabe Arnett, Senior Software Architect at Moody’s demonstrated how Splunk is being used to monitor and troubleshoot the Moody’s Analytics platform. Gabe has more than 15 years of building web applications in financial services, investment banking and e-Commerce. At Moody’s he’s responsible for global development team that develops and supports the newly re-designed client facing website – v3.moodys.com. Moody's is a leading provider of research, data, analytic tools and related services to debt capital markets and credit risk management professionals. The company's products and services provide the means to assess and manage the credit risk of individual exposures as well as portfolios; price and value holdings of debt instruments; analyze macroeconomic trends; and enhance customers' risk management skills and practices.

      Moody’s Splunk environment is utilized by 25 different users and runs on Windows 2003. Splunk provides Gabe’s developers secure access to the logs they need without touching the production devices, servers and applications. His team has built custom searches and a number of dashboards indicating the general health of their applications and service. Custom searches and alerts provide alerts to track errors and access – guaranteeing good user experience. The team also uses Splunk to understand when and where new content isn’t flowing to the v3 platform. A large part of the Moody’s user experience is delivering email alerts and Splunk helps the team track GUIDs to ensure customers receive the alerts they’ve subscribed to.

      The team recently migrated from Splunk 3 to Splunk 4 – taking 30 minutes to perform the upgrade. The Splunk for Windows App has been significantly revamped in Splunk 4 and the Moody’s team is making use of it to monitor through WMI local server resources (disk, memory, networking) and correlate this performance data with the Windows and Application event logs.

      Shay Benjamin, CSO and SVP, Architecture at IDTdesigns and implements network architectures and manages compliance, security and fraud initiatives at IDT. IDT Corporation (www.idt.net) is a holding company focused on the telecommunications and energy industries. Since 1995 they’ve been building hundreds of VOIP switches globally and assembling an international fiber optic network. IDT pioneered VOIP (Voice over Internet Protocol) to create Net2Phone, piloted the first commercial WiFi phone service in the US and has created a prepaid calling card business, which sells 12 million calling cards a month.

      IDT uses Splunk primary for VOIP Call Detail Records (CDRs). The company indexes more than 120 million CDRs per day with six mirrored Splunk server instances. Call Detail Records (CDRs) are somewhat like logs, but with many fixed delimited fields . One or more CDRs are created at each switching or routing point for every VOIP call. CDRs vary between platform devices in number of fields and contents and unlike logs, few CDR fields contain easy-to-read key=value pairs. Although a key piece of maintaining service quality, billing, monitoring network quality and security forensics, working with CDRs is labor intensive and delay wastes labor, time and money.

      IDT needs fast searches across all fields of the CDRs and quick data loading – to allow fast retrieval of call data and cross platform searches to unify results from different CDR formats. Historically IDT utilized a custom RDBMS solution with an application called Call Genius. In their RDBMS IDT was forced to limit the fields that get indexed because indexing of CDRs with an RDBMS is costly as it takes up a lot of space and slows load times. The RDBMS also only indexes fields common to multiple platform’s CDRs. In the RDBMS solution much of the CDR data was put into BLOBs (actually CLOBS) – multiple CDR fields mapped into a single RDBMS field to try and achieve efficiency. But Blobs can be very difficult to search and are difficult to index effectively. The legacy Call Genius application didn’t permit the search of CDR BLOBS.

      Now IDT utilizes Splunk to index all CDR fields. No need to decide what fields to index and cross platform searches are easy without losing specific platform CDR format resolution. There is no longer a need to create BLOBs for efficiency. Engineers and support staff are able to quickly search for any combination of

      • Phone Number
      • IP address
      • Trunk Group Name

      Splunk naturally and easily links search terms across fields and the users just need to enter the phone number or IP and get back the CDR events and transactions.

      Comparing Splunk to the RDBMS solution IDT found searches to be 50 to 100x faster on non-indexed RDBMS data. Indexed fields are also faster in Splunk than in the previous RDMBS solution. Splunk load times for a typical sample average 1 to 5 minutes versus the 20-40 minutes for the RDBMS.

      IDT is in the process of feeding firewall, security, router, IP network, and switch data in into Splunk as well. They’re already discovering Splunk is finding errors not captured by Network Management Consoles and has provided valuable troubleshooting during recent datacenter migrations.

      Most of all IDT is looking forward to discovering new ways to use all the data in Splunk. Heuristic analysis and Business intelligence applications are on the top of their list including the use of Splunk to find human “Family and Friends” networks and drive the development of new commercial programs.

      New York Life Insurance wrapped up the morning session presentations with Aaron Zachko, Assistant Vice President of Information Systems. New York Life’s family of companies offers life insurance, retirement income, investments and long-term care insurance. New York Life Investments provides institutional asset management and retirement plan services. The company has the highest possible financial strength ratings from all four of the major credit rating agencies.

      Aaron is a senior network architect and leads the group responsible for network management, core network infrastructure and network security infrastructure. The New York Life network consists of hundreds of Cisco routers, switches, firewalls, enterprise DHCP and Network Access Control (NAC) devices. The company chose Splunk to satisfy audit and compliance requirements and support the rollout of their NAC infrastructure earlier this year. Currently the team is expanding its use of Splunk into enterprise security forensics and as a multi component-monitoring compliment to their Enterprise Service Management Platform which seems to have one of every kind of monitoring tool already.

      Thousands of users a day go through NAC to access the New York Life network and Aaron’s team needed visibility into the network from a unified infrastructure and services perspective. They use Splunk to monitor failed login events and transactions and unauthorized devices on the network globally. The NAC rollout team has been able to stay in front of issues – identifying them before end users discover the problems. Their custom Splunk dashboards enable the team to easily see trends and spikes in activity across all networking components.

      Operations teams at New York Life have more recently been using Splunk to troubleshoot Application issues.

      An application issue across multiple servers created more than 9M events across 167 different sources. Manual investigation into this kind of problem would have taken days - an extremely complex and time consuming effort. Splunk found the issue in 3 minutes. Now teams can trace transactions across systems in minutes or seconds vs. hours or days. And all without any new instrumentation – just using the artifacts they already had.

      New York Life is discovering what many other Splunk users have too. Enterprise monitoring and service management platforms can tell you something is wrong but Splunk will help you figure out why and where to fix it.

      ]]>
      Nimish Doshi: 40 Days of 4.0: Enriching Data with DB Lookups (Part 2)http://blogs.splunk.com/nimish/?p=16http://blogs.splunk.com/nimish/?p=16Mon, 14 Sep 2009 20:28:48 +0000Nimish DoshiToday, I'm writing as a guest blogger for Bob Fox to create part 2 of enriching data with the Splunk lookup command. Bob had already created part 1, which describes in detail with an example how to use the lookup command to enrich data from external CSV files. Today's topic builds on the lookup command usage showing how to enrich indexed data at search time using an external database.

      To begin with, it is a fact of life that some event data or log data may not reside in files, may not be broadcast on network ports, or even be imported uniquely via a scripted input. This data may, for legacy reasons, reside in a database. The often cited use case in this scenario is that the user would like to correlate some data that resides in Splunk as indexed events with similar fields that reside in a database. Even if a scripted input can be used to uniquely capture the data within the database and have it indexed within Splunk, there exists the issue of having redundant data that has been indexed twice only for the sake of heterogeneous correlation (join between a field in indexed data within Splunk and a field within data located in the database). Some people may not desire to index data once within a database and again elsewhere via extraction methods that end up taking disk space in the secondary index.

      Examples of use cases where data resides within Splunk and related data resides within an external database are easily found. For instance, there may exist a security use case where an investigator is looking at events in Splunk and finds that a particular user has done something questionable. One thing the investigator may want to do is find the user's address location and phone number that resides in a relational database. Using search within Splunk to quickly get to this database data is useful. Another example could be that a proprietary system logs all its access data including IP source addresses that are being used within a relational database, while at the same time the company has firewall data being indexed within Splunk. A correlation between the two types of data within Splunk using the IP address as the common key should be possible. With these types of correlation in mind, I'll go over the steps for setting up an example and provide a link to download it.

      First, decide what field within indexed data within Splunk is going to be used to correlate and enrich data with an outside source. In my example, I have weather data and the field that I want to use is the city field within weather data. For purposes of illustration, my data looks like this:

      Jul 27 08:35:09...city=Nice...

      Splunk will automatically extract the city field with the value Nice at search time. What I'm interested is finding the country location of a city using an external database for the correlation. Again, for simplicity, I'll use a terse database table.

      city country
      Nice France
      ... ...
      Cambridge UK

      Now, let's move on to the Splunk setup. You'll need to add an entry for the lookup to your prop.conf configuration file just like in part 1. Mine looks like this:

      [weather_entry]
      lookup_table = countrylookup city OUTPUT country

      Next, you'll need to define what countrylookup does in your transforms.conf file. In this case, it will call an external Python program.

      [countrylookup]
      external_cmd = countrylookup.py city country
      external_type = python
      fields_list = city, country

      The external command that you write, countrylookup.py, should reside in the bin directory of your application. The city and country terms next to it in the configuration file are the input field name headers used to produce a dynamic CSV table that is sent to a Python CSV standard output writer. The Python program gets its city field input via standard CSV input from Splunk, calls SQL to find the corresponding country, and produces the aggregate CSV output that contains the city with its correlated country. The complete example with instructions on set up can be downloaded from Splunkbase. My example uses the MySQL database, but you are free to change the code to use whatever database you require as long as there exists in this case a Python module to access the database. The final touch is to show you what the Splunk search looks like to get the new country field for my example.

      sourcetype="weather_entry" city=Nice |xmlkv| lookup countrylookup city OUTPUT country

      This will return France in a new field called country. There are a few design considerations that need to be addressed before I conclude today's entry.

      • Use a database index in your DB on the field that is being used to correlate between Splunk and the database.
      • Have the Python program connect to a long running program (application server) that maintains a connection pool to the database to avoid having to reconnect to the database on each invocation of the lookup command.
      • If you know beforehand the number of uniquely matched events in the database will only be in the few hundreds, such as number of unique cities in my case, consider building an internal cache to avoid having to access the database for each corresponding select call. Splunk's iplocation command does this and the source code for iplocation.py is included in your download of Splunk.

      To wrap up, although we didn't discuss the user written Python program to do the lookup in detail, the sourcecode for it is part of the Splunkbase.com download to provide you with one example on how it can be written. The Splunk distribution also ships with external_lookup.py, which has a similar structure for taking CSV input from Splunk via standard input and producing CSV standard output.  I hope today's entry is useful for these types of use cases.

      ]]>
      Michael Wilde: Splunk Ninja - Fields of Dreamshttp://blogs.splunk.com/thewilde/2009/09/11/splunk-ninja-fields-of-dreams/http://blogs.splunk.com/thewilde/2009/09/11/splunk-ninja-fields-of-dreams/Fri, 11 Sep 2009 15:30:39 +0000Michael WildeI spend a great deal of time using, learning and demonstrating Splunk, and recently I had some questions from users on "what can I do with fields?", "how do i make them?", "how do I tweak them?". That inspired me to publish a new Splunk Ninja episode known as "Fields of Dreams".

      In this episode, Splunk Ninja gives an all out tour of "fields" in Splunk 4.0, how they work, how to use them, some tips and tricks as well.

      The ability for Splunk to handle multiple data formats all in a single search index and do "search time field extraction" is unique to the marketplace.

      Additionally, you'll see me take fields and use them to assemble a transaction with Cisco PIX firewall logs. I use the "| transaction" search command to link and calculate the duration of outbound TCP connections.

      Comments, suggestions, or new Splunk Ninja video ideas welcome!

      Note: Often in blogs, this one, and on my site http://splunkninja.com the "fullscreen toggle" buttons don't work properly on videos that are embedded.  I shoot all of mine in 1280×720 (720p) resolution. If you would like to go directly to the episode so you can watch it in fullscreen or even download it, go here:

      Splunk Ninja - Fields of Dreams

      Blogged with the Flock Browser

      Tags: , , , , ,

      ]]>
      Nimish Doshi: Using File Contents as Input for Searchhttp://blogs.splunk.com/nimish/?p=15http://blogs.splunk.com/nimish/?p=15Fri, 28 Aug 2009 18:55:00 +0000Nimish DoshiI've been asked a few times on how best to search for events which may  contain many different discrete values for a field. It's essentially using an OR (disjunctive search) in the search language. For example, you can do this:

      sourcetype=my_sourcetype (planet=mars OR planet=earth OR planet=saturn)

      This works fine for a finite case where you only have a handful of planets, but what happens if the field's possible search criteria changes daily and may contain hundreds of possible values that you would like to input for the search? Certainly, using OR terms with over a hundred entries sounds impractical. A solution is to have an external file that contains all the possible values that you would like to use in the disjunctive search be used within the search language as input to the search criteria. With Splunk 4.0, one way this is possible out of the box is with the new lookup command. For an introduction to this command, please consult Bob Fox's blog entry discussing example usage. For now, I will assume you have basic knowledge about its usage and I will list a possible solution for trying to use OR with many possible values for a field.

      First, use field extraction to extract the field in question. For our example I'll use an ip address field. Next, create a CSV file in your SPLUNK_HOME/etc/app/<app_name>/lookups/ directory. I created iptable.csv with the following sample content to be used for input.

      ip, myip
      192.168.1.105, 192.168.1.105
      10.10.10.2, 10.10.10.2
      192.168.1.10, 192.168.1.10

      Since I'm not interested in creating a real mapping from one field (ip) to another (myip), I used the same value in both columns to conform to the syntactical usage of the lookup command. Now, in your SPLUNK_HOME/etc/apps/<app_name>/local directory you'll need to create or modify two files. First, edit transforms.conf.

      [search_ip]
      filename = iptable.csv

      Second, edit props.conf and use your sourcetype to start the stanza. I am using mail as my sourcetype.

      [mail]
      lookup_ip = search_ip ip OUTPUT myip

      Now, from your browser, log into Splunk and reload the props.conf and transforms.conf file for your new additions:

      sourcetype=mail | extract reload=true

      You are now ready to use your file as input to search for all events that contain ip addresses that were in your CSV file. One possible search is:

      sourcetype=mail | lookup search_ip ip OUTPUT myip | search myip=*

      The last search command will find all events that contain the given values of myip from the file. In essence, this last step will do your disjunctive search for you without having to type in a long sequence of OR terms. Finally, if your requirement is that you want to search on the top N (N is an integer) values for a field each day, Splunk can help you create the CSV input file. Simply run the following search assuming you want the top 100 values for IP in our example:

      sourcetype=mail | top limit=100 ip | fields + ip

      You can then copy and paste the the values into your CSV file. In short, today's blog entry gave you one possible way to use the content of a file for input for your disjunctive search. There may be more approaches and you are welcome to discuss them in the comments.

      ]]>
      Michael Baum: Splunk 4 Down Underhttp://blogs.splunk.com/thebaum/2009/08/27/splunk-4-down-under/http://blogs.splunk.com/thebaum/2009/08/27/splunk-4-down-under/Fri, 28 Aug 2009 05:54:00 +0000Michael BaumI visited Sydney and Melbourne last week to host our first Splunk Live events in Australia. Its my first visit to Australia and I'm really blown away by the friendliness of the people we've met. And the "Australian for Grep" t-shirt finally had a proper home. Attendees at today's event in Melbourne and Tuesday's event in Sydney included an impressive list of current customers and partners and a number of new users evaluating Splunk for the first time including Telstra, Ericsson, InfoSys, Frontline Systems, Fujitsu, GE Capital Finance, Toll Holdings, Vanguard Investments and more. We owe a huge thanks to the team from Digital Networks Australia who sponsored the two events.

      Martin Brown, A Large Australian Financial Services Company

      In Sydney Martin Brown, pictured below with me, gave an excellent presentation on using Splunk for Identity Management Compliance. Martin is a Technical Architect managing the development and operations of the world wide web application security system‏ for a major financial institution. He's had many career evolutions from implantable device electronics and software engineering, UNIX and network systems administration, internet systems management and security.

      Martin's company has a requirement for presenting client security history from their web applications and to be able to access this information to look for suspect IDs from the past six months. Tivoli Access Manager (TAM) is used for both external and internal identity management and access control. More than 200,000 clients authenticate externally through TAM.

      His Splunk deployment is very much out of the box with a range of saved searches and some role partitioning. It consists of a single Splunk server with 1TByte of local disk for retention. The TAM logs are rsynced regularly and directly mounted from various hosts and systems. 12 internal and 12 external TAM hosts generate 5 GB/day of data or ~2TB of data a year.

      The current user base consists of business second level support teams and TAM support group for third level support. The user bases is expected to extend to the Risk Management Group and first level help desk support soon. Their classic use case is

      “Client X's account has been compromised. What applications has he/she logged in to in the past 6 months?”

      The old way required days / weeks of work and support from multiple teams. Often needed to pull in log files from offsite backup tapes then grep through GBytes of data from several hosts. Fun fun. Now with Splunk Martin's team finds answers in minutes and soon will train Tier 1 agents to do the same, eliminating the hassle of Martin's team fetching data for everyone. Next he plans to add App server, Web Server and Load Balancer data, role partitioning to restrict business user access to relevant logs, off-shore implementations to present local application logs, API consumption for helpdesk one-stop-shop interface.

      Nick Clark, Ericsson

      Nick Clark is a Technology Manager in the Solution Management & Utilities Consulting, System Integration & Multimedia practice with Ericsson where the focus is on bespoke support and life cycle management services for complex infrastructures. His group focuses on mobile and fixed network infrastructure, telecom services, software, broadband and multimedia solutions for operators, enterprises and the media industry. He presented his Splunk solution which Ericsson implemented at Telstra in the mobile multimedia services area to troubleshoot problems and investigate incidents. The solution was initially implemented to provide coverage of the 2008 Beijhing Olympics. Telstra predicted massive interest for mobile streaming yet demand exceeded all expectations. Splunk helped Ericsson and Telstra quickly pinpoint, manage and address problems. Because application failures and limits were discovered before they cause serious downtime Telstra maintained an uptime above 99.9% during the Olympic Games.

      Telstra manages more than 10M users and 50 plus content providers on the Telstra Service Delivery Platform providing multiple mobile portals, content transformation, mobile streaming services and device specific rendering and UI over 2G and 3G networks. The environment consists of 60+ servers (Solaris 9/10, Windows 2003) and many platforms and technologies providing service orchestration, rich media content management, encoding and streaming for terabytes of active content.

      Ericcson and Telstra's challenges before Splunk were numerous including:

      • no central view of logs and events resulting in difficult to troubleshoot problems,
      • support and operations diverted to log fetching and ad-hoc reporting delaying work on high priority projects,
      • no consistent approach to log handling and storage making it difficult to locate, access and archive logs and
      • poor visibility of service and transaction flows extending outages.

      The Ericsson team chose Splunk to help Telstra gain a holistic view of the environment, troubleshoot outages more quickly, provide users with ad-hoc reporting and control access to logs with by role. They are currently indexing roughly 20GB per day on a dual processor, dual core Xeon GHz server with 16GB of RAM. 30 support people (tier 1 and up) currently Splunk application, server and network logs and events to troubleshoot problems. The team makes extensive use of Splunk tagging to create alerts for future notification of problems reoccurring. Perhaps the most valuable thing Ericsson has done with Splunk is track end to end transactions on the Service Delivery Platform. With one view across all services and transactions to track activities the team can finally provide transaction level alerting and reporting.

      Thank you again to Nick and Martin for presenting so well and Monsour, Martin and Sky with DNA who did a fantastic job and are representing Splunk very well down under.

      ]]>
      Amrit Bath: API 4TWhttp://blogs.splunk.com/amrit/2009/08/27/api-4tw/http://blogs.splunk.com/amrit/2009/08/27/api-4tw/Thu, 27 Aug 2009 19:03:41 +0000Amrit BathOk, here's a real blog post to make up for that last one. You may have heard that one of the major features of Splunk 4.0 is a brand new REST API. This is the interface that both the CLI and the web UI use to manage Splunk inputs, retrieve splunkd status, perform searches, etc. You, too, can use this API for doing all sorts of good or evil - read on.

      Explore a bit...

      Exploring it is easy - point Firefox at your your local Splunk instance's management port. For example, https://localhost:8089/services is the default. Adjust https vs http as necessary, as well as the port. Note that this is the management port, not the web interface port (which is 8000 by default).

      In a decent browser (my favorite, Konqueror, doesn't seem to cut it :( ), you'll see a list of links, with smaller links beneath each. This is just a user-friendly rendering of our Atom XML feed. View the raw XML by right clicking and choosing View Source, if you wish.

      You can use this set of links to inspect the state of your running splunkd. Drilling down into data/inputs/monitor, for example, displays all Monitor inputs that Splunk knows about. If one of these is a directory, clicking the members link below it will display all the files being monitored within that dir. Note that not all of the links will work by simply clicking on them. The remove action, for example, requires an HTTP DELETE action, whereas edit requires an HTTP POST containing the parameters you'd like to change.

      But APIs are serious business...

      Agreed, you're not going to use the browser with the API for anything more than playing around (although the Poster extension for Firefox is quite useful...).

      If you're familiar with HTTP/REST, choose your favorite library and run with it. Start by making a POST to /services/auth/login with the parameters username=<username> and password=<password>. You'll get a response like the following:

        <response>
        <sessionKey>a48fe44eb76ecf08674954e47c403f24</sessionKey>
        </response>

      Then, simply include this session key in the HTTP headers for any requests you make to the API:

        Authorization: Splunk a48fe44eb76ecf08674954e47c403f24

      And if I'm lazy?

      Don't worry, I'm lazy too. Splunk includes a handy little tool that lets you easily make calls to the Splunk API. For example:

        splunk _internal call /data/inputs/monitor -auth admin:changeme

      Will perform an HTTP GET on https://localhost:8089/services/data/inputs/monitor. Since this is a Splunk utility, it will read your config files and automatically enable/disable SSL on the request, as well as change the destination port as necessary. You can also use -uri to point the request to other servers. :)

      The tool allows for POSTs and other HTTP actions, but more on that in my next post...

      Enough shenanigans, I want a real example.

      Sure.. how about this thing?

      A KDE 4 desktop widget monitoring a handful of boxes around the office.  1 outta 5 ain\'t so bad, is it?

      This is a Plasmoid for the KDE 4 desktop environment. It's written in C++ using the cross-platform Qt toolkit and KDE's Plasma library.

      The entire code will be linked further down this post, but the most important parts are the HTTP request, and the XML parsing.

      We first make a request (using our handy CLI tool, because it's easy) to our REST endpoint for messages, where highly important notices end up:

        // build args.
        QStringList args;
        args << "_internal"
             << "call"
             << "/admin/messages"
             << "-auth" << userPass; // this is OK even in the free version.
        if (!uri.isEmpty())
          args << "-uri" << (QString("http") + (info.useSSL ? "s" : "") + "://" + uri);
      

      (Note that the password is sent as a command line argument - not the most secure thing to do on a multi-user system. Luckily, this is just a tech demo.)

      When the process completes, we check the return code, and then use an XPath query to parse any messages out of the XML returned on stdout:

        // build xpath query with splunk's namespace info.
        QXmlQuery query;
        query.bindVariable("data", &#038;xmlData);
        query.setQuery("declare namespace a='http://www.w3.org/2005/Atom';"
                       "declare namespace s='http://dev.splunk.com/ns/rest';"
                       // choose only the s:key nodes that match their entry node's title.
                       "doc($data)/a:feed/a:entry/a:content/s:dict/string(s:key[(../../../a:title = @name)])");
      
        if (!query.evaluateTo(&#038;messages))
          messages << "Parsing of status failed.";
      

      ...and throw it up on the screen. But you'll have read the code to find that part. :P

      I wanna try it!

      The source code is here, give it a shot.

      Installation instructions:

      • tar -jxvf splunk_status*.tar.bz2
      • cd splunk_status-version
      • cmake . (don't miss that dot!)
      • make
      • At this point you can try 'make install', but on my system I had to manually copy things to the right locations: cp lib/splunk_status.so /usr/lib/kde4/splunk_status.so and cp splunk_status.desktop /usr/share/kde4/services/
      • Rebuild KDE's cache: kbuildsycoca4
      • Restart the Plasma workspace: kquitapp plasma &#038;&#038; sleep 1 &#038;&#038; plasma

        Before you enable it in KDE4, you need to create a small config file by hand. It will look something like the following.

        ~/.kde/share/config/splunkstatusrc:

          [settings]
          cmdPath = /opt/splunk/bin/splunk
        
          [servers]
          localhost:8089 = admin,changeme,ssl
          amritdesktop:8089 = admin,changeme,nossl
          tiny:1236 = admin,changeme
          spacecake:57089 = admin,changeme,ssl
          10.1.1.50:9089 = admin,changeme,ssl
        

        The settings/cmdPath variable is required, as is at least one entry under servers. The latter is formatted as host:port = username,password,(ssl|nossl). Remember that the port here is your management port, not your web interface port. The SSL specification is optional, and defaults to ssl. Be sure you get that one right as well (SSL is enabled on default Splunk installs).

        Now...

        Is anyone actually gonna try this thing? :)

        ]]>Amrit Bath: Reload 4 Authhttp://blogs.splunk.com/amrit/2009/08/20/reload-4-auth/http://blogs.splunk.com/amrit/2009/08/20/reload-4-auth/Thu, 20 Aug 2009 19:54:40 +0000Amrit BathThis will be a very brief post, to fulfill my obligations. I'll share something a little more informative, perhaps even more interesting, in an upcoming post (soon... I promise (kinda) this time).

        As of Splunk 4.0, our old somewhat-of-an-API has been replaced with an entirely new REST API, invalidating my old post on reloading authentication from the command line.

        Sooo..... in 4.x, you can restart the authentication system with the following command:

        $ splunk _internal call /authentication/providers/services/_reload -auth admin:changeme

        Any errors should be obvious in the resulting XML. As of 4.0.3, you'll also get a non-zero return code upon receiving errors from the API. And I've filed a bug (just now) to expose this as a real CLI command, so soon this post will no longer be very important. :)


        Happy now, Simeon?

        ]]>
        Michael Baum: Splunk 4 Lands in the Southwesthttp://blogs.splunk.com/thebaum/2009/08/19/splunk-4-lands-in-the-southwest/http://blogs.splunk.com/thebaum/2009/08/19/splunk-4-lands-in-the-southwest/Thu, 20 Aug 2009 03:29:40 +0000Michael BaumLast week we continued our road show launching Splunk 4 through the Southwestern US in Phoenix, San Diego and Los Angeles.This was our second annual gathering of customers, partners and users and we had more than double the attendees at this year’s Splunk Live events. In the morning we held a three-hour hands on technical workshop. Attendees had the opportunity to install and configure Splunk 4 on their laptops or remote server and get one-on-one assistance from the Splunk team. Afternoon sessions and dinner focused on customer presentations. We’re very grateful to all the presenters who took time out of their busy days to share with everyone how Splunk is transforming their IT environments. I captured some notes from the week and thought I'd share them with you.

        Early Warning

        In Phoenix we had a packed house at the Sanctuary conference center on the side of Camel Back Mountain. At 109 degrees I decided against hiking up it in the early AM. Dave Bridgeman, Data Security Engineer at Early Warning kept things cool showing the audience how his company’s use of Splunk in their security operations center. Early Warning collaborates with major financial services companies to facilitate fraud detection through shared information and knowledge in cross-institution environments. The company has an interesting history having spun out of First Data and is now primarily owned by Bank of America, BB&#038;T, JPMorgan Chase and Wells Fargo.

        Dave is a well rounded IT professional who started as a developer then moved into network and security management. He current leads the data security team for Early Warning. The environment he over sees includes a variety of platforms including AS400s, MP300s, AIX, Solaris, Linux and Windows. He uses a combination of Splunk forwarders and syslog forwarders to collect Java and Cobol application logs and FTP/SFTP networking logs.

        The Early Warning Splunk installation is designed to track transactions and users from one bank to the next in cross-institution activities. Transaction ID tracing correlates events across applications and services and Splunk alerts the team when jobs fail so the operations and development teams can securely troubleshoot issues on the fly. And remote accessibility mean no more driving into the office to access locked down servers in the middle of the night. On the security side of things Splunk helps Dave’s team track and monitor known fraudsters and bad user names allowing them to stay vigilant when monitoring external attacks. They also use Splunk to deliver reports for customers, executive committee members and the Security Advisory Committee (with representatives from the founding banks).

        Amkor

        Henry Grant of Amkor a $2.1B provider of packaging/assembly and testing services for the semiconductor industry also presented an overview of how his Corporate Data Center team uses Splunk. Henry overseas operations for the company’s SAP, PLM, Supply Chain, Hyperion and Oracle systems. Amkor has a heterogeneous environment of Sun Solaris, IBM iSeries, Cisco ASA firewalls, packaged and custom web and J2EE applications and TACAS/Radius accounting and access control technologies. With manufacturing locations in China, Japan, Korea, Taiwan, Singapore and The Philippines and headquarters in Chandler, AZ, the Amkor team is challenged with log and event data overload. GBs of data a day generated at multiple points makes operational troubleshooting and security investigations extremely complex.

        SOX Compliance

        Proving SOX compliance has traditionally been handled by writing and maintaining scripts to collect and report on errors, access controls and log access activities. It was impossible to segregate duties given the lack of access control to the logs and events themselves. Splunk has taken the place of the awkward script writing and maintenance to collect iSeries, Unix and application events and logs and provide automated schedule reports. The team is now expanding the Splunk footprint to handle network and Oracle logs as well.

        Application and System Monitoring

        Like most enterprise IT shops, Amkor has figured out that traditional point monitoring tools aren’t enough as they have a hard time scaling to all the modern day technologies, require intrusive agents and only work for known events but don’t handle anomalies and unknowns. Too many issues end up being reported by end users themselves rather than the monitoring systems. With Splunk Henry’s team detects event anomalies in real time and has dramatically cut their response time by hours per incident.

        Tools for the Help Desk

        Sometimes it’s the simple things that can cut your response time, escalations and IT budget. The Amkor team noticed a lot of calls and emails regarding VPN set-up and access across the company. With Splunk level 1 help desk agents are now able to resolve most of the VPN issues without creating an escalation. Henry’s team built a VPN dashboard driven by a series of searches and reports that gives entry level help desk personnel the insight they need to troubleshoot problems right away.

        Henry’s Splunk Tips

        The best part of Henry’s overview were the tips for a successful Splunk implementation. I’ve included the list here in hopes that these may help you as well.

        • Provide training that caters to each group’s need.
        • Utilize the deployment Server.
        • Develop a Common Information Model.
        • Update and change as needed.
        • Use Tagging to Normalize Data.
        • Monitor Scheduled Compliance Reports by using the Audit Logs.
        • Splunk into your processes where possible.
        • Setup Test/Dev Environment and a Test/Dev Index .

        Intuit Consumer Group

        The Intuit team of Jeff Ludwig, Chief Architect and Larry Raab, Architect of the Consumer Group joined us to share how use Splunk in production support operations. Jeff leads the Consumer Group’s Connected Services Development for electronic and print tax and payroll filings for TurboTax, ProSeries, Lacerte and QuickBooks. Larry speciali a large-scale, highly available application and systems architect responsible for the consumer group applications and infrastructure.

        While the original use for Splunk at Intuit was application management, Jeff and Larry covered three additional ways they have applied Splunk including reliable monitoring, improving user experience and large-scale reporting for compliance and business intelligence.

        Application Management

        Inuit’s Consumer Group problem is very common. Several services, dozens of machines per service, dozens of log files per machine. Tracking down error logs took hours and correlation across logs and services was nearly impossible. With Splunk the team finds answers in minutes, keeps developers off of production machines and can now correlate across the entire organization and environment – something that is providing them with incredible new insights.

        Reliable Monitoring

        Jeff and Larry summed up their legacy monitoring systems in this way, “Monitoring tells us WHAT, but Splunk tells us WHY.”

        The Intuit Consumer Group team uses lots of other monitoring and alerting tools for networking, servers and applications, but Splunk tends to be more reliable and is the most powerful in terms of features and speed. But the biggest advantage Jeff and Larry see to integrating Splunk with their current monitoring systems is that they can create ad-hoc alerts with Splunk – getting smarter about their environment on the fly.

        Improving User Experience

        For Intuit’s Consumer Group, when it comes to tax and payroll offerings every transaction completion is critical. But, each transaction goes through several services and many different technologies. Splunk consolidates disparate pieces of the transaction environment so the team knows when something goes wrong and how to fix it.

        As Jeff points out, “with Splunk we’re get more intelligent about our users behavior so we can offer them a smarter and better experience.

        Large-Scale Reporting

        Consolidating Intuit’s Consumer Group’s messages, events and logs has finally make reporting easier and faster for

        • Internal data and security audits.
        • Financial audits.
        • Operational metrics and statistics to plan future deployments and developments.

        Thursday we headed up the 405 from San Diego to LA for the last of our Southwest tour. The W Hotel in Westwood was once again the location for our second annual Splunk Live LA. It was a lively scene around the hotel which is just blocks from the Federal building where a police chase ended up in day of traffic snarls and helicopters hovering noisily overhead all day.

        Fortunately we had Jon Hart, Manger of Production Engineering at Edmunds and Jeremy Custenborder Senior Performance Architect at MySpace to share how they have deployed and are using Splunk.

        MySpace

        We were fortunate enough to have Jeremy Custenborder, a Splunk fan and Senior Performance Architect at MySpace drop by to share his experiences identifying and troubleshooting performance issues with Splunk. Jeremy is responsible for performance management across multiple datacenters and thousands of database, web, indicator, index and cache servers and switches, routers and load balancers for MySpace.com.

        Lots of MySpace friends generate gigabits of traffic at a time and Jeremy makes serious use of Splunk to keep on top of overall site performance.

        Jermey says, "Unstructured data rocks!" I happen to agree with him. His advice is to get the data into Splunk, then figure out what to do with it.

        His Splunk installation includes four indexers per datacenter on a 1 GB network with Raid 1+0 volumes; four cold storage servers on a 1 GB network with Raid 6 volumes and two distributed search servers.

        Data gets into Splunk in a variety of ways Unix servers use syslog, Windows servers use a custom MySpace agent, .Net applications make use of a Splunk log4net appender Jeremy wrote and has published for others to use as well. The Splunk log4net appender provides both UDP and TCP based transport with failure detection and dynamic configuration via DNS. Why didn't I think of that? DNS makes total sense for forwarder configuration.

        You can download the Splunk log4net appender. It is available for use under an MIT license.

        Today Jeremy has Splunk performing real time alerting of error data, searches for patterns of suspicious behavior and uses data from Splunk to recreate error in development environments. He plans to start building custom dashboard for development with data specific to each development team and is busy integrating the MySpace performance monitoring system with Splunk to get early detection of new trends and provide fast right click investigation from the performance console.

        Edmunds.com

        Edmunds has been using Splunk for almost two years now primarily in fraud and security operations. The company is a incredible resource for automotive consumers and enthusiasts. Jon is a self professed Security Ninja and SysAdmin who enjoys racing cars and mountain bikes when he’s not Splunking security incidents. Data comes into Splunk via syslog, a custom agent for windows event logs and .Net application data via a custom log4net appender Jeremy wrote and has published.

        Edmunds has more than a thousand devices and servers powering their business with many different logging mechanisms and locations.

        Like many enterprises they previously built their own log analysis tools but have replaced those efforts with Splunk. In Jon’s words, “we’ve got better things to be doing around here!”

        Edmunds Splunk environment consists of

        • 11x 8-core, 64-bit, 16-32G RAM, 300G 15k RPM local disk, 2T NFS (3.4)
        • 6 indexers, 2 Splunkweb (1 corporate, 1 production)
        • ~60-70G/day, increasing to ~100+G/day soon
        • NFS, syslog-ng, Splunk forwarders
        • Apache, WebLogic, F5, Oracle, Web Crossing, a metric ton of syslog
        • 9 sources, 6 sourcetypes, ~1000 hosts
        • distribute search (Splunkweb, CLI) across all indexers
        • Centralized Splunk management FTW
        • 10 classes outside of per-machine classes
        • Multi-membership
        • LDAP + AD integration, per-group authorization and

        So what is Jon and Edmunds doing with this set-up?

        Real-time Alerting and Historical Trending

        Edmunds uses Splunk to monitor the good, the bad and the ugly. Good includes traffic trends are tracked and reported on to ensure revenue and analyze trends. Bad consists of port scans, aggressive spidering by search engines and other bots and device failures. And ugly is of course anything that disrupts revenue and Edmunds money making IT look bad.

        Developers, engineers, admins, analysts and even managers have visibility into everything. For every application, there are easy Splunk forms for things like errors by environment , host or time including cross-application (think web tier <-> app tier correlation).

        For everything that logs data, Edmunds appends a few simple pieces of data that makes everyone’s job a lot easier. I’ve never seen an organization so organized with their logs and events!

        • Environment (PROD, TEST, QA, DEV, etc)
        • Tier (App, Web, DB, Admin, etc) and
        • Normalized source name (“apache” instead of /var/log/httpd/…)

        Using this simple organization and a few Splunk search commands, Edmunds drives a series of daily and weekly trends like daily, weekly “Top X” error reports for Web and Application tiers. These trends can also can an eye on the complete build process for monitoring of error diffs between data and build numbers allowing Edmunds to catch error before production code rolls. Developers, not administrators can now monitor and diagnose errors during the development process more effectively. Recently this type of diagnosis and trending has been used to even prioritize development tasks. For example, when someone complained that a particular feature didn’t work with a particular version of Microsoft Internet Explorer, the developer in charge used Splunk to become the voice of reason, discovering the issue impacted only 0.06% of traffic to Edmund’s web sites.

        Security

        Edmunds has taken a similar approach to simply organizing their Security logs and events by normalizing data from Cisco devices, Netscreens, Sourcefire and Access Control Systems. Normalized fields include src_ip, dst_ip, src_port, dst_port, and protocol. So searches like startdaysago=1 src_ip=1.2.3.4 dst_port=80 will work regardless of log format. Now Jon can easily answer the question of “Who done it?” Without a single source for all security data and cross-device correlation that was previously this use to take a long time and often be impossible.

        Before and After

        Jon offered this comparison in a example of life before and after Splunk. Edmunds makes heavy use of HTTP logs for all kinds of work. Recently an HTTP log from 6/5/2009 (7G compressed, 60G uncompressed, 115M events) was used with a goal to find the top 10 referrers generating 404 (not found) errors. Before Splunk he'd Gzip/grep/awk/sort in about 7 minutes time. With Splunk he can index in Splunk, search, sort in a mere 58 seconds. Summary indexing in Splunk reduces that to 13 seconds. And this is all on Splunk 3.10. When Jon migrates to Splunk 4 he will be 5 to 10 times faster still.

        Summary indexing is a great way to calculate ongoing stats in Splunk and Edmunds makes use of it not just for referrers but for status, method, URI, and UserAgent. Then they combine summary indexes for status, method, URI and referrer across WebLogic, Oracle, Tomcat and Apache to baseline different types of transactions and monitor anomalies.

        The Bottom Line

        Even though Jon is highly technical, he has been incredibly effective at translating the benefits Splunk brings to Edmunds in business terms. He’s learned this is the only way IT gets to make new investments. He justified the purchase of Splunk by demonstrating it has drastically reduced MTTR for revenue impacting incidents and helped ensure a steady flow of online ad revenue from the four Edmunds Web sites. But the IT and Security teams at Edmunds know there are a number of other advantages. The continuous improvement through automated error reporting and trending, elimination of the “log god” bottleneck, much more productive cross-team debugging and investigations and being able to satisfy that “I wonder if . . .” curiosity in the every day course of doing their jobs are all make their jobs a lot easier to do.

        What’s Next at Edmunds?

        The Splunk deployment continues to move forward at Edmunds. On Jon’s list of improvements for the next several months are

        • Dedicated summary indexers.
        • Redundancy.
        • Longer retention periods.
        • Double indexing volume by 2010 (more RAM, more storage) .
        • Windows event log.
        • Splunk 4.0 migration.
        ]]>
        Nimish Doshi: Indexing and Searching RSS feedshttp://blogs.splunk.com/nimish/?p=13http://blogs.splunk.com/nimish/?p=13Wed, 19 Aug 2009 20:22:55 +0000Nimish DoshiMany companies produce RSS (Really Simple Syndication) feeds for their employees, partners, and customers. Moreover, these same companies consume RSS feeds from their suppliers whether it be personal news information or more timely business data. RSS is a great way to digest this information, but after a certain period, it may not be possible to find it again. If information from a RSS feed were indexed on a regular basis, say every 10 minutes to 30 minutes, into Splunk it could be searched at anytime. To accomplish this, I've created a simple Splunk application to index some RSS metadata (date, title, link, and description) on Splunkbase. Simply download the application and install it into your $SPLUNK_HOME/etc/apps directory. Then, modify its inputs.conf file. For example:

        [script://./bin/rss_sports.sh]
        interval = 600
        sourcetype = rssfeed
        source = rss_sports
        disabled = false

        Next, create a script in the rss/bin directory that is called by the scripted input. A sample one has already been provided as follows:

        #!/bin/sh

        python $SPLUNK_HOME/etc/apps/Info/bin/rssfeed.py $SPLUNK_HOME/etc/apps/Info/bin/sports.txt

        The script calls an already written Python script passing in one argument which contains a list of RSS feeds to index. Restart Splunk and look for your rssfeeds sourcetype. The RSS metadata has already been delimited by tag="value" for automatic field extraction. The provided Python script calls open source feedparser to do the parsing of each RSS feed supplied to it. Since this is all script based and re-entrant code, you can provide multiple scripts in inputs.conf, each eventually calling rssfeed.py with its own set of feeds to simultaneously index multiple sources.

        The next step is to search the Splunk for information within a feed. Here's an example screenshot using Splunk 4.0.x.

        Splunk Web showing RSS Content

        As seen on the left, fields have automatically been extracted. You can even set up alert conditions such as search for:

        sourcetype="rssfeed" title="*inflation*"

        For this example, Splunk will provide an alert for any feed event that has inflation in its title. As you can see, this capability provides the Splunk user with a powerful way to create an information base on any subject for future search.

        ]]>
        Tina Phi: LDAP auth configuration tipshttp://blogs.splunk.com/tina/?p=3http://blogs.splunk.com/tina/?p=3Thu, 13 Aug 2009 22:30:22 +0000Tina PhiNow that I've (hopefully) convinced you that ldapsearch is your friend, let's get down to the matter. How can you use that information to configure Splunk to authenticate against LDAP?

        The file used to configure LDAP authentication: authentication.conf

        If you have never attempted to configure ldap auth before then you won't have one of these files in your $SPLUNK_HOME/etc/system/local/. You can either create it by hand or use the UI (which creates the file for you.)

        Here's a sample authentication.conf file that I will break down for you. (BTW, a lot of this explanation already exists in a file called $SPLUNK_HOME/etc/system/README/authentication.conf.spec):

        [authentication]
        * This does not change
        authType = LDAP
        *If you want LDAP, set it to LDAP. Other options are Splunk and Scripted.
        authSettings = myldapstrategy
        *the name of your LDAP strategy from

        [myldapstrategy]
        *This is the custom name you set for your LDAP configuration "strategy". Do not use any whitespace.
        SSLEnabled = 0
        *disabled by default - Make sure your LDAP server supports LDAPS if enabling this.
        bindDN = cn=Directory Manager
        *Bind account used to make requests to LDAP server. If binding to AD, you can use a valid email address, e.g. Gina.Lee@Splunkers.com.
        *If your LDAP server allows anonymous bind, you can leave this field blank.
        bindDNpassword = $hashed_password
        *Enter the password for the configured password. The password gets hashed when you restart Splunk. Leave blank if binding anonymously.
        failsafeLogin = admin
        *Select an arbitrary, failsafe userid. I like to use 'admin' to keep things simple.
        failsafePassword = $hashed_password
        *Enter your desired password for your failsafe account above. Enter in plain text and it gets hashed on restart.
        groupBaseDN = ou=Groups,dc=splunksupport,dc=com;
        *This is the Base of your Groups in LDAP. You can also specify multiple bases. For example: ou=Management,ou=Groups,dc=Splunkers,dc=com;ou=Consultants,ou=Groups,dc=Splunkers,dc=com;
        groupBaseFilter =
        *This is optional. It can be very useful for narrowing down search results if you have a large Directory tree to recurse (and/or large entries being returned.)
        groupMappingAttribute = dn
        *By default, set this to 'dn'. I have very rarely seen this set to anything else.
        groupMemberAttribute = uniquemember
        *Typically, you have a list of members listed out within the group entry. This attribute is the one that stores the member's dn (usually.)
        groupNameAttribute = cn
        *This is the "pretty name" for your group. Usually, 'cn' but can be set to something else.
        host = SplunksupportLDAP
        *This is the hostname, FQDN or IP address of your LDAP Server.
        pageSize = 0
        *This tells the LDAP server how many entries to return "per page" of request. I set this to 0 because I use Sun LDAP in-house for testing - Sun LDAP does not support paging so it has to be disabled this way. If you're using AD, you can leave it to the default of 800.
        port = 389
        *The default non-SSL LDAP port is 389. LDAPS default port is 636. Yours could possibly something else.
        realNameAttribute = name
        *This is the "pretty name" for your users. Other possible attributes you can use are displayName, cn.
        userBaseDN = ou=People,dc=splunksupport,dc=com;
        *This is the Base of your Users in LDAP. You can specify multiple UserBaseDNs. Example: ou=Tech Support,ou=People,dc=Splunkers,dc=com;ou=ITOPs,ou=People,dc=Splunkers,dc=com;
        userBaseFilter =
        *Like the groupBaseFilter, this can be very useful if you have to narrow down your search results. Here's an example that returns only those users who are a member of the SplunkAdmins group or the SplunkUsers group in LDAP: (|(memberOf=CN=SplunkAdmins)(memberOf=CN=SplunkUsers))
        userNameAttribute = uid
        *This will be the user's login id.  In AD its usually sAMAccountName.
        [roleMap]
        *Here's where you will map the LDAP group to Splunk Role. This must be done before users will be able to log in. The format is usually (Splunk Role) = (LDAP group CN)
        admin = Splunk Admin Users;
        power = Splunk Power Users;
        user = My&amp;Group;Splunk Users;

        Once you've got LDAP auth configured, restart Splunk and, if you're lucky, you'll be able to login as an LDAP user. If not, you'll have to login as the failsafe user and figure out what went wrong.

        A couple more hints: Don't forget that handy dandy tool called ldapsearch. And, It is very helpful to have DEBUG logging enabled for 'authenticationManagerLDAP' when troubleshooting these LDAP issues. And you can now enable DEBUG logging right in the UI, under System Logging, without having to restart Splunk.

        OK, if you MUST use an LDAP Browser then check out Apache Directory Studio which is free for OSX, Linux and Windows.
        Download, install and launch. Enter your LDAP hostname and go from there.

        If you're tired, frustrated, lost and given up hope after reading this, feel free to contact Splunk Support and we'll get you moving forward.

        ]]>
        Eric Garner: Extract and Alias Field Names in Splunk 4.0 Nowhttp://blogs.splunk.com/maverick/2009/08/13/extract-and-alias-field-names-in-splunk-40-now/http://blogs.splunk.com/maverick/2009/08/13/extract-and-alias-field-names-in-splunk-40-now/Thu, 13 Aug 2009 12:38:45 +0000Eric GarnerI've had this topic come up in several technical conversations lately, so I thought I would blog about it now.

        Situation: You have two different source types containing common key field values, but the actual name of the field itself is different within each of the source types.

        Question: How do you produce a report within Splunk that correlates all of these fields values together under one normalized field name?

        Answer: Use the new FIELDALIAS and EXTRACT features included with Splunk 4.0 to normalize the field name at search-time.

        Example: Let's suppose you have two different types of call detail records, each containing a number that represents the total duration in seconds that someone is on a phone call.

        One CDR event looks like this:

        TELCOE,2.1,7e197787-655330a9-7a458301-70845177@12.13.20.20,,0,,H,,S,,sip:7622550@127.10.15.17:5050, sip:5558889999@120.10.20.20:55555,TELCO:Dallas,TX,0,sip:7622555@110.130.52.25:5050,NORTH:NORTH,200,0
        ,1,0,1,0,08/02/2009:05:03:21,08/02/2009:02:03:22,92,UNKNOWN,0,0

        and the other CDR record looks like this:

        TIME=20090802104826865|CHAN:332|SESSIONID:100102345|CALLDURATION:93|CALLINGNUM:5558431297|
        CALLEDNUM:5559903894|UNIQID:8948373827100002938847889873474893

        Now, let's take a look at the Splunk configuration files to index these source types and extract the call duration values out into fields.

        inputs.conf
        [monitor:///$SPLUNK_HOME/etc/apps/cdr/logs/CDR.txt]
        sourcetype= cdr_log

        [monitor:///$SPLUNK_HOME/etc/apps/cdr/logs/cdr2.txt]
        sourcetype= cdr2_log

        props.conf
        [cdr_log]
        EXTRACT-calldur = ^.*?:\d\d:\d\d:\d\d,(?&lt;callDuration&gt;\d+),\w+,\d+\.\d+\.\d+\.\d+,

        [cdr2_log]
        REPORT-cdr2 = cdr2-kvpairs

        transforms.conf
        [cdr2-kvpairs]
        DELIMS = "|", ":"

        Now, notice that in the extraction of the call duration field in our example cdr_log sourcetype above uses the new EXTRACT option in props.conf to explicitly pull out and name the field we want. All we do is specify the regular expression to pattern-match the call duration number (in this case it's "92&#8243; in our event) and name it as "callDuration".

        However, the extraction of the same type of field from cdr2_log uses the DELIMS option in transforms.conf and, therefore, the field name will be CALLDURATION in this case, which is in all caps. Doh!

        What we want, though, is a report that includes values from BOTH fields AND that also refers to them by some common normalized field name. Using the new FIELDALIAS option, we can accomplish this. All we need to do is simply add an extra option to our props.conf file, called FIELDALIAS, to alias the field name CALLDURATION as callDuration, like this:

        props.conf
        [cdr_log]
        EXTRACT-calldur = ^.*?:\d\d:\d\d:\d\d,(?&lt;callDuration&gt;\d+),\w+,\d+\.\d+\.\d+\.\d+,

        [cdr2_log]
        REPORT-cdr2 = cdr2-kvpairs
        FIELDALIAS-callduration = CALLDURATION AS callDuration

        Then we can perform the following search within the Splunk GUI, which refers to our normalized field name using the now common alias name, callDuration:

        sourcetype=cdr*_log | top callDuration limit=10 | sort - callDuration

        and produces are desired pie chart, which looks like this:

        So there you have it. By using the EXTRACT and FIELDALIAS features available with Splunk 4.0, you can now normalize and correlate your field values across source types quickly, easily, and effectively with as little effort as possible.

        ...and any true Splunk user will tell you, that's what Splunk 4.0 is all about.

        BTW, if you haven't experienced it yet, you should download Splunk now and see for yourself why Splunk is the first thing that comes to your mind when you need real-time ad-hoc searching capabilities now to determine root cause, thwart those pesky security threats, and gain that visibility into all of your application, server, and network log files at once, from one central web interface.

        ]]>
        Tina Phi: Help! I can’t export more than 10,000 events!http://blogs.splunk.com/tina/?p=2http://blogs.splunk.com/tina/?p=2Fri, 07 Aug 2009 23:28:33 +0000Tina PhiIf you've ever tried exporting lots of events from Splunk UI then you probably know that there's a hardcoded max of 10,000 lines. This is to prevent users from potentially crashing splunkd or python. Taking the previous into consideration may allow you to view this restriction as a safety feature.

        In most cases, users should not need to export 10,000 lines of data. If you've got more than 10,000 lines, you should refine your search so that your have less (a lot less) than that. There are probably a few cases where there's a legitimate reason to export this many lines and more. If you feel the compelling need to be able to export more than 10,000 lines here are a couple of workarounds:

        METHOD 1:

        Run the search in question and pipe it to csv:

        'sourcetype=”samplesourcetype” SenderIP=”192.168.0.12” | outputcsv myoutputfile.csv'

        This will create a file named "myoutputfile.csv" in $SPLUNK_HOME/var/run/splunk that contains the results of your search in csv format. If you've got access to the file system to grab the file, problem solved. But what if you have a user who doesn't have access to the file system and you don't want to have to do this for them?

        METHOD 2:

        As a Splunk user, you can output your search results to a csv file on the indexer and then input the data and scan through it at your rated limit. Sounds complicated until you see the example. But this does require a bit extra user involvement.

        - Just like Method1, you'll need to run the search and pipe it to csv file:

        'sourcetype=”samplesourcetype” SenderIP=”192.168.0.12” | outputcsv myoutputfile.csv'

        - After your search completes, you'll need to manually export at your rated limit (10000 results):

        '| inputcsv start=0 max=10000 myoutputfile.csv'

        - Once it is finished running, select “Export results...” from the “Actions” pull down menu. Name a file to save to and be careful to include a unique number in the filename to prevent it from being overwritten on next run: e.g. - myoutputfile10000

        - Repeat the previous search with a modified start value:

        '| inputscsv start=10000 max=10000 myoutputfile.csv'

        - Once it is finished running, select “Export results...” from the “Actions” pull down menu. Name a file to save to and be careful to include a unique number in the filename to prevent it from being overwritten on next run: e.g. - myoutputfile20000

        - Run as many additional searches as needed until you have all your results exported.

        ]]>
        Andrea Longo: 40 Days of 4.0: So you want to write an apphttp://blogs.splunk.com/andrea/2009/08/07/40-days-of-40-so-you-want-to-write-an-app/http://blogs.splunk.com/andrea/2009/08/07/40-days-of-40-so-you-want-to-write-an-app/Fri, 07 Aug 2009 22:01:59 +0000Andrea LongoWith the previous setup, here's what I want for my app:

        A dashboard with a couple pretty pictures and some top N lists
        Saved searches for advanced users to explore further
        It should work for all my users with whatever indexes they have access to

        I'm going to start with the sample_app template available in Manager and add what I want. Then I'll clean up the sample stuff I don't need. So the first step is to create a new app in Manager->Apps. Give it a name and an optional label and select "sample_app" as the template. I don't have any additional files to upload now, so I'll leave that alone. Save and I'm back to the list of installed apps.

        On the filesystem, a bunch of things just happened. The directory MyGreatApp was created, containing a complete app structure and sample files, enough to have a functioning app. These files are all based on simplified XML that hides much of the complexity of the underlying full XML format. This makes it easier to build views, but has limitations. (For more on this see the docs: Simple Dashboards)

        Some highlights:

        MyGreatApp/appserver/static contains the app's css and images. There are a bunch of basic images provided that can be used or replaced.

        MyGreatApp/default contains the conf files that make up my app. This includes app.conf, where the app name, author, description and version are set.

        MyGreatApp/default/data/ui/nav contains the XML file that define your menus. There is a default.xml ready to be customized.

        MyGreatApp//default/data/ui/views contains XML files that define your views. The several provided sample_*.xml files can be edited, or new ones created.

        If you switch to the new app (by exiting Manager back to where I was and using the App menu in the top right) it has dashboards and various kinds of searches already. If you save a search of your own while you are in this app, it will end up in the Searches->unclassified searches menu.

        You can create a new empty dashboard with Actions-> Create new dashboard and add saved searches to it. Simple drag and drop dashboards are a quick way to make pretty pictures. It's just a new view, and you can make one of your own in any app you have permission to. Like the saved searches, it will be private unless you set it to be accessible by others. If you don't like your app, delete its directory from the filesystem directly and restart.

        Next I'll get into where it is possible to make changes to these files directly and where you should use the UI.

        ]]>
        Brian Murphy: 40 Days Of 4.0: Distributed Searchhttp://blogs.splunk.com/brian/2009/08/04/40-days-of-40-distributed-search/http://blogs.splunk.com/brian/2009/08/04/40-days-of-40-distributed-search/Wed, 05 Aug 2009 03:11:45 +0000Brian MurphyIn this post I will be talking about a feature of Splunk that got turbo charged for 4.0 : Distributed Search.

        Splunk is a great tool when it's just running on a single system but distributed search has some great advantages.

        • Provides completely different views into the same data by having different apps on different systems.
        • Allow leveraging of map reduce architecture to run complex queries.
        • Linearly scale Splunk indexing by simply adding more servers.

        Terminology Used:

        • Search Head : The splunk instance that the user logs into and distributes searches from.
        • Search Peer : A splunk instance that receives search requests from the search head.
        • $SPLUNK_HOME : The root of your splunk install, this environment variable will be automatically set if you source $SPLUNK_HOME/bin/setSplunkEnv on unix.

        Note this post will be written with *nix in mind but it is applicable to Splunk on windows as well.
        For a basic primer and a nice diagram you can check out http://www.splunk.com/base/Documentation/latest/Admin/Whatisdistributedsearch

        Setting up Distributed Search:

        Access to Distributed Search in 4.0 is based on a public private key architecture similar to ssh-keys. We provide secure access to a search peer by installing a public key on that search peer that corresponds to a private key on the search head.

        On the search head these keys can be found in $SPLUNK_HOME/etc/auth/distServerKeys private.pem is the private key and trusted.pem is the public key. These keys are generated the first time you run splunk so if you haven't run splunk yet you should start it now. If you are adding the search peers via the cli or by editing the distsearch.conf file you will need to install the public key from the head onto your peers if you add via the ui you can skip this step by providing user credentials on the search peer and splunk will install the key for you.

        To install the public key on the search peer you must create a sub directory on the search peer in $SPLUNK_HOME/etc/auth/distServerKeys with the same name as the SPLUNK SERVER NAME of the search head ( you can find this by doing a "splunk show servername" from the ui or in Server Settings in the manager UI ) so if you are trying to distributed to a search peer named FOO from a search head named BAR you must create a directory $SPLUNK_HOME/etc/auth/distServerKeys/BAR on FOO. Copy the trusted.pem from BAR into this new directory. Now you can add FOO as a peer on BAR by doing splunk add peer : in the cli on BAR.

        Note that as your keys are stored by splunk server name if you change this on the search head you will need to update your search peers.

        Using Distributed Search to provide different views of your data:

        One of the best features of distributed search in 4.0 is that the search on the search peer runs as if that search was running on the search head. This means that only those eventtypes, tags and search time extractions that are defined on the search head will be visible when you search. This means that you can create different search heads that each define a different way of looking at the data and you only have to give users access to the view they need to see.

        Due to the certificate nature of distributed search you do NOT need to replicate your users across all your distributed peers, any user that can log into your search head can search any search peer that has that head's public key installed.

        Map Reduce Distributed Search:

        Map reduce is an increasing common buzz word these days, basically what it means for Splunk is that when you are doing a distributed report as much work as possible will be off loaded to the search peers. So you can search and report over billions of events in a reasonable amount of time if you have your events distributed across many servers.

        No special knowledge is needed as all of Splunk's internal reporting commands are already set up to work in this way! Neither the Splunk administrator or the Splunk users ever need to think about it. If you have 10 search peers each with 100 million access log events it will take the same amount of time to report on the top ips across all billion events as it would to preform the same search on just one of the search peers directly.

        Index Scaling:

        I'm not going to get into distributing the indexing load across indexers in this post as I want to focus on search but there are some great docs available at : http://www.splunk.com/base/Documentation/latest/admin/Configureautomaticloadbalancing

        Debugging Distributed Errors:

        Also new to splunk 4.0 is the enhanced UI management of peers accessible from Manager -> Distributed Search -> Distributed Peers. Here you will see a list of all the peers you have access to as well as their statuses. Here is a break down of what each possible status means :

        • Up : The peer is functioning correctly and the search head can run searches against it.
        • Down : The peer is unreachable.
        • Not a splunk server : This peer does not appear to be a splunk 4.0 server, commonly the web port was entered instead of the splunkd management port.
        • Blacklisted : The peer is blacklisted in distsearch.conf.
        • Version Mismatch : This peer is running the free version of the product and so cannot participate in distributed search.
        • Certificate Mismatch : This peer does not have the search head's public key installed.
        • Duplicate License : This peer has the same license as another peer or the search head.
        • Duplicate Servername : This peer's servername collides with another peer or the search head.
        ]]>
        Michael Baum: Splunk Live London - Awesomehttp://blogs.splunk.com/thebaum/2009/08/01/splunk-live-london-awesum/http://blogs.splunk.com/thebaum/2009/08/01/splunk-live-london-awesum/Sat, 01 Aug 2009 18:20:24 +0000Michael Baum

        I'm finally getting my head above water after a tireless run up to and hectic week launching Splunk 4. The highlight of the launch for me was Splunk Live London. IMHO Splunk Live London 2009 was unrivaled as the most outstanding Splunk event yet.
        We came up with this idea of getting local customers together as a way to launch Splunk 2 in June 2007. Five of us Splunkers sprinted between eight different cities in two weeks to share what was new and encourage users to exchange stories of how searching their data centers was changing life for the better. Its an exhausting way to launch a new product, but it worked so well we've integrated Splunk Live events into the mainstream way we do business and interact with our community. I've long since lost count of the number of Splunk Lives we've conducted all over the world including places like Cape Town, Johannesburg, Beijing, Tokyo, Singapore, Bangkok, Sao Paulo and yes once again in London.



        This year's London Splunk Live was really special. The event occurred during our launch of Splunk 4 and surpassed our expectations as the largest event we've ever held. More than 100 customers and users attended at the Cumberland Hotel and their swank conference facility, complete with a business canteen like breakfast experience, near Marble Arch in West London.

        But the dominant reason to attend any Splunk Live are the presentations and round tables with forward thinking IT professionals who are using Splunk to transform the way they manage IT. This year we were very fortunate to have three Splunk customers who took time out of their busy schedules to come to London and share their experiences with us.

        Accenture - Alexander Strobl, Technical Consultant

        Alexander has been a visionary inside Accenture bringing the power of IT Search to enterprise clients in Germany where he works for Accenture as a Technical Consultant in the Data Center Technology and Opeations team. Alexander is responsible for analysis, design, roll out of Splunk. His most recent Splunk project was with a large worldwide services company with more than 50,000 employees on three continents operating mail order, distribution, e-commerce and over-the-counter-retail trade. Accenture implemented Splunk to transform the management of several technologies including Linux, virtualization and large-scale storage systems.

        The project was part of an IT project to reduce the time to triage problems and improve quality of service. Challenges were:

        • no centralized access to logs and events,
        • critical IT data was stored on local file systems which were copied to central storage only once a day,
        • manual processes to locate errors,
        • no correlation between events on different services/servers and
        • development time was spend building workarounds rather than working on revenue generating applications.

        All of this resulted in complex and time consuming analysis and end the end long MTTR.

        The Accenture Splunk installation is currently indexing ~50GB/day including custom application files and events from 10+ integrated business critical applications and services. There are two Splunk indexes; one for testing and one for production environments and the team has established interfaces between Splunk and several other legacy data center tools.

        Telenor - Henrik Strøm, Security Architect

        Telenor is Norway's largest ISP, Mobile Operator and Telco. Its one of the largest mobile operators in the world, with 160+ million customers and was founded in 1855 - 154 years ago. The company has 13.000 employees in Norway and 26.000 abroad. Telenor has been rolling Splunk out for centralized log collection and management using Syslog to forward data where it is already in place and using Splunk as a forwarder for new systems and systems with complex multi-line and/or XML structures Syslog can't handle. Sources of data handles by Splunk include:

        • application logs (Web, Email, IPTV)
        • data center logs (server, network, storage and firewall)
        • IP backbone logs

        Use cases include what Henrik refers to as digging, dashboards baselines, alerting and reporting. One of the best "digging" examples Henrik mentioned was identifying Unix Kernel Errors over the last 30 days. This kind of information routinely went unnoticed prior to Splunk's arrival.

        Another powerful use case explained by Henrik was how to baseline what is normal in your environment. For example, how many errors do you have on average for a particular type of device (routers, servers, specific applications, etc). Splunk was used to baseline normal Linux kernel behavior and found roughly 20 kernel errors per Linux running instance every 15 minutes.

        The base line then allows the team to schedule simple searches to look for deviation from the baseline and send out alerts before downtime occurs from these hidden sways in behavior. In one case Splunk found thousands of errors occurring on a specific type of device, where the normal baseline was around 20!

        The Telenor team also uses Splunk to identify and report on security situations that may impact their customer facing network and services. Because they are able to easily compose dashboards showing for example which Web servers are under attack and who is attacking them all in one place, the team saves Telenor from potential downtime, performance degradation or theft of data due to attacks they've not seen before and are missed by existing security policies and technologies.

        Vodafone - Paulo de Carvalho, Network Services Manager

        Paulo de Carvalho has been using Splunk at Vodafone for almost two years now. His presentation titled "Freeing Information from Organizational Silos" lifted the idea of leveraging logs and IT data out of the realm of just system administration into a thirst for higher level intelligence that crosses not only IT but also business functions. Paulo started by describing the current service oriented architecture (SOA) at Vodafone and how attempts to objectize and re-use capabilities creates incredible complexity among the services, technologies, processes, tools and people.

        Using Dennis Leeden's Sense-Making Model, as a blueprint for the raising the intellect of IT and business consumers of IT services, Paulo has proven how achieving a level of "knowledge" versus just "data" management can significantly impact the performance of an IT organization and the services they provide. He went on in detail describing how Vodafone has broken down the segregation of duties along business process, technology services and business units and determined what knowledge is essential and can be provided from the active running IT systems regarding their behavior, performance, configuration, dependencies etc. His team has defined data inputs, searches, reports and dashboards for the most important intersections of processes, services and technologies using Splunk. The impact on the performance of IT and the quality of services to consumers has been dramatic.

        IT knowledge management at Vodafone has significantly improved the quality of life for IT people and customers too. If you've ever been frustrated dealing with a customer service agent at a mobile phone company, an airline or a government agency you understand a little information can go a long, long way to improving our ability as humans not to use technology as an excuse for poor customer service, but to actually deliver the type of customer service customers appreciate.

        If you've attended one of our Splunk Lives you know why I'm so passionate about them. If you haven't attended and think you might be interested, check out our events page for more information about where we'll be when.

        Happy Splunking!

        ]]>
        Tina Phi: ldapsearch is your friendhttp://blogs.splunk.com/tina/?p=1http://blogs.splunk.com/tina/?p=1Fri, 31 Jul 2009 05:24:19 +0000Tina PhiNeed a friend to help you in the war against seemingly complex LDAP configuration tasks? Let me introduce you to a handy dandy tool called ldapsearch.

        Next to an LDAP browser (they cheat, by the way, but I'll talk more about this later), ldapsearch is your friend when it comes to configuring Splunk, or any other LDAP capable app for that matter, to authenticate against LDAP as it allows you to test out your configuration purely from command-line and then implement once you know its working.

        The most important things you'll need to know about your LDAP server are its hostname or IP address, the LDAP port number and base DN. If you don't know any of the aforementioned ldapsearch can't help you there - it does not perform magic. If you're guessing the LDAP port number, your first guess should be 389 which is the default port for LDAP. (Second guess would be 636, the default LDAPS port. However, that would be treading into SSL waters and I'd like to keep it simple here.) Not knowing any of the required items usually means you should contact your IT/OPS department or someone who manages the LDAP or AD infrastructure at your organization.

        STEP 1: Assuming you know the LDAP hostname (or IP),port and base DN, let's find out if you have access to ldapsearch. Most *nix systems, including OSX, ship with ldapsearch so its a matter of launching the terminal and typing:

        $ which ldapsearch
        /usr/bin/ldapsearch

        If you haven't got ldapsearch, go online and find one and download it. Stay away from the LDAP browsers (GUI) if possible. I mentioned earlier that LDAP browsers, at least the ones I've seen, particularly the ones that run on Windows cheat. They do things like follow ldap referrals (which is just silly.)

        STEP 2: Run ldapsearch and pray that the LDAP server you're connecting to allows anonymous bind. If your LDAP server allows anonymous bind, you can bind to it without providing a bind account and password!

        $ ldapsearch -h ldaphostname -p 389 -x -b "dc=splunkers,dc=com"

        All of the above options are necessary to perform a simple, anonymous bind to the LDAP server.

        -h hostname
        -p port number
        -x tells ldapsearch to perform a simple_authentication (yes, you need this even for anonymous bind)
        -b baseDN

        If your organization is relaxed about LDAP access, it should just work. You'll get human readable output in ldif format that you can pipe to a file for review. If your organization is not so relaxed (most responsible ones are NOT), you may need to provide a bind_account and password:

        -D "uid=tina,ou=People,dc=splunkers,dc=com"
        -W will prompt you for your password

        Example:

        $ ldapsearch -h ldaphostname -p 389 -x -D "uid=tina,ou=People,dc=splunkers,dc=com" -b "dc=splunkers,dc=com" -W
        Enter LDAP Password:

        Here's what sample user and group entries look like:

        # tina, People, splunkers.com
        dn: uid=tina,ou=People,dc=splunkers,dc=com
        objectClass: top
        objectClass: person
        objectClass: organizationalPerson
        objectClass: inetorgPerson
        uid: tina
        givenName: Tiny
        sn: Ina
        cn: Tiny Ina
        userPassword::


        # TechSupport, Groups, splunkers.com
        dn: cn=TechSupport,ou=Groups,dc=splunkers,dc=com
        cn: Technical Support
        objectClass: top
        objectClass: groupOfNames
        ou: Groups
        member: uid=tina,ou=People,dc=splunkers,dc=com

        NOTE: It is possible that LDAP returns no entries even when a proper bind_dn, password and base DN are provided. LDAP can be configured to prevent listing of entries starting at the root base, e.g. "dc=splunkers, dc=com". In this case, you'll have to provide a more specific base DN, for example:

        -b "ou=People,dc=splunkers,dc=com"
        or
        -b "ou=Groups,dc=splunkers,dc=com"

        Stay tuned for my blog post on how to use this information to configure Splunk to authenticate against LDAP.

        ]]>
        Simeon Yep: 40 Days of 4.0: Distributed searchinghttp://blogs.splunk.com/simeon/?p=4http://blogs.splunk.com/simeon/?p=4Thu, 30 Jul 2009 20:09:06 +0000Simeon YepIf you are a long time enterprise user of the 3.x product, you may have become used to the pull-down menu for distributed searching.   One of the common use cases for this menu was searching specific indexers in your distributed search.   A common question was:  "Can we restrict the server via search syntax?".   In the 3.3 and 3.4 product, you cannot restrict via syntax through the web interface.   There is a trick you can use via the command line, but that doesn't help when you want to do this in a saved search.

        In the 4.0 release, we have removed the pull-down menu and implemented indexer restrictions with search syntax. The new parameter is called "splunk_server".   Let's assume I have a distributed searcher (hostname=searcher1) and three indexers (hostname=indexer1, hostname=indexer2, and hostname=indexer3).  If I am searching for "error" and my goal is to restrict my searches to indexer3, I would use the following query:

        splunk_server=indexer3 error

        To search anything but indexer3 I would use:

        error NOT splunk_server=indexer3

        Using this restriction can be useful for tracking specific datacenters, monitoring server health, and securing data (can add this as a filter to a role).  For the complete documentation on this command, see our official documentation:

        http://www.splunk.com/base/Documentation/latest/User/SpecifyMultipleServersToSearch

        Note:   distributed searching is limited to the Splunk enterprise version.

        ]]>
        Johnathon Cervelli: 40 Days of Splunk 4.0 - Euro Splunkers awesome (as usual)http://blogs.splunk.com/johnathon/?p=17http://blogs.splunk.com/johnathon/?p=17Thu, 30 Jul 2009 17:54:07 +0000Johnathon CervelliGetting out the office to see successful Splunk customers is always a pleasure, and the presentations and conversations at SplunkLive in London were especially a treat. One of the most striking things about all three customers (Vodafone, Telenor and Accenture) is how Splunk has transitioned from a tool used by a couple of working teams into a cross-organization IT utility. Despite being from two different industry verticals, they also all approached the problem in a similar way, and that way suggests the new dynamic lookup feature is going to be very popular.

        If you’re an existing Splunk user, you might be familiar with our transaction search-time command. It’s used to identify patterns that indicate a single, unified intention – such as buying something from an online store – even across multiple data sources. That works great when there is some common piece of data to anchor on, such as an IP address or user name. In both the online retail and telecom use cases we saw in London, that was a major part of how groups at different layers of the stack exposed their data to their peers working elsewhere; e.g. the IP address was a way for the web team to track the network behavior of a host through the router logs to look for network-layer abnormalities. These kinds of searches were common to all of our London presenters’ normal use of Splunk.

        But what do you do if there is no shared piece of data tying two sources together?

        Enter the dynamic field lookup feature. It’s like summary indexing light – you run a search that populates a smaller, more manageable table structure with data. But here’s the difference: dynamic lookups can act as an intermediary, joining data from one sourcetype with another at search time. For example, we use this for the Windows GUID lookup feature. When Splunk indexes Active Directory, it identifies all the GUIDs and adds the GUID and its associated common name to a lookup table. Then, if you ask Splunk to translate GUIDs, it takes all the GUIDs in your search return and checks to see if it’s in that table. If it is, a new field is dynamically added to your searched events – the common name – as if it had always been there.

        That’s a fairly basic use of the feature, however. Vodafone, who was a London presenter and Splunk 4.0 beta tester, had a more ingenious use case. They’re using it to create abstracted data access points for each IT service they manage. So one service – for example, the customer management system – can return via a Splunk search the last few numbers a customer called if you search on the customer number, but not return the customer’s name or other revealing information. Other groups can then consume that information, much like a feed or other web advertised service, directly in their own searches and dashboards. Not only is the data access constrained by role, but potentially also by time as well, providing secure windows into past activity that still respect the privacy of Vodafone’s customers.

        The idea of joining data from one source contingent on another source in a safe and controlled fashion using Splunk seems to resonate with almost all of our beta customers. Dynamic lookup tables may end up being one of those features that has much more mileage in it than we ever anticipated. Learn how to make yours here.

        ]]>
        Andrea Longo: List indexes on the main dashboardhttp://blogs.splunk.com/andrea/2009/07/29/list-indexes-on-the-main-dashboard/http://blogs.splunk.com/andrea/2009/07/29/list-indexes-on-the-main-dashboard/Wed, 29 Jul 2009 22:49:04 +0000Andrea LongoIf you are comfortable editing XML, here's a handy hack to get the list of your default indexes in the "All indexed data" dashboard. It will show whatever the logged-in user has access to.
        If you are using the standard dashboards from the Search app, do this:

        Go to $SPLUNK_HOME/etc/apps/search/default/data/ui/views
        Copy dashboard.xml to $SPLUNK_HOME/etc/apps/search/local/data/ui/views
        Change the permissions on the file so you can edit it
        Right before the last &lt;/view&gt; tag at the end insert this XML:

         &lt;module name="HiddenSearch" layoutPanel="panel_row2_col1_grp4" group="All
        indexed data" autoRun="True"&gt;
            &lt;param name="search"&gt;| eventcount summarize=false index=* -count&lt;/param&gt;
            &lt;module name="SimpleResultsHeader"&gt;
              &lt;param name="entityName"&gt;results&lt;/param&gt;
              &lt;param name="headerFormat"&gt;Indexes (%(count)s)&lt;/param&gt;
              &lt;module name="Paginator"&gt;
        	&lt;param name="count"&gt;20&lt;/param&gt;
        	&lt;param name="entityName"&gt;results&lt;/param&gt;
        	&lt;param name="maxPages"&gt;10&lt;/param&gt;
        	&lt;module name="LinkList"&gt;
                  &lt;param name="initialSortDir"&gt;desc&lt;/param&gt;
                  &lt;param name="labelFieldSearch"&gt;*&lt;/param&gt;
                  &lt;param name="valueField"&gt;count&lt;/param&gt;
                  &lt;param name="labelField"&gt;index&lt;/param&gt;
                  &lt;param name="labelFieldTarget"&gt;flashtimeline&lt;/param&gt;
                  &lt;param name="initialSort"&gt;count&lt;/param&gt;
        	&lt;/module&gt;
              &lt;/module&gt;
            &lt;/module&gt;
          &lt;/module&gt;
        

        Save the file.
        Back in the UI, click the Splunk logo to refresh the search app.

        Presto! Now there is a new column showing indexes. If something didn't work right, just remove the file you created. This file won't be overwritten on upgrade, so if in the future there is a change to the search app you will still have this version because files in local take precedence.

        ]]>
        Bob Fox: 40 Days of 4.0: Enriching Data with Lookups (Part 1)http://blogs.splunk.com/bob/2009/07/27/enriching-data-with-lookups-part-1/http://blogs.splunk.com/bob/2009/07/27/enriching-data-with-lookups-part-1/Mon, 27 Jul 2009 21:08:08 +0000Bob FoxMany customers tell me that they see a lot of value when Splunk is used to enrich IT data with information from another source.  An example of such an enrichment could be a cross reference between a customer's username found in an application log and that same customer's information extracted from a contact management system.  How amazing would it be to have a customer service representative make a phone call to Mr. Smith to ask if he needed help logging onto their system after a number of failed logins?

        Splunk has always been able to do data enrichment, but the newly released Splunk 4 really simplifies the process.  In this post, I'll give a quick examply of using a CSV file to provide data enrichment to a application log.  In future posts, I'll show how to use an external database as the data source.

        Let's start with some mock application data.  To keep things simple, we'll use this as our application log:
        Jul 27 08:35:09 appname=app4 error=123
        Jul 27 08:35:19 appname=app3 error=123
        Jul 27 08:35:29 appname=app1 error=163
        Jul 27 08:35:39 appname=app1 error=123
        Jul 27 08:35:49 appname=app1 error=133
        Jul 27 08:35:59 appname=app1 error=123
        Jul 27 08:36:09 appname=app1 error=123

        The goal here will be to enrich this data with the actual error message, rather than just the error number.  To facilitate this, we will use an error lookup table in the form of a comma separated variable (CSV) file with a descriptive header:
        error, error_message
        113, Error In WOPR Core
        123, General Application Fault
        133, Memory Allocation Error
        163, Error Exists Behind Keyboard

        First, we need to do a little preparation.  I will make an assumption here that we have gone ahead and defined a Splunk App that tells Splunk where to find (and what to do with) the application log.  We are going to need to create a directory within the app definition to support the lookups.  This directory will typically be $SPLUNK_HOME/etc/apps/APPNAME/lookups where $SPLUNK_HOME is the top level directory where Splunk is installed, and APPNAME is the name of the Splunk App.  Inside this lookups directory we'll put the CSV file above.  You can call this whatever you want, and for this example we will call it errortable.csv. It's important to note here that the CSV file will need to be located in (or linked into) $SPLUNK_HOME/etc/apps/APPNAME/lookups or $SPLUNK_HOME/etc/system/lookups.

        Next we'll make a couple of quick file changes.  Typically, all of these files will be in $SPLUNK_HOME/etc/apps/APPNAME/local.  I'll assume that there is already an inputs.conf file in that directory.  Depending on the configuration, there may or may not be a transforms.conf file.  If not, we will create it and add a definition of where to find the CSV file.   Inside transforms.conf add:
        [ErrorLookup]
        filename = errortable.csv

        I have chosen to call the transform.conf entry ErrorLookup.  You can call this whatever you like, as long as it matches the entry in props.conf, below.  We'll give ErrorLookup a single entry to map to the CSV filename.

        Now, we need to make another change within the directory.  This time , we will modify props.conf to alter the sourcetype definition for the data we will be enriching.  Now, it is possible that the sourcetype for your application logs is defined elsewhere, in which case our configuration may become a bit more tricky - but certainly not impossible.  We are going to assume simplicity here.  Edit (or create) props.conf, and find (or create) a stanza that matches the sourcetype of the application log.  In the case of my example, the sourcetype was myappdata.  For this example, I always want my application events to show both the error number, and the error message.  My sourcetype definition in props.conf should look like this:
        [myappdata]
        lookup_table = ErrorLookup error OUTPUT error_message

        There may be other information already in the myappdata definition.  The lookup_table line added here breaks down like this:
        • lookup_ specifies that we are doing the lookup function here at search time.
        • table is a class.  For the most part, this is an arbitrary value and can be anything you want.
        • ErrorLookup refers to the entry we made in transforms.conf
        • error refers to both the Splunk field, and the CSV header.  We named them the same.  If they were different, one could always use an AS command here (ie csv_field AS splunk_field)
        • OUTPUT defines what is going to end up as a field back in our event.  If you don't use OUTPUT, all columns in the CSV file will be brought in as Splunk fields.
        • error_message here defines both the CSV column (as defined by the CSV header) and the Splunk field that will be created.  Again, use the AS command if you want to rename the field on the fly.
        Once these configs are in place, give Splunk a restart and do a search on the sourcetype.  Assuming all of the definition names match up and the CSV file can be found, you should see the additional field(s) in the 'Other interesting fields' section of the Field Picker.

        Up next:  Data enrichment using a script to an external source...

        Resources:
        ]]>
        Greg Albrecht: 40 Days of 4.0: How to consume tcptrace with Splunk 4.0http://blogs.splunk.com/greg/how-to-consume-tcptrace-with-splunk-40/http://blogs.splunk.com/greg/how-to-consume-tcptrace-with-splunk-40/Fri, 24 Jul 2009 21:59:40 +0000Greg AlbrechtThe idea to consume tcptrace with Splunk came to me after seeing Darren Hoch's OSCON 2009 presentation Linux System and Network Performance Monitoring. In his talk Darren shows how he diagnosed home networking issues using tcptrace. Here's his description of tcptrace:

        The tcptrace utility provides detailed TCP based information about specific
        connections. The utility uses libpcap based files to perform an analysis of
        specific TCP sessions. The utility provides information that is sometimes difficult
        to catch in a TCP stream. This information includes:
        • TCP Retransmissions – the amount of packets that needed to
        be sent again and the total data size
        • TCP Window Sizes – identify slow connections with small
        window sizes
        • Total throughput of the connection
        • Connection duration

        The data coming out of tcptrace looks like this:

        TCP connection 1:
                host a:        gba-ubun810-amd64.splunk.com:40739
                host b:        spreader.yandex.net:80
                complete conn: no       (SYNs: 0)  (FINs: 0)
                first packet:  Wed Jul 22 19:58:34.489567 2009
                last packet:   Wed Jul 22 19:58:35.164233 2009
                elapsed time:  0:00:00.674666
                total packets: 395
                filename:      testdump1000
           a-&gt;b:                              b-&gt;a:
             total packets:           147           total packets:           248
             ack pkts sent:           147           ack pkts sent:           248
        &lt;snip&gt;

        Complex? Yes. Edible by Splunk? Hell yes.

        The prerequisites for this setup are:

        1. Splunk 4.0 installed on your system. Download Splunk 4.0 Free
        2. tcpdump installed on your system. Included with most *nix based operating systems or available at http://www.tcpdump.org/
        3. tcptrace installed on your system. Available at http://jarok.cs.ohiou.edu/software/tcptrace/
        4. super-user (root) access to your system, or ability execute tcpdump via sudo

        An outline of the steps we're going to take:

        1. Capture some data with tcpdump and parse the data with tcptrace
        2. Configure splunk to read the parsed data from tcptrace
        3. Use splunk to extract useful data from tcptrace
        4. Use splunk to graph data from tcptrace

        Step 1: Capture some data with tcpdump and parse the data with tcptrace

        Capture data with tcpdump:

        $ sudo tcpdump -nevvs 1520 -C 10 -w /tmp/tcp.dump

        Parse the data with tcptrace:

        $ tcptrace -l tcp.dump &gt; /tmp/tcptrace.log

        Step 2: Configure splunk to read parsed data from tcptrace

        Add these lines to your $SPLUNK_HOME/etc/system/local/inputs.conf

        [monitor:///tmp/tcptrace.log]
        sourcetype = tcptrace
        

        Add these lines to your $SPLUNK_HOME/etc/system/local/props.conf

        [tcptrace]
        TIME_PREFIX = \s+last\s+packet:\s+
        BREAK_ONLY_BEFORE = TCP\ connection\ \d+:
        REPORT-tcptrace = tcptrace-rexmts
        TRANSFORMS = tcptrace-hosts
        

        Add these lines to your $SPLUNK_HOME/etc/system/local/transforms.conf

        [tcptrace-hosts]
        REGEX = (?m)\s+host\s+\w+:\s+(?[^\r\n]*)[\r\n]\s+host\s+\w+:\s+(?[^\r\n]*)[\r\n]
        FORMAT = host1::&quot;$1&quot; host2::&quot;$2&quot;
        WRITE_META = true
        [tcptrace-rexmts]
        REGEX = \s+rexmt data pkts:\s+(?[^\r\n]\d+)\s+rexmt data pkts:\s+(?[^\r\n]\d+)
        FORMAT = host1_rexmt_data_pkts::&quot;$1&quot; host2_rexmt_data_pkts::&quot;$2&quot;
        

        Add these lines to your $SPLUNK_HOME/etc/system/local/fields.conf

        [host1]
        INDEXED = true
        [host2]
        INDEXED = true
        

        Once you've updated your splunk system configs restart Splunk:

        $SPLUNK_HOME/bin/splunk restart

        Step 3: Use splunk to extract useful data from tcptrace

        Log into your splunk instance and execute this search to see a timeline of most frequent packet retransmissions:
        sourcetype="tcptrace" | search host1_rexmt_data_pkts&gt;0 OR host2_rexmt_data_pkts&gt;0
        timeline_count_rexmt

        Perhaps you'd like to know which connections are retransmitting packets? Add the following modifier to your search string | fields host1,host2,host1_rexmt_data_pkts,host2_remxt_data_pkts so that it reads:
        sourcetype="tcptrace" | search host1_rexmt_data_pkts&gt;0 OR host2_rexmt_data_pkts&gt;0 | fields host1,host2,host1_rexmt_data_pkts,host2_remxt_data_pkts
        Execute your search, but this time click the Events Table button events_table_button.

        Want to see something cooler? Try selecting the Heat Map Overlay:

        heatmap_overlay

        Step 4: Use splunk to graph data from tcptrace

        To get a useful graph out of splunk update your search string to read:
        sourcetype="tcptrace" | search host1_rexmt_data_pkts&gt;0 OR host2_rexmt_data_pkts&gt;0 | timechart max(host1_rexmt_data_pkts),max(host2_rexmt_data_pkts) | fillnull value=0 | rename max(host1_rexmt_data_pkts) as "Packet Retransmits from me",max(host2_rexmt_data_pkts) as "Packet Retransmits to me"
        Then click on the Show Report button show_report_button. Once you're in the report builder for Chart Type select area and click Apply:

        That's it for now. Next time I'll show you how to make a dashboard that you can share with other splunk users in your organization.

        ]]>
        Andrea Longo: Getting started with 4.0 appshttp://blogs.splunk.com/andrea/2009/07/24/getting-started-with-40-apps/http://blogs.splunk.com/andrea/2009/07/24/getting-started-with-40-apps/Fri, 24 Jul 2009 21:03:57 +0000Andrea LongoI've been working on some apps for 4.0 and finally I can talk details. Over the next couple posts I'll walk though creating a simple app using the new UI tools and a little XML. This is all based off the Apache logs on my server, so first a little background on how I've configured my 4.0 instance.

        I have a typical small server whose primary purpose is to host a dozen or so low traffic websites. One site gets half my hits, three more most of the rest and the stragglers round out the lot attracting bots. Each virtual host has separate access_log and error_log files but all use the same format: access_common.

        To take advantage of the new multi-index search in Splunk 4, I've set up my instance to use different indexes for various sources. In my case, it's by person, as I have several groups of sites managed by a particular admin. The indexes are named www_something so as the overall administrator I can search across all of them with "index=www_*" and still not have to touch the other system events I've got going into the main index. I have also set up roles so each admin sees only the relevant data (and isn't confused by the rest.) All the config is explained in the docs, so I won't go over it right now.

        There are several reasons to do this. With each broad class of data in a separate index, I can apply different retention policies to each. This can be a big deal for high-traffic webservers where you might want to keep the OS logs around longer than the web logs.

        Next, if you can divide your data into discrete categories it makes it easier to assign roles to access only certain parts of it. "All your stuff is in your index" is a much simpler policy to enforce than "You get this, and that, and this other thing..." and so on. You can do that, and with excruciating granularity, via search filters, but under the hood what it does is tack stuff onto your search. This can lead to some pretty hairy searches as splunkd has to decide which results it's looked through actually should get returned.

        The most important is search performance: data can be pulled off disk only so fast. If there is less of it to slog through at once, the files that are looked at are more likely to be relevant and your search will complete faster.

        ]]>
        Dave Marquardt: 40 Days of 4.0: Searching Smarter and Faster with Splunk 4http://blogs.splunk.com/dave/2009/07/23/40-days-of-40-searching-smarter-and-faster-with-splunk-4/http://blogs.splunk.com/dave/2009/07/23/40-days-of-40-searching-smarter-and-faster-with-splunk-4/Fri, 24 Jul 2009 00:26:05 +0000Dave Marquardt

        Hi Splunkers, Dave here from the Search and Index team at Splunk. Coming from an engineering perspective, I’m excited about Splunk 4 because it represents a monumental improvement in search power. Not only is search about ten times faster than the previous release, but we have added several new features that empower users to search smarter and faster. This blog post is going to highlight just a few of these new features.

        Asynchronous Search

        Let’s start with a basic search for “Not Found” errors in web access logs via the UI:

        status=404

        The first thing you’ll notice is that you get events right away, with the timeline marching back as you get more results. Search is now asynchronous, meaning there’s no more waiting for a search to complete to get results. You’ll be able to find answers and start troubleshooting faster than ever.

        Lazy Key-Value Extraction

        The results from the above search will keep all the information about fields we extracted from the data, such as status code, number of bytes, and the referrer’s URI. But let’s say that the only thing you care about in the results is the client’s IP. Then you can use the fields command to make your search even faster:

        status=404 | fields clientip

        Using this technique, you’re telling the search that you only need the field clientip, which now limits what Splunk extracts from the data. This saves a lot of processing time, and on my laptop the search ran almost TWICE as fast. This is an extremely powerful technique to use when you know what fields you care about in the results.

        CIDR Subnet Matching

        So now that we’ve found what client IPs have received 404 errors, let’s limit those IPs to a particular subnet. If you’re not familiar with CIDR (Classless Inter-domain Routing) subnets, it’s simply a way of describing which IP addresses fall into a particular network.

        In Splunk 3.x, if you wanted to specify a subnet of 64.0.0.0/6, you would need a search like this:

        status=404 (clientip=64.*.*.* OR clientip=65.*.*.* OR clientip=66.*.*.* OR clientip=67.*.*.*)

        Not only is this search ugly, but it’s slower than it needs to be. Keep in mind this is just a simple example where the top octet in the IP address only has 4 different possibilities - in the worst case you would need 128 different clientip comparisons!

        For Splunk 4 we added automatic CIDR subnet detection when comparing a field, which is cleaner and faster. The above search simply becomes:

        status=404 clientip=64.0.0.0/6

        Multi-Index Search

        In previous versions, users were limited to searching one index at a time. For Splunk 4 we overhauled the search system to allow searching over any number of indexes at the same time.

        Here’s an example that searches for errors over all accessible public indexes:

        error OR fatal index=*

        It’s that simple. For admins that have access to internal indexes, you can access all of the events with the following search:

        index=* OR index=_*

        Perhaps the most useful application of multi-index search is the ability to partition different searches to different indexes. For example, if you wanted to search for 404 errors in your web index and undeliverable messages in your mail index, you could use the following search:

        (index=web status=404) OR (index=mail undeliverable)

        These partitioned searches are fast. They are highly optimized and each index only searches what is relevant.

        Furthermore, index search options are easily customized under “Roles” in the Manager page. Roles control what indexes users are allowed to access, as well as which indexes they search by default. For example, as an admin I sometimes find it useful to set my defaults to all internal and public indexes, so my searches hit all indexes.

        These are just a few of the powerful new features in Splunk 4. I encourage you all to try them out to help you find results faster and easier.

        Happy Splunking!

        ]]>
        Erin Sweeney: 40 days of Splunk 4.0-Interactive Field Extractorhttp://blogs.splunk.com/erin/2009/07/22/40-days-of-splunk-40/http://blogs.splunk.com/erin/2009/07/22/40-days-of-splunk-40/Thu, 23 Jul 2009 01:05:39 +0000Erin SweeneyOur first full day of Splunk 4.0 is under our belt and so far so good. More than 2400 people ultimately registered for the Launch webinar yesterday representing more than 61 countries and 1500+ organizations.

        To keep that momentum going, we're introducing 40 days of Splunk 4.0. Each day for the next few weeks someone from the Splunk team will author a blog post detailing a cool new feature, highlighting a fun mashup, or sharing something interesting we're working on or building for a customer. As always, we'd like you to contribute to the conversation as well, so feel free to leave a comment or shoot me an email to share your tip, search or story and we'll send you some limited edition Splunk 4 launch schwag.

        And since we're Splunk and we want to overdeliver, we'll start you off with two posts - check out Christina Noren, our VP Products 4.0 insights.

        And I think the Interactive Field Extractor, or IFX is pretty handy. IFX helps you teach Splunk how to extract fields from your data so you can then report on and analyze them. Check out this short video or read more in the docs to start using IFX straight away.

        Remember to watch for the tag "40 days of Splunk 4.0&#8243; and start adding to your bag o' tricks. It should be fun.

        Happy Splunking!

        ]]>
        Christina Noren: Splunk 4’s proving *everyone* can use IT datahttp://blogs.splunk.com/cfrln/2009/07/22/splunk-4s-proving-everyone-can-use-it-data/http://blogs.splunk.com/cfrln/2009/07/22/splunk-4s-proving-everyone-can-use-it-data/Thu, 23 Jul 2009 00:37:16 +0000Christina NorenThere's a big reason I haven't blogged here for a while: Splunk 4. I've been so wrapped up in it for the last year that I haven't really been interested in writing about anything else. Well, now it's out, so I'm back! So I'll kick it off with some background on why 4 is the Splunk I've always wanted and a little story about how my team and I have used Splunk ourselves in a new way the past few days.

        The aspect of Splunk 4 that I'm most excited about is all of the ways that it makes IT data accessible to everyone, regardless of their job.

        I've been a data fanatic since I started my first software company job 17 years ago and worked on forecasting and order management systems. I wasn't a developer but I was able to build out quoting and forecasting systems and do in depth analysis using Filemaker Pro and Excel.

        Since then, I've been involved in building out systems that let users analyze IT data in one form or another for 10 of the last 12 years, first running a tools team for MSN at Microsoft where my team spent $millions developing a log-driven executive dashboard, then at a pioneering log management vendor that moved from web analytics into SIEM, and the last 4 years at Splunk.

        I've seen an unimaginable variety of functions and users that need some kind of information based on logs and other machine data. The further from software development or hands on systems administration they are, the less aware they are that the information they're seeking is in a logfile somewhere. And even technical people who know what log it would be in may not have permission to access it.

        If such an access-deprived individual is lucky, they have the power or influence to get a sysadmin to pull the data for them. If they're not just access-deprived but technically handicapped, they also need to prevail on that sysadmin to write some scripts to massage the data into information. Then they need to trust that the sysadmin understood the business logic well enough to do the analysis right. It's like the old story of the hungry man being handed a 3-foot long spoon.

        Splunk 3 succeeded because it helped the access-deprived - which was huge in organizations hit hard with segregation of duty rules. But the non-technical user (or managers with technical chops but no time) still needed power users to run most analysis for them. Splunk 3 made it easier for technical users to fulfill the request but sysadmins still resented the distraction and savvy managers still worried about what was lost in translation.

        That was as true here at Splunk as anywhere else. When we shipped 1.0, our own sysadmin kicked the tires a bit but still grepped (yes, I admit it). Somewhere around 2.x a real production setup indexing all our website server, access and error logs continuously started to get frequent usage by our web developers and sysadmins to troubleshoot problems. Yet all the time I'd sit through executive, marketing, sales, product planning and other meetings and listen to discussions where people were substituting guesses for facts - because the facts were buried in logs somewhere and our sysadmins were too busy to be burdened with one-off requests to run analysis.

        As an example, I'd routinely ask Rachel, our Director of Documentation, for information about what docs topics were recently popular, trends in docs search engine referral terms, etc. as a guide to what we needed to fix in our product or processes. Sometimes I'd get the data, sometimes not, but it was always like pulling teeth. Even though Rachel and I are both technical enough to analyze a logfile the old way, we'd run into all kinds of roadblocks: switching docs platforms meant the logs stopped going to the system we were using, it was hard to set up a dashboard that we could both see, the stats we needed required analyzing more data than Splunk 3 could do on an ad hoc basis and we didn't have the permissions to do any back end config, we can do regexes but it takes too much time to swap back into that way of thinking... Ultimately we were both busy managers that would give up and go back to executing on our core jobs, without the information we really wanted. Exactly the same stories I'd hear from Splunk 3 customers about why there were still lots of groups that could benefit from Splunk that weren't yet doing so.

        The tide turned last Friday.

        In prepping for the launch I googled "Splunk 4.0&#8243; to see if people were already talking about it online. Lo and behold! Our own beta documentation, which was supposed to be locked down to beta customers, was in the google search results. Turned out that some special pages in the docs system enabled the google crawler to get to insecure versions of our beta docs at different urls than what you'd get by navigating our docs the regular way. A typical example of an unknown vulnerability in a web application's security, just like ones I hear of from our customers all the time.

        As the business owner of this web app the next thing I wanted to know was who had seen it that shouldn't have, what they'd seen, so I'd know whether it was a big deal or not and could decide a course of action. Too bad our daily web stats wouldn't give me any idea of traffic that matched this very specific pattern - I'd need some custom analysis of the raw logs.

        I started behaving like any hands off manager would - I started writing email to our web producer and web developer to ask them to pull the logs and do the analysis and I whipped up a storm with their bosses so they'd be given cycles to work on it. Then I stopped myself and logged into our live Splunk 4 instance instead.

        I first searched for all refers to the insecure uri pattern from google.com with search strings of "Splunk 4.0&#8243;. Almost nothing. Wait - that was my search and the site I used but other crawlers could index these pages, and our pre-launch marketing used "Splunk 4&#8243; not "4.0&#8243;. So I broadened my search to all refers to these uris from external domains to get a raw hit count - a few thousand. If I'd just gotten the total from our web guys and hadn't been looking at the data myself I probably would have accepted the wrong answer.

        So where'd these hits come from? 4.0's new search assistant told me a common next command was "stats". I clicked to add it to my search and I saw examples of past usage by others on our Splunk instance was "| stats count by clientip." OK. Click. Now it suggested "lookup" (new in 4.0). Click. Now it suggested "| lookup dnslookup clientip" - sounds promising. Click. As Splunk streamed in new client IPs to build my table on-the-fly I saw familiar names pop up in the domain names - a lot of Splunk customers, one competitor.

        Now I wondered what they'd seen. I couldn't tell from this simple statistic on the initial referred request if they'd landed on one page and left, or navigated around to lots more pages. So I found (through search assistant) examples of using stats to list uris and added that to the stats command arguments.

        I got my final result after just a few minutes - I had a table of results grouped by client IP and sorted in descending number of hits showing the first and last date they'd seen the special pages, their revdns hostname, the full sequence of URIs they'd viewed, and the referring domain and search query. The timeline at the top of the search view showed me that very few hits had happened before the launch webinar invitation went out. I shared an export of the results with impacted colleagues. We decided how to react based on complete information on the impact of the vulnerability. And I didn't waste any of our web guys time while they were busy getting splunk.com ready for launch.

        But that was just the first of many uses over the next few days. Yesterday, the day of the actual launch, I was more interested in keeping watch over whether initial downloaders were having a good experience, if they were just downloading or were reading the docs, and, based on the docs usage and search terms, what features they were trying first and what features may have been giving them trouble.

        Now, I've been asking for a dashboard with this information for a while. But, tired of asking, I just went ahead and built it. I was able to put all of this information on a new docs usage dashboard and share it with support, documentation and other colleagues - all through the UI using the new report and dashboard builders and Splunk Manager.

        The dashboard helped us identify some confusion around the need to upgrade to 4.x licenses which drove us to clarify the release notes and download page quickly. And now the whole docs team is enthusiastically using Splunk to better understand customers product and docs usage. They're even planning on starting to use examples of their own usage to illustrate topics in the manuals.

        I'm looking forward to seeing this tide turn for all of our customers too as others realize they can now get their own answers to all sorts of questions they used to leave unanswered.

        ]]>
        Michael Baum: If Splunk Was An Animal What Would It Be?http://blogs.splunk.com/thebaum/2009/07/21/if-splunk-was-an-animal-what-would-it-be/http://blogs.splunk.com/thebaum/2009/07/21/if-splunk-was-an-animal-what-would-it-be/Wed, 22 Jul 2009 00:42:39 +0000Michael BaumSplunk 4 is out of the bag and the Splunk community and our customers are kicking the tires. I even saw several executives from other log management, SIEM and system management vendors registered and attended our world-wide webcast with a thousand attendees. And Twitter is all abuzz with questions, answers and some ass kicking. Yes Splunk 4 kicks ass. It is 2x faster on indexing and up to 10x faster searching. We have a fantastic new App framework where you can build custom views, dashboards and work flows and there are countless numbers of other great improvements and new features. But sometimes we don't get it completely right and you all let us know.

        But back to my question, if Splunk was an animal what kind of animal would it be?

        "Odd thing animals. All dogs look up to you. All cats look down to you. Only a pig looks at you as an equal."

        - Winston Churchill

        I read that quote today at the birth place of Winston Churchill and it reminded me that Splunk is like a pig. We've always looks our users and customers straight in the eye with the good and the not so good. This has always been the transparent way we conduct business. So keep the feedback coming - the praise and the criticism.

        One of the areas that I'm especially interested in hearing about is our new App focus. We are in the very early stages of creating Splunk Apps and making them available to the Splunk community. Some are free Apps and some are premium Apps. The free apps are available for immediate download. The premium Apps you need to talk with us about so we can work with you on an installation. At some point we plan to have trial versions of the premium Apps available for download too.

        The free Apps include things like

        You can easily download the App .spl file, drop it into your splunk/etc/apps directory and check it out. More easily you can download and launch the Apps right from your Splunk Launcher screen (which is an App too). We're working on fully documenting all these Apps so if you need help now feel free to contact us via support@splunk.com. You can also select "Send Feedback..." on the first menu of the App to contact the specific App team directly via email. We're especially interested in what doesn't work, where you get stuck and what else you'd like to see. Several of these Apps are still beta versions so feedback sooner rather than later is much appreciated.

        Happy Splunk4ing!

        ]]>
        Erin Sweeney: T-minus 12 hours to Splunk 4–be among the first to see it!http://blogs.splunk.com/erin/2009/07/20/t-minus-12-hours-to-splunk-4-be-among-the-first-to-see-it/http://blogs.splunk.com/erin/2009/07/20/t-minus-12-hours-to-splunk-4-be-among-the-first-to-see-it/Tue, 21 Jul 2009 02:09:52 +0000Erin SweeneyTeam:

        The Splunk crew has been working away to bring you something even better than the Splunk you already know and love. Join us tomorrow as we unveil Splunk 4.0.

        Register to join us: https://splunk.webex.com/splunk/onstage/g.php?t=a&amp;d=595539039

        Splunk&gt; 4 the win!

        ]]>
        Simeon Yep: Monitoring input files with a white listhttp://blogs.splunk.com/simeon/?p=3http://blogs.splunk.com/simeon/?p=3Thu, 09 Jul 2009 21:46:31 +0000Simeon YepThere are many ways to feed data into Splunk. One method is to monitor the files within a directory. In the default 'monitor' configuration, Splunk will try to index all files within a specified directory. In some cases, you may have a directory which contains many files including some that you do not want to index. Splunk can be configured to index specific file types as well as sub directories. Here is a real-world working example of how to use a white list...

        Let us assume we want to index certain compressed files (*.gz) where the file name starts with "200906&#8243;. One of the filename's is "20090631.gz". These files exist in a specific directory: "/storage/datacenter/host1/webserver".  To make things more interesting, I have other *.log files in that directory. There are also other subdirectories within datacenter (such as host2, router1, router2). I want to only index the "host" (host1 and host2) files and exclude any router files.   Additionally, there are appserver and system directories which reside under each host directory. Conceptually, you want to do the following:

        * Tell Splunk to monitor the /storage/datacenter directory
        * Set a whitelist for this input
        * Edit the REGEX to match all files that contain "host" in the underlying path
        * Edit the REGEX to match all files that contain "webserver" in the underlying path
        * Edit the REGEX to match all files that start with "200906&#8243;
        * Edit the REGEX to machh all files that end with ".gz"

        Your final stanza in the $SPLUNK_HOME/etc/system/local/inputs.conf file would resemble the following:

        [monitor:///storage/datacenter/]
        sourcetype=gzfiles
        _whitelist=host[^/]*/webserver/[^/]*200906[^/]*\.gz$

        The above stanza would index the following files:

        /storage/datacenter/host1/webserver/20090601.gz
        /storage/datacenter/host1/webserver/20090602.gz
        /storage/datacenter/host2/webserver/20090601.gz
        /storage/datacenter/host2/webserver/20090602.gz

        The above stanza would NOT index the following files or directories:

        /storage/datacenter/logfile.txt
        /storage/datacenter/router1/logfile.log
        /storage/datacenter/host1/appserver/20090601.gz
        /storage/datacenter/host2/webserver/20090601.txt

        The following doc was referenced and can be viewed for more details: http://www.splunk.com/base/Documentation/latest/Admin/WhitelistAndBlacklistRules

        ]]>
        Robert Lau: Splunk partner i-NET-Systex @ Singapore - July 3, 2009http://blogs.splunk.com/robert/?p=22http://blogs.splunk.com/robert/?p=22Wed, 08 Jul 2009 15:20:27 +0000Robert Lau

         

        Courtesy of Systex Singapore Team.

        ]]>
        Simeon Yep: Splunk Dashboards outside of Splunk (part 2)http://blogs.splunk.com/simeon/?p=2http://blogs.splunk.com/simeon/?p=2Mon, 22 Jun 2009 22:29:52 +0000Simeon YepI recently blogged about a cool open source tool which is a Splunk Dashboard. In less than an hour, you could easily bring up a central dashboard to visually oversee Splunk administration duties. Here is a basic review of how to get the dashboard working, in combination with the Check Splunk tool.

        Prerequesites:

        • spdash
        • checksplunk
        • crontab competency
        • ssh competency
        • web server competency
        • cgi-bin competency

        Even if you are not very familiar with the above items, there is plenty of information available on the web to get things going. The README files that come along with the tools are very useful and should be reviewed before proceeding. The following steps are an outline of what I performed to get the dashboard working:

        Step 1: Install the spdash software on the web server host

        • Installed onto my linux server splunkdemo1
        • Installation consisted of: enabling the web server and placing the spdash scripts into the cgi-bin location
        • Runs on top of the OS installed apache web server from /var/www/cgi-bin/spdash
        • Runs on port 80
        • Edited the spdash script so that $STAT directory is located in /opt/demos/splunkdash/status
        • Create the above directory so that it contains ALL of the files used to compose spdash. Logs, statistics, etc... are here

        Step 2: Install the checksplunk software on the Splunk server

        • Installed onto my linux server splunkdemo1
        • Installation consisted of: placing the checksplunk script in it's own directory, creating a directory to store results, and enabling a local crontab to run checksplunk on a regular interval (see step 3 for the example command)
        • OPTIONAL - Install checksplunk onto your other Splunk servers. My example uses hosts located at 10.1.1.1 and 10.1.1.2)

        Step 3: Retrieve the checksplunk data

        • Setup a crontab on the web server host to retrieve the checksplunk data

        My crontab on splunkdemo1 is as follows:

        splunkdemo1&gt;crontab -l
        */5 * * * * /opt/demos/splunkdash/j2ee/checksplunk spdash
        */7 * * * * /opt/demos/splunkdash/email/checksplunk spdash
        */8 * * * * scp root@10.1.1.1:/opt/splunkdash/status/interop* /opt/demos/splunkdash/status/
        */6 * * * * scp root@10.1.1.2:/opt/splunkdash/status/cmdemo* /opt/demos/splunkdash/status/

        You will notice that I am running two remote secure copies and two local checksplunk commands. The local checksplunks are configured to feed data to the /opt/demos/splunkdash/status directory.

        Once you have checksplunk data feeding to the status directory, the cgi script should immediately pickup the data.

        ]]>
        Nimish Doshi: Using Splunk to Trace SOA Applicationshttp://blogs.splunk.com/nimish/?p=11http://blogs.splunk.com/nimish/?p=11Thu, 18 Jun 2009 20:49:45 +0000Nimish DoshiI have mentioned in past blog entries that Splunk can be used to contribute to the governance and indexing of Service Oriented Architectures. In this post, I will discuss a more common issue that pertains to log management, operations support, and troubleshooting. In a typical SOA deployment, you may have a situation where a user logs into a web site for procurement or purchasing, which kicks off a series of steps handled by different servers using heterogeneous technologies. One flow may include a web server, which initiates the request and sends a message to an application server. The application server then sends a message to an Enterprise Service Bus (ESB), which in turn, routes the message to a Business Process Management (BPM) solution.  The diagram below illustrates this basic flow.

        SOA Flow

        The complexity begins as soon as something goes wrong in the flow as each node in the SOA may represent a cluster and there may be multiple log files being generated to record what has occurred. Along comes Splunk to index all the log files using forwarders to send events to a central indexer. At this point, the user would have access to log events without having to log onto any production servers.

        To make the situation more complex, what if you wanted to now trace the flow of all users at a certain point in time and correlate what each user's session was doing on each node of the SOA flow? Splunk's transaction search can be utilized in the Splunk Web application to do this rather easily. For purposes of example, I am assuming that you already have an eventtype created called "SOA_Logs", which is just a search that includes all the different sourcetypes for SOA log files. Also, the web server log file may at first have a session ID for the authenticated user, the application server may map this to an user ID and the rest of the nodes in the flow may use this user ID to identify the same user. You would use Splunk's field extraction capability to extract these fields from your logs at search time. With these requirements, we could use a transaction search command to correlate all users for a certain time span within one search:

        eventtype="SOA_Logs"  | transaction fields="session_id,use_id” connected=f maxspan=5m maxpause=5m

        This search command will return groupings for all users with a session and user ID in a correlated manner, which follows the flow of the SOA. Each grouping will also give you a duration time so that you know how long an end to end flow took. Rather than go into the details for how transaction search works and the possible ways to use the above example, I invite you to read Eric "Maverick" Garner's excellent blog entry discussing the steps in very readable language. What I've done is use the same example in the business context for troubleshooting SOA applications.

        If you are already using Splunk for central log management in environments that are typical to this sample SOA flow, then out of the box, you will have this capability to trace your SOA applications to gain better visibility at the individual user level for events that have occurred. You can also pipe the results to a Splunk report command such as top. In summary, this approach can be valuable in troubleshooting complex deployments.

        ]]>
        Robert Lau: Splunk>Live! Bangkok May 26http://blogs.splunk.com/robert/?p=21http://blogs.splunk.com/robert/?p=21Thu, 18 Jun 2009 19:25:49 +0000Robert LauExcerpt from local IT magazine.

        ]]>
        Robert Lau: Splunk won the Best of Interop Tokyo 2009http://blogs.splunk.com/robert/?p=20http://blogs.splunk.com/robert/?p=20Thu, 18 Jun 2009 19:19:50 +0000Robert Lau米スプランク社製ITサーチエンジン「Splunk for Enterprise」

        Best of Show Award ネットワーク機器の輸入、開発、販売を手がける マクニカネットワークス株式会社 (以下 マクニカネットワークス、本社:神奈川県横浜市港北区新横浜1-5-5、代表取締役社長:宮袋 正啓)は、この度、2009年6月8日~12日に開催されたInterop Tokyo 2009において、マクニカネットワークスの取り扱う米スプランク社製ITサーチエンジン「Splunk for Enterprise」がBest of Show Awardプロダクト部門、アプリケーション部門においてグランプリを受賞したことを発表いたします。 

          Interop Tokyo 2009では、300を越える出展社がネットワークに関するさまざまな製品やソリューション、サービスを展示します。その中から今年のテーマにふさわしい、最も優れたものを決めるのが、“Best of Show Award”です。 IT業界有識者による厳正な審査や、来場者の投票によって選ばれ、部門によっては「該当なし」も出る厳しい審査です。製品・ソリューション・サービスは、まさに今年を代表する、新しいネットワーク環境を牽引していくにふさわしいものと言えます。 

        Best of Show Award アプリケーション部門でグランプリに輝いた、ITサーチエンジン「Splunk for Enterprise」  アプリケーション部門でグランプリを受賞した「Splunk for Enterprise」は、ネットワーク機器やパソコン、サーバなど複数の機器や、アプリケーションから出力されるログデータやConfigデータをリアルタイムに収集・蓄積し、検索、アラート、レポーティングを行うITデータ全体を対象とした収集、解析ツールです。テキストデータであれば取り込みが可能で、ログフォーマット等を問わず、メールやシステムの起動、シャットダウン、アクセス権変更のログなど、あらゆる種類のデータを一元管理し検索や分析を行なうことができるため、ネットワーク監視をはじめ、IT全般統制で重視されるアクセス権やアプリケーションの変更管理など、さまざまな分野に活用することが可能です。

          ログ・ITデータの「管理」を超えて、自由に「検索」をするという新たなコンセプトをもって登場した『ITサーチエンジン』 という新テクノロジーが、今回高く評価されました。 2004年に米国で創設されて以来、米国でも数々の賞を受賞しており、2008年には米国のBest of Interop賞(ネットワークマネジメント、ソフトウェア&サービス部門)も受賞しています。 マクニカネットワークスは、2009年1月より米Splunk, Inc.と国内一次代理店契約を締結し、国内販売を展開しています。

         

        ]]>
        Nimish Doshi: Using Splunk in a Screen Saverhttp://blogs.splunk.com/nimish/?p=9http://blogs.splunk.com/nimish/?p=9Thu, 18 Jun 2009 18:36:44 +0000Nimish DoshiSometimes users of Splunk like to have Splunk tell them what is happening with their infrastructure without doing an ad-hoc search. The most obvious way to accomplish this is to use Splunk Alerts. An alert gets generated for a saved search that is executed over a configured period and matches user defined conditions.

        Now suppose you want to visually just watch a saved search run on periodic basis. One approach would be to have the Splunk Web application in the browser auto refresh itself. If the requirement is that you would like this to appear full screen in real time for others to see without giving them any other access to your desktop computer (as you may be away), a possibility is to have the search run in a screen saver. I'll explain one way to get this to work.

        First, decide what searches you would like to run in a screen saver and test them out in a browser. Next, create a permalink to the search by using the pull down menu next to the Splunk search bar clicking on permalink. The URL will appear in the browser's address bar and should be copied away to some documentation utility such as notepad in Windows. An example URL that has been "permalinked" by Splunk would be:

        http://localhost:8000/?q=sourcetype%3D%22WinEventLog%3AApplication%22%20startminutesago%3D15&amp;selStart=false&amp;selEnd=false

        Next, you'll need to install a screen saver creation utility that allows web pages to be used as screens in a screen saver. For the purposes of testing, I'm using 2Flyer Screensaver Builder. All I did next was to use the saved URL above to create a web page for the screen saver and have it run every 30 seconds. This would allow me to execute a sequence of searches each being shown for 30 seconds at a time. After previewing the results, you can build the screen saver from the tool and you'll get a screen such as below running from your screen saver.

        Splunk Search in Screen Saver

        Now, the next question is authentication. For the purposes of testing, I used the free edition of Splunk and didn't have to deal with it. For the enterprise edition of Splunk, there is an application on Splunkbase called autologin that will allow automatic login into Splunk using a pre configured Splunk user and password. It is recommended to use an underprivileged user as your base user for security reasons. I got this working with Firefox as my default browser, but for some reason in IE, it had me go through one extra mouse click to accept the Splunk Certificate each time even though it had been added as an acceptable certificate and CA from the browser beforehand. Screen savers, by definition, don't allow you to interact with them using mouse clicks as that would exit the screen saver. Since 2Flyer Screensaver Builder was based on the IE rendering engine, I didn't try this any further.

        In retrospect, I don't recommend using an autologin feature to authenticate into Splunk as it does introduce a backdoor that you may not want, even if it is for an underprivileged user. A more acceptable approach would be to have the screensaver builder accept users and passwords to authenticate with any HTML form as part of building the screen saver. Overall, I write this blog entry to show you another interesting way to monitor activities in your operations center beyond traditional ad-hoc searches and alerts.

        ]]>
        Michael Baum: The Great Firewall of China: Internet Censorship Run Wildhttp://blogs.splunk.com/thebaum/2009/06/18/internet-censorship-run-wild/http://blogs.splunk.com/thebaum/2009/06/18/internet-censorship-run-wild/Thu, 18 Jun 2009 14:13:46 +0000Michael BaumThe past couple of days I've been visiting China meeting with some of our technology and channel partners. It just so happens I was present in Beijing for the 20th anniversary of the 1989 Tiananmen Square Events. Yes it really did happen despite what the Chinese government says. Speaking on Saturday at the F5 APAC Sales Kickoff I found myself staying over the weekend with Sunday off to roam around Beijing like a tourist, something I rarely get a chance to do on business trips. It is amazing to me to see how the Chinese and Taiwanese work on Saturdays. In the US we rarely see that. Europeans chastise Americans for working too hard but I guess they should really see the work ethic in Asia and then we'd look more normal.

        Watching the 2008 Beijing Olympics last summer things there certainly seemed more normal than 20 years ago, but being there in person with all the festivities gone things seemed really strange to me. It is very difficult to describe. Maybe I was jaded by all the newspapers I'd read on the way to Beijing. On a nice long 13 hour flight from Washington DC with plenty of reading material I consumed James Kynge's piece in the Financial Times questioning whether the Western media really understood why the student demonstrators were protesting. He went on ascribing the word "democracy" with the student motivations and questioning whether we or they really knew what it meant despite the fact that he spells out their desires in plan old English which sounds like democracy to me.

        "Almost everything fell within its scope: campaigns against corruption, nepotism, inflation, police brutality, bureaucracy, official privilege, media censorship, human rights abuses, cramped student dormitories and the smothering of democratic urges. But to say the demonstrations were to “demand democracy” is an oversimplification."
        James Kynge, Financial Times

        It's almost impossible to describe the strange feeling I got while walking through Tiananmen Square observing the soldiers and the huge portrait of General Mao that dominates the landscape. Maybe part of it was due to the increased tension of the anniversary. Maybe not. Tiananmen has come to symbolize the unspoken and largely unrecognized tension between the economic progress driving modern China and the old fashion communist government still ruling there. The Chinese seem to have a foot in both camps. The eeriness I felt came not only from my surroundings and an understanding of the principles they stood for but also from the reaction of my Chinese and Taiwanese friends. Their usually jubilant outgoing personalities were completely subdued in the square. Was a sign of respect and mourning that drove their thoughts? Perhaps to some extent. But in quiet whispers and conversations out of the ear shot of any "green" uniformed soldiers (versus the "blue uniformed" security guards they confessed to being actually scared to speak for fear of someone or something listening. Challenging them I said, "surely you must be joking." But it was no joke. Only when we crossed the street into the forbidden city did their usual personalities return.

        Of course this began a prolonged conversation over the next 24 hours as we visited the great wall, a new Beijing restaurant and departed through the impressive new Beijing airport. I kept asking and trying to understand. How can a country of so many people be controlled by the minds of so few? What are the real limitations to speak out? And what effect will economic progress have on the political future of China? There was no shortage of stories supporting the fact that the government still does take a very heavy hand to those who disagree. But rather than discuss it, everyday Beijing seems to sweep the event of 20 years ago under the rug. As one of my Chinese friends said, "everyone is embarrassed and we just pretend it never happened."

        At the same time I was traveling through out China, the articles started pouring in about Beijing's efforts to step up Internet and IT censorship. Upon reading the perspectives pouring in about "Green Dam" I was reminded of the impact the technology industry is having on the whole situation. It was bad enough I couldn't get to sites like Twitter and Youtube form my hotel room. Now the Chinese government is requiring every PC sold in the country starting July 1st has to have special software blocking all sorts of things. The move is being presented as an attempt to protect children from online pornography but is obviously one more attempt by Beijing take its censorship to a new level. China currently has the world's most sophisticated and multi-layered system of Internet censorship. Objectionable content on domestic Web sites is deleted or prevented from being published, and access to a large number of overseas Web sites is blocked or "filtered." Decisions about what to censor are based on the Chinese government's attempts to control the minds of 1.2B Chinese. There is no transparency or accountability, no public consultation in developing block lists or censorship criteria, and no way to appeal the blockage or removal of Web content.

        In a notice to PC makers, the Ministry of Industry and Information Technology said all PCs shipped in China needed to offer Green Dam/Youth Escort, identified as a "green internet filtering software", either pre-installed or as part of basic software packages. In May 2008, the government picked Jinhui Technology and Dazheng Language Technology, two Chinese software companies to develop the software, according to a contract award notice from the MIIT. While these companies claim their software is only being used to block sites although last year, researchers discovered that a Chinese version of Skype contained the ability to block politically sensitive words in instant messaging chats, and to keep a record of the use of such words.

        While there is obviously a legitimate role for filtering software, we're starting to see governments take this way too far. Green Dam is only one example of a global trend. Internet censorship is expanding rapidly and now includes a growing number of democracies. Legislators are under growing pressure from family groups to "do something" in the face of all the threats sloshing around the Internet, and the risk of overstepping is very high. In China's case it's an open door to abuse power in the attempt to prove the legitimacy of an ailing legacy.

        ]]>
        Erin Sweeney: Around the world, around the worldhttp://blogs.splunk.com/erin/2009/06/15/around-the-world-around-the-world/http://blogs.splunk.com/erin/2009/06/15/around-the-world-around-the-world/Tue, 16 Jun 2009 06:54:33 +0000Erin Sweeney

        We've been having some great success stateside recently at Splunk, but the fun doesn't stop there. Oh no. We have good news from the land of the rising sun...Splunk won Best of Show at Interop Tokyo. If you can read Japanese, you can read more about it here.

        Later this week I'll recap our SplunkLive South Africa events.

        ]]>
        Erin Sweeney: Affordable SIM/ SEM/ SIEM?http://blogs.splunk.com/erin/2009/06/12/affordable-sim-sem-siem/http://blogs.splunk.com/erin/2009/06/12/affordable-sim-sem-siem/Fri, 12 Jun 2009 23:07:08 +0000Erin SweeneyI know, I know, no one wants to hear about the bad economy, but many of our IT brethren are facing staff reductions and limited budgets. At the same time, security is becoming a greater concern. Employees leaving organizations could be taking secure/ private data with them, fraud and other hacks are on the rise.

        Even the White House is paying greater attention to digital security threats as President Obama looks to appoint a cybersecurity coordinator.

        Bottom line: you need to make time, or spend the money to ensure your networks and information are secure.

        Good news, John Sawyer at DarkReading has done a nice job detailing a few free/ inexpensive solutions in his article Free SIM Tools Save Money - And Maybe Your Data.

        Give it a read. Then download the free version of Splunk and check out either Splunk for OSSEC, Splunk for Network Security, or both. And again, you can read more on Splunk for OSSEC in the blog post from Dale Neufeld at Protus IP Solutions.

        But it's not just security where Splunk puts more hours in your day ... Splunk is one product with many uses, the only limit is your imagination. Customers are using Splunk for application troubleshooting, change management, network management, server management, virtualization management, PCI, SOX and FISMA compliance.

        I spoke with David Abbott, Infrastructure Management Analyst from ACS just this week, and he told us Splunk enables their service desk people to handle and troubleshoot customer issues 30X faster (1 minute to get the info they need instead of 30 minutes previously.) That sounds like a pretty good metric to me. How will you use Splunk? And what will your time savings be?

        Happy Splunking!

        ]]>
        Erin Sweeney: SplunkLive, Awesome CTOs and OSSEC, Oh my!http://blogs.splunk.com/erin/2009/06/02/good-things-happening-at-splunk/http://blogs.splunk.com/erin/2009/06/02/good-things-happening-at-splunk/Tue, 02 Jun 2009 23:48:46 +0000Erin SweeneyLots of good news at Splunk these days!

        1) Congrats to our lovely and talented Erik Swan, named among the Top 25 CTO's for 2009. We knew it all along, but great to have InfoWorld recognize him as well. Yay Boss!

        2) 7 of our customers are also on the list! A few we can mention include Peter Balnaves, CVS Caremark; Stephen Herrod, VMware; Judith Spitz, Verizon Business; Aber Whitcomb, MySpace

        3) Last week we beat out 1200 other nominees to win the TiE50 for the Internet Infrastructure category. The TiE 50 honors the top 50 startup companies that are leaders in innovation, ingenuity, and show excellence in management. Our Chief Architect and Co-founder Rob Das was on hand to receive the award.

        4) DiscussIT's IT Security Pubcast produced a nice overview of Splunk. Give it a listen - or share it with your friends who haven't yet drunk the Splunk Kool-Aid.

        5) SplunkLive, is our roadshow-like event series where we send executives out to different cities across the country and pair them with local customers to highlight various use cases for Splunk. We've got 4 coming up in the next 2 weeks!

        This Thursday, June 4, join us for SplunkLive San Jose at the Fairmont to see:

        • The operator of the world's largest retail electronic payments network and one of the most recognized global financial services brands, LeRoy Isaac, Senior Network Engineer, uses Splunk for network security monitoring and incident response.
        • 2Wire provides intuitive customer experiences for broadband carriers nationwide. Faisal Khan, Security Architect, will detail how his operations and security team rely on Splunk to identify and resolve issues, and conduct forensic investigations.
        • Genius.com is the only true SaaS solution in the marketing automation space. Zaid Ali, Director, Technical Operations, is using Splunk across the infrastructure for application troubleshooting and network and security management.

        Next Tuesday, June 9, join us at the Fairlawns Boutique Hotel in Johannesburg, South Africa to chat with Wybrand Conradie, Development Manager at Vodacom - as he shares how using Splunk helps them solve their application management and IT infrastructure challenges.

        Wybrand will join us again on Thursday, June 11, at The Spier Wine Estate in Cape Town, South Africa. If you're on the continent, come say hi!

        Rounding out the series is SplunkLive Denver, also on Thursday, June 11, at the Sheraton Denver Downtown Hotel. This time we'll hear from:

        • Lead Development Engineer Bill O’Brien shares how Splunk helps the leading local phone provider troubleshoot and secure the applications that keep more than 14 million customers connected and satisfied.
        • Director of Information Security at a Fortune 500 financial services firm, shares how using Splunk helps them solve IT security and infrastructure challenges.

        6) And rounding out recent highlights (not quite a Top 10 List) is this blog post HOWTO: Use Splunk as Your  Remote Syslog Server from Daniel Meissler. An outsider's take on using Splunk.

        But wait...this just in! Splunk for OSSEC, courtesy of Dale Neufeld. You can read about it here, or download Splunk for free, then go get the OSSEC app (also free).

        Thanks for reading, and remember if you want to speak at a SplunkLive, build an app for the general Splunk community, or just tell us the awesome ways you're using Splunk, hit me up: erin@splunk.com

        Hope to hear from you soon!

        ]]>
        David Carasso: Anomalies: How to find what you’re looking for, without looking for ithttp://blogs.splunk.com/david/2009/05/25/anomalies-how-to-find-what-youre-looking-for-without-looking-for-it/http://blogs.splunk.com/david/2009/05/25/anomalies-how-to-find-what-youre-looking-for-without-looking-for-it/Mon, 25 May 2009 23:14:54 +0000David CarassoVery often you want to find "problems" in your IT data, but you don't know what to look for. How can you find these problems with Splunk?

        In Splunk's new search language, there are several search operators that can help you. I'll describe only a subset of what is possible.

        • 1) You can search for unexpected events by looking at those that do not cluster into large groups. For example, you can cluster the errors in the last hour and report on the events the belong in the smallest clusters (e.g., 'error | cluster showcount=true | sort - cluster_count | head 5&#8242;).
        • 2) You can find unexpected events by finding values that are far from the standard deviation. For example, you can search for sendmail events with anomalous 'delay' values (e.g., 'sourcetype=sendmail_syslog | anomalousvalue delay action=filter pthresh=0.02&#8242;).
        • 3) You can use machine learning to find events that have unexpected values based on the past historical context (e.g., '* | anomalies blacklist=boringevents').
        • 4) It's a little bit of a hand-wave - but you can do really cool graphical reports that often make anomalies visibly obvious. For example, you could create a timechart of average cpu_seconds by host, and visibly see problems (e.g., 'sourcetype=top | timechart avg(cpu_seconds) by host').
        • 5) Finally, Splunk is expandable - you can define your own search operators. If you know how to find events interesting to you, you can write a simple script and trivially integrate it with the power of a search platform that deals for billions of events in seconds. Since Splunk uses a scalable map-reduce framework, your script will run in the map-reduce framework and scale automatically.

        Once you have searches that find unexpected events, you can set alerts for them. You can also combine events together into 'transactions', and look for anomalies in groups of events.

        ]]>
        Simeon Yep: Splunk Dashboards outside of Splunkhttp://blogs.splunk.com/simeon/?p=1http://blogs.splunk.com/simeon/?p=1Thu, 21 May 2009 22:32:16 +0000Simeon YepI was recently given access to an open source tool called spdash.  This tool allows you to externally visualize Splunk health from an Administrative standpoint. It consists of some cgi code and leverages a set of scripts (checksplunk) that grabs health information from one or more Splunk instances.   Information such as basic process status, listings of event counts, user specific search counts, and error messages are all presented in an intuitive screen.  Check out the main dashboard page:

        spdash

        After installing and running it internally on some of our systems, I have come away very impressed with what this can do for the System Administrator of a Splunk instance. One of the great features is the server link which allows you to get specific server information.  Here is a screen capture of that screen:

        spdash drill down

        When I first saw this being developed, I thought that it might be challenging to deploy. After less than an hour, I had a handful of servers sending and updating data to this dashboard. Now it's no cakewalk, but it's pretty straighforward. If you are very familiar with Splunk, have scripting experience, and can manage cgi on a web server then you should have no trouble. Kudos to the author, Kirk Waingrow, for making this available to the general public! If you are a System Administrator and manage Splunk, I would highly recommend you check this out.

        I will post a follow up that will contain details on my deployment...

        ]]>
        Erin Sweeney: SplunkLive Norway Recap: Telenor and Splunk for Securityhttp://blogs.splunk.com/erin/2009/05/06/splunklive-norway-recap-telenor-and-splunk-for-security/http://blogs.splunk.com/erin/2009/05/06/splunklive-norway-recap-telenor-and-splunk-for-security/Wed, 06 May 2009 21:16:06 +0000Erin SweeneySo as I said, things are really moving and shaking for Splunk EMEA. In addition to last week’s win of Best Integrated Security Solution, last week we hosted SplunkLive Norway in Oslo. Henrik Strøm, Security Architect for Telenor, presented how Splunk can help Telenor's IT Operations become more proactive and do in minutes what used to take hours.

        The Telenor Group is a leading global provider of telecommunications services. The company serves 164 million subscribers representing a strong footprint in Central and Eastern Europe and Asia. As one can imagine, 164 million subscribers generate a lot of data. In Norway alone they manage 1000s of servers and routers spanning many different data centers. In short - they are heavily reliant on IT.

        "We have a lot of data, a lot of tools, many groups of people, and too little communication between groups," Henrik said. "This makes it difficult to investigate incidents as no one person or department holds the keys to all of the data needed to conduct a proper analysis."

        Challenge:

        Telenor required greater visibility into their IT data silos, but needed the ability to control who could see what. After evaluating several log management vendors, they settled on Splunk, as it was open and easy to integrate into existing systems, scalable and was software rather than an appliance-based solution.

        “Splunk helps us consolidate our tools. It’s scalable and very open so it can integrate to existing and in-house tools where necessary. Better yet, Splunk makes the data we capture in logs available to less technical staff who might not otherwise know what is relevant or where to look for it.”

        Splunk in Action:

        Most users monitor and troubleshoot issues incredibly fast with Splunk – and Telenor is no different.  The dashboards and ability to do ad-hoc reporting are key features they rely on, and today Telenor uses Splunk for network monitoring and troubleshooting situations. But Henrik was further excited about his ability to put the power of Splunk into the hands of lower level users.

        For instance, he used Splunk to diagnose a kernel error on systems he did not have intimate knowledge of, while having a particular system expert diagnose the problem manually without Splunk. Both found the problem in minutes, but the system expert had full access to the servers, had worked on the systems over several months and knew exactly where to look. Henrik on the other hand did not require access to the systems to find the problem and did not really know what to look for. Opening the ability to do basic problem solving at lower levels, and sharing controlled access more broadly will provide Telenor with a big productivity advantage.

        The empowerment theme carries over to dashboards and ad-hoc reporting as well. Telenor is using Splunk’s saved search and alerting features to easily construct dashboards covering any number of items—failed logins, firewall traffic by port, specific user activities—whatever is important to their organization at a given time. 

Henrik has used Splunk to map and understand what is normal for a given environment. Just understanding what’s typical helps Telenor to build smarter alerts and manage their systems in a proactive manner - when something spikes up or an unfamiliar pattern appears analysts can dig in ... before a potential failure occurs.

        “Today’s monitoring tools just tell you when something isn’t working. With Splunk we can examine historical data, and watch trends to pick up on warning signs before an outage occurs,” Henrik said.

        Thanks Henrik! We look forward to helping your continued success—and we’ll keep the rest of you posted as Telenor develops new and exciting ways to apply Splunk to their IT environment.

        Thanks for reading. If you’re using Splunk for something cool—let us know!

        Best,

        erin

        ]]>
        Erin Sweeney: SplunkLive Chicago Featuring Snap-on Toolshttp://blogs.splunk.com/erin/2009/05/06/splunklive-chicago-featuring-snap-on-tools/http://blogs.splunk.com/erin/2009/05/06/splunklive-chicago-featuring-snap-on-tools/Wed, 06 May 2009 20:27:40 +0000Erin SweeneyIf you're in the Chicago area, get yourself to the Intercontinental Chicago tomorrow, Thursday May 7, 2009, at 8:30 am for SplunkLive featuring Flavio Marcato, Solutions Architect for Snap-on Tools.

        Flavio has been using Splunk to ensure all transactions are successfully flowing through Snap-on's ERP so that orders and invoices are processed and shipped without delay. He's centralizing application logs from several proprietary applications and loves the fact that Splunk can ingest any type of data without custom parsers, or connectors.

        Another speaker representing a Fortune 1000 company is using Splunk for NERC and SOX compliance. He'll tell us how Splunk's log centralization and ad-hoc reporting mechanism keeps the company compliant, and internal and external auditors happy.

        Register here.

        Hope to see you!

        ]]>
        Eric Garner: The Yoda Analogyhttp://blogs.splunk.com/maverick/2009/04/28/the-yoda-analogy/http://blogs.splunk.com/maverick/2009/04/28/the-yoda-analogy/Tue, 28 Apr 2009 07:21:06 +0000Eric GarnerAfter demonstrating the amazing features and capabilities of Splunk to numerous clients over the past couple years, I find that people still perceive it to be a very disruptive technology. So much so, it's still difficult for some to truly understand the magic of Splunk.

        They ask me "How is it that I can feed Splunk any kind of IT data I want, log files, SNMP traps, alerts, configuration files, xml, whatever, and know it will be indexed correctly?"

        The answer is one of most powerful features of Splunk called Universal Indexing and, hopefully by the time you finish reading this article, you will have a better understanding of what that is and why it's so powerful.

        To start down that path to understanding, I would like you to think about Yoda.

        Yeah, that's right, Yoda from the Star Wars movies. You know, he's that short funny-looking wrinkly green muppet character that speaks in a severely mixed-up manner. Remember him now?

        Now what does Yoda have to do with Universal Indexing, you ask? Well, it's not so much about Yoda, really, as it is about how Yoda talks.

        Lately I've been explaining how Universal Indexing works by using what I call The Yoda Analogy and it goes something like this...

        The way Yoda talks is so mixed-up, almost backwards, that it's extremely confusing at first, right? It's so out-of-order from what you and I are used to, with verbs coming last in the sentence and adjectives coming first or maybe second, and nouns and pronouns thrown in wherever there's room to throw them in.

        Yet no matter what Yoda says and no matter what order the nouns, verbs, adjectives, pronouns are arranged in, we can still figure out what he is saying and what he means.

        How is this possible?

        Well, if you think about it, it's not that difficult to understand how. It's based on how we learned to talk as children. When we were very young and first learning to speak, we did a lot of listening before we started talking, right? We listened to our parents and grandparents talk to us, maybe our older brother(s) and/or sister(s), and other friends and family members. We listened to them without any understanding of what a verb was or what a noun was or what sentence structure was. We just listened and listened and listened and one day we figured it out well enough to start talking and having conversations.

        In essence, what we were really doing was sampling the sounds that people made and looking for common patterns and correlations and after a while we figured out that certain sounds and patterns had specific meanings.

        Then we started to notice vocal tones and inflections and it became evident that speaking loudly or softly meant certain things and a vocal pitch that suddenly went in an "upward" direction at then end of a sentence probably meant you were supposed to respond. (i.e. we derived more structure and meaning...)

        And then as we grew up and and got older, we learned to talk in more sophisticated ways and we eventually learned what a word was and what a sentence was, and a verb, and a noun, and an adjective, etc, and we learned the difference between a question and a statement and so on. (i.e. more structure and meaning...)

        In other words, we learned about the complete structure and meaning of the language years AFTER we learned how to FIRST speak the language intuitively though sampling the sounds people made with their mouths.

        And it was that intuitive sampling capability that allowed us to listen to Yoda speak out-of-order and still determine what he was actually saying and what he meant.

        So, what does this have to do with anything, you ask?

        Basically, how we learned to talk is very similar to how Splunk's Universal Indexing works. Splunk does not assume what anything means at first. It simply indexes it, samples it, looks for patterns and correlations, and presents it to the end-user who then helps Splunk derive and apply meaning and structure AFTER THE FACT, rather than before-hand.

        Take a moment right now to imagine what it would be like if you tried to teach a child how to talk by explaining to them what a word was or about the structure of a sentence was and then proceeded to explain what a noun, a verb, an adjective was, and the difference between a question and a statement. Let's say that it was required that the child understand the complete structure of language before they could speak the language. That would be extremely limiting, if not impossible to do, right, because how can you explain ANYTHING to a child if that child does not yet know how to talk?

        It's kind of a crazy catch-22 of sorts.

        Yet there are IT products and tools that ask us to do that very thing everyday. We are required to understand the structure of our IT data BEFORE we can use these tools to talk with that data and have a decent informative conversation with it, right? We have to teach our tools about log formats and database structures FIRST, before we can expect to ask a question and get an answer.

        And then, to make things even more difficult, along comes some new "Yoda" log file format or IT data structure that's all out-of-order, and we wonder why we don't understand what it's trying to say or what it means.

        Happens every day in data centers around the world and the Splunk users know all about it because it used to be like that for them before they discovered Splunk and started using it and realizing quickly that for once, here is a tool that learns to talk the way I learned to talk, sampling quickly first and determining meaning and structure later. And because of that truly easy and intuitive experience, they find themselves saying, "Splunk makes more sense to me. You can use it and apply it more rapidly and easily to your IT data and have a casual conversation with that data, in all cases and situations, no matter what, even if it sounds like Yoda."

        So next time you find yourself struggling to get the answers you need about what's going on within your IT infrastructure, remember The Yoda Analogy and think about how Splunk's Universal Indexing can easily and intuitively enable you to finally listen and learn how to talk with your IT data the same way you've been talking with people your whole life. (BTW, if you want to, you can download Splunk now!)

        ]]>
        Michael Baum: Conficker is Proof We Need to Log Broadly and Analyze Deeplyhttp://blogs.splunk.com/thebaum/2009/04/23/conficker-is-proof-we-need-to-log-broadly-and-analyze-deeply/http://blogs.splunk.com/thebaum/2009/04/23/conficker-is-proof-we-need-to-log-broadly-and-analyze-deeply/Fri, 24 Apr 2009 05:04:51 +0000Michael BaumAt RSA this week it's easy to got lost in the menagerie of security technologies to conquer malware proliferation, stomp out spam and protect virtualized and cloud computing environments. But the most recent statistics show we are still losing the war on cybercrime. Symantec’s latest Internet Security Threat Report sited 1,656,227 malicious-code threats last year and 75,158 new active bot-infected computers per day. And yes the United States is still the most frequently targeted by denial-of-service attacks accounting for 51% worldwide and the top country for underground economy servers advertising stolen credit cards accounting for 67% of all activity worldwide.

        Why are we losing so badly? Not surprisingly, there was a lot of talk at RSA about the Conficker worm. Some of the chatter points to reasons why the security industry is falling behind. At first glance, the Conficker worm looks harmless. So far there are not too many significant reports of infected machines and hijacked data,
        but it may be too early to feel so smug about it. The worm’s real danger is its demonstrated ability to evade the expensive IDS technology enterprises have put into place and rely on today. Estimates are that 90% of the enterprise IDS implementations have failed to detect the worm’s presence and create some kind of actionable alert. How can this be?

        Conficker properties are simple but different from the typical threat. First Conficker affected systems outside of IDS coverage like USB keys and mobile user laptops. So if you’re looking for attacks from outside your network only, you won’t see it. It’s a “walk-in virus”. Second it isn’t greedy like Code Red and other viruses of late. The Conficker worm has built-in sleep cycles. So where a typical worm might scan 1,000 or 10,000 IPs a minute, Conficker was happy to scan maybe say 100 and evade the baseline trip wires. Third Conficker is very selective with its payload delivery. It only delivers when it sees a vulnerability. All this helps Conficker evade IDS systems that want to witness the crime. But Conficker is the perfect crime in that it goes undetected. With no payload delivered and seemingly fewer IPs scanned there is no grossly abnormal behavior to witness. The evidence is circumstantial.

        At a lunch on Wednesday, Tom Le of BT gave a good overview of how BT Managed Security Services detected Conficker for their customers. It was one of the first times I’ve really been sold on a managed security service beyond the value of cost and convenience.

        First, as Tom explained it, they started by assuming IDS would miss the attack. They didn’t assume a payload had to be delivered and didn’t assume that large number of scans were needed to indicate the presence of an intruder. Instead of depending on IDS, BT uses logs and events to baseline the natural behavior of even netbios triggered scans (which Conficker happened to use) and was able to alert on small changes in scans that would be missed if you were only looking at things like netflow. As it turns out most firewalls blocked the netbios scans going out so again most customers didn’t even know they had the Conficker worm present.

        Second Tom and his team assumed some type of command and control activity associated with Conficker. They followed the money watching for things like confikur trying to phone home in different ways. By having a broad set of logs and events from switches, routers, applications and IDS they were able to look for outlying behaviors like DNS lookups to obscure locations not typically seen in customer networks and aggregate this information across customers to identify common abnormalities. Tom estimates that BT sees roughly five billion messages a week across their customer base. That’s a lot of messages.

        After listening to all the chatter about Conficker and walking the show floor, it gets easier to understand how criminals continue to evade the security infrastructure enterprises put in place. There are just too many ways in which breaches can occur and there is just too much data scattered about to collect and correlate in order to find the anomalies. So the security industry continues down the path of specific solutions to specific vulnerabilities and criminals continue to create new threats that evade the industry’s point approaches. I say the industry as a whole needs to move to more of an adaptable and flexible approach that can apply security to what ever threats arise, when they appear.

        The best real world detectives are able to piece together seemingly circumstantial evidence and sift out the clues that lead to catching criminals. But every time it’s different. Perhaps we need to take the same approach in order to obtain more adaptable security solutions. Assume every time it’s different not the same.

        Logging broadly and analyzing deeply is one of the best defenses. Without a broad swath of data you won’t have the pieces of the puzzle to put together at the moment you need to solve the crime.

        Few criminals are caught in the act.

        ]]>
        Nimish Doshi: Audible Alertshttp://blogs.splunk.com/nimish/?p=5http://blogs.splunk.com/nimish/?p=5Wed, 15 Apr 2009 19:49:51 +0000Nimish DoshiI was talking to some Splunk Users and mentioned scripted alerts as a very powerful way to invoke any program to get an alert. My thoughts then came to audible alerts. Since a scripted alert can call anything, it is possible that the script can call a program that can remotely send an alert that is audible, not just readable (like an email alert). I can think of a simple use case for this. Suppose you already have alerts that go to your cell phone via SMS through the email alert function of Splunk. Now, if you are at home and your cell phone battery is dead and it needs to be recharged, you may miss an important alert until you turn on your cell phone. As a back up, if an alert can go to some other device that is always on, such as a voice enabled device, you'll have another opportunity to get the alert.

        First, you'll need to have a device that that can translate text to speech via a remote API. I chose to use a Nabaztag:Tag for this function. What's this? It's a voice enabled wifi rabbit that can receive multiple types of audible input including RSS, audio streams, and text to speech. What I did was set up a scripted input with environment variables on what to say, which included the name of the saved search, the number of events matched, and a readable subject. To make it more interesting, I added a day of the week (daily, weekdays, weekend) and start to end hours environment variables to control when the alert can be active. In a real life situation, you would want the alert to be active during your evening hours at home such as 6 PM to 11 PM. The script then calls a Python program that checks the time to be active, puts the final alert together as a String and then calls a Nabaztag REST based API to send the alert to the rabbit. The call to send the alert looks something like this:

        http://api.nabaztag.com/vl/FR/api.jsp?sn=00039D4022DE&amp;token=112231049046144&amp;voice=UK-Shirley&amp;tts=Splunk+Alert+...

        The sn and token identify which rabbit to send the alert to, the voice identifies the accent and language, while the tts is the actual text to be read. When executed, the rabbit lights up and speaks the alert. It will also flash with color until a button is pressed which will again speak the alert in case you missed it the first time.

        This example may sound a little playful as it uses a consumer gadget to serve the purpose. However, it illustrates that alerts do not always have to be textual in nature and can be as useful and creative as your imagination can conceive them. You can download the wifi rabbit example at Splunkbase and start using the same approach for your own audible device. Happy Belated Easter!

        Nabaztag Rabbit

        ]]>
        Paul Pang: Splunk ! Yes we can !! Yes we can !!http://blogs.splunk.com/paul/?p=3http://blogs.splunk.com/paul/?p=3Sat, 04 Apr 2009 03:21:29 +0000Paul PangToday, we have a great event in Taiwan. Our distributor Systex are having around 400 people together to attend the Systex Sales Kickoff. It's amazing that all topics are related to Splunk

        This is very impressed that the Sales Kickoff meeting in Systex in very similar to our SplunkLive event.During the events, there are around 20 speakers from different Systex business unit to have the splunk experience sharing. They are sharing their own use case on Splunk in different areas such as for Internal IT ( Infrastructure troubleshooting for Exchange, AD, Firewall .. etc ), for security ( change management, MAC address management ), for application in special domain ( BI analysis for online book store, BI analysis for media download service in Telco ..)

        The event is great. I really wanna bring in all of my customers to attend this event next time.

        P.S. After each presenter finished his sharing session, our MC Mr. Mosses will lead the team to say the team slogan together : " Splunk! Yes we can !! Yes we can !! A very passionate scene :D

        ]]>
        Robert Lau: SingCERT Security Seminar - Organized jointly by Splunk, Systex Singapore and SingCERThttp://blogs.splunk.com/robert/?p=17http://blogs.splunk.com/robert/?p=17Wed, 25 Mar 2009 08:45:57 +0000Robert Lau 

        [picture courtesy of Armie @ Systex Singapore]

        ]]>
        Johnathon Cervelli: Getting started with Splunk on Windows, a short subject documentaryhttp://blogs.splunk.com/johnathon/?p=16http://blogs.splunk.com/johnathon/?p=16Fri, 20 Mar 2009 17:08:13 +0000Johnathon CervelliHere in the Ivory Tower of Splunk, it's easy to forget sometimes that people in the rest of the world are busy too. Despite our undying love for search software, there are plenty of people out there who are just doing a drive-by of our software. We should make it super - dead - simple to use.

        That's a neverending story, however. But today's installment is a video on getting started with Splunk on Windows. If you're confused or having trouble getting going, it's our fault. But maybe this will help:

         http://www.splunk.com/view/SP-CAAADKS

        Enjoy, and merry Splunking.

        ]]>
        Robert Lau: Splunk and F5 Singapore - Consolidate, Maximise and Securehttp://blogs.splunk.com/robert/?p=14http://blogs.splunk.com/robert/?p=14Thu, 19 Mar 2009 12:35:09 +0000Robert Lau

        Thanks to F5 Singapore, NCS, Transition Systems Singapore and our local distributor Systex Singapore, together we had a very successful seminar presenting the joint solutions to a number of Singapore Government agencies. The day ended with a two hours hands-on training on Splunk for F5 Apps. Audiences were very excited to see how Splunk is being applied to turn IT data into intelligence and actions.  Click here for more information on F5 products. To test drive, download FREE Splunk for F5 Apps at Splunkbase!

        [pictures courtesy of Armie @ Systex Singapore]

        ]]>
        Robert Lau: Systex Splunk Lab @ Taipeihttp://blogs.splunk.com/robert/?p=9http://blogs.splunk.com/robert/?p=9Thu, 19 Mar 2009 10:11:12 +0000Robert LauThis round I am going to introduce our Splunk Lab in Taiwan.

        Splunklab is the brainchild of Systex COO Mr. Frank Lin. Not only does Frank share the vision of our 3 founders Michael Baum, Erik Swan and Rob Das, he is determined to make Systex to become the premium strategic ISV partner of Splunk worldwide.

        Today at Splunklab, you will be able to experience the power of Splunk with its real time demonstration of Splunk eating live IT data feeding from all sort of systems, devices and applications.

        * * * * * *

        SYSTEX Splunk Lab設立於精誠位於台北內湖的總公司,在SYSTEX Splunk Lab中精誠將設置產品Prove of Concept展示區,協助客戶瞭解導入產品的效果與狀況。另外,也作為雙方研發人員彼此交流的據點,以及相關產品的研究開發工作。

        由於Splunk 應用面相當廣,精誠會以跨事業部門的方式,於不同事業部門內挑選專業領域專家加入SYSTEX Splunk Lab的運作,主要成員為研發人員與技術工程師為主。這就是我們的splunk Lab workstation (實驗室工作站)
        研發的項目將會以客戶需求與趨勢發展的方向並進,初期將以Splunk for Check Point、Splunk for Trading、Splunk for Microsoft、Splunk for Oracle、Splunk for Virtualization為主要研發方向。
        Splunk Lab 初步的研發將以精誠的業務核心價值出發,對於資料庫、資安與金融等同時並進,協同Splunk 駐點人員一同開發市場需求。並計畫在Splunk的IT Search Platform上整合金融服務、電信業、製造業、醫療、資訊安全、商業智慧等領域長年累積的專業知識,替既有的客戶量身訂做一個專屬的 Splunker。

        這則是我們 SYSTEX splunk&gt; Lab 的主要機房,所有的研發基礎設備主機將會放置此處,同時,機房也具備了各式各樣的設備,如儲存裝置、apps 主機、防火牆主機、應用系統主機…等等,提供Lab 成員作為研發的”大後方“後援地。

        [Courtesy of Emy Tu, Systex Splunk Marketing Manager, Asia Pacific]

        ]]>
        Robert Lau: Splunk發表中文版網站! (Splunk Chinese websites launched!)http://blogs.splunk.com/robert/?p=2http://blogs.splunk.com/robert/?p=2Thu, 19 Mar 2009 04:41:54 +0000Robert Lau     It is amazing to see, in short span of just a few months since our marketing tours to China and Taiwan, a huge surge in numbers of visitors and downloads (tens of thousands) are from the region. New blogs are being written on Splunk by Chinese users to share their experiences.  Obviously there are strong interests there and we are preparing ourselves to support this key growth engine for Splunk business in the Asia Pacific.

             To that and with the help of our partner Systex, we have successfully launched two micro-websites for the Chinese markets.  They are now available at Traditional Chinese site and Simplified Chinese site.

         

        ]]>
        Robert Lau: Everybody Splunk! @ Taipeihttp://blogs.splunk.com/robert/?p=1http://blogs.splunk.com/robert/?p=1Wed, 18 Mar 2009 02:42:49 +0000Robert LauSplunk IT Search Hands-on Boot Camp @ Taipei from March 18, 19, 25 &amp; 26

        Organized by our partner Systex Taiwan, participants will be able to enjoy this 3-hours introductory training on the Seven Wonders of IT Search on Splunk.  Please check here for details of the event! Everybody welcomes! Everybody Splunk!


        ]]>
        Nimish Doshi: Change Management for SOA Configurationhttp://blogs.splunk.com/nimish/?p=4http://blogs.splunk.com/nimish/?p=4Mon, 16 Mar 2009 23:52:20 +0000Nimish DoshiIn a previous blog entry, I had mentioned that Splunk can participate as a Services Oriented Architecture (SOA) consumer and provided an example on using web services as a scripted input. In today's entry, I'll discuss a more administrative task, which is quite native to Splunk, change management. As you may well know, Splunk can audit the file system and provide events on any change to a directory such as additions, updates, and deletes. Splunk can even monitor the files' contents so that you can do a Splunk diff command to know exactly what has changed.

        Now, what does this have to with SOA? In a typical SOA set up, there will exist a number of configuration files. At a minimum, you may have Web Services Definition Language (WSDL) files, XML files, and XML Schema (XSD) files stored on the file system of a production machine. It is important that any change that occurs to these files be authorized and monitored to provide control over deployment.

        Let me introduce a simple use case. First, use Splunk's fschange as a data input in your inputs.conf to monitor a directory. For example,


        [fschange:/Applications/splunk/wsdl]
        sourcetype = wsdl_monitor
        index = testing
        disabled = false
        _blacklist = [~]$
        _whitelist = \.wsdl$
        recurse = true
        sendEventMaxSize = -1
        pollPeriod = 600
        fullEvent = true

        This stanza means I am monitoring my WSDL directory recursively for changes to WSDL files (excluding backup files ending with ~) every 600 seconds and I would like the full event (the entire file) to be stored within Splunk. Notice, I have an index = testing criteria, because the default is that index = _audit, which may not be where you want to place file system change events. You can also use index = main if you want your main index to have these events. Now, after I restart Splunk, I can start making changes to my configuration files and Splunk will tell us what has changed.

        My search index=testing sourcetype="wsdl_monitor" yields results such as:

           ::  =, ="///.", =, =, =, =, ="   :: ", ="- -  - ", =, =" "
        source= | host= | sourcetype=
           ::  =, ="///.", =, =, =, =, ="   :: ", ="- -  - ", =
        source= | host= | sourcetype=
           ::  =, ="///."
        source= | host= | sourcetype=

        I can now tell who changed the file or directory and when it was done. Notice how action=add, action=update, and action=delete can be used as the basis to form event types which can later be used for Splunk Alerts. This means that changes to the SOA configuration can actively be monitored. Moreover, if a change was supposed to happen, and a search yielded no results for your action=update path=/Applications/splunk/wsdl/iptocountry.wsdl within the last 24 yours, the absence of a change may be worthy of an alert as someone forgot to update the WSDL.

        An interesting next step is to see what changes were made to a configuration file to see if the changes were innocuous.  In my example, the search index=testing source="/Applications/splunk/wsdl/iptocountry.wsdl"|diff yields the following:

        diff x y compares x to y
        - indicates a line present in x but missing in y
        + indicates a line present in y but missing in x
        ! indicates a line that exists in both x and y, but contains different information

           
            
              
                     
                   

        Notice that the change is for the documentation, so it is relatively harmless.

        What this means is that you can now easily use your existing Splunk installation to monitor changes for your SOA configuration. The entire activity for monitoring SOA is part of SOA Governance. Keep in mind that just because you are monitoring file system changes does not in itself constitute an upward step in the SOA maturity model. However, with Splunk's ability to monitor changes, instantly produce differences for what has changed, ability to provide active alerts for the changes, and ability to produce reports for analysis, you have a valuable tool in your arsenal for achieving a step closer towards SOA maturity. To achieve a more sophisticated and powerful approach towards monitoring your SOA configuration, consider also using the Splunk for Change Management Application, as it provides predefined reports and dashboards to facilitate change auditing, change detection, change reporting, change validation and incident response based on change events, change tickets and configuration files. I hope this article and example will get you closer for examining using Splunk for monitoring SOA configuration files.

        ]]>
        Nimish Doshi: Everybody Splunk with the Splunk SDKhttp://blogs.splunk.com/nimish/?p=3http://blogs.splunk.com/nimish/?p=3Thu, 26 Feb 2009 00:41:51 +0000Nimish DoshiOne of our partners in Asia came up with the interesting catch phrase "Everybody Splunk", which we say internally. Today's topic is about everybody using Splunk's SDKs. As I've spoken to Splunk users, I've noticed that many of them are not aware of their existence. This topic has been discussed elsewhere in the development guide, but I'll summarize. Splunk has SDK API to perform Search outside of using Splunk Web and the CLI that is available for

        • Java
        • Python
        • C#
        • PHP

        If that doesn't cover your favorite language, then, use the REST API which is the foundation for the SDKs. With the REST API, you can use any language you want that supports URI communication to search an index. The approach in each SDK is essentially the same. First authenticate, create the search string, iterate over the results, and then close the job. It's that simple.

        This brings me to the heart of today's topic: Doing a Search in an application. Often developers are asked to look at time series data files (e.g. log files or application generated events) via an application. They may end up using libraries that help read, parse, and search files. Even if the code is simple, files that are only a few MBs in size may grow to be GBs in weeks. The point is that any search will be sequential and probably slow. If the data were indexed within Splunk as in just point Splunk to it, then a SDK could be used to perform the search. Because it is indexed, search time will have high performance characteristics and Splunk's search capabilities and language will provide a rich interface to manufacture the search. In this manner, Splunk becomes part of the application, where search is an integral part of the development and production results. In a future blog, I'll go over an example for using one of the SDKs.

        Now, I just can't resist adding some verse to Everybody Splunk. Don't worry; I won't quit my day job.

        Everybody Splunk.
        Superstars Dunk.
        Everyone say hey.
        Find the needle in the hay.
        Let Splunk show you the way.

        ]]>
        Lamar Holtzclaw: History Repeats Itself…http://blogs.splunk.com/lamar/?p=2http://blogs.splunk.com/lamar/?p=2Fri, 20 Feb 2009 19:28:21 +0000Lamar HoltzclawNimish Doshi: Splunk as a SOA Consumerhttp://blogs.splunk.com/nimish/?p=2http://blogs.splunk.com/nimish/?p=2Tue, 17 Feb 2009 18:00:41 +0000Nimish DoshiWhen you think about Service Oriented Architectures (SOA), Splunk doesn't come to mind first. However, it is important to realize that any entity that is able to consume or produce services is by definition a participant in a SOA. With that said, let me state that Splunk can easily capture and index the output of a web service later used for search.

        The next question is what are the use cases. Information that can be captured in a time series manner is ideal for Splunk. For example, suppose a warehouse is using a RFID reader to capture the movement of goods in and out of its facilities. This information usually drives a software business practice, which in turn may have web services to query the current state of what is happening. With Splunk, you could use a scripted input to capture the output of a web service. The script would call a web services client written in any of the usual web service friendly languages and information such as the inventory of purchased goods would be captured in a time series manner every N seconds. After these snapshots in time of purchased goods are indexed, you can then run time delimited searches and trend reports from Splunk Web to provide instance analysis.

        This example is just a limited introduction for what can occur with Splunk and SOA. As more and more people deploy SOA, services are available to capture metrics within a corporation. Splunk could be used as a quick and powerful mechanism to capture time series metrics to provide search capable insight into information flow derived via a service.  To show that this is a real possibility I've created on Splunkbase a weather example and a stock quotes example using public web services advertised from Xmethods. Weather output looks like this:

        &lt;?xml version="1.0&#8243; encoding="utf-16&#8243;?&gt;
        &lt;CurrentWeather&gt;
        &lt;Location&gt;Nice, France (LFMN) 43-39N 007-12E 10M&lt;/Location&gt;
        &lt;Time&gt;Dec 09, 2008 - 12:00 PM EST / 2008.12.09 1700 UTC&lt;/Time&gt;
        &lt;Wind&gt; from the NNW (330 degrees) at 10 MPH (9 KT):0&lt;/Wind&gt;
        &lt;Visibility&gt; greater than 7 mile(s):0&lt;/Visibility&gt;
        &lt;SkyConditions&gt; mostly cloudy&lt;/SkyConditions&gt;
        &lt;Temperature&gt; 46 F (8 C)&lt;/Temperature&gt;
        &lt;DewPoint&gt; 24 F (-4 C)&lt;/DewPoint&gt;
        &lt;RelativeHumidity&gt; 42%&lt;/RelativeHumidity&gt;
        &lt;Pressure&gt; 30.00 in. Hg (1016 hPa)&lt;/Pressure&gt;
        &lt;Status&gt;Success&lt;/Status&gt;
        &lt;/CurrentWeather&gt;

        The Stock quote output is similar in style. Because the output is in XML format and a timestamp is already in the data, this was very easy to capture in Splunk. Feel free to try either example with your own Splunk installation using your own cities for weather and your own stock symbols. Hopefully, it will inspire you to create your own applications using SOA output for indexing into Splunk.

        One last note is once XML is indexed within Splunk, any search can be piped to the Splunk xmlkv command. This will automatically create field extractions for you for all the elements in the XML stanza. These field extractions can next be used for your Splunk Reports.

        ]]>
        Michael Wilde: My head is in the clouds? Help me RightScalehttp://blogs.splunk.com/thewilde/2009/02/09/my-head-is-in-the-clouds-help-me-rightscale/http://blogs.splunk.com/thewilde/2009/02/09/my-head-is-in-the-clouds-help-me-rightscale/Mon, 09 Feb 2009 19:12:17 +0000Michael WildeUpdate: If you're interested in checking out the Recorded Webinar as a result of the news below, it is located here:

        RightScale / Splunk Webinar

        If you're new to the cloud, and new to Splunk - or neither - spare an hour tomorrow, February 10th at 11am PST. Splunk and RightScale will be putting on a pretty cool webinar about IT search in the cloud. Infrastructure-as-a-service is becoming more popular as a solution to many challenges IT faces in the coming years. Our friends over at RightScale have quite an amazing platform for managing cloud infrastructure.

        RightScale makes it dead simple to get infrastructure deployed in the cloud, but once you're up and running, what about your IT data - logs, configurations, messages, etc? Thats where our partnership comes in. Rightscale is a Splunk Powered Associate - that means, if you want your Splunk in the cloud, check them out.



        Specifically, any user of RightScale, can easily install the Free License of Splunk (limited to 500MB
        of indexed data per day) - without downloading. After a few clicks, your Splunk server will be installed and ready to receive data. A few more clicks and you can simply install a Splunk forwarder on every server in your deployment Rightscale's configuration system makes it a snap. No manual SSH install needed, no configuration, its just done - "and you can't shake a stick a that".

        Check out the webinar tomorrow, February 10th, 2009 at 11:00 AM PST. I'll be there.. I hope you will too

        Blogged with the Flock Browser
        ]]>
        Michael Baum: How Much More Free Can Free Get?http://blogs.splunk.com/thebaum/2009/02/03/splunk-powered-associates/http://blogs.splunk.com/thebaum/2009/02/03/splunk-powered-associates/Tue, 03 Feb 2009 22:52:32 +0000Michael Baum
        Well if you ever wanted to integrate Splunk into your own product or service, free is now really, well ... free. We've always had a free Splunk license for end users. But now we have the same for software, hardware and service provider partners. Now as a Splunk Powered Associate you can distribute Splunk with the free license key as part of your offering. You can also link to the Splunk free license download and earn referral credits if the download leads to a purchase. Pretty cool heh? Now the free license is still limited to the 500MB daily uncompressed indexing volume but hey that's a lot of data for free.

        A few of our Splunk Powered partners have picked up on the real potential here. F5 Networks, for example, has created a Splunk App that pre packages searches, alerts, reports and dashboards for F5's ASM and FirePass products. Now F5 customers get real-time search, alerting, reporting and analytics for free with
        Splunk for use with F5 Networks. Support for F5 LTM and BIG IP is coming soon.

        And the folks over at RightScale are taking Splunk into the clouds. RightScale is a great cloud computing management platform that let's you control your cloud resources across several different providers from one interface. We use RightScale at Splunk to control our demo instances on Amazon EC2/S3. Each demo instance consists of one or more servers running in the cloud that recreate a live IT environment like a J2EE-based E-commerce application, a converged network or a rack of Microsoft Windows Servers. It's important that we are able to scale these instances up and down dynamically and RightScale comes to the rescue. The integration of
        Splunk and RightScale gives cloud us the IT control and visibility we need.

        Every piece of software, hardware and service on the planet generates IT data. And now you can bring Splunk to your community by integrating it into your solution at no cost to you, your channel or your customers. To join the Splunk Powered Associate program just Sign-up to be a Splunk Powered partner and we'll take it from there.

        Happy Splunking!

        ]]>
        Johnathon Cervelli: Splunking for a rogue exchange adminhttp://blogs.splunk.com/johnathon/?p=14http://blogs.splunk.com/johnathon/?p=14Mon, 02 Feb 2009 23:31:49 +0000Johnathon CervelliRecently I was speaking with a customer who was concerned that one of the Windows admins was reading the email of regular users. Thought I'd share this tidbit as a simple example of the power of search. In this case, we didn't even have to go to other data sources other than the relevant event log, though later analysis of netflow logs triangulated from where the admin was connecting to the Exchange server from.

        Problem: Senior admin has reason to think another admin is abusing privileges and reading other people’s mail on Exchange.
        Use Case: Splunk the Exchange event logs to check for insider threat.
        Search 1: bad_admin_username “EventCode=1016”

        Finds: User who has opened up a mailbox that is owned by someone else.

        Search 2: bad_admin_username “EventCode=1013”

        Finds: User who has opened up an additional mailbox. Needed because if the mailbox is shared (ie alias for a particular department) you won’t get a 1016

        Use Case 2: Check for network logins by the admin to the Exchange box in the security log. This search will show if they’ve been using the Exchange console to connect remotely and take unauthorized actions
        Search: bad_admin_username “Login Type=Network” “Success Audit”

        Finds: Shows if admin has been using the Exchange console to connect remotely and take unauthorized actions. Note that you will not know what the action is unless you have turned on more aggressive auditing than the default.

        ]]>
        Johnathon Cervelli: Boss! Boss! De-Boost! De-Boost!http://blogs.splunk.com/johnathon/?p=1http://blogs.splunk.com/johnathon/?p=1Thu, 29 Jan 2009 23:20:36 +0000Johnathon Cervelli 

        Ever had a girlfriend that just wouldn’t … leave?

         

        (or, for those that prefer boys, if you’ve known a Mission emo brat, you too know what I mean)

         

        Maybe it was a hookup. Maybe a friend of a friend. But you were always just sorta biding time. Hanging out till something better comes along. Eventually, that better thing turned out to be, well, anything else. Like watching dust collect on the shriveled remains of your caring.

         

        Remember the relief when one day, after months of items too conveniently “forgotten” at your house, ignored phone calls and awkward social gatherings when you suddenly realized…

         

        They were gone?

         

        &lt;&lt;cricket noise&gt;&gt;

         

        Now crystallize and concatenate that relief you felt with the very real, inescapable fact that Boost is gone. That’s right, janky code that stuck around way too long, forgotten about until it called you in the middle of 24 to tell you that it had left a mutex behind the couch. Another moment lost, stolen by the one that won’t, for the love of God, get away.

         

        Now it is gone. Mitch killed it. Exorcised it from Splunk with rituals to dark pagan Gods. Slayed it like Grendel. The deed is done.

         

        Now we feast* and drink in revelry to this glorious act. Come and raise a glass of de-boost. Only on the South Side.  

        ]]>
        Eric Garner: Splunk for Xitive Xactionshttp://blogs.splunk.com/maverick/2009/01/17/splunk-for-xitive-xactions/http://blogs.splunk.com/maverick/2009/01/17/splunk-for-xitive-xactions/Sun, 18 Jan 2009 03:42:21 +0000Eric GarnerHappy New Year and thanks to everyone who has been subscribing to my blog recently. I greatly appreciate it!

        Every week people ask me to show them how to use Splunk to stitch together multiple events that might exist in different locations within different sources because, from an IT perspective, they are considered to be part of larger transaction groups. They tell me they want to know how to do this because the ability to trend against transitively-related events becomes very powerful in helping them understand the reality of IT operations and how efficiencies can be increased and costs can be more quickly and significantly reduced.

        I thought I would share a quick example of how to do this using the transaction command.

        Let's start with a couple sample user activity log files containing some events that are related by multiple keys. Take a moment to study the two following sample activity log files and notice how the user and session key values are related between the files.

        -------------------
        xusers1.log
        -------------------
        XU*** user event: user=maverick credentials cleared
        XU*** user event: user=maverick authentication processing complete
        XU*** user event: user=johndoe credentials NOT authenticated properly
        XU*** user event: user=johndoe illegal login attempt
        
        -------------------
        xsession1.log
        -------------------
        XS*** transaction event: user=johndoe session=2220 msg: failed login
        XS*** transaction event: user=maverick session=1110 msg: successful login
        XS*** some other event: session=1110 msg: maverick did something while logged in
        XS*** still something else here: session=1110 msg: this user logged out now
        

        Now, if you splunk these two files and specify the first one as sourcetype=xuser and the second as sourcetype=xsession, then executing the following search within the Splunk web user interface

             (sourcetype=xuser OR sourcetype=xsession)
        

        then all of the results from both files are returned, which should look something like this:

        Now, to make things a bit easier, let's save our current search as a custom eventtype and call it "XACTION_EVENT" and then click the "Show Fields" option and search on this new eventtype, which will look like this:

             eventtype="XACTION_EVENT"
        

        and the search results will look like this:

        Next, let's say you want to correlate all of the events from either file that have matching session keys into one multi-line event (or transaction) grouping. To do this, you might submit the following search:

             eventtype="XACTION_EVENT"  | transaction fields="session" maxspan=1d maxpause=1d
        

        which will take the xuser and xsession sourcetype events and group the ones containing matching session key values. The result set looks like this:

        Now, let's change the search to this:

             eventtype="XACTION_EVENT"  | transaction fields="user" maxspan=1d maxpause=1d
        

        so that now it correlates the events by the user key values, instead of the session key values.

        Now, notice the XU and XS characters appearing at the beginning of each event indicating that you are finding matching key values that appear within BOTH sourcetypes. Pretty cool, huh?

        But wait, it gets better!

        Before I show you just how much better, I want you to scroll back up and look that the resulting events for each of the separate xuser and xsession key searches one more time and this time notice that some of the events have a xsession key AND an xuser key appearing within the same event. Don't you think it would be more powerful to use those specific events as a kind of 'bridge' to correlate all of the key values together into one big transaction? I mean, after all, that's probably what you really want to know is how everything relates all at once....within one big final truly transitive transactional story, right?

        Well, fortunately the Splunk transaction command can do this in a very simple and clean way. And if you paid close attention to how we've been using the transaction command, you will see that there is a fields parameter. Notice that the fields parameter is plural. The reason it's plural is because you can specify more than one key value to match on. And let me remind you, there is no other technology on earth right now that offers a correlation capability as powerful and as easy to use as Splunk.

        Okay, so let's do this transitive transactional search by changing our last search string to include both user and session key values within the fields parameter and separating them with a comma and adding the connected param set to "f", like this:

             eventtype="XACTION_EVENT"  | transaction fields="session,user" connected=f maxspan=1d maxpause=1d
        

        and now the results below are way more informative and much better at painting the complete transitive transactional picture:

        Now, I don't know if you find this type of transitive transaction analysis useful or not, but my experience with helping my Splunk customers use this command effectively leads me to believe that you do.

        Before I end this blog post, I want to make you aware of two additional and powerful artifacts of using the transaction command, which are the duration and linecount fields. Since you are using a command that groups separate events together, the time of the total transaction to take place (i.e. the duration) as well as the number of total events appearing within the transaction (i.e. the lines) are automatically calculated for you.

        Therefore, if you take our example transactions shown above and determined (for whatever reason) that any transaction that has a duration greater than three seconds or any transaction containing less than five lines is a bad transaction, you could enhance your search to consider those conditions, like this:

        eventtype="XACTION_EVENT"  | transaction fields="session,user"  connected=f  maxspan=1d maxpause=1d | search duration > 3
        

        or this:

        eventtype="XACTION_EVENT"  | transaction fields="session,user"  connected=f  maxspan=1d maxpause=1d | search linecount < 5
        

        ...and, of course, the resulting effect would be a filtered list of transactions matching your conditions.

        Make sense?

        BTW, for a few more advanced examples of how to use the transaction command more effectively, see David Carasso's blog post.

        Anyway, I hope you are getting a good feel now on how to use the transaction command and I wonder if you are now getting some ideas how you might be able to leverage this very easy yet powerful search command to correlate your events now.

        If you do, please leave a comment below and let us know about it. We are always looking for better ways to
        use Splunk for everything!

        ]]>
        Johnathon Cervelli: Time goes by. More slowly.http://blogs.splunk.com/johnathon/?p=2http://blogs.splunk.com/johnathon/?p=2Wed, 31 Dec 2008 22:30:45 +0000Johnathon CervelliDid you know the earth, in addition to warming, is slowing? We, the early Global Slowing movement, are raising awareness of this issue here at Splunk.

         

        So dire is the threat that time itself is being distorted by this world-wide phenomenon. To compensate, authorities have declared a Leap Second to protect us from slowing rotational patterns.

         

        Therefore, the Global Slowing movement beseeches you to use this extra second wisely. Join us as we protest this travesty with a shot at exactly midnight GMT (16:00 PST) on the south side.

         

        Remember, it’s like it never was. What happens on the leap second, stays on the leap second.

        ]]>
        Michael Wilde: It’s time for a Boxee-ing match with Splunk!http://blogs.splunk.com/thewilde/2008/12/29/its-time-for-a-boxee-ing-match-with-splunk/http://blogs.splunk.com/thewilde/2008/12/29/its-time-for-a-boxee-ing-match-with-splunk/Mon, 29 Dec 2008 19:12:33 +0000Michael WildeAnd now for something completely different! In working with some interesting data generated by Boxee media center software, I found that we could use Splunk as a "Ratings Reporting Engine". Additionally, as Boxee is open source, I thought it might be handy to give their developers realtime access to my log data as its being generated.

        http://upload.wikimedia.org/wikipedia/commons/0/03/Boxee.png
        Background: Boxee is a cool, open source, media center software package that runs on AppleTV, Linux, MacOS X &amp; windows (soon). Allows you to watch movies, internet video content, even Netflix. Boxee itself generates some interesting log data. Boxee also allows for a viewer to automatically send a message to Twitter when a program is being viewed.

        What could we do with this?:

        Using Boxee’s own Logs:

        • Detect errors so that the developers can see them live
        • Calculate viewing duration on a local Boxee instance and some other cool reports

        Using information in Twitter:

        • Create reports that show most watched shows and most active users

        View all of this live right now in my public Splunk server.


        Local Boxee Logs

        In my setup, I have Boxee running on AppleTV. Splunk is also running on AppleTV. Splunk monitors data and forwards its logs up to my public Splunk server over TCP. Send yours up if you want!. When I looked at the Boxee log data in Splunk there were a few events that piqued my interest

        When the “DVDplayer” program opens a file to be viewed, it records and event and the same goes when it closes a file. Hmm.. Makes me think that using Splunk’s “Transaction” search operator, I could tie them together, AND, calculate the duration of viewing. Smells kinda fun. How does that work?

        Here’s the search command that’ll make this one work:


        dvdplayer (opening OR "closing video") NOT SQLite | rex "Downloads\/Boxee\/(?&lt;title&gt;[^\/]+)\/" | rex "Movies\/(?&lt;title&gt;[^\.]+)\." | rex "file\/get\/(?&lt;title&gt;[^\.]+)\." | transaction startswith="eventtype=\"open-movie\"" endswith="eventtype=\"movie-closing\"" maxpause=-1 maxspan=-1 | eval duration = duration / 60 | timechart max(duration) by title usenull=f


        Prerequisites

        1. Create event type called “open-movie” for any events that match this search: “search = dvdplayer opening"
        2. Create event type called “movie-closing” for any events that match this search: “search = dvdplayer closing NOT audio“

        In English it is... “Find the dvdplayer opening or closing events, and get rid of the ones that have SQL Lite in them, because there are some errors happening (pipe to rex) to extract the title of the program from the filename (pipe to rex) to get more program titles because I have movies in two different directories (yeah, you can overload a field) - (then pipe it to “transaction”), define the transaction as beginning with the even type “open-movie” and ending with the eventtype “movie-closing”, setting the pause and span as “-1” so built in rules don’t get in the way. Transaction will create a duration (showing number of seconds), we’d better divide that by 60 so we can get it in “minute resolution”, and then (pipe to “timechart) to look at the maximum duration viewed by title. This way, we’ll know what movies are popular locally—even if they’re watched multiple times. (Breathe, you weren't supposed to repeat that whole paragraph in one breath!)

        Additionally, I created some reports that will allow the open source developers of Boxee to look at “Where the errors are coming from”. I extract some info from the events

        error OR failed OR severe | rex "ERROR: (?&lt;error_source&gt;[^\:]+)\:" | rex "ERROR: \[(?&lt;error_source&gt;[^\]]+)\]" | top limit=10 error_source


        In english it is, find errors (pipe to rex) to create a field called “error_source” (do it again because there are two types of errors in boxee), then (pipe) to a top graph by error source, and then save it to dashboard, but display as “TABLE”. Kinda handy so the devs can see that
        most of my errors come from some “CGUIBoxeeViewState” Objects.
        The SQLite errors are also quite annoying.

        Boxee Data on Twitter

        If you’re asking yourself “what's Twitter”, you are clearly not hip enough to be using “rex” or “transaction”. Assuming you already know what it is, I’ll bet you didn’t know Twitter has a search engine (They bought from Summize). Twitter Search indexes all “Tweets” and lets you retrieve results. Why.. Well if you don’t listen to what people are saying publicly, should should start! What are people saying about Splunk right now? See, that’s why Twitter is so valuable.. Not the “I’m sitting down to have Sabra with Amrit &amp; David” posts most people do).

        You can setup Boxee to “tweet” what you’re watching, and when you do - this happens:.

        A message like this is posted to Twitter: “jlarkins: watching Inherit the Halibut on Boxee” - about 1 hour ago.


        Pretty simple, and they’re all like that. Every message has the word “watching” followed by the title, followed by “on Boxee”. It also has a timestamp as well - which Splunk really likes. If we run a search on twitter and ask it for “watching * on boxee”, we should get nearly all of those messages. Notice in the upper right of the Twitter Search page, there’s a“feed for this query” link. If we run this search http://search.twitter.com/search.atom?q=watching+*+on+boxee we’ll get back an ATOM feed which is like RSS but technically better. (Follow me kids, this is going somewhere cool).

        The results of that search yield an Atom feed with XML for every Twitter message that looks like this:

        &lt;entry&gt;
        &lt;id&gt;tag:search.twitter.com,2005:1082935045&lt;/id&gt;
        &lt;published&gt;2008-12-28T22:50:05Z&lt;/published&gt;
        &lt;link type="text/html" rel="alternate" href="http://twitter.com/kiranboxee/statuses/1082935045&#8243;/&gt;
        &lt;title&gt;watching The Onion Movie on Boxee. check it out at "&gt;http://www.imdb.com/title/tt0392878&lt;/title&gt;
        &lt;content type="html"&gt;&amp;lt;b&amp;gt;watching&amp;lt;/b&amp;gt; The Onion Movie &amp;lt;b&amp;gt;on&amp;lt;/b&amp;gt; &amp;lt;b&amp;gt;Boxee&amp;lt;/b&amp;gt;. check it out at &amp;lt; a href=""&gt;http://www.imdb.com/title/tt0392878&#8243;&amp;gt;http://www.imdb.com/title/tt0392878&amp;lt;/a&amp;gt;&lt;/content&gt;
        &lt;updated&gt;2008-12-28T22:50:05Z&lt;/updated&gt;
        &lt;link type="image/png" rel="image" href="http://static.twitter.com/images/default_profile_normal.png"/ &gt;
        &lt;author&gt;
        &lt;name&gt;kiranboxee (kiranboxee)&lt;/name&gt;
        &lt;uri&gt;http://twitter.com/kiranboxee&lt;/uri&gt;

        &lt;/author&gt;

        &lt;/entry&gt;

        Look at all that data, there’s the “author’s name”, there’s a timestamp, there’s the Title of the movie as well.. Or rather there’s that “watching The Onion Movie on Boxee” message in there.

        Splunk Comes In Handy
        Indexing that stuff: Using Erik Swan’s “Web Page Monitor (webping)” application on SplunkBase, I’ve configured my Splunk server eat the output of this URL http://search.twitter.com/search.atom?q=watching+*+on+boxee . I have it setup to ping that URL every 300 seconds (5 min). Since Twitter search is only going to give me back about a page full of results, and those results change a lot, I decided every 5 minutes was fine — it turns out that might be too frequent—you’ll see why soon. I did have to configure props.conf to know where to break events (BREAK_ONLY_BEFORE=\&lt;entry\&gt;), but once I had that done, my XML/RSS events that show each Twitter post on movie viewing was indexed by splunk. If you didn’t know, we have a python search operator called “xmlkv” which will actually take those XML elements and turn them in to fields—for my purposes, I won’t be using that operator.

        Searching - If we run the search "source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" over a 7 day period we get way more than 50k results. Why, because we’re indexing a search engine, and there’s a chance we have a lot of duplicates in there (if I back off my ping time, I might have less).

        Sidebar: every Twitter message has a unique number &amp; URL for it. Look up there.. See “href” item in the "link" element - that’s it.

        Another Splunk search operator you probably didn’t know about is called “dedup” which will take search results and de-duplicate them based on the contents of a field. This search:

        source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" | dedup href

        Yields only 321 unique results in the past 7 days... That’s more like it!. By using some field extraction with multiline regex searching, we’re pulling out “username” and “title” and then graphing them.

        Boxee Rating Reporting

        In my Splunk server I have a "Boxee" dashboard, consisting of a few saved searches that reveal statistics about user activity gleaned from Twitter. Check in from time to time, and you may see more.

        Top programs viewed in past 7 days - via Twitter: source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" | dedup href | timechart count(title) by title useother=f usenull=f

        Top 10 Viewers in the past 7 days - via Twitter: source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" | dedup href | top limit=10 username

        If you hadn't figured out, I'm a pretty big fan of Splunk. Its just so darn useful versus alot of other tools that deal with IT data.

        So what did we learn (other than Wilde uses Twitter), ok seriously what did we learn:

        Splunk Search language commands

        1. Transaction
        2. Dedup
        3. Timechart count
        4. Timechart max
        5. Eval

        Splunk Applications:

        • Web Page Monitor (Webping)
          It appears, in my application of webping, I probably could backoff my ping time to like once an hour because I have a lot of dupes.

        Do something cool with Splunk. It causes you to read the docs, learn stuff you didn’t think you needed to know. Got questions, let me know - I'm happy to help.

        Disclaimer: In regards to what may appear as the viewing of copyrighted material, any and all names, characters, places, locations, locales, business establishments, organizations, associations, groups, entities, dominions, states, nations, governments, beliefs, circumstances, conditions, and events portrayed in this story, text, writing, symbol, image, or illustration are either fictitious or fictitiously used. Any resemblance to real or actual persons (living or dead) are pure coincidence. Any resemblance to real or actual character, characters, place, places, location, locations, locale, locales, business establishment, business establishments, organization, organizations, association, associations, group, groups, entity, entities, dominion, dominions, state, states, nation, nations, government, governments, belief, beliefs, circumstance, circumstances, condition, conditions, event, or events that exist, exists, existed, have existed, or will exist are pure coincidence. Any resemblance to reality is pure coincidence.

        Blogged with the Flock Browser
        ]]>
        Michael Wilde: It’s time for a Boxee-ing match with Splunk!http://blogs.splunk.com/thewilde/2008/12/29/its-time-for-a-boxee-ing-match-with-splunk-2/http://blogs.splunk.com/thewilde/2008/12/29/its-time-for-a-boxee-ing-match-with-splunk-2/Mon, 29 Dec 2008 07:32:54 +0000Michael WildeAnd now for something completely different! In working with some interesting data generated by Boxee media center software, I found that we could use Splunk as a "Ratings Reporting Engine". Additionally, as Boxee is open source, I thought it might be handy to give their developers realtime access to my log data as its being generated.

        http://upload.wikimedia.org/wikipedia/commons/0/03/Boxee.png
        Background: Boxee is a cool, open source, media center software package that runs on AppleTV, Linux, MacOS X &amp; windows (soon). Allows you to watch movies, internet video content, even Netflix. Boxee itself generates some interesting log data. Boxee also allows for a viewer to automatically send a message to Twitter when a program is being viewed.

        What could we do with this?:

        Using Boxee’s own Logs:

        • Detect errors so that the developers can see them live
        • Calculate viewing duration on a local Boxee instance and some other cool reports

        Using information in Twitter:

        • Create reports that show most watched shows and most active users

        View all of this live right now in my public Splunk server.


        Local Boxee Logs

        In my setup, I have Boxee running on AppleTV. Splunk is also running on AppleTV. Splunk monitors data and forwards its logs up to my public Splunk server over TCP. Send yours up if you want!. When I looked at the Boxee log data in Splunk there were a few events that piqued my interest

        When the “DVDplayer” program opens a file to be viewed, it records and event and the same goes when it closes a file. Hmm.. Makes me think that using Splunk’s “Transaction” search operator, I could tie them together, AND, calculate the duration of viewing. Smells kinda fun. How does that work?

        Here’s the search command that’ll make this one work:


        dvdplayer (opening OR "closing video") NOT SQLite | rex "Downloads\/Boxee\/(?&lt;title&gt;[^\/]+)\/" | rex "Movies\/(?&lt;title&gt;[^\.]+)\." | rex "file\/get\/(?&lt;title&gt;[^\.]+)\." | transaction startswith="eventtype=\"open-movie\"" endswith="eventtype=\"movie-closing\"" maxpause=-1 maxspan=-1 | eval duration = duration / 60 | timechart max(duration) by title usenull=f


        Prerequisites

        1. Create event type called “open-movie” for any events that match this search: “search = dvdplayer opening"
        2. Create event type called “movie-closing” for any events that match this search: “search = dvdplayer closing NOT audio“

        In English it is... “Find the dvdplayer opening or closing events, and get rid of the ones that have SQL Lite in them, because there are some errors happening (pipe to rex) to extract the title of the program from the filename (pipe to rex) to get more program titles because I have movies in two different directories (yeah, you can overload a field) - (then pipe it to “transaction”), define the transaction as beginning with the even type “open-movie” and ending with the eventtype “movie-closing”, setting the pause and span as “-1” so built in rules don’t get in the way. Transaction will create a duration (showing number of seconds), we’d better divide that by 60 so we can get it in “minute resolution”, and then (pipe to “timechart) to look at the maximum duration viewed by title. This way, we’ll know what movies are popular locally—even if they’re watched multiple times. (Breathe, you weren't supposed to repeat that whole paragraph in one breath!)

        Additionally, I created some reports that will allow the open source developers of Boxee to look at “Where the errors are coming from”. I extract some info from the events

        error OR failed OR severe | rex "ERROR: (?&lt;error_source&gt;[^\:]+)\:" | rex "ERROR: \[(?&lt;error_source&gt;[^\]]+)\]" | top limit=10 error_source


        In english it is, find errors (pipe to rex) to create a field called “error_source” (do it again because there are two types of errors in boxee), then (pipe) to a top graph by error source, and then save it to dashboard, but display as “TABLE”. Kinda handy so the devs can see that
        most of my errors come from some “CGUIBoxeeViewState” Objects.
        The SQLite errors are also quite annoying.

        Boxee Data on Twitter

        If you’re asking yourself “what's Twitter”, you are clearly not hip enough to be using “rex” or “transaction”. Assuming you already know what it is, I’ll bet you didn’t know Twitter has a search engine (They bought from Summize). Twitter Search indexes all “Tweets” and lets you retrieve results. Why.. Well if you don’t listen to what people are saying publicly, should should start! What are people saying about Splunk right now? See, that’s why Twitter is so valuable.. Not the “I’m sitting down to have Sabra with Amrit &amp; David” posts most people do).

        You can setup Boxee to “tweet” what you’re watching, and when you do - this happens:.

        A message like this is posted to Twitter: “jlarkins: watching Inherit the Halibut on Boxee” - about 1 hour ago.


        Pretty simple, and they’re all like that. Every message has the word “watching” followed by the title, followed by “on Boxee”. It also has a timestamp as well - which Splunk really likes. If we run a search on twitter and ask it for “watching * on boxee”, we should get nearly all of those messages. Notice in the upper right of the Twitter Search page, there’s a“feed for this query” link. If we run this search http://search.twitter.com/search.atom?q=watching+*+on+boxee we’ll get back an ATOM feed which is like RSS but technically better. (Follow me kids, this is going somewhere cool).

        The results of that search yield an Atom feed with XML for every Twitter message that looks like this:

        &lt;entry&gt;
        &lt;id&gt;tag:search.twitter.com,2005:1082935045&lt;/id&gt;
        &lt;published&gt;2008-12-28T22:50:05Z&lt;/published&gt;
        &lt;link type="text/html" rel="alternate" href="http://twitter.com/kiranboxee/statuses/1082935045&#8243;/&gt;
        &lt;title&gt;watching The Onion Movie on Boxee. check it out at "&gt;http://www.imdb.com/title/tt0392878&lt;/title&gt;
        &lt;content type="html"&gt;&amp;lt;b&amp;gt;watching&amp;lt;/b&amp;gt; The Onion Movie &amp;lt;b&amp;gt;on&amp;lt;/b&amp;gt; &amp;lt;b&amp;gt;Boxee&amp;lt;/b&amp;gt;. check it out at &amp;lt; a href=""&gt;http://www.imdb.com/title/tt0392878&#8243;&amp;gt;http://www.imdb.com/title/tt0392878&amp;lt;/a&amp;gt;&lt;/content&gt;
        &lt;updated&gt;2008-12-28T22:50:05Z&lt;/updated&gt;
        &lt;link type="image/png" rel="image" href="http://static.twitter.com/images/default_profile_normal.png"/ &gt;
        &lt;author&gt;
        &lt;name&gt;kiranboxee (kiranboxee)&lt;/name&gt;
        &lt;uri&gt;http://twitter.com/kiranboxee&lt;/uri&gt;

        &lt;/author&gt;

        &lt;/entry&gt;

        Look at all that data, there’s the “author’s name”, there’s a timestamp, there’s the Title of the movie as well.. Or rather there’s that “watching The Onion Movie on Boxee” message in there.

        Splunk Comes In Handy
        Indexing that stuff: Using Erik Swan’s “Web Page Monitor (webping)” application on SplunkBase, I’ve configured my Splunk server eat the output of this URL http://search.twitter.com/search.atom?q=watching+*+on+boxee . I have it setup to ping that URL every 300 seconds (5 min). Since Twitter search is only going to give me back about a page full of results, and those results change a lot, I decided every 5 minutes was fine — it turns out that might be too frequent—you’ll see why soon. I did have to configure props.conf to know where to break events (BREAK_ONLY_BEFORE=\&lt;entry\&gt;), but once I had that done, my XML/RSS events that show each Twitter post on movie viewing was indexed by splunk. If you didn’t know, we have a python search operator called “xmlkv” which will actually take those XML elements and turn them in to fields—for my purposes, I won’t be using that operator.

        Searching - If we run the search "source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" over a 7 day period we get way more than 50k results. Why, because we’re indexing a search engine, and there’s a chance we have a lot of duplicates in there (if I back off my ping time, I might have less).

        Sidebar: every Twitter message has a unique number &amp; URL for it. Look up there.. See “href” item in the "link" element - that’s it.

        Another Splunk search operator you probably didn’t know about is called “dedup” which will take search results and de-duplicate them based on the contents of a field. This search:

        source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" | dedup href

        Yields only 321 unique results in the past 7 days... That’s more like it!. By using some field extraction with multiline regex searching, we’re pulling out “username” and “title” and then graphing them.

        Boxee Rating Reporting

        In my Splunk server I have a "Boxee" dashboard, consisting of a few saved searches that reveal statistics about user activity gleaned from Twitter. Check in from time to time, and you may see more.

        Top programs viewed in past 7 days - via Twitter: source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" | dedup href | timechart count(title) by title useother=f usenull=f

        Top 10 Viewers in the past 7 days - via Twitter: source="http://search.twitter.com/search.atom?q=watching+*+on+boxee" | dedup href | top limit=10 username

        If you hadn't figured out, I'm a pretty big fan of Splunk. Its just so darn useful versus alot of other tools that deal with IT data.

        So what did we learn (other than Wilde uses Twitter), ok seriously what did we learn:

        Splunk Search language commands

        1. Transaction
        2. Dedup
        3. Timechart count
        4. Timechart max
        5. Eval

        Splunk Applications:

        • Web Page Monitor (Webping)
          It appears, in my application of webping, I probably could backoff my ping time to like once an hour because I have a lot of dupes.

        Do something cool with Splunk. It causes you to read the docs, learn stuff you didn’t think you needed to know. Got questions, let me know - I'm happy to help.

        Disclaimer: In regards to what may appear as the viewing of copyrighted material, any and all names, characters, places, locations, locales, business establishments, organizations, associations, groups, entities, dominions, states, nations, governments, beliefs, circumstances, conditions, and events portrayed in this story, text, writing, symbol, image, or illustration are either fictitious or fictitiously used. Any resemblance to real or actual persons (living or dead) are pure coincidence. Any resemblance to real or actual character, characters, place, places, location, locations, locale, locales, business establishment, business establishments, organization, organizations, association, associations, group, groups, entity, entities, dominion, dominions, state, states, nation, nations, government, governments, belief, beliefs, circumstance, circumstances, condition, conditions, event, or events that exist, exists, existed, have existed, or will exist are pure coincidence. Any resemblance to reality is pure coincidence.

        Blogged with the Flock Browser
        ]]>
        Johnathon Cervelli: Eat your fruithttp://blogs.splunk.com/johnathon/?p=3http://blogs.splunk.com/johnathon/?p=3Fri, 12 Dec 2008 23:50:33 +0000Johnathon CervelliWhat is this? Do you know how hard we in the morale department work to keep you happy? Our fingers bleed; have you seen a callous this big before? Only on that black pit you call a soul.

         

        We paid good money for that tasty goodness rotting away in the kitchen. Don’t pretend like you didn’t seem them there. Lots of fresh, organic, artisanal local fruit. Grown by professionals. Armies mobilized from Central America to come a pick them, risking life and limb. Delivered to mere feet from your lazy desk by hipsters on the backs of biofueled, trendy little scooters.

         

        And for what? So you can watch them attract flies. A vile waste that will not be tolerated!

         

        Unfortunately, the morale department cannot eat that many pears single-handed. Therefore, you will be further indulged, like a African despot bribed into a life of privileged seclusion in a villa outside London.  

         

        So bring your aviator shades and come to the south side. Where we will cut, gut and turn delicious pear goodness into tasty shots. It’s pear-on-gin-on-pear action that will make you happy, the pear happy, and the gin happy.

         

        On the South Side.

        ]]>
        Johnathon Cervelli: You can’t keep a good drink downhttp://blogs.splunk.com/johnathon/?p=5http://blogs.splunk.com/johnathon/?p=5Fri, 05 Dec 2008 21:33:27 +0000Johnathon CervelliSometimes you must be reminded by loss to appreciate what you have. Consider, for example, the tragic loss of liquor that afflicted us for 13 years. Makes the truancy of your Splunk bar staff seem like a mere bathroom break.

         

        But all bad things come to an end.

         

        Seventy five years ago the US repealed Prohibition, and tonight the South Side repeals ours. And rest assured, we’re doing it in style – Manhattan style. Ever wondered how to make the drink that self-describes as “perfect?” I’ll give you a hint: it gets more perfect the more you have.

         

        South Side at 5. A toast to the 21st.  

        ]]>
        Michael Baum: Splunk Live San Francisco. It’s about time.http://blogs.splunk.com/thebaum/2008/12/05/splunk-live-san-francisco-its-about-time/http://blogs.splunk.com/thebaum/2008/12/05/splunk-live-san-francisco-its-about-time/Fri, 05 Dec 2008 16:13:39 +0000Michael Baum

        Last night we hosted more than 100 people at our first ever Splunk Live in San Francisco. It was about time. In May 2007 we started our first series of Splunk Live events. We've traveled all around the world from Santa Clara, Los Angeles, Phoenix, San Diego, Dallas, Chicago, New York, Washington DC, Atlanta, London, Zurich, Singapore, Taipei, Shanghai, Bejing, Bangkok and Hong Kong. But never have we had an event in our own backyard. Congratulations to Steve Sommer and our Marketing Team for pulling it off.

        The event took place in our new offices at 2nd and Brannan Street.

        Little known fact that for the first two years at Splunk we actually never had an office of our own but squatted in the offices of venture capitalists and other start-up companies like Six Apart. Having a conference room called "BIG" where we can actually fit more than 100 people still takes some getting use to.

        The best part of course to every Splunk Live are the customer presentations. Last night we were honored to have three local customers show everyone how they are using IT Search.

        • Mashery, The leading provider of API management services enabling companies to easily leverage web services as a distribution channel, discussed how they use Splunk to power self-service reporting for their customers on activity within their hosted, cloud-based services.
        • Lawrence Livermore National Labs LLNL, a US Dept of Energy national lab talked about their Splunk deployments in multiple groups and data centers addressing a wide range of needs, from application availability to meeting FISMA security regulations. They drive a range of initiatives from high performance computing to nuclear weapons development to running particle accelerators.
        • Visa International- The world's largest retail electronic payments network, and one of the most recognized global financial services brands, will share how they use Splunk for network security monitoring and incident response.

        Stay tuned to our events page for more upcoming Splunk Live events next year. We plan to visit several cities each quarter and will likely be in your neighborhood at some point in the near future.





        ]]>
        Tom Donahoe: New Enterprise Management Applicationhttp://blogs.splunk.com/tom/?p=1http://blogs.splunk.com/tom/?p=1Thu, 04 Dec 2008 02:21:16 +0000Tom DonahoeWe've been ardently listening to your posts, emails, and requests around improved controls and visibility into forwarders, indexers, and overall connectivity of these Splunk resources. So, just before the Thanksgiving holiday the Splunk crew posted a new application to SplunkBase. The Splunk Enterprise Manager application reaches into the internal logs and pulls interesting artifacts from the Splunk infrastructure, especially around forwarder and indexer connectivity and data volume analysis. You'll like the visibility into the top ten forwarders by volume, indexer volumes, views by sourcetype, forwarders down in the last 24 hours, and on and on. Best of all, the app provides improved visibility to your Splunk enterprise infrastructure. Of course, we ran this application in our own environment to fine tune the reports and dashboards. Take a look and let us know what you think and go ahead and vote on it!

        Docs: Splunk Wiki

        App: Splunk Enterprise Manager

        ]]>
        Amrit Bath: Reloading the auth system via CLIhttp://blogs.splunk.com/amrit/2008/11/26/reloading-the-auth-system-via-cli/http://blogs.splunk.com/amrit/2008/11/26/reloading-the-auth-system-via-cli/Wed, 26 Nov 2008 19:26:20 +0000Amrit BathNote: Tina pointed out that this does not apply to the authorize.conf file. This will be fixed in an upcoming version of splunk.

        This comes up every once in a while on the support channel (EFnet/#splunk), so I guess that means I should do a blog post on it.

        If you're making changes to the authentication.conf file and want to reload Splunk's auth system without going through the web UI, you can use one of our internal functions to do it at the command line:

        $ splunk _internal rpc-auth '&lt;call name="syncAuth"&gt;&lt;params/&gt;&lt;/call&gt;'

        This fires off the same call that the UI would use to reload the auth system, so it functions identically. Note that this is an authenticated call, so you'll need to use one of the standard authentication methods (-auth, splunk login, or the SPLUNK_USERNAME/SPLUNK_PASSWORD env vars...).

        ]]>
        Greg Albrecht: Eating NetFlow with Splunk, Part 1http://blogs.splunk.com/greg/eating-netflow-with-splunk-part-1/http://blogs.splunk.com/greg/eating-netflow-with-splunk-part-1/Wed, 19 Nov 2008 23:42:43 +0000Greg AlbrechtIt's easy to eat network data using Splunk. In a recent seminar I demonstrated how quickly a network administrator could dig through NetFlow data to diagnose network problems using Splunk. Here I'll show you some steps for getting NetFlow (cflow, jflow, netstream, IPFIX, sflow) data into Splunk.

        For this setup we'll need the following:

        1. A Splunk installation on a *nix platform. You can download Splunk here.
        2. flow-tools.
        3. DJB's daemontools.
        4. A NetFlow source. This can be a Cisco or Juniper router, or a system running nProbe.

        Here are the detailed steps for setting up Splunk + NetFlow:

        Please note:

        • In these examples we're using FreeBSD 6.3 amd64, the commands shown may vary on your system.
        • Before running these commands make sure you've su'd to root.

        1. Download &amp; Install Splunk, flow-tools &amp; daemontools:

        # pkg_add "http://tinyurl.com/splunk3-4-fbsd63-amd64"
        # portinstall flow-tools
        # portinstall daemontools
        

        2. Configure flow-tools &amp; daemontools:

        # mkdir -p /var/service/flow-receive
        # cat &gt;/var/service/flow-receive/run&lt;&lt;EOF
        #!/bin/sh
        export FLOW_PIPE="/var/run/flow.pipe"
        if [ ! -p "\$FLOW_PIPE" ]; then
        mkfifo "\$FLOW_PIPE"
        fi
        /usr/local/bin/flow-receive 0/0/9800 | /usr/local/bin/flow-print -f 5 &gt; \$FLOW_PIPE
        EOF
        # chmod +x /var/service/flow-receive/run
        # echo "svcscan_enable=YES"&gt;&gt;/etc/rc.conf
        # /usr/local/src/rc.d/svscan.sh start
        # ln -s /var/service/flow-receive /service

        3. Configure Splunk:

        # echo &gt;&gt;/opt/splunk/etc/system/local/inputs.conf&lt;&lt;EOF
        [fifo:///var/run/flow.pipe]
        disabled = false
        sourcetype = netflow
        EOF
        # echo &gt;&gt;/opt/splunk/etc/system/local/props.conf&lt;&lt;EOF
        [netflow]
        AUTO_LINEMERGE = false
        SHOULD_LINEMERGE = false
        EOF
        # /opt/splunk/bin/splunk restart

        Now we're ready to start eating NetFlow data. In Part 2 I'll show you how to configure your network equipment to send this data, and some ways you can use this data within Splunk.

        ]]>
        Eric Garner: Splunk is _piping_ hot!http://blogs.splunk.com/maverick/2008/11/16/splunk-is-_piping_-hot/http://blogs.splunk.com/maverick/2008/11/16/splunk-is-_piping_-hot/Mon, 17 Nov 2008 03:55:36 +0000Eric GarnerThat's right! It's "on fire" folks! Hotter than the sun! Burning its way into the thoughts and minds and data centers across the world.

        Unfortunately, what I wanted to talk about today is not related to how hot Splunk is, but rather a very special and sometimes misunderstood character called "the pipe". For most of us tech geek types, the pipe is our friend. We use it all the time at the command-line to make efficient use of our tools and our time. For non-techie folks, it may be more mysterious or intimidating concept, so I felt it might be a good topic to discuss and demonstrate just what it is and how to use it in the Splunk search box.

        Also known as the vertical bar character, the pipe (|) allows you to create simple yet powerful ad-hoc Splunk searches. You might think of it as if it were an actual pipe where things flow into one end and then flow back out the opposite end. Within the context of Splunk searches, the "things" that flow in and out of the pipe are your IT events.

        For example, let's suppose that you searched for all events within your infrastructure that matched the word "error", regardless of hosts or sourcetypes or timerange. After typing the word "error" within the search box, like this...


        ...and searching across all time, you would get back some events containing the word error.

        But now let's say you wanted to create a report on how many events containing the word "error" were occurring on each host or server oer all time. To do that, you could use a pipe character within your search to instruct Splunk to take the resulting events from your initial search and treat it as if it were inputs into a second command. Then you could type the second command right after the pipe. In this case, the second command would be a "chart" command and you could specify parameters to count raw events and split them out by each host, like this:



        and your results page would switch over to Report mode and display as a graphical report, which would look like this (i.e. try this example on Splunk's live email demo (http://email.demo.splunk.com)

        Notice that the raw event counts for each host are displayed in descending order by default. Let's say you wanted to display it in ascending order instead. To do this, let's add one more pipe to this search and use the "sort" command to really demonstrate the piping concept.


        Upon executing this search, the result is the same graphical report, but this time notice how the report table below it has changed to display in ascending order now, which will look similar to this in the Splunk email demo:

        So this example uses two pipes. However, you could use more pipes if you needed to. Notice how the first part of this search is basically your initial search criteria, and then you pipe from left-to-right using search commands (and parameters) that essentially manipulate your event data as needed in a simple real time ad-hoc fashion. You can think of it in the following way:

        &lt;search_criteria&gt; | &lt;command1> &lt;params> | &lt;command2> &lt;params> | &lt;command3> &lt;params>

        The pipe character is simply the bridge between your initial search results and each step you take to further manipulate those results.

        As another quick example, consider this search:

        sourcetype=syslog  | rex field=_raw "(?P&lt;ip&gt;\d+\.\d+\.\d+\.\d+)" | transaction fields="ip" | search duration &gt; 100

        Here I am using several pipe characters, again going from left to right, to process the initial search on all syslog events into only those events that contain an ip address matching my regular expression and then grouping those matching events where the duration of the entire grouped transaction was greater than 100 seconds.

        In Splunk email demo, it will look similar to this:

        So now you are probably thinking to yourself, "This piping concept is great, but what kind of commands are there and how can I use them in my searches more effectively?"

        Well, one thing you could do is check out the search commands reference in our online docs, which explains the various search commands you can pipe into for creating the exact search or report you need. You could also just type the pipe (vertical bar) character at the end of your search string in the Splunk search box and the type-ahead feature will display all the search commands that are available.

        Hint: after you type the pipe, type the a space, and then a letter to see each set of commands that begin with each specific letter, like this:

        Once you find a command you want to learn more about, type out the command and you will see a nice in-line help page drop down below the search box containing instructions for using that command. For example, here is what you see after typing the pipe followed by the "chart" command:

        It contains information about the parameters you can use with the command and the acceptable formats, etc. Very useful, if you ask me. Try it out and see for yourself. And leave a comment and let me know what you think.

        Anyway, I hope this explanation of the pipe character makes sense.

        I know that that most of your tech heads understand how to use the pipe character at the command line, but I figured it might be worth my time to discuss it here to allow all you other, more non-technical folks, a chance to realize how to use the pipe in your searches and leverage it to create some very flexible and powerful Splunk searches.

        Thanks,
        -maverick

        ]]>
        Michael Wilde: Splunk Ninja - EVENTually I will be TYPEcasthttp://blogs.splunk.com/thewilde/2008/11/12/splunk-ninja-eventually-i-will-be-typecast/http://blogs.splunk.com/thewilde/2008/11/12/splunk-ninja-eventually-i-will-be-typecast/Wed, 12 Nov 2008 15:25:12 +0000Michael WildeWelcome to another episode of Splunk Ninja.  I received and email from a customer yesterday indicating they wanted a better way to deal with "noise" in their logs.  For this customer, filtering out events prior to them being indexed was not the answer - they need to retain every event, but not necessarily deal with them.

        It brought me to a component of Splunk's technology, that in my unscientfic survey, not too many customers use very often.  Event Types.  While you can read all about them in our documentation, I figured i'd give you my thoughts, explain them in terms that I myself can understand.  You'll see a few examples of how to locate and create event types using the "punct" field attached to every event.  Additionally we'll cover how cool the "typelearner", or "Discover Event Types" feature is.

        There's a lot you don't know about in your log data, and event types and the typelearner can help focus your vision in to your IT data.  Comments welcome as always.  T-shirts to all commenters!

        Update: Here's some advice from David Carasso, father of crawl, eventtypes, and lots of other cool learning technology at Splunk.

        1. 1. Consider tagging these boring eventtypes as "boring". and then filter results by "NOT eventtypetag=boring".
        2. 2. Finally, when making eventtypes, it's always a good idea to make the search as generic as possible, while still getting just the events you want.  if you can avoid sourcetypes, punctuation, and extracted fields, your eventtype is easier to share, in that you don't have to also share your props.conf, sourcetypes.conf, and transforms.conf, but maybe that 's a minor issue.

        Update:  According to Splunk lore, taken from the historical archives, safely guarded by the Knights of the Splunk Templar, David Carasso may in fact also be the father of "the search language, transaction search, sourcetype classifier, timestamping, multiline event splitting, and the phrase, "take the sh out of it"

        Blogged with the Flock Browser

        Tags: ,

        ]]>
        Michael Baum: Human and Machine Language Mashups at Splunk Live Zurich, Switzerlandhttp://blogs.splunk.com/thebaum/2008/11/06/human-and-machine-launguage-mashups-at-splunk-live-zurich-switzerland/http://blogs.splunk.com/thebaum/2008/11/06/human-and-machine-launguage-mashups-at-splunk-live-zurich-switzerland/Fri, 07 Nov 2008 03:54:03 +0000Michael BaumAt Splunk Live in Zurich this week an interesting discussion erupted about human and machine languages. Before I continue with the story, I want to thank everyone that attended the event. Despite the fact that Raffy Marty is a resident celebrity, this was our first formal customer and partner event in Switzerland. We had more than 50 people attend for several hours to talk about Splunk and data center management challenges. The event was co-hosted by T-Systems.

        Thank you Meno Schnapauff for your great presentation on how T-Systems and the Swiss National Railway are using Splunk!

        Other attendees included folks from Swisscom, Unicom Consulting, Rothschild Bank, Genossenschaft Migros, LeShop, Netcetera, Cablecom GmbH, TBK-Patent Munich, On Line Video 46, Skyguide, PostFinance and the Univestity of Fribourg. Brian Haynes, Tim Thorpe, Julie Duncan and Hash Basu-Choudhuri from our London office participated too.

        Now part of the reason I mention all these names (in addition to thanking folks) is to the point of this post. In the room we had an American (me), several native English speakers from different areas of England, Swiss German speakers from Switzerland and German speakers from Germany. What I noticed is how two people think they speak the same language but can't always understand each other. It turns out there are a lot of American (some West Coast) colloquialisms I use that my "queens English" counterparts don't understand. And of course most of the time I try to make a joke the Swiss and Germans just look at me like I'm from outer space even though if you asked them they'd say they speak fluent English. During the event the Swiss Germans had trouble understanding the Germans and the Germans had trouble understanding the Swiss Germans. The folks from the UK who spoke German didn't understand either the Swiss German or the German German although they all claim to speak German.

        What does all this have to do with IT you ask? Well it turns out that mashing up languages and attempting to understand each other even though we don't speak exactly the same language is one of the biggest problems we have in trying to understand our IT systems as well.

        "One of the questions posed at the event was how can I modify my system and application logging to some standard in order to follow what my systems are doing? Do we need a logging standard?"

        I have long been telling people that logging standards are a waste of time. IBM's Common Base Events (CBE) has been around for decades and has very little traction in the real world. Data Center Mark-up Language (DCML) was pushed by Opsware and lots of smart people. It got nowhere. Logs exist. Instrumentation exists. Our IT systems already have tremendous amounts of data. Trying to retrofit that data to some standard is impossible. Attempting to organize a multi-vendor logging standard will never happen. Getting developers to log consistently sounds great but I've never seen it done before.

        What we need is a mashup of machine languages and logging formats. That's exactly what IT Search is!

        Humans need to stop thinking about how we can format data to make it easier for machines to work with it. There is too much data. The real value is being about to work with massive amounts of data without any human intervention. This is exactly what Google does for the web. Sure you can reformat your HTML to get better search results. But even if you do nothing Google will index your site. You don't even have to tell Google to do it!

        I'm going to start sharing more of our experiences helping people see the connections that already exist in their logging data. While the connections are not always obvious to the naked eye and human linear thinking, machines are great at teasing out non-obvious relationships. This is perhaps the most compelling thing we work on at Splunk and continue to push the bleeding edge of what's possible.

        ]]>
        Michael Wilde: Got Salesforce, Got Mac.. need help. Here you go!http://blogs.splunk.com/thewilde/2008/10/31/got-salesforce-got-mac-need-help-here-you-go/http://blogs.splunk.com/thewilde/2008/10/31/got-salesforce-got-mac-need-help-here-you-go/Fri, 31 Oct 2008 15:39:45 +0000Michael WildeSince we've recently switched over to Salesforce.com, which I'm pretty satisfied with.  I've been searching for tools that help me interact with Salesforce.com via the software on my Mac, such as "Microsoft Entourage 2008&#8243; and my favorite application "Quicksilver"

        Simon Fell, over at PocketSoap.com has created a bunch of tools for the Mac user that help integrate Salesforce.com with the stuff you do locally on your Mac.

        Maildrop

        Maildrop is pretty cool, because it logs you in, and integrates with Entourage.  It adds a special menu that provides functionality for Notes, Events, Cases, Contacts, and Email.   Most importantly, this video will show you how to setup Maildrop, how to use it, and how Entourage can work with Salesforce.com




        Salesforce Plugin for Quicksilver

        Simon Fell's Salesforce.com Plugin for Quicksilver is also a pretty sweet add-on for Quicksilver. If you don't know how awesome Quicksilver is, watch my productivity video. This plugin allows search in Salesforce.com directly from Quicksilver and file upload to your documents folder! All with a few key clicks - per usual in Quicksilver. This video will show you how to setup the plugin for Quicksilver and a bit about how it works.



        The Successforce dudes that own the Twitter account also sent me a link to other Mac tools for Salesforce.com up on their site (where Maildrop is also featured)

        Blogged with the Flock Browser

        Tags:

        ]]>
        Michael Baum: Splunk Voted Fastest Growing Company in Silicon Valleyhttp://blogs.splunk.com/thebaum/2008/10/30/splunk-voted-fastest-growing-company-in-silicon-valley/http://blogs.splunk.com/thebaum/2008/10/30/splunk-voted-fastest-growing-company-in-silicon-valley/Fri, 31 Oct 2008 05:59:10 +0000Michael Baum

        I've just returned from the Deloitte Technology Fast 50 awards dinner where Splunk was selected as the fastest growing company in Silicon Valley. Delloite, Silicon Valley Bank, Korn Ferry International, Cornish &#038; Carey, Cooley Goward Kronish and adb Insurance Services were the sponsors of this year's competition and we thank them all for the award.

        I was joined at the awards dinner by my two co-founders Erik Swan and Rob Das. What a great ride it has been over the past four and a half years. The time has flown by so quickly and it seems like we still have so much more to do. But it was nice at least for one evening to take a breather and enjoy what we have accomplished.

        Since I graduated from college with a degree in computer science I have dreamed of creating a technology and a company that had the potential to achieve what Splunk has. Seems unreal that we are now here living that dream.

        The award ceremony was held at the Computer History Museum in MountainView, CA. What a cool place. When the Boston Computer Museum closed in 1999 the museum in Silicon Valley became the keeper of computer technology history. Wandering through the museum I spotted an exhibit on chess software competition and was reminded by one of the long job outputs hanging from the ceiling of my own chess playing Pascal program that performed a pretty good six level look ahead algorithm.

        But it was entering the hardware history wing that really sent me down memory lane.

        PDP8s, PDP11s, original IBM PC, Osborne, Apple Lisa, Apple IIc, Mac 128k, Compaq luggable, Apple Powerbook 170 and 230 with that cool ejectible enclosure that hooked up all your cables for you. Wow!

        I even saw an IBM 5100. Perhaps the most bizarre machine I ever programmed. It has a switch that moves the shared program and memory space from APL to Basic - two worlds that should never co-exist.

        When I was at IBM in Boca Raton I wrote an inventory management system on a 5120 the predecessor with a 9 inch screen!

        If you've never been to the museum you really should go. Take your kids. Show them the progress technology has made during your adult lifetime and let them dream about the next 25 years.

        Where else can you sit on the built in sofa of a Cray 1 supercomputer and see a PDP1 still working to play the world's first video game?

        Thanks to all the sponsors for hosting the event and selecting Splunk as the fastest growing company in Silicon Valley!

        The Award - Where's the cash?

        Splunk Founders - Erik, Michael, Rob

        How Many Can You Remember?

        PDP8

        PDP11

        Cray 1

        ]]>
        Andrea Longo: inputcsv to restrict a search by a list of field valueshttp://blogs.splunk.com/andrea/2008/10/24/inputcsv-to-restrict-a-search-by-a-list-of-field-values/http://blogs.splunk.com/andrea/2008/10/24/inputcsv-to-restrict-a-search-by-a-list-of-field-values/Fri, 24 Oct 2008 16:52:27 +0000Andrea LongoA customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It's documented as an internal search command here:

        http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv

        We are talking about promoting it to public, so while it says unsupported it does work. Here's how:

        I've got events from my webserver for my new domain and I want to see what real hits it's getting and not my own. They look like this:


        66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] "GET /category/admin/ HTTP/1.1&#8243; 200 5158 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

        And I've gotten some traffic already:


        $ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'
        count
        -----
        11424

        It's a standard format that was automatically recognized as sourcetype access_common, so the extracted field "clientip" is already there. I create a csv file containing the values I want to exclude like this:


        clientip
        xxx.xxx.xxx.xxx
        yyy.yyy.yyy.yyy
        zzz.zzz.zzz.zzz

        This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I'll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok.

        Now I can do this search:


        ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]'


        $ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count'
        count
        - -
        121

        and only get the ones that aren't from my network. This search also works from the UI as


        source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]

        ]]>
        Michael Wilde: All My Regex’s Live in Texashttp://blogs.splunk.com/thewilde/2008/10/22/all-my-regexs-live-in-texas/http://blogs.splunk.com/thewilde/2008/10/22/all-my-regexs-live-in-texas/Wed, 22 Oct 2008 22:50:55 +0000Michael WildePut down that O'Reilly book about RegEx, quit googling, and saddle up! Ninja's going Texas style today with a new video on Regular Expressions, or REGEX. Since Splunk is the ultimate swiss army knife for IT, or rather the "belt" in "blackbelt", I wanted to share with you how I learned about Regex and some powerful ways to use it in your Splunk server.

        I did have an O'Reilly book on Regex, and I have spent a great deal of time on the web looking up how to do regex. Still, I like the easy way, and since i'm a visual guy - to no surprise - I have found some great tools that help me. RegexBuddy by JGSoft and Reggy (free on Google Code). RegexBuddy will teach you Regex better than anything else, and Reggy is your shuriken.

        Using those tools to help me develop a proper RegEx, I can take what i've learned and apply it in Splunk. By no means is being a ninja required to use Splunk, any IT person worth their salt has some special tools and talents they employ to take software products like Splunk to the next level.



        This video will break it all down for you and should give you a few advanced ways to use Splunk that I'll bet you didn't know about.

        By the way, not only did I never think I'd live in Texas, how the heck did I end up parodying a song title by George Strait. If you don't get it. Listen to the song.




        Shout out to the ninjas at University of Texas, Austin who dig Splunk!. Splunk 'em Horns!






        Update: "@shadejinx" on Twitter asked.. "Can you extract multiple fields with the Rex format"?
        Answer: Of course you can.. guess how? Think for a bit (this is how i figured this out)... ... ... aha! just add another " | rex" at the end of that search. In the video above, this scenario is presented:

        Event:
           :: ... :   ...  :::::  ...7

        In the video example, i'd like to extract the DHCPACK (and other variations) and create a field called "DHCP_ACTION", so this search is ran:

        Search:

        source="/mnt/log/splunk-interop/2008-lv-messages" dhcpd via | rex "dhcpd:\s(?&lt;dhcp_action&gt;\S+)"
        But what if, in the same search I wanted to extract that final IP address, being the device by which we requested the IP address. Lets call it "dhcp_subnet_host" Easy, the Splunk search language works as you'd expect it to. Try this:

        Search:

        source="/mnt/log/splunk-interop/2008-lv-messages" dhcpd via | rex "dhcpd:\s(?&lt;dhcp_action&gt;\S+)" | rex "via (?&lt;dhcp_subnet_host&gt;\S+)"

        The result is, in the same search, I'm able to extract two fields, especially if i have some variance on where that subnet_host is. By doing it this way, I don't have to write the "mother of all regex's" to come up with the perfect match - just string searches together and you're ropin' cattle.. or log events.!

        Blogged with the Flock Browser
        ]]>
        Michael Baum: Splunk Lab in Asia Launches to Develop New IT Search Appshttp://blogs.splunk.com/thebaum/2008/10/21/splunk-lab-in-asia-launches-to-develop-new-it-search-apps/http://blogs.splunk.com/thebaum/2008/10/21/splunk-lab-in-asia-launches-to-develop-new-it-search-apps/Tue, 21 Oct 2008 20:17:28 +0000Michael Baum

        The last two weeks I've been traveling throughout Asia with our new partners at Systex and the Splunk Asia team. In Singapore, Hong Kong, China and Taiwan we met with government agency, high tech manufacturing, insurance, online gaming and managed service provider customers who told us how critical Splunk is to their IT organizations, especially as budgets get even tighter.

        Systex is now our master distributor covering Taiwan, China, Hong Kong, Singapore, Thailand and Malaysia. Systex is an amazing company fueled by Taiwanese entrepreneurship, creativity and innovation. The company is part distributor, part reseller, part system integrator and part independent software developer. The 2,900 Systex employees are led by CEO Hilo Chen and COO Frank Lin. Hilo did a stint at Yahoo! Asia before joining Systex as CEO. He is a very friendly, engaging and good nature executive who commands the passion of his team. Frank is detail oriented and intense and he has an ability to focus on what seems to be the impossible and get it done.

        I'm not used to people pushing faster than I do, but the Systex team are reminding me what start-up speed is all about.

        The Systex system integration and software business is fueled by more than 1,400 engineers with deep domain expertise in financial trading and banking systems, network security, database administration, storage, virtualization, disaster recovery, IT service management, telecommunications OSS/BSS, unified communications, business intelligence and more. This past week we unleashed the creativity of more than 400 of those engineers, product managers, sales personnel and business unit heads. We met at a three day kickoff event for the launch of a joint Splunk Lab designed to come up with new areas to apply IT Search and new Splunk Apps for a variety of use cases.

        It is our hope that our joint work together will result in lots of new Apps available for download by Splunk users all over the world.

        The event started Thursday with a press conference at the Westin in Taipei. We were joined at the press conference by more than three dozen press covering innovation in Asia. We discussed the design of the partnership, the Splunk Lab and some of the joint customers including Allianz Insurance, IAH Games, and The Malaysian Prime Minister's Office. Allianz is using Splunk to report on F5 Big IP load balancer activities. IAH is mining their online multi-player game events and logs for insight into user patterns and activities including market basket analysis across different game properties. The Malaysian PM's office uses Splunk to secure their email messaging system.

        The press asked some very good questions about various use cases and our strategy for accelerating activities in Asia with Systex. Richard Tang and Johnny Lin attended the event from Systex as well and provided a great overview of how the Splunk Lab is coming together and what kind of solutions Systex is creating around Splunk. Richard has been very patient with me and has taught me enough Mandarin to completely embarrass myself during my last few visits.

        On Friday 260 engineers and product managers attended an all day Splunk Boot Camp at the Systex UCOM training center in downtown Taipei. The day was divided into two three and a half hour sessions. Each session covered using, administering and deploying Splunk. There was a brief section on developing Splunk Apps including building of a network management application.

        One of the product managers commented to me at the end of the day, "My mind is broken on Splunk, there is so much you can do with it."

        Saturday's session was the Splunk Lab kickoff event and creative activity attended by 300 business unit heads, sales people, product managers and field sales engineers. I was amazed. We went from 8:30am to 6:30pm on a Saturday. The level of energy was unlike anything I'd ever experienced before. Taking the long trip back from Taipei by way of Tokyo, I am just in awe at how two organizations half a world a part have so tightly bonded in just six months. I'm very impressed by the Taiwanese work ethic and dedication.

        Kord Campbell, Splunk's Director of Developer/ISV program gave a great talk on developing Splunk Apps to start the working round tables. Each business unit (twelve in all) spent three hours coming up with ideas for Splunk in their unit including what Splunk Apps they were going to create and which customers they were targeting. The areas included

        • Financial Trading Platforms
        • Banking and ATM Systems
        • Database Serivces
        • Information and Security
        • Business Continuity and Disaster Recovery
        • Customer Service
        • Data Management &#038; Integration
        • Unified Communications
        • IT Service Management
        • Education &#038; Training

        Teams were judged on several factors including creativity, feasibility, significance to current business and target customer profiles.

        The winning team didn't use slides but instead acted out their presentation in a 15 minute skit. It was wild and reminded me of how dysfunctional most IT organizations are today. Not that we needed reminding :-)

        The Financial Services Business Unit was judged the winner. This team has developed market trading platform software in a joint venture with Reuters and explored using Splunk with their quotes and trading solutions and for market compliance. The first scenario involved monitoring TAIFEX, TWSE and OTC trades and examine patterns indicating potential fraudulent activities.

        The second scenario showed how IT Search can be applied to troubleshooting the electronic system including buy side, sell side, cash position, web interfaces, trading systems and risk management. Actors in the scenario ranged from investors, web infrastructure managers, dealer groups, trading managers, CRM users and back office personnel. The team called their solution "A Lighthouse in the Dark."

        Perhaps the most interesting integration of Splunk though was the mining of data from the web application platform to determine which features users tapped into and which ones they tried once but never went back to. By examining page views for new functions and correlating those with trade volume deltas the team can continuously monitor the revenue effects of application and site changes.

        The Splunk Lab launch has us thinking about how to get other people collaborating to build new applications for IT Search. We're planning to launch a public site soon that will allow domain experts from all over the world to work together and create great Splunk Apps. So we decided to take the elevator to the top floor of Taipei 101, the world's tallest building to look for more...


        Top Floor at Taipei 101


        View to the East of Taipei

        Press Conference


        Frank Lin, COO, Systex


        Me


        Robert Lau - Splunk &#038; Emy - Systex


        Hilo Chen, CEO, Systex


        UCOM Technical Training Center

        Kord Campbell - Splunk


        Splunk Lab Team Competition


        Winning financial services App


        A little bit of fun

        Taipei 101 - World's Tallest Building
        ]]>
        Johnathon Cervelli: Cocktail Default Swapshttp://blogs.splunk.com/johnathon/?p=13http://blogs.splunk.com/johnathon/?p=13Fri, 10 Oct 2008 23:47:07 +0000Johnathon CervelliWoe. Calamity. Bust. As your retirement account swoons and banks once mighty crumble to dust, you might start to wonder what to do at a time like this. Do you flee to cash? Bullion? Or do you reach deep into those pantalones and find your last bit of pocket change to plow into this bottom? (it is the bottom, right?)

         

        No. All of those involve risk. And require far too much effort.

         

        No, you drink you silly Splunker. And while we watch lower Manhattan sink into the Hudson, we will ask is that glass half full, or half empty. For half-empty is the only way I can imagine serving the Market Crash, a delicious and nutritious blend of brown booze. After all, when you mix red ink with black, brown is what you get.  

         

        5 on the South Side.

        ]]>
        Paul Pang: 什麼是 Splunk ?http://blogs.splunk.com/paul/?p=1http://blogs.splunk.com/paul/?p=1Fri, 10 Oct 2008 15:39:34 +0000Paul Pangsplunk !

        每當我一拿出我的 Macbook Pro, 我的朋友一看到上面的 Sticker, 就會問我: 什麼是 Splunk

        Splunk是專門設計給企業使用的IT搜尋引擎(IT Search Engine),打破過去傳統IT管理的方式,將雅虎與Google的搜尋技術與概念發揚光大,讓企業的資訊人員可以用簡單易懂的關鍵字(Keyword)搜尋方式,在Splunk軟體協助下來管理龐雜的IT系統。

        Splunk的軟體能自動收集由各種伺服器、網路設備和軟體產生的資料與日誌,並且具有計算(Computing)能力,企業IT管理者可以透過Splunk將搜尋所得的結果立即做運算處理,進而產生各種報告、圖表與警示;同時還可以設定Splunk進行排程定時搜尋,並將結果以電子郵件方式通知相關人員。有了 Splunk 之後,你不用再擔心浪費太多時間去解決及找尋各種的 IT 問題了。

        Be an IT superhero, go home earlier !!

        ]]>
        Michael Baum: Splunking Across the Pond. Welcome Brian Haynes VP EMEA.http://blogs.splunk.com/thebaum/2008/10/06/splunking-across-the-pond-welcome-brian-haynes-vp-emea/http://blogs.splunk.com/thebaum/2008/10/06/splunking-across-the-pond-welcome-brian-haynes-vp-emea/Tue, 07 Oct 2008 07:42:09 +0000Michael Baum

        It's kinda a funny story and although it seems so long ago it was just 18 months ago. I was traveling in Europe starting to talk with potential customers who had downloaded and installed Splunk (3.0 variety). My very first meeting was with a guy name Scott Davies VP of E-commerce Trading Platforms at Royal Bank of Scottland in London's Bishop Gate. I had the opening slide to our presentation up when Scott walked in the room. He was very polite, asked us if we wanted some still or sparkling water and wanted to know how our trip was progressing thus far. Finished with the pleasantries he than quipped, "I love your product, but when are you going to change your name."


        Seems "Splunk" didn't quite translate all that well in the UK. Although Colin Barker and Steven Arnold didn't seem to mind. Fast forward to October 2008 and here we are with more than 60 customers in Europe including several major banks, telecommunication providers and large enterprises. And now we have a big shot head of EMEA and an incredible team on the ground in London. Welcome Brian Haynes!

        I first met Brian about three months ago at the Berkeley Hotel in London. We hit it off immediately. Brian was incredibly excited about our free download model as he had experienced similar success with companies like Legato that initially followed a simlar model. The difference he said was, "Splunk really believes in fostering a global community of users around its product, something Legato never had." As our new Vice President Sales for EMEA, Brian will no doubt help us really accelerate our growth in the European market. He joins us at a great time. Last week we attended the IP 08 show and our booth was mobbed with folks anxious to learn how they can Splunk their infrastructures.

        As the global economy continues to crumble its amazing to see that we're able to keep bringing value to customers around the world and grow our user and customer base by helping IT organizations do a lot more with less. The notion of a single universal platform that breaks down the silos between operations, security and compliance will certainly continue to thrive.

        ]]>
        Andrea Longo: Enabling debug messageshttp://blogs.splunk.com/andrea/2008/09/22/enabling-debug-messages/http://blogs.splunk.com/andrea/2008/09/22/enabling-debug-messages/Mon, 22 Sep 2008 23:30:18 +0000Andrea LongoSplunk spits out an astounding number of its own internal log messages, some I've already described. This post is how to get more of them, in case you have spare disk space lying around and need something to fill it with. Or you have some problem with Splunk and need debug logs. Sometimes Support will ask for this to diagnose an issue.

        splunkd log messages go in the file splunkd.log. (Note that if you move the existing file out of the way, a fresh one is created on startup if you want to work with only the messages from the current run.) They are controlled by the log.cfg file located in /opt/splunk/etc, which specifies the log level of messages by category:

        rootCategory=WARN,A1
        category.LicenseManager=INFO
        category.TcpOutputProc=INFO
        category.TcpInputProc=INFO
        category.UDPInputProcessor=INFO

        Messages can be set to, in order of severity: DEBUG, INFO, WARN, FATAL, CRIT. Setting a log level gets you messages at that level and higher, so default settings are typically INFO or WARN. When you change something in this file, you need to restart Splunk for it to take effect. When you restart with the - debug flag, it uses a similar file, log-debug.cfg, with a different set of settings for DEBUG messages. Not everything is set to DEBUG, because some of the categories are very chatty.

        One of those is FileInputTracker, which even in log-debug.cfg is set to WARN. If you are having problems with data input from files, either indexing multiple times or not indexing at all, set this to DEBUG to get more about what is going on.

        Now there is another way to enable and disable messages other than changing the file and restarting. If you want to permanently change settings, or you need to test a script that manages starting and stopping Splunk, you'll want to use these files. But you can also turn loglevels for categories off and on with a specially constructed search:

        | oldsearch !++cmd++::logchange !++param1++::root !++param2++::DEBUG

        This is the seach used for 3.3.x, for 3.2 and before remove the "| oldsearch" part. Yes, that is really the pipe, or vertical bar, character there. (And you will get the message "Search Execute failed because Setting root priority" when the search completes.) You can change any category to any loglevel with this, using the category name for the param1 value and the loglevel for param2. "root" is a special keyword for all messages, otherwise use the correct category name like "LicenseManager". log.cfg is not changed, and on restart you will revert to the configured settings.

        One clever thing you can do with this is set up a scheduled saved search to turn on debugging only when you want it. If you have some problem that you know happens around midnight, you can set up one search to turn it on (set it to DEBUG) and off (return it back to WARN or INFO or whatever.)

        splunkweb messages are controlled by a different mechanism, the SplunkWeb.tac file. If your problem is specifically with splunkweb, such as debugging LDAP settings in the UI, turn on these additional messages. You do need to restart splunkweb, but this can be done with "splunk restart splunkweb" rather than restarting splunkd along with it on a normal restart.

        Change this line:

        # set global logging level
        appLoggingLevel = logging.INFO

        To this:

        # set global logging level
        appLoggingLevel = logging.DEBUG

        The additional messages are output in $SPLUNK_HOME/var/log/splunk/web_service.log file.

        ]]>
        Michael Baum: Splunking VMware virtualization at VMworldhttp://blogs.splunk.com/thebaum/2008/09/19/splunking-vmware-virtualization-at-vmworld/http://blogs.splunk.com/thebaum/2008/09/19/splunking-vmware-virtualization-at-vmworld/Fri, 19 Sep 2008 15:00:51 +0000Michael BaumThis week things were rocking and we were splunking at VMworld. VMware launched their road map for their Virtual Data Center Operating System (VDC-OS). VDC-OS is VMware's vision to aggregate virtualized servers, storage and network resources into a common platform that manages resources for guest operating systems and applications. And we launched Splunk for VMware. It's an application build on top of Splunk that gathers data from from different levels of the VMware virtual stack including the hypervisor configuration, metrics and events, the host operating system, underlying network and guest OS and applications. The application also gives you predefined searches, alerts and reports to troubleshoot and secure your VMware environment. It's free and you can download it here.

        VMware VDC and Splunk for VMware

        VDC-OS represents a big leap forward in managing the complexity virtualization hoists upon us. Finally vendors like VMware and Microsoft (will soon ship their own System Center Virtual Machine Manager) admit managing complex combinations of virtual resources is difficult and important. This is great for monitoring the hypervisor and virtual guest sessions, but what about the resident guest operating systems or applications? Its still impossible to correlate activity and performance at an application level with resource utilization and performance down to the bare metal

        While these vendors are focused on deploying and tracking the resources themselves, Splunk focuses on providing visibility into the complex interactions and dependencies within a virtual infrastructure. Splunk finds, collects and persists the otherwise perishable log, event and configuration data from dynamic virtual instances as they come and go. Splunk correlates data across tiers in the virtual stack - both inside and outside the hypervisor and guests including the physical servers, hypervisor, VMs, and deployed applications,.

        When you point your web browser to the Splunk for VMware application you'll notice several dashboards already created.

        • VM Metrics Dashboard - a view of the last hour's memory and CPU utilization across all running VMs so you can pinpoint hot spots.
        • VM Status Dashboard - current configuration, available storage and other key status indicators from different tiers including hypervisor; access &#038; weblogic logs from deployed applications within the guest OS; perfmon, ps and top from the guest OS's.
        • VM Searches Dashboard - all searches, alerts and reports included with Splunk for VMWare.

        You'll see on the searches dashboard a number of investigation searches that correlate the VMWare API data with OS data from within the guests to perform complex investigations in a single step. This dashboard also shows you the details of predefined alerts like looking for guests with heartbeats, looking for storage capacity problems, and other common issues.

        As concepts like VMware's VDC-OS become reality (some time in 2009 according to VMware) having the ability to trace transactions through a virtual infrastructure will become even more important. Every layer of management and abstraction (and yes that's what virtualization is) means more complexity to manage. Just as with previous VMware products, VDC-OS will not manage physical hardware that has not been virtualized. And understanding how the virtual infrastructure is interacting with non-virtualized servers, storage and networks will remain a critical requirement.

        Check out Splunk for VMware and let us know what you think and how we can continue to build on it together.

        ]]>
        Michael Baum: Splunk in the fast lane. Welcome Godfrey!http://blogs.splunk.com/thebaum/2008/09/12/splunk-in-the-fast-lane-welcome-godfrey/http://blogs.splunk.com/thebaum/2008/09/12/splunk-in-the-fast-lane-welcome-godfrey/Sat, 13 Sep 2008 04:04:37 +0000Michael BaumThings are moving pretty fast at Splunk and I wanted to comment on the exciting news we announced last week.

        In 2004, myself, Erik Swan and Rob Das started Splunk with a vision to battle IT complexity by embracing it. We were thinking of things a bit differently. A different way to address the management of IT by applying search to millions of data center artifacts. Traditionally these artifacts were summarized, filtered and reduced and then forgotten - leaving us humans in a pickle when we needed to figure out what's really going on. For us Splunk was also about a different way to interact with the market taking an approach of utter transparency. Our public product road maps, freely downloadable software and straightforward marketing had even our early stage venture capital investors thinking we were crazy.

        By start-up standards, we seem to have succeeded. Splunk now has more than 250,000 user downloads, more than 750 enterprises, service providers and government agencies worldwide as paying customers and a growing list of partners who embed Splunk into their software, hardware and managed services including companies like Cisco and British Telecom. According to my venture capital friends, very few start-ups make it to where we are today. But, fueled by a love for innovation and so many passionate users we've challenged ourselves to see beyond achieving success as a start-up. We believe Splunk can be a company that gets the IT industry thinking differently.

        Creating change isn't easy and we'll need all the help we can get. Fortunately, we've been blessed with an ability to attract top talent at all levels. But our most recent success tops them all. Godfrey Sullivan has joined us as our new President and CEO. When you meet him you'll realize the incredible passion he has for building great companies. Most recently he was President and CEO of Hyperion Solutions. He took Hyperion over a period of six years to $1B in revenues. Hyperion was acquired by Oracle in 2007 for $3.3B. Godfrey also serves on the board of directors of Citrix Systems, Inc., and Informatica Corporation. Just as important as his business and leadership abilities, Godfrey has the cultural DNA that fits right in at Splunk.

        Here's the yin and yang that is Godfrey. He owns one of only 4,038 1994-1997 Ford GTs. Now this thing is fast, really fast.

        • 0–60 mph (0–96 km/h): 3.3 seconds
        • 0–100 mph (0–160 km/h): 7.3 seconds
        • Standing 1/4 mile: 11.2 seconds @ 134.2 mph
        • Top speed: 212 [11]

        And his other car is a Toyota Prius. Enough said.

        Godfrey couldn't join us at a better time. We're scaling all aspects of the business and need the leadership of someone who's been through this type of explosive growth before. For me personally, it's pretty cool to work beside someone of his experience, talent and steady as she goes outlook on life.

        And I get to continue to do what I do - build things. I'm now leading the team building our partner ecosystem working with Developers, MSPs, Resellers, Technology Partners and System Integrators around the world.

        Of course this hyper growth wouldn't be possible without your passion and support. Thank you all for that.

        Happy Splunking!

        ]]>
        Johnathon Cervelli: The tall guy against the wallhttp://blogs.splunk.com/johnathon/?p=12http://blogs.splunk.com/johnathon/?p=12Fri, 12 Sep 2008 23:35:12 +0000Johnathon CervelliFor a nice sunny summer week, far too many of us have succumbed to illness. Clearly the move, sprinting and attendant stress has been too much for some Splunkers. We salute their sacrifice to the greater good. Those who still survive should take all due and proper precautions to ensure their continued health. For that no tonic is better than the (in)famous Harvey Wallbanger.

         

        Bringing together the restorative powers of orange juice, ancient Italian herbs and wholesome grain liquor, the Harvey Wallbanger provides all the nutrition the body needs to ward off sickness and scope creep. That it sounds like your creepy uncle also helps add extra tre chic that PBR sipping hipsters adore. This ain’t your sister’s screwdriver – this is bona fide old school.

         

        So come get the cure for what ails you down on the south side after five. As a special bonus, I’ll explain the subject line and other dirty names for OJ based beverages that they only teach in Sunday school.

        ]]>
        Christina Noren: Jira users’ group Thursday September 18http://blogs.splunk.com/cfrln/2008/09/11/jira-users-group-thursday-september-18/http://blogs.splunk.com/cfrln/2008/09/11/jira-users-group-thursday-september-18/Fri, 12 Sep 2008 02:54:31 +0000Christina NorenBoth Dave Pickering from New Aspects and I will be at the Atlassian Jira users' group in San Francisco next Thursday September 18, for those of you who've been following what we're doing with Jira to automate product management for an agile dev organization. Looks like a lot of great Bay Area companies are going to be there.

        And we really, really, are just about ready to publish the extensions and workflows we've done.

        Details and registration.

        ]]>
        Michael Wilde: Splunkin at Amazon Start-Uphttp://blogs.splunk.com/thewilde/2008/09/11/splunkin-at-amazon-start-up/http://blogs.splunk.com/thewilde/2008/09/11/splunkin-at-amazon-start-up/Thu, 11 Sep 2008 19:01:32 +0000Michael WildeToday, http://splunk.tv is live at Amazon Start-Up at the Austin Music Hall.  Tune in, the SplunkNinja will be talking about what we've been doing with Amazon's Web Services in a number of capacities.  This will be recorded, so if you can't make it - tune in later.  3:10 PM CST.

        Update:  The recorded video from yesterday's presentation at Amazon Startup is here:

        http://www.ustream.tv/recorded/704929

        Note:  There's about 13 minutes of delay... sorry, so fast forward to about 13:30 and you're good

        Blogged with the Flock Browser

        Tags:

        ]]>
        David Carasso: 3D Photosynth of New Splunk Officehttp://blogs.splunk.com/david/2008/09/09/3d-photosynth-of-new-splunk-office/http://blogs.splunk.com/david/2008/09/09/3d-photosynth-of-new-splunk-office/Wed, 10 Sep 2008 05:59:24 +0000David CarassoI made a photosynth of the new Splunk office in SF, which automatically linked 104 photos in 3D space. It mostly worked.

        Hit the "play" button, sit back, and have a tour of the Splunk office. Click the button with 3 dots on it to jump to the next 3D space.

        ]]>
        Johnathon Cervelli: It’s too hothttp://blogs.splunk.com/johnathon/?p=11http://blogs.splunk.com/johnathon/?p=11Sat, 06 Sep 2008 00:10:33 +0000Johnathon CervelliAfter last week’s little sojourn to the desert, many of you have expressed thoughtful concern for my well being. After all, even a many-talented drinker like myself might be challenged by:

         

        1. Riding a bike
        2. Avoiding 50,000 dirty hippies
        3. Avoiding Matt
        4. Maintaining a satisfactory blood alcohol content

         

        ...especially when one must do all of these things at the same time, all day, every day for a whole week. What technology makes this possible? Surely John isn’t mixing patchouli flavored, rose colored martinis.  

         

        Indeed not.

         

        May I present you with a useful little concoction, should you find yourself wandering the Sahara with the cast of Ab Fab. Playa Sangria. It’s quick, it’s easy, it’s cheap, it’s tasty and you can use it to wash down a hippie. If you don’t mind them being a bit sticky afterwards. And since it’s hotter here than it was in the middle of the Nevada desert, a little sounds delish.

         

        On the South Side, starting now.

        ]]>
        Andrea Longo: Index ICU: Assertion `_sourceMetaData != __null’ failed, part 1http://blogs.splunk.com/andrea/2008/09/03/index-icu-assertion-_sourcemetadata-__null-failed-part-1/http://blogs.splunk.com/andrea/2008/09/03/index-icu-assertion-_sourcemetadata-__null-failed-part-1/Wed, 03 Sep 2008 16:59:47 +0000Andrea LongoThere you were, merrily going along and Boom! Somebody kicks the power switch, your filesystem goes off the deep end, something Very Bad happens. You start to understand why fsck is a four-letter word. After using some additional four-words, you get things up and running. But what's with Splunk? It won't start!? You only get some cryptic error and "Splunkd appears too be down." Welcome to the world of WordData. You had a backup, right? Yeah, thought so.

        Buried deep in the index are a bunch of *.data files:

        www.feorlen.org[feorlen]:/Applications/splunk/var/lib/splunk/defaultdb/db$ ls -lr *.data
        -rw-r - r - 1 root admin 10276 Sep 3 07:41 Sources.data
        -rw-r - r - 1 root admin 5085 Sep 3 07:41 SourceTypes.data
        -rw-r - r - 1 root admin 252 Sep 3 07:41 Hosts.data
        -rw-r - r - 1 root admin 21 Jul 26 19:19 EventTypes.data

        You will find them in every bucket, they contain event counts for sources, sources, hosts and event types along with some timerange info. During indexing, these are constantly being updated. They are supposed to look something like this (note my timestamping oops there for host::grumpy):

        $ more Hosts.data
        0 0 2147483647 0 0
        1 host::grumpy 11194556 900458000 1231448496 1220453014
        2 host::www 1953184 1194131619 1220452994 1220452994
        3 host::www.feorlen.org 2350 1207761050 1216665145 1216665145
        4 host::localhost 7482 1203904810 1217973661 1217973661

        Except when they look like this:

        $ more Hosts.data
        ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
        ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
        ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
        ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
        ^@^@^@^@^@^@^@^@^@^@^@
        Hosts.data (END)

        That isn't very good. splunkd doesn't much like it when somebody messes with it's *.data files. There are also supposed to be at minimum Sources.data, SourceTypes.data, and Hosts.data. (EventTypes.data may legitimately not be there in some cases.) Your crash log will likely contain something like this:

        Backtrace:
        [0x00002B51C8EEFB6E] abort + 270 (/lib/libc.so.6)
        [0x00002B51C8EE8266] __assert_fail + 246 (/lib/libc.so.6)
        [0x000000000066661D] ? (splunkd)
        [0x0000000000697BA6] _ZN23DatabasePartitionPolicy20getSourceWordForCodeEmmR3Str + 182 (splunkd)

        and here is the real smoking gun in splunkd_stderr.log:

        splunkd: /opt/splunk/p4/splunk/branches/3.2/src/pipeline/indexer/TimeInvertedIndex.cpp:974: void TimeInvertedIndex::getSourceWordForCode(long unsigned int, Str&#038;): Assertion `_sourceMetaData != __null' failed.

        Ok, so you've got a horked *.data file. Where? Well, based on frequency of writes, it's going to be in a db-hot directory because that is where active indexing is going on. And the most active indexes are usually fishbucket, _internal and defaultdb. Start by looking for *.data files that are binary. Here's one way you can find which files are binary, a big clue on where the problem is:

        $ cd /opt/splunk/var/lib/splunk
        $ find . -name *.data | xargs grep "." % | grep Binary
        grep: %: No such file or directory
        Binary file ./_internaldb/db/db-hot/Hosts.data matches
        Binary file ./_internaldb/db/db-hot/Sources.data matches
        Binary file ./_internaldb/db/db-hot/SourceTypes.data matches
        Binary file ./fishbucket/db/db-hot/Sources.data matches

        file will do it also, but beware false positives:

        $ for i in `find . -name *.data`; do file $i | grep -v text ;done
        ./_internaldb/db/db-hot/Hosts.data: data
        ./_internaldb/db/db-hot/Sources.data: data
        ./_internaldb/db/db-hot/SourceTypes.data: data
        ./defaultdb/db/db_1214955936_1210836930_38/Hosts.data: Bio-Rad .PIC Image File 2352 x 12297, 14601 images in file

        Another check is to see if the line numbers in the file are in ascending order. If they aren't, then something is seriously wrong:

        for i in `find . -name *.data`; do sort -nc $i;done

        Have a look at these files and see what's in them. If they are only partially corrupted, you may be able to edit out the garbage. If they are totally full of junk, you will need to find replacements. For _internaldb and fishbucket, you may not care if your event counts are exactly correct so you can lift some files from another bucket. If the problem were in defaultdb or another index containing your real indexed data, you'll need to pay more attention to the contents.

        In the simple case, if the files in db-hot are trashed, see if there is a warm bucket next to it you can copy some from. Warm buckets are in the same directory as db-hot and look something like db_1218802821_1218658318_17. Copy the *.data files from there into db-hot and try to restart Splunk. If it does, then you are good to go. If not, that means there is more damage to repair. If there are other binary *.data files, make sure you deal with all of them.

        This should handle the most common types of problems. I'll go into more detailed debugging and reconstruction in another post.

        ]]>
        Michael Baum: Life after SIEM. Situational Awareness is next.http://blogs.splunk.com/thebaum/2008/09/03/situational-awareness/http://blogs.splunk.com/thebaum/2008/09/03/situational-awareness/Wed, 03 Sep 2008 08:01:23 +0000Michael BaumWe've been hearing a lot lately about the death of SIEM technologies. But isn't the question less about a legacy technology dying and more about the dimensions on which the next mass adopted security capability will be born? Clayton Christensen first described a model for disruptive technology in his book The Innovator's Dilemma and his follow on The Innovator's Solution. Christensen describes a theory about how disruptive technologies over take sustaining technologies by delivering value on new dimensions that established vendors overlook as unimportant, low end or just don't think about because they're too busy improving their legacy. Christensen's work offers an interest framework to think about what's taking place in the market for SIEM security management solutions.

        Any enterprise trying to secure their IT infrastructures knows the state of the art in SIEM security approaches falls short. And trends like virtualization are making things even more difficult. System and security administrators and analysts are inundated with too many potential incidents and its too difficult and time consuming to investigate even a fraction of them. Achieving a greater comprehension of the meaning of potential incidents and the projection of their status in the near future is the real goal. The idea, called "situational awareness" is often, however, impossible to achieve. We are so dependent on pre-programed rules in our SIEM solutions that we lack the ability to perform our own analysis because the original raw data has been filtered out, thrown away or we have no practical way to make sense of it.

        Observation: If the technology is sufficiently complex as to allow the vulnerability to exist, can we really build complex technology to catch all the possible issues or scenarios?

        As a reference point see David Hazekamp, Security Architect at Motorola, talk about the importance of retaining all security data across the Motorola global SOC infrastructure and integrating access to all this data into existing SIEM solutions.

        Of course reaching this understanding requires one suspends their disbelief about the effectiveness of current SIEM security technologies. Usually this means you're not a vendor or you're a vendor with little or no vested interest in current approaches. So with this let's examine the typical enterprise deployment of security technologies.

        Defense in Depth

        This is where every good enterprise security architecture starts. In order to begin securing your environment you've got to have data, raw data. In most data centers this takes the form of syslog from network devices and servers, SNMP traps, OPSEC or LEA interfaces for firewall events, WMI for Windows desktop and server events, IDS and IPS signature scans and application level firewall examination of common services like FTP, HTTP, SFTP, SCP etc. The thinking is you need to look at everything. Perhaps you'll even want to pull in information from physical security systems like badge readers.

        Security Information Management (SIM)

        The next step in the process is to manage all this raw data and filter it down to a manageable number of events, traps and alerts. Collecting, storing and providing some basic analysis on all this data is the job of a SIM. Typically, as Raffy points out, the data is parsed, normalized and stored in a structured RDBMS. Parsing, normalizing and structuring all this data is great if the data doesn't change or you don't have too much of it. But if you're dealing with data formats that aren't static or you're trying to store terabytes of this data an RDBMS won't be your friend.

        Security Event Management (SEM)

        Once a SIM has done it's job you're ready to aggregate, correlate and start reporting on potential incidents using a SEM to do the job. SEM's usually consist of lots of rules that look for combination and patterns of events indicating that a possible attack or breach may be underway. Essentially the SEM rules attempt to codify what we humans know about vulnerabilities in our IT systems and possible ways to exploit them. The goal is to provide some real-time information usually in the form of reports, dashboards and visualizations to operations and security analysts who work to keep the infrastructure secure.

        Situational Awareness (SA)

        SIEM correlation can be interesting for discovering a pattern or related event but the ability to work an issue outside of these "canned" rules and events becomes the real problem. Unfortunately, what all to often happens is there are so many possible attacks, operations and security staff are overwhelmed with potential incidents to investigate and not every event or pattern of interest is going to be discovered via the pre-built rules. Situational awareness is the attempt to perceive environmental elements within a volume of space and time. Comprehension cannot be achieved if the data being bubbled up is filtered according to a set of rules and the technology does not allow a human to perform their own analysis of the raw data as generated by the environment itself. All technologies have their weaknesses and those that perform correlation are no different.

        Thus whilst canned SIEM correlation provides value in bubbling things up — we still need the ability to dig into the raw data to fully perceive and comprehend what is taking place. Now mind us all SA is not a new concept. It has been applied rather robustly by decision-makers in complex, dynamic areas from aviation, air traffic control, power plant operations, military command and control — to more ordinary but nevertheless complex tasks such as driving an automobile or motorcycle. And yes it has been mentioned before in security operations, particularly in government agencies.

        Situational awareness is a simple as, "I discovered a problem and need context." Whether discovery comes from a operational log, a security event log, a SIEM correlated events or aggregated events, a telephone call or something read on a blog. The ability to access and quickly analyze the raw data from the far reaches of your IT environment is the only true path to situational awareness. The idea extends well beyond log and event management and is an enabler for Operations and Security best practices alike where questions are answered by attaining context around an event. It should not be limited by the structure of the data or the structure of the queries and reports that the vendor provided.

        I'm not sure if Raffy is right and SIEM is dead yet, but for certain it will eventually become just one part of a more comprehensive, flexible and human enabled ways of securing our IT infrastructures.

        ]]>
        Michael Wilde: Caught on tape! Splunk Ninja vs. Sciencelogic Special Forceshttp://blogs.splunk.com/thewilde/2008/09/02/caught-on-tape-splunk-ninja-vs-sciencelogic-special-forces/http://blogs.splunk.com/thewilde/2008/09/02/caught-on-tape-splunk-ninja-vs-sciencelogic-special-forces/Tue, 02 Sep 2008 16:48:06 +0000Michael WildeA few weeks ago, Louis DiMeglio and I did a "quasi-podcast-ish" Q and A session discussing experiences at this year's Interop shows (Las Vegas in May, and the upcoming New York show in September).   This session is over on Sciencelogic's blog, check it out - we tried really hard to edit the audio well - who knows we may have to turn this in to a frequent podcast.

        Louis DiMeglio heads up the Sales Engineering team at Sciencelogic.

        EM7, their flagship product is a pretty cool all-in-one integrated management appliance that that works hand-in-hand with Splunk live at Interop. Come check out the NOC we're building for the September 2008 version of Interop.  It is THE largest IT tradeshow on the planet, the one you're likely to find people who know what they're talking about, and a crew of "rag-tag" vendors that get together,

        build a real NOC with all sorts of different products and actually make it all work.

        Interestingly, we NOC volunteers go through the same challenges that IT guys who do real work deal with every day.  I look it at 
        as sort of a reality-check for vendors of IT products, and a really neat 
        experience to geek out for about a week twice a year on a real production network.

        Yeah, catchy post title, I know.  Other than the new deodorant I'm trying this week, I had to do something to lure the millions of readers in to hear the golden voice of the Splunk Ninja.

        And yes, I do wear Heely's shoes when I go to tradeshows, and no, I'm not 40, yet...

        Listen to the podcast!

        Blogged with the Flock Browser

        Tags: , , ,

        ]]>
        Karandeep Bains: first!http://blogs.splunk.com/deep/?p=1http://blogs.splunk.com/deep/?p=1Mon, 01 Sep 2008 03:10:44 +0000Karandeep Bainshello world!

        ]]>
        David Carasso: Write your own search languagehttp://blogs.splunk.com/david/2008/08/29/write-your-own-search-language/http://blogs.splunk.com/david/2008/08/29/write-your-own-search-language/Fri, 29 Aug 2008 23:16:26 +0000David CarassoSplunk provides many power search commands - such as sort, fields, transactions - but even better, it allows you to expand things anyway you want, by writing your own search commands.

        I'll show you how to write your own search command.

        Suppose you want to make a new “shape” command in python that returns the shape of an event - tall, short, thin, wide, etc.  There are just three simple steps:

        • Step 1) Tell splunk about this external command in  commands.conf...
        [shape]
        filename = shape.py
        • Step 2) Authorize users to run this command in authorize.conf...
        [capability::run_script_shape]
        [role_User]
        run_script_shape = enabled
        • Step 3) Write the code!  Here is shape.py...
           import splunk.Intersplunk 
        
           def getShape(text):
                description = []
                linecount = text.count("\n") + 1
                if linecount &gt; 10:
                    description.append("tall")
                elif linecount &gt; 1:
                    description.append("short")
                avglinelen = len(text) / linecount
                if avglinelen &gt; 500:
                    description.append("very_wide")
                elif avglinelen &gt; 200:
                    description.append("wide")
                elif avglinelen &lt; 80:
                    description.append("thin")
                if text.find("\n ") &gt;= 0 or text.find("\n\t") &gt;= 0:
                    description.append("indented")
                if len(description) == 0:
                    return "normal"
                return "_".join(description)            
        
           # get the previous search results
           results,unused1,unused2 = splunk.Intersplunk.getOrganizedResults()
           # for each results, add a 'shape' attribute, calculated from the raw event text
           for result in results:
                result["shape"] = getShape(result["_raw"])
           # output results
           splunk.Intersplunk.outputResults(results)

        It works!  Show me the top shapes among events with more than one line...

        $ splunk search "linecount&gt;1 | shape | top shape"
        shape                count  percent
        -------------------  -----  ---------
        tall_indented           43  43.000000
        short_indented          29  29.000000
        tall_thin_indented      15  15.000000
        short_thin_indented     10  10.000000
        short_thin               3   3.000000

        Just to review, here are the files we made...

          apps/example/bin/shape.py
          apps/example/default/authorize.conf
          apps/example/default/commands.conf

        Now go out there and make cool extensions to Splunk!

        ]]>
        Andrea Longo: More fishbucket funhttp://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/http://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/Wed, 27 Aug 2008 21:10:57 +0000Andrea LongoFor debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. I don't want anything to change it once I put it in the new instance. So I set up a throwaway instance to easily make changes I wouldn't want to do to a real one.

        REALLY BIG WARNING

        Don't do this to any Splunk instance you like. You will be unhappy later. Throw away your dummy instance when you are done so you don't confuse anybody.

        Set up a new instance of an appropriate version, the same or more recent as the original and appropriate architecture (ppc/sparc or intel.) Get it all working with the correct ports so you don't conflict with anything else that may be running on the machine. Since it won't be indexing, the license doesn't matter. Start and then stop so the first run stuff is done.

        Change some things so it won't touch the index:
        ./splunk clean all -f
        rm /opt/splunk/bin/splunk_optimize
        rm /opt/splunk/etc/system/default/inputs.conf (or wherever it is in your version)
        edit /opt/splunk/etc/system/default/indexes.conf to comment out the line frozenTimePeriodInSecs = 2419200 in [_thefishbucket] stanza
        If it's large, you'll want to also comment out maxDataSize = 10

        rm -rf /opt/splunk/var/lib/splunk/fishbucket/*
        copy the contents of the fishbucket index you have into the now empty directory (don't accidentally create an extra fishbucket/fishbucket directory!)
        remove any archives or other temporary files you left lying around in the index directories

        Start this instance and now you can search for index=_thefishbucket. It helps to exclude the Splunk internal files with something like this:

        index=_thefishbucket NOT filename::/opt/splunk/var/log/splunk/license_audit.log NOT filename::/opt/splunk/var/log/splunk/metrics.log NOT filename::/opt/splunk/var/log/splunk/searchhistory.log NOT filename::/opt/splunk/var/log/splunk/splunkd.log NOT filename::/opt/splunk/var/log/splunk/splunklogger.log NOT filename::/opt/splunk/var/log/splunk/web_access.log NOT filename::/opt/splunk/var/log/splunk/web_service.log

        Your full path may vary. What is left is all the files being monitored by the instance.

        ]]>
        Michael Baum: Man Versus Machine: Part Onehttp://blogs.splunk.com/thebaum/2008/08/25/man-versus-machine-part-one/http://blogs.splunk.com/thebaum/2008/08/25/man-versus-machine-part-one/Mon, 25 Aug 2008 09:05:33 +0000Michael Baum

        Recently I gave a talk at the BT annual technology gathering. The setting was a really beautiful estate called The Grove just north of London in Hertfordshire England. A couple hundred of BT's smartest technology managers were in attendance and I was supposed to think of something to hold their interest for an hour. I got to thinking about all the technology and infrastructure BT must have and how in the world do they manage it. I started gathering data. With internal growth, new projects like BT's 21st Century Network and acquisitions over the past decade through BT Global Services outsourcing contracts the company has a lot of IT infrastructure.

        • 74 data centers,
        • 163 countries,
        • 3,000 applications,
        • 6,000 different types of systems/devices and
        • 17,000 IT staff (6,000 BT and 11,000 outsourced).

        I also spent a few hours with some of BT's brightest architects who are working on attempts to virtualize every layer of their infrastructure - network, storage, database, application, web servers, VoIP, collaboration, ordering, billing, provisioning, monitoring etc. What's their biggest problem I asked. Resoundingly it was "our customers are still often the ones that tell us stuff is broken." This was so reminiscent of my time at places like Yahoo! where we'd have these 7&#215;24 war rooms during key outages and the daily conference calls with 30-40 people on the line all emailing logs and configurations to each other.

        As our IT infrastructures become incredibly complex, dynamic, service oriented, virtualized and mission critical we're confronted with this battle raging in our data centers. And it appears the machines are winning and the humans are losing.

        Our biggest problem is figuring out - did something go wrong? Why? Where does truth lie? According to market researcher IDC In 2007 > $140B spent managing the world’s data centers. IT OPEX is growing at 2.5 times the rate of hardware spend and 1/3-1/2 of TCO is spent recovering from problems. The cost of availability now dwarfs the purchase and maintenance cost of technology.

        So what have we as an IT industry done to address the problem?

        We've created concepts like ITIL and CMDBs. While there are some good processes improvements here for sure, these top down modeling approaches and pre-determined rules only tell us what we already know. In my experience it is not the things we already know about that bite us in the ass and take our systems down for prolonged periods of time. It's the multitude of unanticipated and unavoidable dependencies and interactions that take place in an complex system. And it's impossible to know what set of dependencies and interactions will cause downtime until it occurs. Our infrastructures are just too indeterminate. That's the point after all. Tier it, load balance it, virtualize it. So we don't have to worry about the dependencies and interactions among all the different components. Well guess what? We do have to care. Because we have to fix it when it goes wrong.

        Take the analogy of a complex air traffic control system. Sure the air traffic controllers feel really great when they arrive at work in the morning. They've got their coffee, flight plans and a good handle on the early morning inbound and outbound traffic.

        flightplan

        Then the day gets a bit more challenging. Weather conditions over Chicago backs up landings at O'Hare. A baggage handler and mechanic strike slows down JFK departures. A pilot radios he's three degrees north over Pennsylvania but where is he really? Now you need radar. Throw the flight plans out the window. You needs to know what's actually happening now.

        radar

        So how do we establish the equivalent of radar for a complex IT infrastructure. Component monitoring doesn’t work any more. If the problem is a single component failure, we already know about it. We've already automated the swapping in of a new machine or device. And we can reboot software components automatically. IBM's has their own marketing play on this called "Autonomic Computing" but that too seems to only focus on the simple single component issues not the indeterminate chaos that ensues in a real running system. And it seems like more slideware than real solutions.

        In my next post I'll tackle the issue of how we might look at things differently.

        Stay tuned.

        ]]>
        Inder Sabharwal: We are hiring ActionScript/Flex engineers….http://blogs.splunk.com/inder/2008/08/21/we-are-hiring-actionscriptflex-engineers/http://blogs.splunk.com/inder/2008/08/21/we-are-hiring-actionscriptflex-engineers/Thu, 21 Aug 2008 15:48:44 +0000Inder SabharwalSplunk is hiring ActionScript/Flex engineers to build new products for the Enterprise team. If you have been building Enterprise and/or Web applications using AS/Flex, we would love to talk to you.

        Also, if you are a UI engineer using Java or .NET or AJAX (jQuery, ExtJS, etc..) technologies, and are motivated to move to ActionScript/Flex, we will provide you with the tools and mentoring to be successful in this position.

        Experience in building network topology visualizations is a big plus!

        All resumes can be emailed to me directly.

        ]]>
        Michael Baum: Splunk Live Southwest 2008http://blogs.splunk.com/thebaum/2008/08/15/splunk-live-southwest-2008/http://blogs.splunk.com/thebaum/2008/08/15/splunk-live-southwest-2008/Fri, 15 Aug 2008 18:06:41 +0000Michael Baum

        This week we've been moseying through the Southwestern part of the US with our Splunk Live show. We changed up the format a bit with Splunk technical workshops in the morning and customer round tables in the afternoon. The technical workshops were a big hit with more than 200 people registered to engage with our Splunk Experts. During the workshop you were able to download, install, configure and start using Splunk on your laptop or server with remote access. The best part about Splunk Live events though is sharing ideas with other Splunk fanatics.

        Ryan Peterson from Infusionsoft, a marketing automation company, gave a great talk in Scottsdale about his Splunk deployment for the company's email infrastructure. Ryan is tasked with keeping more than 12M emails a week flowing out of the system to support Infusionsoft's Automated Follow-up Technology (AFT). Ryan has multiple servers in different geographies in addition to PCI Compliance requirements. He demonstrated using Splunk to troubleshoot problems spread across the messaging infrastructure, address reporting inaccuracies and deliver PCI reports to auditors. He's even indexing the content of email with Splunk using a scripted LDAP data input. Cool stuff.

        In San Diego Tony Doan of the Genomics Institute at the Novartis Research Foundation (GNF) and Eric Van Johnson from Sony Consumer Electronics joined us. Tony is a security engineer and former pen tester. He also confesses to be a recovering Unix sysadmin. GNF has 600 Windows desktops and several hundred Windows and Linux servers supporting the discovery of new biological processes and improved human therapeutics. Tony discussed how they splunk Cisco CSC, Bluecoat, Symantec AV, Arpwatch, Cisco Switches and Wifi access points to find what he calls "previously unknowns" to improve operational availability and security. He says they're finding new uses everyday but Tony's favorite is splunking Cisco IPS and Cisco MARS events looking for odd behaviors. Next up for GNF is eating Windows Event Logs and Windows Registry inputs together with summary indexing for consolidated reporting.

        Eric Van Johnson is the eServices Hosting and Operations Manager at Sony Consumer electronics. He led an great discussion on splunking IBM Websphere and MQ Series events including how Sony has integrated operations and development environments to identify problems with complex apps more quickly and avoid unnecessary escalations to the development team. He shared with us Sony's roll out of Splunk to their Business Intelligence Group. The idea is to complement aggregated WebMethods data reporting for business activity monitoring. Next up he wants to feed Splunk data back and forth with Verizon's hosting operations since some of the Sony servers are hosted at Verizon and Verizon is also using Splunk.

        In LA Rich Horace, Director of Systems Engineering and Operations at Fox Interactive Media demonstrated how Fox uses Splunk in the Fox Audience Network. Basically these are the guys that serve web advertisements across all the Fox properties including MySpace, Rotten Tomatoes, Fox Sports and IGN. He's challenged with launching new monetization platforms and keeping the existing ones running. Rich gave a fantastic overview of his Splunk installation which consolidates/aggregates data form disparate systems in order to protect against hackers and meet PCI and SOX requirements. He currently runs an environment with ~600 Linux servers, load balancers, servers, NetApps and network switches. So far he's indexed 1.5B events. We engaged with everyone in a lively discussion about securing production sites from developers and controlling and auditing access to data using Splunk's access controls and search filters. Rich also discussed how Fox is using Splunk to integrate with various Citrix products including Netscaler and XenApp.

        Thanks to everyone who shared their stories with us this week, it was really awesome.

        ]]>
        Andrea Longo: What is this fishbucket thing?http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/Thu, 14 Aug 2008 22:50:44 +0000Andrea LongoIt's time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what's there, try searching for "index=_thefishbucket". Events look something like this:

        48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log

        The fields are:

        timestamp (epoch time, in hex)
        CRC of the first 256 bytes of the file
        CRC of the 256 bytes where we were last reading
        seek pointer for where we are in the file
        the time the file last changed
        the full path to the file.
        the full path to the source, which is usually the same as the file but could be the archive the file came from.

        When the file monitor processor looks at a file, it searches the fishbucket to see if the CRC from the beginning of the file is already there. If not, the file is indexed as new, If yes, then we check the CRC of where we were reading against the saved value in seekcrc. If it matches and the file is longer than the saved seek pointer, then there is new stuff at the end to read. If the top of the file matches but the seekcrc doesn't, or the seek pointer is beyond the current end of the file, then something in the part we have already read has changed. Since we don't know what might have changed, we just index the whole thing. (You can control this: see CHECK_METHOD in props.conf.spec.)

        If you want to track what is happening with a particular file, you can search for all the events in the fishbucket associated with it by the file or source name (like source::/var/log/apache2/feorlen_org_access_log.) If you check the seekptr and the modtime, they will only be increasing with time (note that events are returned most recent first, so this list is newest to oldest.)

        48a3084d initcrc::5f66db978a1ff3a3 seekcrc::3e746e9f66897965 seekptr::414a40 modtime::1218644042 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a307d9 initcrc::5f66db978a1ff3a3 seekcrc::77f6d8313fc689ba seekptr::41419b modtime::1218643929 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a3062e initcrc::5f66db978a1ff3a3 seekcrc::2cc30b86b37c646 seekptr::4140fc modtime::1218643502 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a300d3 initcrc::5f66db978a1ff3a3 seekcrc::8db2f52ef6f75c91 seekptr::413fa4 modtime::1218642130 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2fc7a initcrc::5f66db978a1ff3a3 seekcrc::881375418e194bd5 seekptr::413f06 modtime::1218640999 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2f996 initcrc::5f66db978a1ff3a3 seekcrc::c596371ec4c573d4 seekptr::413e6c modtime::1218640260 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2f80c initcrc::5f66db978a1ff3a3 seekcrc::2e686cf0dd2f62bb seekptr::413dce modtime::1218639883 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2f25a initcrc::5f66db978a1ff3a3 seekcrc::b2e489862ed72c79 seekptr::413d1d modtime::1218638406 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2f1d1 initcrc::5f66db978a1ff3a3 seekcrc::58af0c6446e96bf5 seekptr::413c7f modtime::1218638289 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2f19d initcrc::5f66db978a1ff3a3 seekcrc::16fdb83b48965067 seekptr::413bbe modtime::1218638236 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2f05b initcrc::5f66db978a1ff3a3 seekcrc::fbb8700a35cfdfcb seekptr::413b25 modtime::1218637915 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log
        48a2ebc5 initcrc::5f66db978a1ff3a3 seekcrc::ddbac21aa7386a6 seekptr::413abd modtime::1218636714 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log

        Anything other than this indicates a big problem with the file, like it is getting re-indexed when it shouldn't. (Some files you do want to re-index when they change, but not normal logfiles that roll.)

        So why do I care?

        Every Splunk instance has a fishbucket index, except the lightest of hand-tuned lightweight forwarders, and if you index a lot of files it can get quite large. As any other index, you can change the retention policy to control the size via indexes.conf. But since it tracks what files the instance has seen, you have to consider carefully before you change the retention policy. If you retire data from the fishbucket for files that still exist on the host, it will "forget" it saw them and next time around they will get re-indexed.

        ]]>
        Erik Swan: Search engine for virtual sprawl - vmware app for splunkhttp://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/http://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/Sun, 10 Aug 2008 22:57:24 +0000Erik Swan**** UPDATE - 10/31/08 ****
        Hey all,
        I've updated the app to version 1.8.
        The only fix in this version is a bug with multiple datacenters.
        Version 1.8 should now work for an unlimited number of datacetners.
        ( Thanks to Stephen for finding and letting me know )

        As always feel free to bug me if the app has any problems.
        e.

        **** UPDATE - 10/10/08 ****

        Hey all,
        I updated the latest release - 1.7 - to fix a shutdown bug.
        Turns out that in prior releases when Splunk was shut down that the VMWare app kept running.
        This release not will terminate the VMWare app when splunkd goes away.

        If you would like to test or run without splunk you can pass in the arg.
        java -jar splunk.jar - standalone

        ** see instructions below on how to run the above command **
        As usual, drop me a line if you have any questions.
        Good luck with 1.7

        **** UPDATE - 09/16/08 ****

        Thanks to more testing i have found and fixed a few critical bugs.
        Updated APP version 1.6 &gt;&gt; here &lt;&lt;

        • there was a static var preventing the multiple server configs from working. Should be fixed, and multiple servers in the vmware.conf should work.
        • Ibm jvm's should work - ie AIX should now work ;-)
        • Added new saved searches and a few dashboards ( thanks to raffy ;-)

        As usual, please let me know if you find any bugs.
        I'll type up some notes on my VMworld experince

        Cheers,
        e

        **** UPDATE - 09/08/08 ****
        Thanks to lots of folks trying it out i have found a critical bug that was preventing much of the data from getting indexed. This latest release 1.5 should have that fix and everyone should see all the wonderful VMWare data in the index.

        As usual, bug me if it does not work or you have any questions.

        If you have made changes to vmware/local/vmware.conf and not to the file in default you can just untar this version on top of your old one. If you are making changes to the default/vmware.conf file, i'd move that to local/vmware.conf that way when i ship updates it will not blow away your conf changes. We ship only default and not local/vmware.conf.

        Thanks again to everyone that helped find bugs!

        e.

        **** UPDATE - 08/27/08 ****
        I have updated the app with a few fixes found in the field.

        • hopefully fixed issue on AIX (IBM jvm )
        • added output of host/vm name on update messages. It was hard to tell where the messages were coming from
        • added more debugging infor on startup to help debug connection issues.

        Things that are still under-investigation.

        • Pointing at lots of ESX servers and not VC. Seems as though some data is not coming back from ESX.
        • Making work with older jvm's ( currently it seems i require 1.5)

        **** Original Post 08/10/08 ****
        I've wanted to release this a few months ago but the project keeps getting stuck on the back-burner. Finally I've cleaned it up and had a few people try it and it seems to work well. I'm sure there are configurations and versions out there that will have issues - please write me back ( my first name at splunk.com ) if it does not work as advertised.

        Reading the below makes it sounds more difficult that it really is. Just download, un-zip, change the server url, username and password in the vmware.conf file, restart and go! This really is the first pubic release and i'd love to get more feedback. I'll more than gladly send you Splunk tee shirt of your choice if you help find bugs or have useful suggestions!

        Why you want to give it a try:
        This vmware app is a cool way to keep track of what your VC and ESX servers are up to, what instances are running where, when they are under load, when instances move, when they have errors, and much more. Since all the data is indexed in Splunk, it's easy and quick to search for problems and report on your virtual sprawl.

        How it works:
        This app will connect a splunk server to any number of Virtual Center and/or ESX servers and grab/index the events, logs, properties, performance data, and anything else I can get my grubby mitts on. It's easy to hookup and get going, so if you use Virtual Center or ESX than give this app a try. I'll explain how to install/setup, how to trouble shoot, and what you will see when you get it working. You will need to install splunk or use an existing Splunk server. See the configuration file for settings on how often to pull data. Also near the end of this post i give example searches to explain the data.

        After installing you get cool graphs like this one showing CPU Usage by Guest by Time:

        cool graph

        Add Inside-out monitoring
        Its optional but if you can also put splunk on the guest OS's as light weight forwarders and you will get a brilliant inside out view where we capture not only what VC/ESX thinks but what the guests are seeing on the inside. My best practice is to put splunk on the guests and capture basic logs as well as OS performance metrics, what apps are running, how much mem/cpu they are taking, etc. You can get the Unix/Linux version here and the windows here. Of course its not required and you get a ton of value out of just with the basic vmware app's monitoring of VC/ESX.

        INSTALLATION:

        **Important**
        This app requires a JVM be installed on the same box as the splunk server. I know this is less that optimal. Please bug your local VMWare rep and tell them to make me REST API's and not SOAP API's. The VMware API's are hideously over complicated - Please dear VMware make a simple REST interface.

        1) Make sure java is present and set the JAVAHOME environment variable. If not already set you must be set JAVAHOME to the directory that contains the java binary.

        2) To test the variable is set correctly, try and run the following on the command line
        windows&gt; "%JAVAHOME%\bin\java
        linux/unix&gt; $JAVAHOME/bin/java

        If it worked it should spit back a bunch of options to pass to the java command. If its not set right you will get some kind of file not found error.

        3) Grab the vmware.zip file HERE.

        4) Unzip the file - and copy the resultant "vmware" directory to your SPLUNK_HOME/etc/apps/ directory. When done the following directory should exist: SPLUNK_HOME/etc/apps/vmware.

        CONFIGURATION:
        There are a few config settings to make the app work.

        5) First you need to let Splunk know where your VC or ESC servers are. Edit the vmware/default/vmware.conf configuration file to point to your vc or esx servers. If using VC you need not specify all ESX servers under management, splunk will get the list from VC. The config file contains one or more of the following stanza's ( the unique_name can be anything you like so long as its unique):
        [vmserver:unique_name]

        For each [vmserver] stanza be sure to set:
        url=https://your_server_IP/sdk
        username= your_user
        password=your_passowrd

        Note that the url should be the ipaddr of your server with "/sdk" at the end - for example "url=https://10.1.1.35/sdk". A good way to test that the url and username/password are correct is test using a web browser. Take the url you have entered above and replace the "sdk" with "mob". Use the web browser to navigate to that url and make sure it asks for username and password and that the values you entered above will authenticate correctly. If the "mob" url works with the username and passowrd you entered than splunk should have no trouble.

        With those three set you should be up and running after a restart.
        The rest of the config file should be self explanatory and is included end of this post for reference but you should not need to change anything else.

        Testing and Troubleshooting:

        6) It's best to test running the vmware app outside of splunk first.
        You'll need to make sure that SPLUNK_HOME is set for the test.

        ** On Windows **:
        set SPLUNK_HOME=your splunk directory
        #note it does not like it when i add quotes around this path - try with no quotes.

        Then run the app by hand
        &gt; cd %SPLUNK_HOME%\etc\apps\vmware
        &gt; java -jar lib/splunk.jar

        ** On others ** :
        export SPLUNK_HOME=your splunk directory

        Then run the app by hand:
        &gt; cd $SPLUNK_HOME/etc/apps/vmware
        &gt; java -jar lib/splunk.jar

        It should spit out all sorts of vmware data. If it throws an error its likely that SPLUNK_HOME or JAVAHOME are NOT set. Remember SPLUNK_HOME will be set by the server when the server runs the script. You need only set it for testing.

        If it does not work, likely the exception will have something useful in it such as connection refused ( bad auth ) or a 404 error in which case the url is incorrect.

        If you get any non-obvious errors email me ( my first name at splunk.com ).

        7) Try running in splunk.
        If the above test works than you should be able to just restart splunk and all should be good. The way to tell if its working is that you will get events with sourcetype vmware and vmware_api.

        8 ) If you do NOT see events of type vmware_api on the dashboard than try the following search:
        "index=_internal error"
        and
        "index=_internal splunk4vmi.py"

        You should see some kind of error or warning that is hopefully obvious. If not again email me and i'll sort you out.

        Using the App

        At this point it should be working and you should be able to search for cool stuff.
        Here is a quick overview of what splunk is indexing:

        After restarting you should see a bunch of logs from vwmare and at least two new sourcetypes ; vmware and vmware_api. Below is a screen shot of my dashboard after restarting - notice the vmware logs and the vmware_api event counts.

        sources

        The vmware sourcetype is for the actual vwmware logs while the vmware_api sourcetype is for the API calls. It can take a minute before they show up so if they are not there, try again after a minute. If you still do not have the logs that likely means the logs path in the vmware.conf if incorrect and you should make sure the path is correct or contact me.

        If you do not see the API calls than there is likely an auth or url error that should have been caught when you did the manual test above. Try retesting by hand above - if the by-hand method works but not through splunk than contact me.

        I've just started to explore the logs that come back - there is a ton of information in them but my test infrastructure is not all that insteresting so i'm not sure what goodness you all might find in them. Poke around the files and see what you see and bug me if you see anything interesting i can make them into alerts / reports.

        The meat of the data is from the API where we pull everything we can.
        Most useful are:

        1) Metrics
        Every few seconds we captures the metrics for all VM's, including
        metrics

        2) Events
        I'm not sure the scope of these but it looks like interesting events kicked out by ESX. Someone with a larger VMware installation might find far more interesting events than i see on our infrastructure.
        events

        3) Updates:
        It looks like when anything changes, we can an update.
        updates

        4) Inventory:
        I periodically just capture the inventory tree. It's more for debugging than perhaps useful in a production environment but it does not cost much to get and it can be useful.
        inventory

        Thanks to Christina we do ship with a bunch of saved searches. After installing you should see them, they all start with 'VM:'. They are named to be somewhat obvious, again let me know if they dont work or you have some better ones to add to the default app. Try some of the Metrics and Status saved searches to make sure your install is working.

        • VM: Investigation CPU load on all guests sharing ESX server
        • VM: Investigation Find ESX Host for Guest
        • VM: Investigation Find Guests sharing ESX Server - Non FQDN
        • VM: Investigation- Find other VMs sharing ESX Host
        • VM: Investigation- Processes on hosts sharing ESX Server
        • VM: Investigation- Running processes on other guests on same ESX server
        • VM: Metrics- CPU by Guest last 60 minutes * VM: Metrics- Host Memory Usage last 15 minutes
        • VM: Metrics- Host Memory Usage last 60 minutes
        • VM: Metrics- Memory by Guest last 60 minutes
        • VM: Status- Free Space by Datastore
        • VM: Status- Running Guests
        • VM: Status- Running VMs

        That's about it.
        Like i said, PLEASE email me if you have bugs or suggestions.
        I'll plan on updating the app with whatever feedback i get from folks. So please, help me out and get yourself a tee shirt.

        Kind Regards,
        e.

        P.S. - there is a sample of the config just so that you can see what's in it without downloading:
        - - - - - - - - - - - - -
        The following are the important values in the config file:


        [vmserver:demo]
        url=https://10.2.1.151/sdk ## This is the url to the vc or esx server
        username=your_username ## user name to auth against the server. If you are not sure of its value point we browser at the above url and check the web auth, it will be the same.
        password=your_passowrd ## we will support non-clear text in the near future.
        ignorecert = t ## for now leave as true (t), we will soon support checking of certs
        loggingLevel = error ## to turn on debugging values are [error, warn, info, debug ]

        index_events = t ## should we index events (t)rue or (f)alse
        events_interval = 10 ## how often to check for events in seconds

        index_properties = t ## should we index events (t)rue or (f)alse
        property_interval = 10 ## how often to check for events in seconds

        index_metrics = t ## should we index events (t)rue or (f)alse
        metrics_interval = 10 ## how often to check for events in seconds

        index_updates = t ## should we index events (t)rue or (f)alse
        updates_interval = 10 ## how often to check for updates in seconds

        index_logs = t ## should we index logs (t)rue or (f)alse
        logs_interval = 300 ## how often to get log changes...
        logs_localpath = ../var/spool/vmware ## the logs are copied from vc/esx to the this directory where splunk will pick them up for indexing

        ]]>
        Inder Sabharwal: Deployed bundles not taking effect?http://blogs.splunk.com/inder/2008/07/28/local-and-deployed-bundles/http://blogs.splunk.com/inder/2008/07/28/local-and-deployed-bundles/Mon, 28 Jul 2008 21:25:09 +0000Inder SabharwalChanges made in /etc/system/local override any configuration bundles that you may be trying to publish to your Splunk instances using a DeploymentServer.

        Serveral customers have reported that DeploymentServer configuration bundles were not taking effect, only to realize after several troubleshooting cycles that there was some configuration in /etc/system/local that was preventing that from happening. Note that any configuration in /etc/system/local will always take precedence over any other configuration in the system - even deployed bundles.

        So, if you are stuck in this position, please make sure to check your /etc/system/local before hitting the panic button!

        ]]>
        Matt Green: Help Me Help You: Opening a good ticket with supporthttp://blogs.splunk.com/matt/2008/07/28/help-me-help-you-opening-a-good-ticket-with-support/http://blogs.splunk.com/matt/2008/07/28/help-me-help-you-opening-a-good-ticket-with-support/Mon, 28 Jul 2008 21:15:33 +0000Matt GreenSalutation drivers of the Information Super Highway,

        I've got another post here in the occasional "Help Me Help You" series, this time I'm going to digging into case writing.

        I was talking with the some of the engineers the other day around the bar about an issue that one of our field guys opened. One of the engineers mentioned a piece of information that totally changed the way the rest of us were going to handle the issue. This got us to talking about how some people write great cases and others don't. The ones who write good cases usually get their issues resolved first (often times closing the issue with the first response from a member of my team), the ones who write "bad" cases generally have a back and forth exchange.

        That got me thinking that maybe I should take a sec to talk about what makes a good case. I'm going to try mapping out a basic template for submitting an issue. This is by no means limited to Splunk and is most definitely not a de facto standard. Rather it is a compilation of things that always make my life easier when my customers can provide them.

        • Backstory: Like I mentioned in my previous post I don't work in the cube next to you, I don't see the same things you see, know the same things that you know.
          Often times I get cases with a description like "I came into work this morning and discovered that this thingy that was working yesterday isn't working today. What gives?" In digging into the issue the customer remembers that last night was the weekly maintenance window and one of the other guys was making some changes on the box and it is this change that caused things to go wonky.
          I guess what I am getting at here is that it helps to know what led up to the issue. Flushing out the supporting data points can be a big help in piecing the problem together. Even if you think it is unrelated include it, it can't hurt. The worst thing that can happen is you spent a few more bits and thankful bits don't cost what they used to. I've also found that when I take the time to think about _all_ of the things that led up to the event in question the light bulb over my head starts to flicker and maybe I can figure it out before enlisting someone else.
        • Impact: Do you have to commit seppuku if this issue is not resolved in the next hour? If you do you may want to include that in the initial report, it will really help with prioritizing the issue. Are others unable to do their job because of this, we want to know. If you're asking a question for your own edification share that as well - helps us to prioritize other issues and formulate the best answer for you. Big fires often require an immediate fix and you don't really care about the inner workings of the fix just that it works. If you are trying to learn something you want the opposite.
        • Priority: We all deal with fires (some bigger than others) let the guy on the other end know how you need the issue treated. Support folk inherently want to help (why else do we do this job? It isn't for the unlimited supplies of handi-snacks) and if you say I need this now we will make every effort to deliver.
        • Data Samples: One of my new favorite shows is The First 48 which follows real homicide cops as they investigate murders. Each episode always starts off with the cops going to crime scene collecting every potential piece of evidence. They don't know what is relevant and what is not, so they assume it all is. The same is true when troubleshooting an issue with software. The more data points I have to work with the better position I am in to figure out what is going on.
          If splunk isn't parsing a field in a given file include a copy of said file along with your configs. If the UI is acting weird take a screen shot. If performance is an issue include the results of your tests to determine that things are slow along with the tool(s) used to produce the results.
        • Repro steps: If you can trigger this issue on demand, please share. Knowing the exact path traveled will often make root cause analysis that much easier. Screen shots of each step are very helpful (a picture is worth more than a 1,00 words) in describing an issue.
        • Your investigation: I find it is really helpful to know what you have done to try to figure out a problem. It saves time because I wont ask you to perform steps that you said you've done and you wont get frustrated at me for asking you to do work again. It also gives me insight into your investigative process - if you are thorough I am more inclined to trust your results at first glance. If you are vague or unclear I have to assume that the information you are providing is incomplete. This is not to say that what you are giving is bad/wrong/stupid, rather it is not the full story.

        Ok I'm sure there is more that I can say here but this post is getting kind of long, my fingers are tired of typing, and I need to answer some cases.

        ]]>
        Andrea Longo: Splunk and iPhonehttp://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/http://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/Mon, 28 Jul 2008 18:20:43 +0000Andrea LongoI've been playing with a few things that will eventually turn into an iPhone application to talk to Splunk via the REST API. I don't have a lot to say about it right now due to other issues but I do have a little something to show off:

        Splunk doesn't support Safari officially yet and MobileSafari is a whole 'nother animal, but there are other things you can do. You can talk to the REST endpoints just fine. Here I have a Live Tail search running from the browser, talking to my production server.

        ]]>
        Andrea Longo: Forcing dashboard refreshhttp://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/http://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/Fri, 25 Jul 2008 17:01:16 +0000Andrea LongoIn 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can't change this right now. But if you want to force a refresh, you can delete the files that contain the cached data.

        Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the dashboard data. There is also a directory for each username with *.csv files. Delete the username_* files (like "admin_KB indexed per hour last 24 hours") and the *.csv files and the next time you refresh the dashboard, it will reload.

        This is not an elegant solution by any means, but it does work. While you could just delete the files for the search in question, there is no simple way to identify which csv file is associated with it. Just don't go messing with the other files in this directory, you will be Very Unhappy if you do.

        ]]>
        Erik Swan: My favorite “customer” and Splunk as multi-tenant platformhttp://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/http://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/Wed, 23 Jul 2008 04:27:25 +0000Erik SwanEveryone has their favorite customer.
        I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is RJ Auburn
        rj

        Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the usual the UI-is-off-by-10-pixels but complex indexing or distributed search bugs. Well, sometimes is its a trivial thing we missed, but usually he is usually pushing splunk to its limits. Its not often that a CTO and "industry expert" is the one to personally put splunk through its paces - but it's RJ is like that and gets his hands dirty - and splunk is the better for it.

        RJ and Voxeo are one of a few, but quickly growing, number of companies that are using splunk in a multi-tenant environment. This means using splunk to to collect data across multiple tenants in a hosted environment and then using splunk for searching and reporting on a per customer basis. Often the output of the searches/reports is rendered for the customer do they can see what is going on within the service. Customer dashboards and activity reports are a common usecase for splunk. Below are some of the images from the voxeo service:

        vox dash

        On the Voxeo blog there is a nice description and even a cool video introduction:

        Lessons learned from these initial deployments are having a significant effect on our upcoming 4.0 release. First and foremost we will provide a much better html "module" system so that you can embed splunk modules in other webpages. Secondly, we will be having the overall splunk UI more configurable and modular so that multi-tenant customers can build even more custom UI's.

        One other very interesting trend is using splunk for SaS using cloud services. Often these uses have some kind of multi-tenant .... It wont be long before splunk makes deploying in the cloud even easier. More in a post to come but do drop me aline if you want to use splunk in the cloud and i can give you some hints.

        In the mean time if your looking for the best push-it-to-the-limits beta tester, contact RJ!
        Thanks RJ ;-)

        e.

        ]]>
        Erik Swan: Congrats to FlowingData - strength in (subscriber) numbers!http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/Sun, 20 Jul 2008 18:42:04 +0000Erik SwanWe here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by Brian cool post.

        I just wanted to congratulate Nathan over at FlowingData for crossing the 3100 subscriber mark.

        flowingdata logo

        FlowingData is a fantastic example of the hidden value in the data all around us. As more and more of what we do is documented by computers the impact of statistics has become less of a hard-core math geek sport and more within the reach of anyone's curiosity. His daily posts are a constant reminder of how statistics has become a crossover genre.

        Thank you Nathan!
        e

        ]]>
        Bob Fox: The Commoditization of the IT Professional (or is there a new Black Art?)http://blogs.splunk.com/bob/2008/07/14/the-commoditization-of-the-it-professional-or-is-there-a-new-black-art/http://blogs.splunk.com/bob/2008/07/14/the-commoditization-of-the-it-professional-or-is-there-a-new-black-art/Tue, 15 Jul 2008 05:25:02 +0000Bob FoxA recent gathering of friends (a group of IT gray-hairs, artists, and lawyers) had got me thinking about IT as a profession, and the development of the industry since I got involved 20 years ago. The question posed to the group was about whether we would recommend our current professions to our children. This query, a few others, and perhaps one Liberty Ale too many had started me down the track of over-analyzing the state of IT today. I suppose I am both proud and terrified at the same time.

        First, the goodness. As an industry participant, IT has come a long way. Collectively, we have successfully lobbied to become more than just a cost center. The 'nerds in the back room' have become intertwined with the business. IT now facilitates both cost savings and revenue generation. IT is the driving force and enabler of employee empowerment, productive mobility, and instantaneous communication. IT run systems facilitate negotiations, analyze deals and execute trades.

        Well done, everyone. A big pat on the back to us all.

        Before I start sharing the negatives, I should let you know that I still have faith in the future of IT. Skip to the end if you don't care for the doom and gloom.

        What scares me most about IT today is what I have always called the 'commoditization of the IT professional'. Personally, I blame this all on the 'dot com' era hiring frenzy that allowed anyone who could install a mouse under Windows to be branded a 'system administrator'. As sysadmin titles transitioned to that of IT Manager, we started to lose some of the industrious, maverick spirit that made the console jockeys of old both revered and magical.

        Now, I am not saying that all of the junior folks out there need to learn the way I did - by fixing superblocks with nothing more than a hex editor and a pot of coffee, or by wrestling a printcap file into submission because some bonehead ordered a printer that didn't speak postscript (PCL? Ugh!). It would be nice, however if certain concepts were understood without having to resort to web searches of old Usenet posts. Why are topics like the effect of increased I/O on the various subsystems of a server not comprehended? Where has the art of system tuning gone? Sometimes throwing hardware at a performance issue is the correct answer - but when?

        Yet I remain hopeful. The ancient 'black arts' are still practiced. Every day I get to speak with people who are doing some very magical things, but now with the Splunk platform. They are extending Splunk to places I have never imagined, and solving problems that are unique to their own businesses. I see enterprising individuals doing some amazing things with search commands and Splunk reports. Some of this is proprietary of course, but a lot of it has been built upon applications available in Splunkbase today.

        So, to answer the original question posed - would I recommend a career in IT to my kids? The simple answer: Hell No. I don't need the competition.

        Want to relive your glory days? Send me your favorite sysadmin spell from the old days and I will do a golden oldies post. For extra credit, show me how you can accomplish (or avoid) the same thing today using Splunk. Immortality awaits!

        ]]>
        Brian Murphy: Splunking pitchfork album reviewshttp://blogs.splunk.com/brian/2008/07/14/splunking-pitchfork-album-reviews/http://blogs.splunk.com/brian/2008/07/14/splunking-pitchfork-album-reviews/Mon, 14 Jul 2008 23:05:39 +0000Brian MurphyOne of my favorite sites is the record review and music news site pitchfork media. On the site they have a bunch of interesting statistics like top record for each decade/year but these are obviously a more subjective list than if they crunched the raw stats. For example their #1 album of the nineties is Radiohead's "Ok Computer" (rated 10.0) and the #15 is "The Bends" by Radiohead ( which isn't reviewed on the site at all ). I was interested in crunching the data provided by their wealth of reviews. So I downloaded all the record reviews using a simple python script. And parsed out the description, rating, label, reviewer, release year, title and artist using the following regex :

        .*?&lt;h2 class="fn"&gt;\s*(.*?):&lt;br /&gt;([^\n]*)\n.*?&lt;div class="info"&gt;\n\[([^&lt;;]*);?\s*(\d*)\]?.*?&lt;span class="rating"&gt;(.*?)&lt;.*?&lt;div class="content description"&gt;(.*?)&lt;/div&gt;.*? - &lt;span class="reviewer"&gt;&lt;span class="vcard"&gt;&lt;span class="fn"&gt;(.*?)&lt;/span&gt;.*?title="\d+"&gt;(.*?)&lt;

        I can now run some interesting queries :

        • * | chart avg(rating) by releaseYear

          Which graphs the average rating per calendar year of the release.
        • *| stats count(title), avg(rating) by artist | search "count(title)">2| sort "avg(rating)" d | head 10

          This shows the top rated artists that have a least 3 reviews on pitchfork
        • * rating<=10 rating>0 | stats avg(rating) as avg_rating, count(title) as title_count by label | search title_count>3 | sort avg_rating | head 10

          This shows that Invisible Records are the worst reviewed label on Pitchfork.
        • * | stats count(title), avg(rating) by reviewer | search "count(title)">4 "avg(rating)">7.5 | sort "avg(rating)" d

          This search finds all the reviewers that have at least 5 reviews and on average score higher than 7.5. So if you want a good review on pitchfork you're better off with Luke Buckman :)
        • * | eventstats count(title) as titleCount by reviewer | search eventtype=7_dirty_words titleCount>3 | stats count(title) as ct ,max(titleCount) as mf by reviewer | eval blue_index=ct*1.0/mf | sort blue_index d

          This is my personal favourite, it's a list of reviewers most likely to use the one of George Carlin's seven dirty words (nsfw). The mf column is the count of reviews with one of the words and the ct row is the review count for that reviewer. The blue_index is the mf/ct.

        So there you go : Splunk > it's not just for logs.

        ]]>
        David Carasso: Simple Transactionshttp://blogs.splunk.com/david/2008/07/03/simple-transactions/http://blogs.splunk.com/david/2008/07/03/simple-transactions/Thu, 03 Jul 2008 15:28:11 +0000David CarassoIn this post, I'll show you how to use Splunk's Transaction search, with several powerful examples.

        In the latest releases, we have search-time discovery of transactions, with the new transaction search command. Transaction collapses a set of events that belong to a transaction into a single event. You can specify the parameters as arguments to the transam operator right in the search, or you can refer to a named-transaction definition in transactiontypes.conf. A few simple examples will give you an idea of some things you can do.

        • get events with 'http', and group any search results into "bursts" of events, grouping any events that occur within two seconds of each other into the same transaction event. [Note: there is an implied "search" command at the head of all searches, so "http" is really "search http".]
        • http | transaction maxpause=2s
        • get events with 'http', and collapse those that share the same host and cookie value, that occur within 30 seconds:
        • http | transaction fields=host,cookie maxspan=30s maxpause=30s
        • get events with 'sendmail', and collapse those that have the same userid, between a login and a logout, that occur within 10 minutes:
        • sendmail | transaction fields=uid startswith="eventtype=login" endswith="eventtype=logout" maxspan=10m maxpause=10m
        • get events with 'http', and then find transactions as defined by email_transaction found in transactions.conf:
        • http | transaaction email_transaction
        • Find transactions that change a password, near where there were unsuccessful root logins. To break it down - search for unsuccessful root logins, find time ranges around those root logins, find transactions in those those regions, and finally look for password changes in the transaction.
          root login NOT fail*
          | localize maxspan=1m maxpause=1m
          | map search="search starttimeu=$starttime$ endtimeu=$endtimeu$
          | transaction session |  search password change"
        ]]>
        Eric Garner: Open Letter to Company Leadershttp://blogs.splunk.com/maverick/2008/06/27/open-letter-to-company-leaders/http://blogs.splunk.com/maverick/2008/06/27/open-letter-to-company-leaders/Sat, 28 Jun 2008 03:04:34 +0000Eric GarnerDear CEO, CTO, CIO, and other Company Leaders,

        Consider this letter a wake-up call.

        As an individual responsible for setting the vision of your company, please be aware that the people who work for you now, those smart, intelligent, high-tech individuals who believe in your vision, who are extremely proud of serving you, do not want to let you down.

        Every day, these individuals work hard for you and you pay them well for their services. They are system and network administrators, security analysts, application developers, infrastructure architects, QA testers, and various other IT consultants.

        As these individuals attempt to move your company forward towards explosive growth and expansion, incredible innovation, and unbounded profitability, you are either not aware of or not focusing enough on "how" they are striving to realize your vision and make it a reality.

        Of course, you are probably thinking it's not your job as a company leader to worry about "how" things are done so much as "why" or "when". After all, that's what being a company leader is all about, right?

        Indeed, this may be true, but the reality is you need to be aware of the "how" more now than ever.

        The reason you are NOT aware is because the people who work for you may not have the opportunity to tell you how or YOU may not be curious enough or interested enough to ask.

        This could be a costly oversight on your part.

        I know. As a technical pre-sales engineer, I work with these individuals all day long and I can tell you that they are struggling to keep your company accelerating towards the vision you set and the profits you seek. They are struggling because there is so much machine-generated data to keep up with, it's humanly impossible to succeed without a tool like SplunkTM. Trust me, I hear this all the time.

        When I ask them how they discovered SplunkTM, they always say they were tasked with an initiative or directive from you. Or maybe you delegated a project to them based on recent pressures to meet some new compliance mandate, audit controls, security concerns, or perhaps all of the above.

        Of course, as soon as they discover, download, install, and prove out for a fact that SplunkTM meets every single one of the requirements you've tasked them with (plus several other things that could save you money), they naturally recommend that you budget for and purchase SplunkTM right away. After all, you want them to get the job done AND save money too, right?

        But again, you may never get a chance to hear about it.

        Why?

        Well, like I said, maybe you are not curious enough or interested enough to ask.

        Or maybe you HAVE asked and they've TRIED to tell you, but their voice was muted but someone who stands between you and them.

        Or maybe they HAVE gotten through to you and told you they need SplunkTM to get the job done faster, better, and cheaper, but since you've never heard of SplunkTM, you simply dismiss it and seek some other company's product that will end up costing you more.

        So I guess my question to you is this: Are you willing to sacrifice your vision by only focusing on the "why"?

        Seriously, are you willing to not ask questions and seek out the truth regarding "how" your vision will be carried out?

        Look around your company right now. Right this minute. Ask around. I am willing to bet that someone who works for you right now has already downloaded SplunkTM and is currently evaluating it. Maybe even several people or teams within your company are splunking IT data as you read this very sentence in an attempt to save you money and you are not aware.

        Do it now and awaken!

        Sincerely,
        Eric Garner
        Senior Sales Engineer
        www.splunk.com

        splunk> take the sh out of IT

        P.S. if you are not a Company Leader, please forward this letter to the appropriate person. If this letter can help raise awareness about SplunkTM within your company, then I will know it was worth my time to write it.

        ]]>
        Michael Wilde: Splunk Ninja - So You’re Interested in Video now?http://blogs.splunk.com/thewilde/2008/06/27/splunk-ninja-so-youre-interested-in-video-now/http://blogs.splunk.com/thewilde/2008/06/27/splunk-ninja-so-youre-interested-in-video-now/Fri, 27 Jun 2008 15:34:34 +0000Michael WildeThis episode gives our faithful and inquisitive viewers a behind-the-scenes look at the Splunk Ninja's ghetto-tech operation. Some viewers have been wondering, how I put all of these videos together, what equipment to use and what software or websites to get started with. Covered in this no-holds-barred, blockbuster epic, multi-dollar budgeted, long form tutorial are:

        • My experiences in getting to this point.
        • Things for you to consider and many options.
        • Tools I use in my "anti-studio".
        • Production, hosting, viewing and all that nonsense.

        Its the longest video I've ever done. I really try to put content in front of the viewers that has substance, some level of staying power, relevance and most of all value for your attention - which I do cherish

        Thanks for watching, please comment in the timeline, with your keyboard or with the Seesmic video comment link below. ...and one more thing, send me a link to your videos!

        This blog post and video is in fact not sponsored by Behringer Mic's, Alesis Mixers, John Foley Software, Vara Software, AllocInit.com, Viddler - I'm just a big fan of their stuff! - but is in fact sponsored by Splunk, The IT Search Engine. Download it today.. it rox!

        Blogged with the Flock Browser

        Tags: , , , , , , , ,

        ]]>
        Inder Sabharwal: Aggregating Metrics from all your Splunks…http://blogs.splunk.com/inder/2008/05/15/aggregating-metrics/http://blogs.splunk.com/inder/2008/05/15/aggregating-metrics/Fri, 16 May 2008 00:05:31 +0000Inder SabharwalIf you found that the new metrics being generated by Splunk on the input (indexing in many cases) and forwarding side to be useful, I am sure you would want to aggregate them all in a central location. Well, you can do that by using Splunk's forwarding mechanism itself! Although, it does not matter where you aggregate these metrics, I believe the Deployment Server instance could be a good location, if you have one setup for your installation.

        Forwarding metrics.log

        Forwarding metrics.log will require that you make the following changes to the configuration on each Splunk instance that you would like to collect the metrics from:

      • Edit or create inputs.conf in $SPLUNK_HOME/etc/system/local folder

        [monitor://$SPLUNK_HOME/var/log/splunk/metrics.log]

        _TCP_ROUTING = RouteMetricsToDeploymentServer

      • Similarly for outputs.conf

        [tcpout]
        disabled=false
        [tcpout:RouteMetricsToDeploymentServer]
        server=&lt;deployment_sever_ip&gt;:&lt;deployment_server_port&gt;

      • If you have many Splunks in your environment, then making these changes on each one of them manually is certainly not an option you would cherish. This is where Deployment Server can help you centralize all your configurations in one place and distribute them to all or selected instances.

        Here's something I like to do

        1. Have all Splunks point to a common Deployment Server

        This can be achieved very easily by creating/editing deployment.conf in $SPLUNK_HOME/etc/system/local on each Splunk instance.

        [deployment-client]
        deploymentServerUri=&lt;your_deployment_server_uri&gt;:&lt;mgmt_port&gt;

        For some of my distributed testing on EC2, I have images that include this configuration in the default image (AMI). Using this approach guarantees that configurations never ever have to be changed by hand!

        2. Create a bundle

        Create a bundle by any name (I called it deployable) and make sure it is available in your Deployment Server's serverClassPath. This bundle should have two files - inputs.conf and outputs.conf - as described above - here's a sample bundle you could re-use.

        3. Make the bundle available to all Splunks

        Make all deployment clients that connect to the deployment server to be part of the deployable service class. This is achieved by changing deployment.conf on Deployment Server again as:

        [distributedDeployment-classMaps]
        *=deployable

        4. Refresh Deployment Server Configuration

        This CLI on your Deployment Server instance will make it aware of the new configuration without a restart:

        splunk reload deploy-server -auth admin:changeme

        You are now all set and all Splunks in your environment will automagically download and apply the bundles within a minute! And in another 30 seconds, your Deployment Server will start aggregating metrics information about your entire data-center!

        We want to hear about your experiences in managing Splunk - use the Comments below or send me an email directly at inder@splunk.com.

        ]]>Inder Sabharwal: Forwarder and Indexer Metricshttp://blogs.splunk.com/inder/2008/05/15/forwarder-and-indexer-metrics/http://blogs.splunk.com/inder/2008/05/15/forwarder-and-indexer-metrics/Thu, 15 May 2008 16:36:09 +0000Inder SabharwalIf you were always wondering how much data was being transferred between your forwarders and indexers, we may have some help for you. Splunk now publishes these metrics to metrics.log, which are by default tailed and indexed in "_internal".

        Forwarding-side

        Splunk uses a component called TcpOutputProcessor, which is configured using outputs.conf, to forward data to another Splunk or non-Splunk entity. This is something that a lot of people also refers to as a forwarder. Each TcpOutputProcessor instance publishes metrics events every 30 seconds - all the fields of these events are described below:

        • group=tcpout_connections - this field discriminates this event as being a TcpOutput metric.
        • tcpout_group_name:destIp:destPort - the load-balanced group that this metric belongs to. If you have multiple groups defined, a separate event is published for each of those groups.
        • host metadata - is always available in an event, and refers to the host on which the forwarder is running.
        • sourcePort - the local port that is used to connect to the remote entity.
        • destIp - the ip address of the remote server to which events are being forwarded.
        • destPort - the destination port on which events are being forwarded.
        • tcp_bps - bytes per second averaged over last 30 seconds.
        • tcp_kbprocessed - total KBytes processed since this connection went live.
        • tcp_eps - events per second averaged over last 30 seconds.
        • tcp_dropped_events - number of events dropped on this connection.

        Indexing side

        Similarly on the indexing side, if you have configured inputs.conf to receive data from one or more forwarders, a metrics event is published every 30 seconds for each connection into your indexer. All the fields of a metrics event on the input side are described below:

        • group=tcpin_connections - this field discriminates this event as being an input metric.
        • sourceHost - The hostname of the entity that is forwarding data to this indexer. If hostname is not available, then it's IP address is used.
        • sourcePort - The remote port of the forwarding entity.
        • destPort - The local port on the input side for which this metric is being collected. Typically this port is defined in inputs.conf.
        • tcp_bps - bytes per second averages over last 30 seconds.
        • tcp_kprocessed - KBytes processed since the connection was established.
        • tcp_eps - Events per second averaged over 30 seconds.

        These metrics will now enable you to get unusual insight into the operation of your forwarders and indexers. Here's a sample query that you can run on each indexer instance to get a report on thruput by each forwarding entity:

        index=_internal metrics "group=tcpin_connections" | timechart span=30s avg(tcp_bps) by sourceHost

        Also, I created a saved search, and used Splunk's reporting features to always show me the current status on a dashboard.

        Index Thruput by Forwarder

        Now that you have all of this nice data, I am sure you would like it all aggregated in one location.

        Good luck playing with these metrics, and if you have any suggestions on what more you would like to see, drop me a line at inder@splunk.com.

        ]]>
        Matt Green: Did you know that your Active Directory is just a glorified LDAP?http://blogs.splunk.com/matt/2008/05/12/did-you-know-that-your-acitve-directory-is-just-a-glorified-ldap/http://blogs.splunk.com/matt/2008/05/12/did-you-know-that-your-acitve-directory-is-just-a-glorified-ldap/Tue, 13 May 2008 01:19:35 +0000Matt GreenMicrosoft Tube Surfers,

        Wanted to take a minute to talk about authenticating Splunk against Active Directory. In case you didn't know Active Directory is running on top of LDAP. While the guys up in Redmond do their best to make sure tha you have no need to know LDAP they give you the ability to interface with it over LDAP if you know what you're doing. Let's take this time to let you know what you need to do.

        If you are comfortable with the command line you can run the command ldifede. The ldifde command is the windows equivalent of ldapsearch and should allow you to get an ldif entry for yourself and a group. With those two entries we should be able to come up with authentication.conf that will allow Splunk to authenticate users.

        For those of you that are more comfortable with a GUI The Sysinternals team offers a nice utility called Active Directory Explorer. This gives you tree view of your Active Directory/LDAP structure.

        The information provided from these utilities is pretty much everything you need to know in order to follow along with the documentation. If you are still struggling to get it working send an email to support@splunk.com with the output from the ldifde command and your authentication.conf and someone from team will help square you away.

        ]]>
        Matt Green: Help Me Help Youhttp://blogs.splunk.com/matt/2008/04/30/help-me-help-you/http://blogs.splunk.com/matt/2008/04/30/help-me-help-you/Wed, 30 Apr 2008 23:08:25 +0000Matt GreenPeoples of the Interweb,

        As one of the Splunk Support Monkeys I am going to try to start a semi-regular series of posts on a topic that is near and dear to me - getting the Splunk community to be able to troubleshoot their issues without the need to reach out to the Support Team.

        The most important piece of any troubleshooting exercise is getting a solid understanding of the problem. The common statement "Shit is broke" while 'summarizing' the problem doesn't do much in the way of isolating the specific problem. Taking a minute or two to think about the problem at and documenting the sequence of events leading up to the problem goes a long way to getting outsiders up to speed on the issue.
        Here are few things to keep in mind when working with support:

        I don't work in the next cube over.

        This means I don't have insight into all of the other moving parts of your network. Try avoiding acronyms that are specific to your organization. I don't know the naming convention that you use for machine names, so if one box is in LA and the other is New York tell me, don't expect me to know that foo.company.com is sitting in the LA data center.

        Less is not more.

        You can never give a support engineer to much data. Often times folks think that they have identified the offending error message in the logs and provide that one line in their support ticket. The problem with this is that the support engineer does not get the benefit of context. Most errors are the result of a series of events leading up the final failure. Being able to see what was going on leading up to the problem often times is what allows us to identify cause. The basic rule of thumb is if you think it would be at all useful share. If I can channel Don Rumsfeld for moment: It easy to know what you know, it is hard to know what you don't know.

        Reduce the problem to the fewest number of variables possible.

        Remember your 7th grade Algebra class and those complex equations that Mr Buckner had you had solve? You started off solving for x and then you went back using your knowledge of x to determine the value of y. The same is true when troubleshooting software. When you try to solve 4 problems at once you end up polluting your results; you can't tell if the change you made for x resulted in y blowing up. By breaking the problem into smaller chunks you are operating in a more scientific manner and the results have more credibility.

        Log like there is no tomorrow.

        Debug logs are your friend. In normal operations the logs don't need to be verbose but when you are trying to figure something out why not give yourself the benefit of the secret messages that the developer put in the code for precisely this reason. It is also helpful to push the existing log file out of the way when starting in a debug mode. While I said early that you can never give a support engineer to much information the majority of the stuff in your logs (especially if you've been running for awhile) is going to be white nows. Starting in debug mode with a fresh log means that the problem and the only the problem are going to be in the log.

        ]]>
        Igor Stojanovski: WMI comes to Splunkhttp://blogs.splunk.com/igor/2008/04/29/wmi-comes-to-splunk/http://blogs.splunk.com/igor/2008/04/29/wmi-comes-to-splunk/Tue, 29 Apr 2008 18:33:42 +0000Igor StojanovskiThe Windows release of Splunk Preview debuts with WMI. So, what is WMI for all you splunkheads out there? It's an OS interface which allows "instrumented components to provide information and notification". WMI gives you the ability to query system instrumentation data such as system performance, event logs, end countless other events that occur on the system. It also has the capability of doing this agent-less from remote machines. The most exciting feature is the ability to do collection of Windows event logs from other machines on your network simultaneously. A Splunk install is not required on every single node that generates this data, and you don't need to do anything special to facilitate this. Assuming you've set up proper authentication between the machines, of course. Setting up proper WMI security is a hot topic on its own.

        From the standpoint of configuration and what WMI is capable of doing, in the context of Splunk, WMI can be used in two ways: to pull event logs and to query instrumentation data. Assuming that you have enough credentials to poll event logs agentlessly, you can simply specify host name and the log file you are interested in. This is an example of retrieving "Application" event logs from a remote machine named "remotehost":

        [WMI:RemoteApplication]
        namespace = \\remotehost\root\cimv2
        interval = 10
        event_log_file = Application
        disabled = 0

        The other aspect of WMI warrants more explanation. To get data from WMI providers, you query them using WQL (WMI query language), which is a subset of SQL. Simply specify a query, and all fields returned by the provider will be automatically collated as an event. (Some queries return multiple results, and hence generate multiple events.) An example query will be select FreeMegabytes from Win32_PerfFormattedData_PerfDisk_LogicalDisk, which will poll free disk space from all logical disk partitions on the system.

        This is an example config setup that gets runtime information for all running processes on a local machine every 30 seconds:

        [WMI:LocalAllProcesses]
        namespace = \\.\root\cimv2
        interval = 30
        wql = select * from Win32_PerfFormattedData_PerfProc_Process
        disabled = 0

        With this you can easily chart memory usage by process.

        WMI Memory Usage by Process Name

        The default install of the preview includes several preset performance queries. If you look at %SPLUNK_HOME%\etc\system\default\wmi.conf, you will find three default config stanzas. To see a list of what all is available for querying, google for "WMI classes" and browse the MSDN documentation. There is tons of stuff that you can splunk, including detailed memory usage, network utilization, disk usage, detailed process runtime information. Also, take a look at the WMI documentation.

        Happy Splunking with WMI.

        ]]>
        Ledio Ago: Splunk Windows Registry Monitorhttp://blogs.splunk.com/ledio/2008/04/28/splunk-windows-registry-monitor/http://blogs.splunk.com/ledio/2008/04/28/splunk-windows-registry-monitor/Mon, 28 Apr 2008 22:53:48 +0000Ledio AgoHey everyone, just wanted to let you know that a preview release of Splunk just left the docks.

        http://www.splunk.com/index.php/preview

        I want to introduce to you one the latest features for Windows Splunk - the monitoring of Windows registry in real time for activity/events, and the indexing and searching these events with Splunk.

        While working on this we had a few challenges:

        First, there aren't any published win32 APIs that does this in user mode. The best that you can do with win32 API is to poll the registry for certain registry key/hives, and you'll be notified when if the key or subkey of the hive has been changed. Even when you get a notification for a change, you will not be told which key exactly has changed, you'll have to figure that out yourself .

        Second, scalability. You can't possibly poll all of the registry in user mode for changes. There are simply too many keys to query.

        The solution is to write a device driver that hooks to the kernel and intercepts all registry events. The driver bubbles up the events to the user mode for filtering and tagging, and finally pipe them to Splunk for indexing. Obviously, this driver needs to be very stable and reliable, needs to scale to the point where if you want to monitor all of the events in the registry, and it should be able to handle the load.

        With this preview release we launched the first version of the splunk-regmon tool. The tool writes events to standard output, and using Splunk's ExecProcessor(popen). Splunk is able to get these events and send them through the indexing pipeline. A basic filtering is in place, hard coded for now to only monitor registry events related to changes - i.e. Create, Delete, Set, etc. Create type events are represented by "CreateKey" reg_event field, Delete by "DeleteKe" and all of the Set event eg: SetValueKey, are represented by SetKey reg_event field. In our next release this filtering will be configurable.

        Here is what a windows registry event looks like with Splunk:

        Registry Event

        Drop us a note and let us know what you think of this new feature and any concerns you may have, or ideas of how we can make it better.
        How would you use it and how it would be useful to you?

        ]]>
        Matt Green: On the off chance you need help with Windowshttp://blogs.splunk.com/matt/2008/04/24/on-the-off-chance-you-need-help-with-windows/http://blogs.splunk.com/matt/2008/04/24/on-the-off-chance-you-need-help-with-windows/Thu, 24 Apr 2008 20:47:38 +0000Matt GreenHello Internets,

        As one of the splunkers responsible for answering the phone I'm going to use this space to talk about something near and dear to my hart - empowering my customers so they are able to figure out their own problems thereby allowing me read FARK all day long.

        Since we recently released our Windows version a bunch of the folks in the office have been trying to figure out how they do the things they do in a UNIX enviornment (like wget a file) in Windows. I've been sharing some of my favorite Windows resources here at the office and figures the rest of you would probably like to know about them as well.

        Google
        Everyone seems to start here when they are looking for something. Most however don't know that http://www.google.com/microsoft will restirct your search to Windows sites. They also have these search sites for linux, bsd, and the mac.

        SysInternals
        Mark and Bryce have created the ultimate coolection of free Windows utilities. Simple executables that allow to get so many of the diagnostic/monitoring things that a UNIX admin takes for granted. Some of my favorites (and especially useful in working with Splunk) in no particular order:

        • AccessEnum
          Lets you see who has access to what. This is really helpful when trying to figure out why Splunk isn't indexing one of your files.
        • Process Monitor
          Watch the registry, running process/thread/DLL, and file system usage in real-time
        • PS Tools
          A bunch of command-line utilities for listing the processes running, working with the event log, rebooting the machine, etc.
        • Active Directory Explorer
          Advanced viewer/editor for Actiive Directory. This will be a godsend you are trying to configure Splunk to authenticate against your domain controller
        • WhoIS
          Doesn't do much in the way of troubleshooting Splunk, but who doesn't want to be able to see if ultramegaextrmeme.com is available and if not who the lucky owner is? BTW it is available.
        • TCPView for Windows
          Lets you see all the TCP and UDP endpoints on your system, including the local and remote addresses and state of TCP connections.

        Hope that helps you guys out. All of you experienced Windows folks if you've got others out that there post to the comments. If my jaw hits the desk when I click the link I will send you a Splunk koozie.

        ]]>
        Christina Noren: Tell us your Splunk story at Interophttp://blogs.splunk.com/cfrln/2008/04/16/tell-us-your-splunk-story-at-interop/http://blogs.splunk.com/cfrln/2008/04/16/tell-us-your-splunk-story-at-interop/Wed, 16 Apr 2008 15:45:51 +0000Christina NorenAre you planning on being at Interop in Vegas April 27-May 2? Do you use Splunk? If so, I'd love to hear from you.

        I'll be there with the Splunk video team and we'd love to record some new interviews with Splunk users. If you haven't seen some of the user interview videos we've already done, check them out. They're the best way to learn about how Splunk's getting applied in the real world.

        Some of my favorites: Demetri Mouratis, Rhythm New Media, using Splunk as an IT data platform across business and operations teams; Allen Hecker and Mark Bronniman, the senior security analyst and senior unix admin at Weill Cornell Medical College, and Trevis Edgworth of Epsilon Data Management, using Splunk for network security, compliance, insider threat and network operations.

        Just email me at cfrln@splunk.com and let me know when you're available. We'll make it fast, and there'll be a fine Splunk jacket on its way to you when you're done.

        ]]>
        Eric Garner: …a new Splunk song idea just popped into my head…http://blogs.splunk.com/maverick/2008/04/13/a-new-song-about-splunkmaybe-someday/http://blogs.splunk.com/maverick/2008/04/13/a-new-song-about-splunkmaybe-someday/Mon, 14 Apr 2008 01:17:05 +0000Eric Garner...actually a couple ideas for songs about Splunk have made their way into my geeky little brain since my last blog post. Yeah, yeah, I know what you're saying..."Hey Maverick, the world doesn't need another nerdy song about an IT Search Platform." My natural response is, you're probably right, but I can't help myself. I'm a nerd, a songwriter, I love Splunk: I have no choice!

        So where's the mp3, dude?!

        Truth is, I am just too damn busy these days to spend time on it. That is one of the reasons why I haven't posted a new blog entry since September of last year. Turns out the demand for Splunk has increased significantly since then, which means I am traveling more now, giving more Splunk demos and presentations, and assisting more companies with their Splunk evaluations than ever before. Don't get me wrong, I love writing songs, but nothing is more satisfying than traveling across Midwest America to show off a product as cool as Splunk.

        And when I say "travel", boy do I mean "TRAVEL"!

        Just to give you a sense of what my life has been like on the road as a Splunk SE, let me start by saying that my schedule is typically more crammed than a college student's brain just before a final exam. I'm telling you, I walk, fly, drive, take taxis, take trains, trolleys, buses, whatever I need to get to our customers. I've been so many times to so many places in the Midwest region, I am losing count: cities like Dallas, Chicago, Saint Louis, Houston, Kansas City, Austin, Omaha, San Antonio, etc.

        For example, here's a perspective shot I took while in Chicago waiting for the Blue Line.
        Waiting for the Blue Line in Chicago

        And in case you don't know, the train is the way to go in Chicago. Even if it takes you 45 minutes to get to your appointment, its still beats waiting in traffic. At least I can use my PDA to be productive in that time versus getting frustrated with drivers going too slow in front of me or cutting me off and basically keeping me guessing as to how they even managed to get a state drivers licenses issued in the first place.

        Here's another perspective shot I took while I was driving to Kansas City from Saint Louis.

        driving_to_kc3_09_12_07.jpg

        Actually, I make this drive quite often. It's takes less time to drive there than to fly via Chicago O'Hare on American Airlines. It's cheaper too. And, again, I can conduct a couple technical conference calls along the way. (BTW, that red van sure needed a car wash, huh?)

        Anyway, speaking of technical conference calls, I've been conducting so many more demos and technical discussions since last September, it's nuts. Some of them I do in-person, some of them via webinars, and sometimes I even do a combination of both. Most of those times, I find myself doing all this stuff from my rental car. Yeah, that's right!...as in, I have a true "mobile office" setup, complete with a wireless broadband USB card and a handy cigarette lighter electrical power inverter I picked up at Fry's that continually keeps my laptop and cell phone charged and running.

        Typically, it goes something like this: I'll be driving between customer appointments, right?. Then one of the Splunk sales reps calls me up and says they need me to help answer some technical questions for a potential customer and maybe do a demo as well. I explain I am on the road driving in the rental car to my next on-site meeting. They tend to ignore that last sentence and say, "Well? Can you just pull over somewhere and join the call and webinar session in a half hour?". Like a proper SE, I reply "Sure". And that's exactly what happens. I pull off the road, find a parking lot somewhere, flip open my macbook, access the webinar session, dial-in to the reservation-less bridge line, put my cell phone on speaker-phone mode, and away we go.

        I bet you didn't know us SEs did that sort of thing, did you?

        Thing is, we are all extremely dedicated worker-bees and, although being a Splunk SE is incredibly demanding, we don't mind, really. That's because we get to meet a lot of smart people at great companies and see a lot of cool things.

        Like these train cars, for example. Check'em out. These train cars are actually pimped out, fully functional business conference rooms at a railway company I visited who bought Splunk last year. Now is that cool or what?

        bnsf_railways_1.JPG

        ...and check out this guy. He sat in the back of the audience at a user group conference I presented at last year. When I was finished, he came up to me and showed me he was already a Splunk fan. What a nice surprise that was, indeed. (BTW, I blocked out is face to protect the innocent)

        kc_jug_splunk_fan_09_12_07.jpg

        Anyway, back to the topic at hand: Song ideas about Splunk.

        So I'm just now wondering if you have any good ideas for songs about Splunk or maybe IT in general? If so, please leave your idea(s) as a comment to this blog post. I appreciate any contribution you might care to make.

        BTW, I got this one idea for a song that sounds like it could be from a musical play. As in, a really really off-Broadway play. Yeah, I know that sounds silly, but think about this for a second. It could be kind of cool if it was done with the proper tongue-in-cheek, right? Can you imagine instead of "Sweeney Todd", you had "Sweeney Splunker"? Very dry and humorous, yet technical and nerdy? In my mind, I picture Johnny Depp wearing the dark thick glasses with tape on the front and everything, signing about how his IT issues are getting out of control and he desperately needs to find a solid troubleshooting tool fast!

        Okay, so I just read that last paragraph and I agree, it sounds kind of...nerd-ish. But, like I said already, I am a nerd to the core and I still think it could work as a song at least. So, I will probably write it and record it and post the resulting mp3 file for download in a future blog post like I do all my songs about Splunk.

        That is, of course, if I ever find the time.

        ]]>
        Erik Swan: Splunk for Virtualizationhttp://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/Thu, 27 Mar 2008 21:14:54 +0000Erik SwanI'm looking for some help.
        I've built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API's to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I'm curious are there any splunk customers out there using VMWare or Xen? I'm looking for usecases so that i better understand how to configure the apps. I'd be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I'm trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.

        Thanks
        e.

        ]]>
        Johnvey Hwang: The Splunk Python client library (part 1)http://blogs.splunk.com/johnvey/2008/03/26/the-splunk-python-client-library-part-1/http://blogs.splunk.com/johnvey/2008/03/26/the-splunk-python-client-library-part-1/Wed, 26 Mar 2008 22:40:23 +0000Johnvey Hwang Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.

        The easiest way to get started with the client library is to get into Splunk's Python environment. Locate your Splunk install directory (/opt/splunk by default), and start the python interactive shell that comes with Splunk:

        # bin/splunk cmd python

        This will launch the interactive Python prompt, which starts off looking like this:

        Python 2.5.1 (r251:54863, Nov 18 2007, 16:13:41)
        [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        &gt;&gt;&gt;

        Starting a search

        Import the Splunk modules:

        import splunk.auth
        import splunk.search as se

        If you have installed Splunk with the default settings, then your hostpath is https://localhost:8089. The client library knows this default, so you can authenticate directly by providing a username and password:

        key = splunk.auth.getSessionKey('admin','changeme')

        The getSessionKey method automatically caches the session key in the current interactive session, so you don't have to pass it along to subsequent methods. In a production implementation, or if you are connecting to multiple servers, you'll need to keep track of separate session keys.

        If your server is on a different hostname or port, then you need to first update the session defaults:

        splunk.mergeHostPath('splunk_hostname:12000', True)
        key = splunk.auth.getSessionKey('admin','changeme')

        The mergeHostPath method takes host information in many different forms:

        • hostname
        • hostname:port
        • https://hostname
        • http://hostname:port

        Next, start a search:

        job = se.dispatch('search error')

        This creates a search job handle object job and start a running search on the server for events that contain the term "error". If you are connecting to multiple servers, then you'll also need to provide hostPath and sessionKey parameters as well. This handle is keyed off of the search job ID that is generated by the server, and is available via:

        job.id

        With this ID, you can always use your web browser to check on the status of a particular job by opening up:

        https://localhost:8089/services/search/jobs/12345

        where 12345 is the ID that you just generated.

        There are a few properties on the SearchJob object that will be of immediate use:

        • job.isDone - a boolean value that indicates if the search has completed
        • job.count - the number of events that have been matched against the search
        • job.cursorTime - the current position of the search cursor; when dispatching a search, the cursor moves in a reverse chronological order

        Working with search results

        The raw events are the original event data that were indexed by Splunk, according to the data input rules. They are available as an interable container object:

        job.events

        This object works just like a list, and you can iterate and slice it to obtain events. The events are stored in reverse chronological order.

        for x in job.events:
        print x

        This code will iterate over every event returned in the search and print out the raw text, which could be every event in your index if you so choose. The iterator will begin returning data as soon as it receives the first event, and will continue until the isDone property is True.

        You can also retrieve specific rows of data using the standard python slice operator:

        job.events[2] # returns the 3rd event in the search results
        job.events[2:10] # returns events 3 through 10 as a list
        job.events[-1] # returns the last event in the results

        The items returned by iterating or slicing are actually Result objects that have additional properties:

        • job.events[0].raw - the raw event text (the same value as print job.events[0])
        • job.events[0].time - the event timestamp, as a datetime.datetime object
        • job.events[0].fields - a dict of all the fields associated with the event

        For example if you wanted to see the host field for an event:

        job.events[0].fields['host']

        Or if you wanted to see all of the host entries for each event:

        for x in job.events:
        print x.fields['host']

        Or alternatively, in shorthand:

        for x in job.events:
        print x['host']

        If you want to print out a human-readable timestamp for events that came from the 'firewall' sourcetype:

        for x in job.events:
        if x['sourcetype'] == 'firewall':
        print x.time.ctime()

        When you are finished with the search job, remove it from the server by calling:

        job.cancel()

        Otherwise, the job will persist on disk until the specified timeout (TTL), which is 24 hours by default.

        ]]>
        Christina Noren: P-Camp preso on automating product management with Jirahttp://blogs.splunk.com/cfrln/2008/03/17/p-camp-preso-on-automating-product-management-with-jira/http://blogs.splunk.com/cfrln/2008/03/17/p-camp-preso-on-automating-product-management-with-jira/Mon, 17 Mar 2008 16:23:17 +0000Christina Noren Here's the presentation that I gave this past Saturday at P-Camp, the unconference for product managers. If you've been following what we're doing here with automating product management using Jira, there's detail and screenshots in this presentation that might be interesting.


        ]]>
        Christina Noren: 6000 Harvard applicants’ personal data on Bittorrenthttp://blogs.splunk.com/cfrln/2008/03/13/6000-harvard-applicants-personal-data-on-bittorrent/http://blogs.splunk.com/cfrln/2008/03/13/6000-harvard-applicants-personal-data-on-bittorrent/Thu, 13 Mar 2008 23:42:23 +0000Christina NorenHarvard just learned security investigation 101 the hard way.

        Harvard admitted yesterday that a web server was hacked a month ago that contained financial application data for over 10,000 applicants. They knew about the incident on February 15 and took down the server till February 21 in order to investigate and implement stronger security controls. Their announcement reveals how slow and ineffective security investigations often are.

        "The University’s initial examination did not reveal the full extent of the hack. As the investigation continued, it became apparent that some sensitive applicant data, including Social Security numbers, could potentially have been accessed."

        Unfortunately, a day later, it was pretty obvious that over 6,000 applicants' data had been compromised - CNet reports that all their personal data was on Bittorrent.

        "Harvard officials said the data includes the applicant's name, Social Security number, date of birth, address, e-mail address, phone numbers, test scores, previous school attended, and school records."

        Ouch.

        It shouldn't have taken Harvard nearly a month to come up with an answer as weak as "could potentially have been accessed."

        Why couldn't they figure out for sure whether the data was accessed? Either they weren't logging file accesses, didn't have the logs, or the logs were too hard to analyze. Most likely a combination of all three.

        Maybe they could learn from Splunk customer Weill Cornell Medical College - here's a video of Mark Bronniman, the senior Unix administrator there, and Alan Hecker, their senior security engineer talking about using Splunk to accelerate security investigations. In fact, they implemented Splunk first to speed up an investigation that was in progress.

        ]]>
        Johnvey Hwang: Using the Atom Feed Format in Enterprise Softwarehttp://blogs.splunk.com/johnvey/2008/03/06/using-the-atom-feed-format-in-enterprise-software/http://blogs.splunk.com/johnvey/2008/03/06/using-the-atom-feed-format-in-enterprise-software/Thu, 06 Mar 2008 23:32:45 +0000Johnvey Hwang XML is a great format for exchanging information because it balances readability, extensibility, and compatibility across heterogeneous environments. However, its flexibility is also a disadvantage because it is far too easy to create a proprietary XML schema, resulting in lots of custom code to interface with various systems. Lots of custom code leads to brittleness, and brittleness leads to frustration. The key to salvation lies in standardization.

        Enter the Atom standard: a standards-track schema that defines a generic collection/item container format in XML. Most people equate Atom to an RSS competitor, which is true, but that only covers half of what it does. The Atom Publishing Protocol is a well-defined protocol for performing CRUD (Create, Read, Update, Delete) operations on items over HTTP. The Atom Syndication Format, which is the most commonly used portion, defines the XML schema used to deliver data during a Read operation. Atom was spearheaded by Sam Ruby, and is now back by people like Brad Fitzpatrick, Tim Bray, Jeremy Zawodny, Mark Pilgrim, and is heavily implemented by Google.

        Like most software systems, the majority of Splunk's internal entities can be loosely viewed as a collection of similar items. The requested searches, configuration information, saved searches, users, roles - all just collections. So instead of creating five separate XML schemas for each of these collections that perfectly describe their contents, I chose Atom to serve as a single generic container to describe all of the entities. This kind of reuse is echoed by Pat Helland of Amazon, who gives a great talk on relating the rise of the industrial age to standardization, and Tim Bray (Mr. XML himself), who advocates against creating your own XML unless absolutely necessary.

        The benefit of sticking to a standard is that there is a much greater chance that external developers already know exactly how to consume your data with very little work. Not only are language-level Atom parsers available everywhere, but entire applications have been specifically built to consume Atom. For instance, here's a screenshot of the NewsFire feed reader displaying all of the searches that exist on my local Splunk server:

        search jobs in a feed reader

        All I had to do was to supply a URI and login to NewsFire, and then it took care of the rest. No XSLT, XPath, or custom DOM iteration necessary; it just works. As far as I know, Splunk is one of a handful of enterprise companies that has integrated Atom at such a core level. Hopefully, for you it means that there is one less bucket of tag soup you have to deal with, and one better product that you enjoy using.

        ]]>
        Jason Gatt: Splunk Replay: Search results in motionhttp://blogs.splunk.com/jgatt/2008/03/06/splunk-replay-search-results-in-motion/http://blogs.splunk.com/jgatt/2008/03/06/splunk-replay-search-results-in-motion/Thu, 06 Mar 2008 20:48:12 +0000Jason GattglTail.rb and Digg Lab's Stack, Splunk Replay is an animated data visualization that "replays" search results as a simulated event stream. The simulation displays events at a rate proportional to the times at which the events originally occurred. Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart. Upon landing in the column chart, one of the event's fields is output in a readable format below the chart. Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones. Rolling your mouse over any column displays the field values for that column. ]]>Inspired by glTail.rb and Digg Lab's Stack, Splunk Replay is an animated data visualization that "replays" search results as a simulated event stream. The application displays events at a rate proportional to the times at which the events originally occurred.

        Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart. Upon landing in the column chart, one of the event's fields is output in a readable format below the chart. Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones. Rolling your mouse over any column displays the field values for that column.

        Replay currently consumes csv files and is configurable through an xml file. The current demo charts twikipage edits split by twikiuser (both sorted alphabetically) and outputs truncated raw events below the chart. The simulated event stream is running at a rate 2000 times real time.

        I'm currently working on getting Replay hooked directly to Splunk's API and building out interface elements so that it can be configured visually.

        You can check out the wiki page on Replay over at Splunk's developers wiki.

        ]]>
        Johnvey Hwang: Exploring Splunk’s REST APIhttp://blogs.splunk.com/johnvey/2008/03/03/exploring-splunks-rest-api/http://blogs.splunk.com/johnvey/2008/03/03/exploring-splunks-rest-api/Mon, 03 Mar 2008 20:15:45 +0000Johnvey Hwang Splunk 3.2 is available for download! This release is one of our biggest so far, representing a tremendous amount of effort by our engineering team, and is a product that I'm proud to stand behind. As I mentioned in my last post about our push for the Splunk Platform, a central tenet is to make a compelling product that developers will not only understand, but also enjoy using. While Dr. LogLogic rambles on about how catering to developers sucks, we know that developers are a huge part of our user base (drop by the #splunk channel on EFNet sometime) and we will continue to make Splunk as flexible and extensible as possible.

        With 3.2, we have begun moving some of Splunk's core services over to a proper REST API. Now, for those of you who have already been using the REST API in 3.1, the new API in 3.2 and beyond is distinctly different, and is intended to replace any older versions. Therefore, the REST API of version 3.1 and before will now be referred to as the UI API, and the term "REST API" will refer to the new API that I'm covering in this post.

        Before I dive into the details though, I'd like to clarify the usage of "REST" and what I mean when I speak of it. First of all, REST is not a protocol or standard. There is no RFC, or ISO specification on what constitutes REST; it is a philosophy about the relationship between entities in a software system and the interface to interact with those entities. Roy Fielding's original thesis named it Representational State Transfer, which when put into practice means that URIs should convey meaning in a durable manner. In essence, REST emphasizes the "what" of a system rather than the "how". In comparison, SOAP interfaces are based on codified standards that dictate the communication protocol. Lots more information on REST can be found on Wikipedia or in book form as RESTful Web Services by Leonard Richardson.

        The New Search API

        Splunk's new search interface allows for multiple searches to be scheduled concurrently, and for the results to be retrieved asynchronously. Assuming that you've installed Splunk using the default settings, you can see all of your search jobs by pointing your browser to:

        https://localhost:8089/services/search/jobs

        This returns an Atom feed of all the search jobs present in the server. Each job has an ID, and so the URI for the Atom entry of a search job of ID=1234 can be found at:

        https://localhost:8089/services/search/jobs/1234

        Following that RESTian schema, each facet of a search job can be found as a sub-endpoint as well. For instance, the events, results, timeline, and summary data for each search can be found at:

        https://localhost:8089/services/search/jobs/1234/events
        https://localhost:8089/services/search/jobs/1234/results
        https://localhost:8089/services/search/jobs/1234/timeline
        https://localhost:8089/services/search/jobs/1234/summary

        Each of those endpoints returns data in XML format by default, but can be switched over to JSON or raw text format.

        The key to implementing a successful REST API lies in using the HTTP protocol to its fullest potential. Instead of adding a new search via something like /search/add_search, we simply POST to the parent /services/search/jobs endpoint. Instead of adding an extra /search/delete_search endpoint to delete a job, you issue an HTTP DELETE command directly on the /services/search/jobs/1234 endpoint. By treating each endpoint as a direct entity mapping, we simplify comprehension and dramatically reduce the total number of discrete endpoints.

        The configuration API

        The same model applies to our configuration system as well. Splunk stores its configuration in conf-style text files, using traditional stanza-separate key/value pairs. For example, the server.conf looks like:

        [httpServer]
        atomFeedStylesheet = /static/atom.xsl
        max-age = 3600
        follow-symlinks = false

        To access this file from the API, you would first browse to:

        https://localhost:8089/services/properties/server

        This endpoint returns an Atom feed of all of the stanzas contained in the file. To view all of the key/value pairs in the [httpServer] stanza, browse to:

        https://localhost:8089/services/properties/server/httpServer

        To read a single key value like max-age, browse to:

        https://localhost:8089/services/properties/server/httpServer/max-age

        To change that value, issue an HTTP PUT to the same endpoint. To add a new key, issue a POST to the stanza-level endpoint, or issue a PUT directly onto the new key name.

        The advantages of exposing everything via HTTP are obvious when it comes to integration and remote management. Every modern programming environment speaks HTTP, which means you can programmatically interact with Splunk from wherever you want. Everyone also uses a web browser, which means that probing the API is as easy as browsing the web.

        Even with a simple API, there's no reason for developers to recreate a language-specific library to access Splunk so we're working on releasing a few downloadable libraries for use in Python, .NET, Java, and Perl. Check the Splunk Labs page for more information about those.

        ]]>
        Ledion Bitincka: Delimiter base KV extraction - advancedhttp://blogs.splunk.com/lbitincka/2008/02/22/delimiter-base-kv-extraction-advanced/http://blogs.splunk.com/lbitincka/2008/02/22/delimiter-base-kv-extraction-advanced/Fri, 22 Feb 2008 23:30:06 +0000Ledion BitinckaIf you've read my previous post on delimiter based KV extraction, you might be wandering whether you could do more with it (Anonymous Coward did). Well, yes you can, I am going to cover the "advanced" cases here. Before covering the capabilities, as in other posts, I would first go over some observations and examples.

        Observations
        1. Header-body. Some applications, for different reasons, choose to format their log files using a header and a body section. The header usually describes the way the fields are organized in each logged event, while the body consists of logged events, usually one per line, with field values delimited as described in the header. W3C, CSV etc come to mind, see examples
        2. Single-delimiter. Other applications choose to use a single delimiter to delimit keys from values and values from keys, while this is not very common it's been observed in the field.

        Data Examples
        The following header-body sample, as you can probably guess, is from an exchange server. There is a header section which among other things has the list of field names, delimited from each other using the delimiter used to delimit values in the body section, in this case a tab character is used (even though our blogging platform chooses to mangle tabs to spaces - gotta love it !!!).

        # Message Tracking Log File
        # Exchange System Attendant Version 6.5.7638.1
        # Fields: time client-ip cs-method sc-status
        14:13:11 10.1.1.9 HELO 250
        14:13:13 10.1.1.9 MAIL 250
        14:13:19 10.1.1.9 RCPT 250
        14:13:29 10.1.1.9 DATA 250
        14:13:31 10.1.1.9 QUIT 240

        The following example shows how a single-delimiter can be used to list fields, it is pretty easy for us, as humans, to recognize the key value pairs:

        "url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"

        Enabling header-body kv/extract
        The delimiter based KV extraction solves the header-body problem by adding the capability to assign field names to extracted values by doing single-level tokenization/splitting (ie single delimiter) instead of the normal two-layered one described earlier. Unfortunately, however, this is only available through transforms.conf* and it requires manual specification of the field names (no automatic field name detection). To this end, we introduce another transforms.conf configuration variable, defined as follows:

        FIELDS = &lt;quoted string comma/space separated list>
        - List of names to associate with each extracted field value. The first entry is associated with the first
        field value, the second with the second value and so on...

        Example from above data:
        FIELDS= "time", "client-ip", "cs-method", "sc-status"

        Thus to enable header-body KV extraction one needs to specify one delimiter and a list of fields to attach to each extracted value. Let's walk through the MS Exchange sample data: (1) we know the field delimiter is the tab character and (2) the field list, in their correct order, is in the header of the file all we have to do is quote the field names. The configuration stanza in transforms.conf should thus look like this:

        ....transforms.conf....
        [exchange]
        DELIMS = "\t"
        FIELDS = "time", "client-ip", "cs-method", "sc-status"

        To apply this transformation you can then run ".... | extract exchange reload=t auto=f| ....", there's no need to restart the server after editing the transfroms.conf as long as "reload=t" is specified in extract (btw auto=f turns off automatic KV extraction)

        The results of this transformation ,on one of the events, would then be:

        "14:13:11 10.1.1.9 HELO 250"

        time=14:13:11
        client_ip=10.1.1.9
        cs_method=HELO
        cs_status=250

        Easy huh!? Try it in your data, we'd love to hear back ......

        *The reason why this is only available through the configuration is that amount of configuration information needed.

        Enabling single-delimiter kv/extract
        There's yet another trick in the delimiter KV extraction - the single-delimiter extraction. Single delimiter extraction pairs extracted field values into key=value as follows: value1=value2, value3=value4 and so on... To enable this extraction via the command line set kvdelim and pairdelim to the same value, for the above example data the extract command should look as follows:

        .... | extract kvdelim=" " pairdelim=" " auto=f | ....

        To enable single-delimiter extraction via transforms.conf you can either specify one delimiter or two identical delimiters in the DELIMS config variable, thus the following two transforms.conf stanzas are equivalent to each other and to the above command:

        ....transforms.conf....
        [single-delim-1]
        DELIMS = " "

        [single-delim-2]
        DELIMS = " ", " "

        The results of these extractions for our sample data would be:

        "url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"

        url=http://splunk.com
        referer=http://dev.splunk.com
        ip=10.10.10.10

        NOTE: do not specify a FIELDS variable for the single-delimiter extraction because that will enable header-body extraction.

        Thoughts, ?, ideas, comments are always welcomed....

        ]]>
        Ledion Bitincka: Delimiter based key-value pair extractionhttp://blogs.splunk.com/lbitincka/2008/02/12/delimiter-based-key-value-pair-extraction/http://blogs.splunk.com/lbitincka/2008/02/12/delimiter-based-key-value-pair-extraction/Tue, 12 Feb 2008 20:26:28 +0000Ledion BitinckaAs described in my previous post, key-value pair extraction (or more generally structure extraction) is a crucial first step to further data analysis. While automatic extraction is highly desirable, we believe empowering our users with tools to apply their domain knowledge is equally important. To this end, this post introduces one of the simplest forms of key-value pair extractions (KV-extraction) - delimiter based extraction.

        Observation

        Most logged events usually contain a list of key-value pairs (e.g. attribute list, method call values etc) in a context-dependent well-defined format. An example of well-defined format: " key-value pairs are separated from each other using ';' while the key is separated from the value using '=' ". More generally, well defined attribute listing formats are not confined to logging, they're part of every event-driven, flexible attribute order, application: e.g. URL get parameter list, HTTP request/response headers, email headers etc... In most application the delimiters are single characters which are least likely to be part of the key or value, whenever the key/value contains any of the delimiters it is normally enclosed in literal-defining characters usually double-quotes (").

        Definition: delimiter based KV extraction
        Let's first define three character classes:
        1. [pairdelim] - non-empty list of characters used to separate key value pairs from each other. (chars after value, before next key)
        2. [kvdelim] - non-empty list of characters used to separate the key from the value. (chars after key, before next value)
        3. [quoter] - list of characters used to enclose a literal - currently *only* quotes are supported and this variable is not configurable

        Thus we can formally define a key-value pair list as follows:

        kvlist = &lt;key>[kvdelim]&lt;value>([pairdelim]&lt;key>[kvdelim]&lt;value>)*
        key = &lt;string>|&lt;quoter>&lt;string>&lt;quoter>
        value = &lt;string>|&lt;quoter>&lt;string>&lt;quoter>
        quoter = "

        Thus, delimiter KV-extraction can be achieved by a two layer tokenization/splitting process:
        1. Split on the pair delimiter to extract candidate KV pairs
        2. Split on key-value delimiter to separate key from value

        Examples:
        1. URL - the following is an example of finding what are the delimiters for parameters listed in the query part of the URL. Note, since the query starts after the '?' character - the '?' qualifies as a key-value pair delimiter since it is before the first key.

        data = "http://usasearch.gov/search?input-form=firstgov&#038;v%3Aproject=firstgov&#038;query=splunk+it&#038;affiliate=uspto&#038;x=0&#038;y=0"
        -------------
        pairdelim is "?&#038;" - # parameters in the query are separated by '&#038;', the query starts after '?'
        kvdelim is "=" - # variable names are separated from their values using '='

        2. HTTP response header:

        data = "GET / HTTP/1.1
        Host: dev.splunk.com
        Connection: close
        User-Agent: Web-sniffer/1.0.25 (+http://web-sniffer.net/)
        Accept-Encoding: gzip
        Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
        Accept-Language: en-us,en;q=0.5
        Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
        Referer: http://web-sniffer.net/"
        -------------
        pairdelim is "\r\n"
        kvdelim is ":"

        That was easy, why don't you try it on the following data?
        Note: data_hard is all one line however our blogging software sucks at displaying long lines

        data_easy = "May 4 14:47:28 gwrk1 sshd(pam_unix)[4572]: 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=test.abc.net"
        data_hard = "loc=544078|action=encrypt|i/f_dir=inbound|i/f_name=eth4c0|__policy_id_tag=product=VPN-1 &#038; FireWall-1[db_tag={xxxx-6123-40CA-XXXX-9620355xxxxx};date=1190818929;policy_name=NYC-BellSouth NY]|src=10.100.0.50|dst=10.104.0.21|proto=icmp|rule=9|scheme:=IKE"

        Delimiter based KV extraction as part of kv/extract command
        OK, great! Now that you know what delimiter based KV extraction is and how to find the list of characters that are used as pair delimiters (pairdelim) and key-value delimiters (kvdelim), let's look at how to instruct splunk to perform this type of KV extraction. Well, all you need to do, is add the delimiters as arguments to kv/extract, as follows:

        ..... | kv pairdelim="?&#038;" kvdelim="=" | .....
        or
        ..... | extract pairdelim="?&#038;" kvdelim="=" | .....

        Configuration for automated delimiter based KV-extraction
        transforms.conf is the key-value extraction configuration file. Delimiter-based KV extraction adds another configuration variable to the transforms.conf vocabulary called DELIMS - yes you guessed right this is where we'll specify the pairdelim and the kvdelim. The format of DELIMS is as follows:


        DELIMS = &lt;pairdelim>, &lt;kvdelim>

        Example:

        in ...bundles/local/transforms.conf
        .....
        # this is equivalent to ..|kv pairdelim="?&#038;" kvdelim="=" |...
        [my_extraction]
        DELIMS = "?&#038;", "="
        .....

        You can then use the newly created transform just like any other transform. To remind the forgetful, you can do:
        1. ..... | kv my_extraction |....
        2. Automatically run"my_extraction" based on source/sourcetype/host with REPORT-* config variable in props.conf

        ]]>
        Rob Das: The SSL Performance Odysseyhttp://blogs.splunk.com/rob/2008/02/04/the-ssl-performance-odyssey/http://blogs.splunk.com/rob/2008/02/04/the-ssl-performance-odyssey/Mon, 04 Feb 2008 20:14:37 +0000Rob DasWhen you come to dev.splunk.com, you see pictures of beer pong, full bars, stuffed ponies with fart machines taped to their ass, etc - basically engineers gone wild. Somewhere between all of this insaneness, we actually find the time to write code and solve problems like this one.This post is all about a crazy-weird performance issue that we were experiencing, how it manifested itself and ultimately how it was fixed.

        I suspect others may be having this problem, as the problem lives in some very popular open source code as far as I can tell. With that, I'll begin telling you about my journey into hell.

        Splunk has a home grown embedded HTTP(S) server that serves up all external interfaces to the 'splunkd' daemon. We use it as the core engine for our REST and XML/RPC-like API's. The GUI and the CLI both end up talking to the daemon via this server.

        When I wrote the core of it a few months ago, I ran some rudimentary performance tests on several platforms and it seemed decent enough for our use, but a week ago, the manager of the Search and Indexing team (Stephen) said that he was seeing abysmal performance using SSL. He said that the GUI performance was being impacted. I didn't believe him and insisted that it was something else and that he was high.

        So to prove to him that it wasn't my server, or my problem like all engineers do, I gave him a small python script that hits the server in a tight loop and we checked the performance. It sucked. Continuing with the theme of "this isn't my problem" - I told him it was probably the handler of the request that was doing something that made the server seem slow. This is when he laughed at me and said "watch this": He proceeds to turn off SSL, re-run the same test and the performance of the server goes up by approximately 50X. 50 times faster! I know that SSL is slower than non-encrypted streams, but there was no way this was the problem. Whoa! We can't ship this way. This needs to be fixed!

        In fact, a very small HTTP request (approx. 80 byte) with a small reply (approx. 300 bytes) was operating at only 23 requests/sec! When he turned off SSL, he was getting over 1000 req/sec! What???

        So, of course I tried the same test on my OSX laptop and I got 130+ req/sec - within the realm of reasonable and certainly better than 24. I then tried running the server on my laptop and the client on my Linux Fedora machine resulting in basically the same performance. Why does this work on my hardware and not his?

        Finally, I switched the server and client by putting the server on my Linux box and the client on my Mac. I re-ran the test and damned if the performance didn't completely suck! I was getting 20 or so request-replies per second over SSL.

        But, why does the OS matter? I didn't get it.

        My SSL Performance Bug Diary

        • Broke out ssldump. Here is a snippet from an OSX client and a Linux server. Note the third C-&gt;S line of .0398 seconds. This is the cause of the slowdown, but why?

        SSL Dump Slow

        • Spent 2 hours looking over every possible OpenSSL build option and try turning various ones on and off. No difference. (score: Bug 1, Rob 0)
        • Spend many hours trying different crypto combinations. Little difference beyond the obvious and documented performance differences. (score Bug 2, Rob 0)
        • Perhaps I need to throw in server-side SSL caching. I throw it in, with the assumption that the python client implements client-side SSL caching. No performance change. (score: Bug 3, Rob 0)
        • Thinking it might be the Nagle algorithm, I modify my test to send larger requests and guess what? The performance is normal again! I try to find out exactly when it turns from slow to fast (as far as the request size) by trying request sizes of 1, 2, 4, 8, 16, 32........16K bytes. Wow, just around 1300-1400 bytes is where the performance goes from sucks to fast. Look at the graph below. See the spike? Hmmm..... (score: Bug 3, Rob 1)

        mtuspike1.jpg

        • I change the MTU on the server from the default of 1500 bytes to 1000 bytes. The performance cliff now is lowered to somewhere in the 800-900 byte range. The MTU is the key! (score: Bug 3, Rob 2)
        • It's got to be the Nagle algorithm. I try turning off the Nagle algorithm on the server. No performance change. (score: Bug 5, Rob 2)
        • I give the problem to our performance engineer. He can reproduce it. I suck.
        • Decide to try ssldump again and this time try a different test - curl sending the same size request as in the python test. I want to compare timings. BINGO. It's not the server, it's a combination of the server running on Linux and Python. (score: Bug 5, Rob 3). Notice in the following curl ssldump image, the single C-&gt;S line and the fast .0007 second timing. Contrast this to the previous ssldump image and here enlies the problem :

        ssldump curl

        • Now to fix it. It really really seems like Python is the problem. I try it with urllib2. Same thing.
        • I try it with httplib2. Same thing.
        • I look at the code for urllib2 and httplib2 and guess what? They both use httplib. The problem must be in httplib. I dig into the code and start commenting shit out and looking at the resulting ssldump output to figure out *exactly* which write is causing the damage. I find the bug. (score: Rob wins)

        The Problem and the Fix

        I forgot to tell you that we are using Python 2.5. It turns out that httplib.py sends requests over the wire in 2 chunks. The first chunk is comprised of the HTTP headers. The second chunk is the body. The fix I made appends the body to the headers and sends the request in 1 chunk only. This is what curl does and this fixes the performance problems.

        Here is the fix for download:

        httplib.py

        Here is my final data:

        fullgraph.jpg

        Things I Still don't Understand

        Because it seems to work and this took so damn long, I am not going to do any further investigations, but there are still many unsolved mysteries. Perhaps one of you can figure them out.

        • Why the extreme falloff on linux where both the client and server are on the same machine at 16K request/reply size?
        • Why is OSX so much slower than linux?
        • Why does the new code speed up linux only?
        • Notice that only the OSX box gets the speed up at the MTU, the Linux box continues the slow performance regardless of the MTU

        Windows to Linux Performance Numbers (added 2/5/08)

        So I added a Windows to Linux graph based on the first comment I received below. Yes, we do test with Windows, and yes, it is not out yet (but will be soon). The problem manifests itself exactly like it does on other platforms. Notice the difference:

        windows-linux.jpg

        Specs on the Test Hardware

        • Windows
          • Dual Core, very fast, lots of Ram (will provide detailed specs in a bit)
        • Linux:
          • 2.6.11-1.1369_FC4smp
          • 3.4Ghz P4, Hyperthreaded, 2G Ram
        • OSX
          • Mac Pro Laptop
          • 1.8Ghz Pentium Core II duo (2 cores), 3G Ram

        ]]>
        Johnvey Hwang: Standing on Our Own Platformhttp://blogs.splunk.com/johnvey/2008/01/31/standing-on-our-own-platform/http://blogs.splunk.com/johnvey/2008/01/31/standing-on-our-own-platform/Fri, 01 Feb 2008 01:02:34 +0000Johnvey HwangSplunk is on track to become a billion-dollar company and you, the intrepid sysadmin/developer, are going to help us get there. Now, this is not a statement that I'm making as an analyst who "covers" the enterprise software market, and compiles a list of "top software companies to watch". I'm writing this as Splunk's Platform Architect, a techie whose goals are to ensure that what comes out of our development group is compelling and exciting to those that are actually working with the product.

        It is this developer-centric ethos that sets us apart from so many of the other enterprise software firms and has already paid dividends on community goodwill. Instead of making prospective buyers jump through registration hoops just to view a guided webcast tour, Splunk provides fully functional software downloads to try out on your own data, inside your own network, free from webinar smoke and mirrors.

        We don't just want you to try out the software, we want you to try doing things that aren't covered in our brochureware, things that sound ludicrous at first but are doable. In fact, in a perverse way, we hope that you do break our product because it reveals new limitations for us to solve, ultimately leading to a product that lets you do your job the way you want, yet easier and faster.

        This is where the Splunk Platform comes into play. We want to increase the ubiquity of Splunk by, 1) exposing major components of Splunk as individual services, and, 2) allowing external developers to build on top of Splunk and leverage our award-winning IT search infrastructure. Starting with version 3.2 (you can download the preview version today), there is a new REST API that provides unprecedented access and consistency to every aspect of the Splunk Server. We are leveraging open standards like the Atom Protocol and OpenID to let enterprise developers create mashups with the same ease as those in the "web2.0&#8243; world. For programmers who want to integrate Splunk functionality into existing applications, you can look forward to Python and .NET SDKs in the near future, with Java and Perl not too far behind.

        Amazon's Web Services, Facebook's F8, and Twitter's API have all proven that standardized platforms breed diverse applications, on scales that are much bigger than a single company can produce. That's the kind of ecosystem we want to cultivate. My next posts will begin exploring the new REST endpoints that have been added to Splunk, and provide tutorials on how to use those endpoints to interact with Splunk programmatically.

        ]]>
        Eric Woo: Your most important IT data: funny quoteshttp://blogs.splunk.com/ewoo/2008/01/30/your-most-important-it-data-funny-quotes/http://blogs.splunk.com/ewoo/2008/01/30/your-most-important-it-data-funny-quotes/Wed, 30 Jan 2008 22:14:19 +0000Eric Woobash.org is a natural dataset for splunking. It's a huge blob of loosely structured text data, and it's made of win.

        To play with a live instance, go to bash.splunklabs.com, login: guest, password: guest.

        Of course, Splunk duplicates the functionality of the site itself. We can find, for example, the top 100 IRC quotes:

        Splunk lets us do considerably more, though. What are the top one-liners?

        How many more quotes mention "girlfriend" than "boyfriend", i.e. exactly how bad is this sausage party?

        Are there any commonly quoted individuals?

        Are there any interesting trends in quote scores over time? Take a look at high quote scores vs. quote ID:

        It seems likely that older quotes, especially good ones, benefit from a disproportionately greater number of views (the rich getting richer, so to speak); this might explain why the peaks in the low-quote-ID ranges are higher than the peaks for more recent quotes. Or maybe the internet just doesn't produce the same quality of LOLs that it once did.

        To try this yourself, add the following to props.conf:

        [sourcetype::bash]
        BREAK_ONLY_BEFORE = (#[0-9]* \+)|([0-9]+-[0-9]+-[0-9]+-[0-9]+-[0-9]+-[0-9]+)
        REPORT-bash = bash

        and the following to transforms.conf:

        [bash]
        REGEX = #([0-9]+) \+\((-?[0-9]+)\)- \[X\]
        FORMAT = $0 bash_quote_id::$1 bash_quote_score::$2

        Then, get a static copy of bash.org. You can grab the one I've created here, or you can generate it yourself:

        $ curl -o '#1.html' 'http://bash.org/?browse&amp;p=[001-409]'
        $ for cur in * ; do lynx -dump -nonumbers ./$cur >> /tmp/bash.txt ; done

        Finally, push the data into Splunk:

        $ splunk add tail -source /tmp/bash.txt -sourcetype bash

        ]]>
        David Carasso: O’Rly?http://blogs.splunk.com/david/2008/01/21/orly/http://blogs.splunk.com/david/2008/01/21/orly/Tue, 22 Jan 2008 03:00:27 +0000David CarassoBelow are a few easter egg features found inside Splunk.

        • From the commandline: "splunk ftw" produces an ascii-art "O'Rly?".
        • From the commandline: the "outputrawr" produces ascii-art fireworks.
        • From the searchbox, piping results to the "marklar" processor (e.g. "*|marklar"), converts all search results into the Marklarian language.
        • From the searchbox, piping result to the "loglady" processor (e.g., "*|loglady"), converts all the search results into quotes from Twin Peaks's LogLady.

        Enjoy them while they last, before they are removed by the Silliness Police, who%$($%%$
        ^H^H^H^NO CARRIER

        ]]>
        Christina Noren: Product management nirvanahttp://blogs.splunk.com/cfrln/2008/01/20/product-management-nirvana/http://blogs.splunk.com/cfrln/2008/01/20/product-management-nirvana/Sun, 20 Jan 2008 23:38:21 +0000Christina NorenA few months ago I wrote about our effort to automate and open up product planning by implementing a process around distilling product inputs into requirements using Jira in support of an agile/scrum based development model. I've rarely had so much response to a post... dozens of product managers at companies large and small wrote me and commented about their own efforts along the same lines. Many asked for our specs on our Jira customizations.

        We were at the beginning of this effort when I wrote that post. In the intervening 3+ months we've completed the first round of Jira customizations (thanks to lots of help from Dave Pickering and the team at New Aspects of Software, a fantastic consulting firm specializing in Jira - these guys do what they say they'll do, when they say they'll do it, for the amount of money they said they'd charge.) My tireless PM teammates have been embracing the new system and putting in the late nights to coalesce all of the feedback into common problem statements and requirements.

        The work all came together for us this past week as we head into the next round of product planning and are reforming our scrums and confirming business priorities - we had a hefty but complete "PRD" that was automatically generated from Jira and represented a comprehensive view of product requirements and concepts. The PM team took about 4 hours to walk through it, confirm our initial priority cut, and then we had an incredibly productive series of sessions with the full business and product leadership to decide which problems to tackle next, how to reform the scrum teams, and what priorities to give each scrum to start with. This was the most ordered and efficient product priority setting exercise I've ever been through, because we were dealing with the complete picture.

        The PRD report was a custom report built for us by New Aspects. It lets you filter our custom "problem statement" issue type by priority, text matches and other fields; then it prints all problems in reverse priority order. An example of a problem statement would be something like "Splunk doesn't support the fibberziggy filesystem", with all ERs asking for fibberziggy filesystem support linked to that problem statement. Each problem shows other linked issues including:

        • Inputs: Enhancement requests, Market data points, Call reports (each with details like customer name, deal value, etc.)
        • Requirements - with details of status, so we could understand where we were on problems that were already partially addressed by past development
        • Features
        • Child problem statements (they cascade)

        Now the scrum teams are off to do their individual sprint planning and requirements development, with all of that to be captured in Jira as they go. The common "PRD" will stay up to date as they work independently, and best of all, our SEs can see all the way from their individual customer enhancement requests through to up-to-the-minute status of requirements definition and completion. Hardly the case with the old document based PRDs PMs used to create.

        Next up we're going to tackle automating cascading updates based on requirements status updates. For example, if QA validates a completed requirement that Splunk lock test succeeds on the fibberziggy filesystem, we want Jira's workflow to check that this was the last requirement for the problem "Splunk doesn't run on fibberziggy filesystems", close that problem, then check to see if that problem was the last problem for each linked enhancement request, and close those enhancement requests and update our Sugar CRM system via our email integration. We even want interim updates, such as flowing back when we've fully scoped requirements for a problem.

        We're also talking to New Aspects about packaging up all our custom Jira reports, workflows and security schemes and giving it to the community, so look here for a post when it's ready for download.

        ]]>
        Ledion Bitincka: Key-value pair extraction definition, examples and solutions….http://blogs.splunk.com/lbitincka/2008/01/18/key-value-pair-extraction-definition-examples-and-solutions/http://blogs.splunk.com/lbitincka/2008/01/18/key-value-pair-extraction-definition-examples-and-solutions/Fri, 18 Jan 2008 17:07:48 +0000Ledion BitinckaMost of the time logs contain data which, by humans, can be easily recognized as either completely or semi-structured information. Being able to extract structure in log data is a necessary first step to further, more interesting, analysis. While it would be great to be able to automatically extract the structure from all log data, splunk cannot rival the brain's performance at this time, however it is able to tap into your brain for help :) Read on ......

        Problem definition:
        Extract structured information (in the form of key/field=value form) from un/semi-structured log data.
        Note: for the purpose of this post key or field are used interchangeably to denote a variable name.

        Problem examples:
        Splunk debug message (humans: easy, machine: easy)

        12-03-2007 13:51:55.114 DEBUG SearchPipelinePerformance - processor=save queryid=_1196718714_619358 executetime=0.014secs
        ideal structured information to extract:
        processor=save
        queryid=_1196718714_619358
        executetime=0.014secs

        Splunk tries to make it easy for itself to parse it's own log files (in most cases)

        Output of the ping command (humans: easy, machine: medium)

        64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=2.522 ms
        ideal structured information to extract:
        bytes=64
        from=192.168.1.1
        icmp_seq=0
        ttl=64
        time=2.522 ms

        An interesting pattern to note here is that there is no consistent field-value delimiter, nor field-value order. In the "from" field the authors have chosen to use a space as a delimiter, while for "icmp_seq", "ttl" and "time" they've chosen the equal sign. For the "bytes" field they've chosen to place it after the value (yes, they might have also intended for it to mean bytes - the data unit) while for the rest they've chosen field-name followed by field-value. Admittedly, some might think the current format is prettier than the following consistent log line which could easily be parsed by machines. (Who thought log files were optimized for prettiness !?)

        bytes=64, from=192.168.1.1, icmp_seq=0, ttl=64, time=2.522 ms

        NetScreen log (humans: medium, machine: hard)

        %MD% %DD% 13:41:25 45.2.0.1 NOC-FWa: NetScreen device_id=NOC-FWa [Root]system-notification-00257(traffic): start_time="2006-05-11 13:40:23&#8243; duration=62 policy_id=41 service=Network Time proto=17 src zone=noc-mgt dst zone=noc-svcs ......
        ideal structured information to extract:
        device_id=NOC-FWa
        start_time=2006-05-11 13:40:23
        duration=62
        policy_id=41
        service=Network Time
        proto=17
        src zone=noc-mgt
        dst zone=noc-svcs

        This part of the NetScreen log line ...service=Network Time proto=17 src zone=noc-mgt dst zone=noc... is a salient example of the ambiguity that sometimes exists in log data. What is the correct value of service ? "Network" or "Network Time"? What about the name of the next field? Is it "Time proto" or just "proto"? Well, we can come up with an easy rule for this case, let call it Rule-1: "Field names should NOT contain spaces". Fair/good enough!
        Let's move on to the next field, what is it's correct name? "src zone" or just "zone"? A human can recognize that "src zone" is the correct field name, thus we just violated the our Rule-1, we can continue our cycle of adding/violating/modifying|removing rules to our rule set only to recognize that the cycle never ends - which simply translates into "there is no one solution/rule-set that is able to extract structure from ALL unstructured data" - there will always be a degenerate case that violates the rules.

        More degenerate log lines:
        Stay tuned! Links in this section are coming soon....

        Solutions:
        - Delimiter based key-value pair extraction
        - Delimiter base KV extraction - advanced
        Stay tuned! More links coming soon....

        ]]>
        Carl Yestrau: JavaScript Error Reporting with Splunkhttp://blogs.splunk.com/carl/2008/01/16/javascript-error-logging-with-splunk/http://blogs.splunk.com/carl/2008/01/16/javascript-error-logging-with-splunk/Thu, 17 Jan 2008 00:18:46 +0000Carl YestrauKeeping track of new browser releases these days can be really challenging. It is less than ideal if your payment processor is throwing a JavaScript onsubmit exception effectively canceling all transactions.

        Here is a little technique for indexing JavaScript exceptions in your production and development environments using Splunk.

        In JavaScript create an onerror event handler that makes an HTTP request to a server that has access logs indexed by Splunk.

        
            function JSErrorLogger(httpBeacon){
                var self = this;
                self.handler = function(msg, url, line){
                    var log = {
                        "date":new Date(),
                        "type":"jserror",
                        "line":line,
                        "msg":msg,
                        "url":url
                    }
                    var logStr = "";
                    for(var i in log){
                        logStr += i + ":" + log[i] + " ";
                    }
                    var imgObj = new Image();
                    imgObj.src = httpBeacon + "?" + logStr;
                };
                self.JSErrorLogger = function(){
                    window.onerror = self.handler;
                }();
            }
        
        

        Make sure that this JavaScript is the very first item executed by the interpreter, ensuring all exceptions are caught by the event handler.

        Instantiate the class with a URI that points to a beacon on a machine that has Splunk indexing the access log. You may want to set some environment variables in JavaScript that turn logging on for only testing and production machines.

        
           //if environment test or production
           var splunkJSErrorIndexer = new JSErrorLogger("http://somedomain.com/beacon.gif");
        
        

        That's it, now you can empirically understand JavaScript exceptions being raised, set blackberry alerts and correlate ui stability issues to deploys:)

        Happy JavaScript Monitoring!

        ]]>
        David Carasso: Bombermanhttp://blogs.splunk.com/david/2008/01/10/bomberman/http://blogs.splunk.com/david/2008/01/10/bomberman/Fri, 11 Jan 2008 02:19:57 +0000David CarassoThe world's most fun video game, keeping us sane - 1993's Bomberman for NES, played on the Wii.
        "Look out, rotsky, you've got fast aids!"


        ]]>
        Ben Strawbridge: Configuring roles in Splunk 3.2 previewhttp://blogs.splunk.com/ben/2007/12/27/configuring-roles-in-splunk-32-preview/http://blogs.splunk.com/ben/2007/12/27/configuring-roles-in-splunk-32-preview/Thu, 27 Dec 2007 18:35:00 +0000Ben StrawbridgeLast week I made a video about how to setup new roles in Splunk 3.2 preview release. The video will demonstrate creating a new type of power user, with the same capability of a standard power user, and the addition of the ability to manage and create new users. You will also see how to create new roles by configuring authorize.conf.

        (Update): While watching the video again and realized I sent a mixed message about where to edit configuration in splunk. I made it clear that you want to edit in the local bundle directory, and if you look at the terminal that is where I was editing my configuration, however, I later said "default over-rides local, so always edit in default", this is WRONG. Always make your personalized configuration changes in the local directory, if the configuration file doesn't exist there, create one or copy it from default and edit that one.

        Take a look at the video and let me know if you have any questions about this stuff.

        Quicktime Video (625&#215;352)

        ]]>
        Carl Yestrau: Hey Browser, You’ve Got Tail!http://blogs.splunk.com/carl/2007/12/05/hey-ui-youve-got-tail/http://blogs.splunk.com/carl/2007/12/05/hey-ui-youve-got-tail/Thu, 06 Dec 2007 00:24:41 +0000Carl YestrauFor those interested in monitoring real-time data being consumed by Splunk we've introduced a new feature called Live Tail to the latest preview release. Additionally, we've added a nifty new REST endpoint /v3/splunk/tail for your custom application needs.

        Live Tail

        More information can be found in these videos:

        • A quick walkthrough of the new preview release feature Live Tail, its UI, and some sample code - See Video
        • An overview of the architecture used to integrate real-time data from Splunk Live Tail in a web browser. Challenges and workarounds when using JavaScript/Flash hybrids - See Video

        Happy Streams!

        ]]>
        Rory Greene: flexibles roles and chamber of secretshttp://blogs.splunk.com/rory/2007/12/05/flexibles-roles-and-chamber-of-secrets/http://blogs.splunk.com/rory/2007/12/05/flexibles-roles-and-chamber-of-secrets/Wed, 05 Dec 2007 23:16:44 +0000Rory GreeneHi Kids,

        So we have added in flexible roles into the preview release. Well, what does that mean.
        We will now allow folks to create their own roles. The previous ones of Admin, Power
        and User will be included as defaults.

        There is currently no GUI available for editing roles but you can directly edit the
        config file $SPLUNK_HOME/etc/bundles/default/authorize.conf.

        To add in these roles we did an audit of our system and broke down various actions
        into capabilities. These capabilities can be grouped together to create any role.
        Please bear with us here, this is just a first cut and we may not have chopped up
        things in a way that makes sense to you. This is the beauty of preview, you got a suggestion
        about capabilities you'd like to see added or removed then comment or mail us.
        The more feedback we get at this stage the faster this feature will improve.

        A role in the splunk system contains the following things.
        1. A list of capabilities that role can perform.
        2. A list of roles that are contained within this role ( their capabilities will be imported into our role)
        3. A list of search filters that should be applied when searching as this role.

        Below demonstrates how to define a role called kwyjibo that can edit users information and
        make changes to the authentication system. It imports in the capabilities of the roles User and Power.

        [role_kwyjibo]
        edit_user = enabled
        change_authentication = enabled
        bounce_authentication = enabled
        importRoles = Power;User
        srchFilter =

        If you have any questions, comments please let me know.

        Rory

        ]]>
        Rory Greene: Scripted auth in previewhttp://blogs.splunk.com/rory/2007/11/16/scripted-auth-in-preview/http://blogs.splunk.com/rory/2007/11/16/scripted-auth-in-preview/Sat, 17 Nov 2007 00:37:03 +0000Rory GreeneHey Kids,

        How are things? so I've made some progress in my attempt to code myself out of a job. Just checked the scripted auth into the preview branch which should be released in a few days. It's very basic right now with more improvements to come. At the moment userLogin, getUserType and getUserInfo are the only methods you need to fill in.

        I've written up a sample that interfaces with PAM on the linux, using /etc/passwd to get user lists. Mac users skip the pamauth.c compile you don't need this app and pam don't like macs ( can't say I blame pam on that score)

        First off a pamauth.c program to compile that will talk to pam for ya. Donated by Phillppe Troin, thank you fif. Feel free to take and edit for your own purposes, but you must send fif a chocolate chip cookie if you found it useful.

        File pamauth.c is attach due to severe lameness on part of wordpress, insisting on screwing with the #include's

        pamauth.c

        Compile that puppy like so
        gcc -Wall -Wextra -o pamauth pamauth.c -lpam

        You may need to create an entry for pam
        edit /etc/pam.d/pamauth and put this line in
        auth sufficient pam_unix.so

        To access pam root access is usually required so we will just set the pamauth script setuid instead of running splunk as root (which would be deeply stupid BTW).

        as root:
        chown root.root pamauth; chmod a+s pamauth

        You can test it by doing echo PASSWORD | ./pamauth username
        returns 0 for auth passed
        returns 1 on fail.

        K now that you have your nifty pam app running you need to add your python script that will interface
        with splunk. As they say on cooking shows, here's one we made earlier.

        [source:py]
        # Required functions;
        # 1. userLogin : login with username password pair
        # 2. getUserInfo : get user information. passed back in the form.userId;username;password;realname;userType
        # 3. getUserType : the splunk role to attach that user to.
        # optional functions
        # 1. getUsers : Enumerate all users in the system, these will then be displayed on the user page in splunk.
        # Later release
        # 1. checkSession : Current version just auths and then splunk managed the session, this will allow
        # session management to be handled here. Careful though splunkd and the frontend
        # are quite chatty this will be called alot. If it's slow it will degrade performance.

        import sys
        import subprocess

        SUCCESS = "success"
        FAILED = "fail"

        PAM_EXE = ""

        def writeToStdout( listIn ):
        result = ""
        for fu in listIn:
        result = result + "[" + fu + "]"

        sys.stdout.write( result )

        def readFromStdin( ):
        input = sys.stdin

        inStr = ""
        for line in input:
        inStr = inStr + line

        inStr = inStr.replace( "[", "" )
        return inStr.split( ']' )

        def userLogin( infoIn ):
        listFu = []
        username = infoIn[0]
        password = infoIn[1]

        command = PAM_EXE + infoIn[0]

        # our check with pam is done with a setuid program called pamauth
        proc = subprocess.Popen( PAM_EXE + ' %s' % username,
        shell=True,
        stdin=subprocess.PIPE,
        )
        proc.communicate( password)
        retCode = proc.wait()

        if retCode == 0:
        listFu.append( SUCCESS )
        else:
        listFu.append( FAILED )

        return listFu

        def getUsers( infoIn ):
        listFu = []
        listFu.append( SUCCESS )
        # just going to use /etc/passwd here but you may use any method you wish.
        FILE = open("/etc/passwd" ,"r")
        fileLines = FILE.readlines()

        for line in fileLines:
        userBits = line.split( ":" )
        if userBits[6].find( '/bin/bash' ) != -1:
        realname = userBits[4]
        if realname == "" :
        realname = userBits[0]
        # userId username password realName userType/splunk role
        listFu.append( userBits[2] + ";" +userBits[0] + ";***********;" + realname + ";Admin" )

        FILE.close()

        return listFu

        # IN UserId
        # OUT [RESULT(SUCCESS|FAILED)][userType]
        def getUserType( infoIn ):
        # Here you are given a userId
        # you must return the user type (splunk role)
        # I'm just going to make everyone an admin.
        listFu = []
        listFu.append( SUCCESS )
        listFu.append( "Admin" )
        return listFu

        def getUserInfo( infoIn ):
        listFu = []
        listFu.append( SUCCESS )
        #userId;
        listFu.append( infoIn[0] + ";" + infoIn[0] + ";***********;" + infoIn[0] + ";Admin" )
        return listFu

        if __name__ == "__main__":
        callName = sys.argv[1]
        listIn = []
        listIn = readFromStdin( )

        returnList = []
        if callName == "userLogin":
        returnList = userLogin( listIn )
        elif callName == "checkSession":
        returnList = checkSession( listIn )
        elif callName == "getUsers":
        returnList = getUsers( listIn )
        elif callName == "getUserType":
        returnList = getUserType( listIn )
        elif callName == "getUserInfo":
        returnList = getUserInfo( listIn )
        else:
        returnList.append("ERROR call name no known" )
        returnList.append( callName )

        writeToStdout( returnList )
        [/source]

        Change the PAM_EXE variable in the script to point to the app that will check the password. On linux : the pamauth module you just compiled. On Mac (the piano-accordion of computers): use chkpasswd program shipped with mac.

        Now that you have a script auth plugin ready to go all you need to do now is tell splunk about it.

        Example of the authentication.conf bundle.

        [source]
        [auth]
        authSettings = fubar
        authType = Scripted

        [fubar]
        programPath = /opt/splunk/bin/python
        scriptPath = /home/boo/splunk/scriptedAuth/flubber.py # my python auth script.
        [/source]

        Now pay attention here you do need to edit programPath and scriptPath to paths on your system.

        Things left to do.
        1. Allow users to pass back search filters on userLogin and getUserType.
        2. Allow session management to be handled by scripted input. ( right not once auth is confirmed as correct splunk takes over session management).

        Also this script will not return user lists on the mac ( not big deal you just can't see all users in the admin/users tab ). Erik Swan has volunteered to fix this because he loves macs, a little too much really it's kinda unhealthy.

        Download this and play with it, let me know of any problems.

        I will publish more details on the communication between splunkd and the script but for the moment you folks can reverse engineer this, it's pretty simple, a lame wilder beast could figure it out.

        More later, for now it's time for beer pong, played for cold hard cash and ugly women.

        Ciao,
        Rory

        ]]>
        Carl Yestrau: Flash/AS3 URLStream Memory Leakhttp://blogs.splunk.com/carl/2007/11/16/flashas3-urlstream-memory-leak/http://blogs.splunk.com/carl/2007/11/16/flashas3-urlstream-memory-leak/Fri, 16 Nov 2007 21:06:15 +0000Carl YestrauLately we have been doing some work with persistent connections. If you are familiar with Comet the Flash/AS3 URLStream class provides an interesting alternative. The URLStream class exposes raw binary data as it is downloaded.

        Unfortunately, this week we ran into a rather tricky memory leak when using this nifty class. An event listener was subscribed to the progress event and over time memory usage steadily increased to a point of making the browser inoperable.

        After a little digging we narrowed the problem down to the URLStreams usage of the ByteArray. It seems as if URLStream was reallocating a buffer for the array and the short turn around time (on the reads) was not giving the garbage collector enough time to throw out the old allocation.

        The way the leak could be corrected was by deleting the ByteArray (Set null), forcing garbage collection of the read buffer.

        Here is the workaround:


        var bytes:ByteArray = new ByteArray();
        this.readBytes(bytes, 0, this.bytesAvailable);
        bytes = null;

        Happy Streams!

        ]]>
        Nick Mealy: reallyDescriptiveNameshttp://blogs.splunk.com/nick/2007/11/07/reallydescriptivenames/http://blogs.splunk.com/nick/2007/11/07/reallydescriptivenames/Wed, 07 Nov 2007 23:20:10 +0000Nick MealyI have a funny habit with our code in the front end, where if something's just too complicated, but i cant see the better solution yet, I'll give its pieces long descriptive names. It's basically so they'll stick out later, we'll think 'why is this thing so ugly and complicated', and it'll help us remember to revisit it. (btw, I'm not claiming that this is good development practice, it's just a trick i use, faintly reminiscent of the blue-wire red-wire stuff in the Mythical Man Month).

        So anyway, I bring it up cause Johnvey saw one of it's cousins out in the wild, taking the whole concept to an extreme. Check it out.

        Arguably though, this is so extreme that it's not reallyDescriptiveNames at all, but closer kin to a sort of passiveAggressiveWorkplaceSabotageAdapter.

        ]]>
        Amrit Bath: Saving the environment, one beer pong game at a time.http://blogs.splunk.com/amrit/2007/11/05/saving-the-environment-one-beer-pong-game-at-a-time/http://blogs.splunk.com/amrit/2007/11/05/saving-the-environment-one-beer-pong-game-at-a-time/Mon, 05 Nov 2007 21:42:07 +0000Amrit BathRecycling is universally considered to be a good thing, right?

        Good. Then that means that we at Splunk are obligated to play play beer pong every Friday! I figure that with all the bottles and cans that subsequently go into the recycling bin, we're probably offsetting a small percentage of the many computers we use here... amirite?

        Al Gore would be proud

        If you disagree, you can voice your opinions in person. See you here Friday at 5PM. ;)

        ]]>
        Christina Noren: Facebook, privacy and IT datahttp://blogs.splunk.com/cfrln/2007/10/29/facebook-privacy-and-it-data/http://blogs.splunk.com/cfrln/2007/10/29/facebook-privacy-and-it-data/Mon, 29 Oct 2007 23:23:27 +0000Christina NorenFacebook is getting a lot of flak in the press (latest in the Register) about reports on a gossip blog about some pretty serious privacy holes:

        1. anyone that works there can look at anyone's private profile

        2. anyone who works there can look at logs of what other profiles any user has seen.

        If Facebook wants to turn their act around, or any other social networking site wants to avoid being in their position, they'd better pay attention to some best practices around securing and reviewing IT data.

        Here's what best practice would say about Facebook's two problems.

        The first problem - anyone can look at any customer's data - is classically the kind of thing that has brought on regulations in other industries, such as PCI-DSS, which was introduced by VISA to ensure that merchants processing credit cards keep consumer financial info private. Like credit cards, a lot of the information people post to their private profiles is a goldmine for identity thieves - Information Week made this argument about Faceboook even before the latest flap. If I know your birthdate and mother's name I'm a lot further along in social engineering an unwitting customer support rep into believing I'm you. And yes, identity thieves do have insiders - ask Ford Motor Credit.

        A major measure that organizations who are following best practices for privacy are supposed to take is to lock down this private information to only insiders with a need-to-know - obviously Facebook's not doing that. But once they do put the right access controls in place, they're going to need to put in a review procedure to watch privileged employees. Facebook' security or privacy staff should be reviewing logs of who has accessed private info and ensuring that there was a valid business reason for each access. The review should include:

        • logs generated by Facebook's application itself to see employees with admin access coming in the front door
        • audit tables for the back end databases to be sure that the database admins who manage the database back-end aren't bypassing the application's permissions and doing manual queries to see what they shouldn't
        • filesystem audit logs, to be sure that server or storage admins aren't bypassing both the database and the app to look at the data on the filesystem itself

        The second problem - that any employee can look at logs of what users have done - is a bit less well understood privacy issue. It's probably particularly bad on a social networking site - do you really want your ex knowing you're watching their profile? But you may not want every Amazon employee being able to see what items you're browsing, so it's an issue that affects almost any site to some degree.

        To address the second issue, logs themselves need to be securely captured into a system that provides appropriate access controls to the logs themselves as well as an audit trail of who's looked at the logs - which the security team should be reviewing proactively. Unfortunately, access logs are hardly ever considered to have privacy implications inside large sites. As evidenced by last year's infamous publication of AOL search records.

        Keeping these logs around that show who looked at what is going to be important too - law enforcement could subpoena Facebook for logs if unauthorized access by their employees is suspected to be a part of a criminal act. Facebook won't want to be in a position where they can't produce the logs.

        The biggest reason Facebook should take this seriously? An overzealous plaintiff's attorney somewhere is probably salivating over all the cash they raked in from Microsoft and figuring out how to sue Facebook for cash damages if a Facebook privacy breach leads to financial losses or serious personal harm, using the argument that by not following the same standard as other sites they've not met their "duty of care." Think they can't do it? TJ Maxx is getting sued right now on similar grounds.

        ]]>
        David Carasso: Tutorial: Event Types in 3.2http://blogs.splunk.com/david/2007/10/27/tutorial-event-types-in-32/http://blogs.splunk.com/david/2007/10/27/tutorial-event-types-in-32/Sun, 28 Oct 2007 03:30:49 +0000David CarassoHi, I'm David Carasso, perhaps you've seen my famous File Classifier Video. It's the number one video at CurrentTV.

        Below is a second screen capture video that I just made to describe Splunk's new Event Typer. The Event Typer dynamically tags system events in custom, yet, universal ways. For example, I can say that for any event that happens on Sunday, that has 'status=Fatal', and that has "sourcetype=weblogic", to be dynmaically tagged as a "weekend_fatal_weblogic" event. Topics covered include: what is an event type; how to search, view, and count event types; creating an event type; creating an event-type template; and discovering event-types.

        Yes, production value is what you've come to expect from a Carasso Production. That's right 15 minutes of unscripted nerd talk. Now with a bonus 45 seconds of video as I type in an off-camera window. But I promise you'll learn a few useful things you didn't know.
        EventTyperVideo (15 minutes of emacs magic)

        ]]>
        Kim Wallace: Stupid Perforce Trick #1http://blogs.splunk.com/kim/2007/10/26/stupid-perforce-trick-1/http://blogs.splunk.com/kim/2007/10/26/stupid-perforce-trick-1/Fri, 26 Oct 2007 22:13:28 +0000Kim WallaceWe use Perforce at Splunk, and it's worked out pretty well for us. I'm a CVS admin at heart, and I know there's some SVN sentiment, but p4 gives us a nice mix of atomic commits, attractive GUI and command-line tools, and someone to call for help if it ever completely eats itself.

        Over time I've compiled a small library of scripts for various p4 functions that have been written time and again at different sites...mergetool is one of them. This little tool accepts a merge target ("yours" in p4-speak) and projectile ("theirs" in p4), labels both, performs an integrate, and performs a "safe" resolve -as. It logs any failures for you to resolve by hand, or submits the change set if the resolve completes successfully. It does this with a bunch of logging in a well-organized, date-stamped directory suitable for archiving (or splunking).

        ]]>
        David Carasso: Tutorial: File Classifierhttp://blogs.splunk.com/david/2007/10/26/tutorial-file-classifier/http://blogs.splunk.com/david/2007/10/26/tutorial-file-classifier/Fri, 26 Oct 2007 19:50:51 +0000David CarassoHi, I'm David Carasso and below is a screen capture video I just made to describe Splunk's File Classifer. The File Classifier takes a file and tell you what type it is. From that sourcetype we determine what to do with the file and how to process it. It's pretty critical for properly handling a file, including time-stamping events and aggregating multiple lines into single events. There are several methods that the File Classifer uses to classify a file, and we'll cover each one with real-world examples.

        Yes, production value is at a new low here as I cover 18 minutes unscripted, but I promise you'll learn a few useful things you didn't know. There's a free Splunk t-shirt for the commentor that guesses the actual number of times I say "uhhhhh".

        File ClassifierVideo (18 minutes of action packed emacs video)

        ]]>
        Carl Yestrau: JavaScript Hybrids (Extending the browser) - Part 1http://blogs.splunk.com/carl/2007/10/15/javascript-hybrids-extending-the-browser-part-1/http://blogs.splunk.com/carl/2007/10/15/javascript-hybrids-extending-the-browser-part-1/Mon, 15 Oct 2007 20:33:38 +0000Carl YestrauI deeply enjoy browser programming, however sometimes I wish it could do more. Things like sockets, streams, audio and improved file system handling would be a real treat. Man would it be fresh if I had access to this functionality in JavaScript.

        Now this is going to sound pretty circa 98, but several main stream browser plugins support a JavaScript communication layer. According to the Millward Brown survey plugin installations of Flash (99%) and Java (85%) are pretty ubiquitous.

        Flash/JavaScript Communication
        The Flash ExternalInterface class enables communication between JavaScript and the Flash Player. ExternalInterface was first introduced in ActionScript 1.0; so Flash Player 8 is the minimum plugin version required.
        From JavaScript

        • Call an ActionScript function
        • Pass arguments
        • Return a value to the JavaScript callee

        From ActionScript

        • Call a JavaScript function
        • Pass arguments
        • Pass various data types (Boolean, Number, String, etc...)

        Java Applet/JavaScript Communication
        The scarcely documented LiveConnect API provides JavaScript with the ability to call methods of Java classes and vice-versa. Using LiveConnect in applets requires the mayscript attribute and the plugin.jar package for newer versions of Java (Howto for Mac OS X users). Communication from Java to JavaScript is mitigated through the netscape.javascript.JSObject class. JavaScript exceptions in Java can be handled using the netscape.javascript.JSException class. Public methods in an applet can be called using the applet container object followed by the method name and arguments (e.g., document.getElementById("myapplet").publicAppletMethod(arg1, argN);).

        From JavaScript

        • Call a Java method
        • Pass arguments
        • Return a value to the JavaScript callee

        From Java

        • Call a JavaScript function (Note: does not seem to support deep objects obj.foo(arg))
        • Pass arguments
        • Pass various data types (Boolean, Number, String, etc...)

        It looks like LiveConnect is due for an overhaul in the near future, so you may want to keep your eyes out for changes on Mozilla developer Josh Aas's blog.

        What's Next
        With the power of Java and Flash this opens up the arena for creating visually hidden gateways (i.e., width:0px; height:0px; applets or swf movies) that extend the browser. Stay tuned for the next part in this series where we make a sample application. Feel the power!

        ]]>
        Kim Wallace: Being the girl in dev at Splunkhttp://blogs.splunk.com/kim/2007/10/12/being-the-girl-in-dev-at-splunk/http://blogs.splunk.com/kim/2007/10/12/being-the-girl-in-dev-at-splunk/Sat, 13 Oct 2007 04:59:59 +0000Kim WallaceLike a lot of tech companies, Splunk's development organization isn't a model of perfect gender balance. For a year and a half now, I've been the only woman in the dev organization.

        Surprisingly, this is not an uncomfortable place to be. In 11 years in industry I've worked in a variety of organizations: the now-bankrupt dot-com best known for putting an ad with a naked guy up during the Super Bowl, 2 major marquee names with vastly differing corporate cultures, a security start-up stocked with emancipated-minor hackers. Aside from that doomed dot-com - which had a surprisingly strong gender balance throughout technical roles and a culture blessedly free of gender-based intimidation at all levels - Splunk may be the most comfortable place I've ever worked. There's no creepy tokenism (unlike stories I've heard about certain other bay area employers), That Guy Who's Never Seen A Girl Before doesn't work here...and as far as I can tell, no one really gets harassed except Amrit.

        Perhaps a better testament for the dev culture than my opinion - because, frankly, I'm pretty weird to start with - is that other women in the company seem to be pretty comfortable visiting the dev area, either on work errands or just to take a break from the sales-focused environment upstairs. Frankly I can't imagine that happens too often in the bay area...and more's the pity.

        ]]>
        David Carasso: Semi-Automatic Discovery of Extraction Patterns for Log Analysishttp://blogs.splunk.com/david/2007/10/12/semi-automatic-discovery-of-extraction-patterns-for-log-analysis/http://blogs.splunk.com/david/2007/10/12/semi-automatic-discovery-of-extraction-patterns-for-log-analysis/Fri, 12 Oct 2007 17:15:46 +0000David CarassoHere's a paper I recently wrote on some of the automatic field extraction we're doing with Splunk.

        Abstract
        This paper presents an interactive bootstrapping process used in Splunk that automatically learns to extract fields from log events. End users simply select one or more example values of a field and a learning process discovers additional instances, along with the patterns to extract them. The user is able to correct the instances and save the extraction patterns. Immediately afterward, while searching log events the newly-taught fields will be extracted from the event's raw text.

        Click here to read full paper

        Feedback appreciated.

        ]]>
        Johnvey Hwang: Trekking in the Galapagoshttp://blogs.splunk.com/johnvey/2007/10/11/trekking-in-the-galapagos/http://blogs.splunk.com/johnvey/2007/10/11/trekking-in-the-galapagos/Fri, 12 Oct 2007 03:34:58 +0000Johnvey HwangThe Splunk cozy has been to a few countries around the world. This month, I took it to the Galapagos, and decided to leave it there at Post Office Bay amongst all the other plaques and memorabilia. I think it'll be very comfortable for a while. See the rest of my Galapagos photo gallery.

        The Galapagos

        The Galapagos

        ]]>
        Rob Das: Diagraming Splunk’s data-flow (part 2 - performance overlays)http://blogs.splunk.com/rob/2007/10/11/diagraming-splunk%e2%80%99s-data-flow-part-2-performance-overlays/http://blogs.splunk.com/rob/2007/10/11/diagraming-splunk%e2%80%99s-data-flow-part-2-performance-overlays/Fri, 12 Oct 2007 00:49:01 +0000Rob DasIn my previous post "Diagraming Splunk's data-flow" I wrote a small python script that parsed Splunk's runtime environment ($SPLUNK_HOME/var/run/splunk/composite.xml) and generated a file which when input into graphviz would generate a nice architectural diagram of how pipelines and processors are wired together.

        In this installment, I took it to the next level by using Splunk's search capability to overlay performance metrics on the diagram. The combination of Splunk logging metrics information for each processor within each pipeline (thanks Brad) and the ability to have Splunk execute a search processor written in Python made this possible. Here is how you use it:

        First download graphviz. I particularly like the OSX application that they've written because you can see the graph on the screen and as the file changes, those changes are reflected in the graph you are viewing. If you don't have a Mac, use the command line version to generate different types of output file formats like .jpeg, etc.

        Go to SplunkBase to download my python script. Copy the .py file into $SPLUNK_HOME/etc/searchscripts

        Start Splunk.

        Type the following into the search box:index___internal metrics pipeline processor NOT get - over all time - localhost - Splunk 3.2-UNSTABLE-4.jpg
        This will search for the appropriate metrics information and pipe the results through the script.

        There are 2 options to perfgraph:

        perfgraph [output filename] [cpu, execs, cumhits]

        Unfortunately (because I'm lazy) you can't specify cpu, execs or cumhits without also specifying an output file.The parameter is the full path and file name of the 'dot' file you wish to create. It defaults to /tmp/out.dot.

        The second parameter, if specified tells the script to highlight in red the slowest processor (cpu), the processor with the most hits (execs) or the processor with the most cumulative hits (cumhits). This parameter defaults to 'none', or no highlighting.

        The above search string results in the following graph (portion). Notice the performance information overlayed into the processors:
        out.dot-1.jpg

        If you specify the output file and 'cpu', the processor with the most cpu time will be highlighted. Here is the search:

        index___internal metrics pipeline processor NOT get | perfgraph _tmp_out.dot cpu - over all time - localhost - Splunk 3.2-UNSTABLE.jpg

        It results in the following graph (portion). Notice the red processor:

        out.dot-2.jpg

        Next steps:

        • Overlay queue metrics into the queue nodes
        • Overlay indexer throughputs into the indexer nodes

        You see. Splunk provides endless fun. Insane! Enjoy.

        ]]>
        Rob Das: Diagraming Splunk’s data-flowhttp://blogs.splunk.com/rob/2007/10/10/diagraming-splunks-data-flow/http://blogs.splunk.com/rob/2007/10/10/diagraming-splunks-data-flow/Wed, 10 Oct 2007 16:57:59 +0000Rob DasThis blog entry is not about how the framework works. It is about a semi-cool visualization that I created using python and graphviz. If you watched the video where I presented Splunks framework architecture from a high level you know what pipelines and processors are. If you haven't here is a very quick overview.

        • A pipeline is a thread of execution that lives within the splunkd process. Each pipeline executes a series of processors, each one which operates on data. The data is created when the first processor on the pipeline reads it from some input (like tailing a file, or receiving it on a network port). Each processor then does something to the data. Eventually, the data gets indexed and execution is returned to the first processor to get more data again.
        • Pipelines are connected via queues. A queue output processor (the last processor in a pipeline) puts data on to a queue and blocks if the queue is full. A queue input processor (the first processor at the top of a pipeline) gets the data item from the bottom of the queue and sends it on down the pipeline. If there is no data, it blocks waiting for some to be put on the queue.

        Enough already. Go watch the video. So, I decided that I'm tired of drawing these diagrams and wrote some code to produce them for me.

        I Implemented some python code that took the composite.xml file, parsed it and produced a .dot file. Composite.xml, for those of you who don't know is an amalgamation of all pipelines and processors in the system. It represents the current (or last) runtime environment for Splunk. It lives in $SPLUNK_HOME/var/run/splunk.

        I then took the resultant .dot file and ran it through graphviz. After lots of tweeking, here is what I came up with. Click on the image to see a larger version which is actually readable.

        Results (click to enlarge)
        Auto-generated pipeline graph

        Python Transformation Code

        Untar this. It's only a single python file, but this blogging software wouldn't let me upload a .py file.

        viz.tar

        Future Work

        • Annotate the graph with run time statistics like average per-processor timing, average queue size, max queue size, etc. This would require looking at the logs.
        • Launching this from Splunk, firing off the python along with the metrics data pre-sifted ala Splunk.

        Got more ideas? Please post them here.

        ]]>
        Amrit Bath: Things you don’t want to hear at workhttp://blogs.splunk.com/amrit/2007/10/09/things-you-dont-want-to-hear-at-work/http://blogs.splunk.com/amrit/2007/10/09/things-you-dont-want-to-hear-at-work/Wed, 10 Oct 2007 01:21:08 +0000Amrit BathLots of things are said here that are... hmm, what's the word... inappropriate? disgusting? TMI? omgwtfbbq?

        My boss just told me, "Amrit, I have a camera on my computer. And when I'm at home, anytime you want, I can turn on the camera and you can watch."

        There was more, but I think my ears reflexively closed in on themselves.

        do not want

        :/

        ]]>
        Rob Das: The framework team is hiringhttp://blogs.splunk.com/rob/2007/10/09/the-framework-team-is-hiring/http://blogs.splunk.com/rob/2007/10/09/the-framework-team-is-hiring/Tue, 09 Oct 2007 18:52:30 +0000Rob DasSplunk's framework team is involved in many diverse projects. The "framework" itself is really a set of generic code that makes up the runtime environment of Splunk. In addition, we also handle bringing data into the system, distributing this data across enterprise topologies, authentication, access controls, configuration management, distributed deployment, high availability, real-time streaming, encryption and much much more.

        Splunk is extending it's reach into extremely large deployments involving thousands of machines and devices across multiple data centers. The framework team is responsible for making Splunk excel in these challenging environments. If this sounds interesting and you want to work with some extremely talented people, please drop me some email.

        Framework Architect / Senior Engineer

        We are looking for a highly motivated engineer who will be responsible for driving the design and implementation of Splunk's network management, scalability, and distributed deployment technology. The right candidate is fluent in C++, high performance networking and concurrent / multi-threaded design.

        Qualifications

        • Minimum 5 years of relevant industry experience
        • Expert C++ knowledge, deep understanding of design patterns and experience building clean external API's.
        • Significant experience with multi-threaded design and implementation
        • Has designed &amp; implemented high throughput server systems
        • Practical experience with network protocols and complex topologies
        • BS/MS Computer Science / Engineering
        • Excellent verbal and written communication skills
        ]]>
        Rory Greene: I’m cold and there are wolves after mehttp://blogs.splunk.com/rory/2007/10/08/im-cold-and-there-are-wolves-after-me/http://blogs.splunk.com/rory/2007/10/08/im-cold-and-there-are-wolves-after-me/Tue, 09 Oct 2007 00:53:43 +0000Rory GreeneJust fresh from the splunk poker game. Good fun, made a whopping $10. Jef looked like
        he was on the verge or paying for his kids education. Maverick even threatened to sing,
        good times.

        So Erik did a pretty good job of describing the environment here at splunk.
        The people here are great and lots of fun, there are some great problems
        just begging to be solved, we need more monkeys on them typewriters

        Poker games, golf, visits to the jackson arms, beer pong, foosball
        (Raffy really needs a challenge )

        Don't worry about that collage bit http://en.wikipedia.org/wiki/Collage

        Erik insists everyone draw a picture of themselves in crayon, but really
        who doesn't ask for that in a serious interview these days.

        In the coming weeks I'm going to be working on a way to allow people to
        plug in their own auth systems. We've had requests running the gamut from
        the normal stuff like PAM, RADIUS etc to carrier pidgeon and bob's trusty
        auth system. The most common thread of all these is that they are all scriptable.
        You folks know your own auth systems. We'll throw this in the unstable
        release/dev branch that we'll be launching and hopefully get some feed back
        from you folks to fine tune it before we put it into stable.

        Now that I've said that in public I'm well and truly screwed and will have to do it.

        ]]>
        Bob Fox: Field Definitions and Splunk’s extract Commandhttp://blogs.splunk.com/bob/2007/10/07/field-definitions-and-splunk%e2%80%99s-extract-command/http://blogs.splunk.com/bob/2007/10/07/field-definitions-and-splunk%e2%80%99s-extract-command/Mon, 08 Oct 2007 01:57:17 +0000Bob FoxThe 3.0 version of Splunk has introduced some wonderful new features such as advanced reporting, granular access control and a slew of additional functions to help you search through your IT data. One of these newly released functions is the extract command. This works very nicely with Splunk’s revamped facility to add, view, and access field names. Here is a quick primer on creating field definitions and using the extract command to have those definitions reloaded automatically.

        Splunk has always done a great job at allowing you to search on any text from any data source. Splunk even goes one step beyond this and automatically defines named fields data that shows up in a Keyword = Value (KV) pair. If my data contains text that looks like

        username=sparky

        then Splunk will key in on those values, allowing me to search and report more precisely on those values. For instance I could say

        * | where username <> "sparky"

        to get back all of the records where sparky did not show up as a username.

        But what if my data is not so friendly? Consider an event that looks like this:

        Invalid login attempt by sparky on host kinja

        While the data is all there and searchable, there is no easy way to hone in on the fact that sparky is the username. Of course, I could simply include (or exclude) all events that had the term sparky with the search:

        * NOT sparky

        but lets say I wanted to be more specific. I don’t want to exclude an those events like:

        Invalid login attempt by badguy on host sparky

        Fortunately Splunk allows me to define fields so I can specify exactly what data is exposed.

        There is a full write up on extracting additional fields here but in short, I need to configure Splunk with some hints on how to find that username, and what to call it when I do find it. And I will probably want to do this all within a Splunk bundle to keep things portable and maintainable, but that’s another blog entry.

        The first step will be to define a regular expression that will isolate the username in the event. We could set up this definition in our bundle’s transforms.conf file:

        [get-username]
        REGEX = by\s(\w+)\son
        FORMAT = username::$1

        Secondly, we will need Splunk to apply this regular expression on the events of a particular sourcetype. We’ll do this at searchtime to allow the definition of these extracted fields to be dynamic. This is accomplished by adding a line to the props.conf file that defines the sourcetype of our events:

        [securitylog]
        REPORT-secure = get-username

        Last, but not least, we need define which of our inputs will be using this sourcetype. For simplicity, let’s look at an example of a tailed file with a hardcoded sourcetype. This definition will exist in our inputs.conf file.

        [tail:///path/to/my/datafile]
        sourcetype = securitylog

        Now that all the heavy lifting is done, we need to apply these properties to the running Splunk instance. This (finally) is where extract comes in.

        Extract allows us to test the regular expression that we have defined within transforms.conf. More importantly, it lets us reload the props and transform without restarting the server. We accomplish this by including the extract command inside of a Splunk search. For example:

        sourcetype::securitylog | extract reload=T

        Now I should see username listed under the "Fields" tab of my Splunk screen. Make sure that the core only option is unchecked to see the custom defined fields.

        There you have it - a quick into to field definitions and the extract command. Check out the release notes to view all of the new Splunk features.

        ]]>
        Stephen Sorkin: the search and indexing team is hiringhttp://blogs.splunk.com/ssorkin/2007/10/06/the-search-and-indexing-team-is-hiring/http://blogs.splunk.com/ssorkin/2007/10/06/the-search-and-indexing-team-is-hiring/Sat, 06 Oct 2007 17:03:21 +0000Stephen Sorkinhello world. i'm the manager of the search and indexing team at splunk. our team is responsible for the amorphous category of "data processing and storage." this includes such tasks as character set normalization, grouping multiple consecutive lines into logical events, timestamp extraction, metadata extraction, indexing and storage on the input side. on the retrieval side, we maintain the APIs to access the index and all the various transformations that fall under the label "reporting," like automatic key/value extraction methods that take raw text and produce semi-structured data. we have more problems to solve than people to solve them, so we're looking to grow our team in the near future. below are some of the positions that we're hiring for, but if you're smart, clever and creative, we probably have a spot for you.

        Indexing Architect
        We're looking for an exceptionally talented engineer to drive the design, implementation and maintenance of our core indexing and search technology. The right candidate will have significant experience writing high performance C/C++ code that interacts with the file system at low levels.

        Qualifications

        • Minimum 5 years of relevant industry experience.
        • Expert level knowledge of C/C++ programming and a deep understanding of multi-process, highly concurrent software design.
        • BS/MS Computer Science/Engineering. PhD is welcome.
        • Experience in algorithm and data-structure design.
        • Excellent verbal and written communication skills.

        Search and Indexing Engineer (all levels)
        We're looking for an exceptionally talented engineers from recent college grads to seasoned software veterans to contribute technically to Splunk's Search and Indexing team. The right candidate will care deeply about writing high-quality, efficient and maintainable software.

        Qualifications

        • Strong background in C/C++ programming.
        • BS/MS Computer Science/Engineering or related field. PhD is welcome.
        • Understanding of algorithmic complexity and data-structure tradeoffs.
        • Excellent verbal and written communication skills.

        Data Mining Architect
        Do you love data? Terabytes of semi-structured, inconsistent, machine-generated data? If you're creative and have inspired ideas of how to summarize, group and link this data, we're looking for you.

        Qualifications

        • PhD in Statistics, IEOR, Computer Science or related field.
        • Strong modern AI and statistics background.
        • Solid command of one or more scripting languages.
        • Understanding of algorithmic development and data-structure tradeoffs in C/C++.
        • Strong publication history.
        ]]>
        Kim Wallace: Packaging Splunkhttp://blogs.splunk.com/kim/2007/10/05/packaging-splunk/http://blogs.splunk.com/kim/2007/10/05/packaging-splunk/Sat, 06 Oct 2007 00:30:06 +0000Kim WallaceSplunk runs on a lot of platforms for a relatively young product and that number is always increasing. The day I started, there were packages for Intel and PowerPC Macintoshes, i686 Linux, Solaris 8 on Sparc, and FreeBSD on x86, all created with BitRock InstallBuilder, run from a simple shell script, usually by Erik. There really wasn't much control over what went into the installer - if a file was in the installer prep directory and the shell script didn't know to delete it, out it went.

        By the time 2.1 was on its way, we'd decided to switch to native packages, and our list of platforms had expanded to include Solaris on Intel, with several more on the horizon. We also wanted to provide the "rail tarball" distribution we continue to support, in part so that QA could get started before the packaging automation was complete.

        What is that packaging automation, you might ask? Obviously writing custom code to package each platform (not to mention spec or pkgmap files in each platform's native format) was not a very maintainable solution. Instead we use a locally modified version of Easy Software's EPM package manager. After a little work, EPM lets us use a common set of list files to create relocatable packages using common pre- and post-install scripts across all of the 9 platforms we now build on. We're able to control every file and permission that goes into the packages, and in most cases we can add packaging for a new OS platform with a minimum of work (for something very different we haven't previously had in house, like AIX, more time might need to be spent cleaning up EPM's support for the platform). We've piggy-backed creation of the "rail tarball" distributions on the EPM list file structure, so those packages too are completely defined. EPM itself is built during the Splunk build process like any other 3rd party dependency, so any new patch to the tool are available to the build systems almost as soon as it's checked in.

        The downside to this is a little loss of flexibility for some platforms; trivial changes to the RPM spec file or FreeBSD ports originpath have usually required modifying EPM's source. It's not a bad trade-off, though.

        EPM has recently been made open source; check it out at http://www.epm-home.org . If you're interested in my patches, feel free to drop me a line, but in some cases superior patches have been contributed at epm-home.org.

        ]]>
        Christina Noren: Splunk as job qualificationhttp://blogs.splunk.com/cfrln/2007/10/05/splunk-as-job-qualification/http://blogs.splunk.com/cfrln/2007/10/05/splunk-as-job-qualification/Fri, 05 Oct 2007 19:09:33 +0000Christina NorenThis is a fun trend for us here at Splunk - more and more job descriptions are listing Splunking skills as a plus. Really rewarding for those of us who've been here since before the 2005 beta!
        Here are a few jobs that want you to know your Splunk:

        Got any more? Post 'em in the comments!

        ]]>
        Rob Das: Software configuration - why does this wheel need re-invention?http://blogs.splunk.com/rob/2007/10/02/software-configuration-why-does-this-wheel-need-re-invention/http://blogs.splunk.com/rob/2007/10/02/software-configuration-why-does-this-wheel-need-re-invention/Wed, 03 Oct 2007 02:01:09 +0000Rob DasI have worked on so many software projects that I can't possibly enumerate them. Most of my contribution to these projects has been on the server side of things. Every one of these projects needed to be configured in some way, shape or form and I just realized that every one of them had it's own configuration subsystem that was implemented from scratch. Many of these configurations could be managed via GUI's and/or CLI's, and others simply were "managed" via vi, or emacs. They all share one thing in common however - they all suck in one way or another. Why? Because configuration subsystems are incredibly difficult to get right.

        Building a configuration system on the surface seems boring. If I went and showed the sales guys how cool my configuration system was they would roll their eyes back into their heads. Put some rotating, flashing thing on the GUI and they think you're the coolest, most creative developer around. The fact is that a good configuration system makes a huge difference to a product. In fact, it can make or break it in some cases.

        Indulge me in allowing me to share a typical "configuration system lifecycle". Please tell me if this seems familiar to you. I have personally gone through this many times.

        • Version 1.0 - simple configuration language, usually XML. Why? Because you need to get something up and running quickly. XML has tons of parsers, validators, etc. Users of this early release need to edit the configuration files using a text editor. They need to restart the system every time a change is made. The developer states that this is fine - the product is "not intended for use by people that can't use an editor". Fuck em'.
        • Version 1.5 - The next release has some really complex configuration. However, it's still only modifiable via a text editor. Maybe flow control is introduced. Changing a configuration in the wrong way causes very bad and very weird things to happen. Customer Support gets lots of calls. There is no way to tell what a customer changed and what the default configuration was supposed to be without comparing the two configuration files side by side.
        • Version 2.0 - We need an adminstration GUI so people can configure this without have to call support every single time! So a GUI is added. Every administered item is coded into the server and into the GUI because every configuration has different validation, different things to check, etc. The customers are much happier. Until graybeard decides he hates the GUI and insists on using emacs. The GUI and emacs don't get along very well. Things break again.
        • Version 2.5 - The executives decide that we need a way for "the community" to build widgets that other people can use. They need to package these widgets up in some way that they can be downloaded and added to the system without disturbing local and default configurations. The engineers decide to use layering to separate these 3 things out. But layering in XML is nasty and people will get confused. So out with the XML to something "simpler". Boy did this open a can of worms. All the different parts of the system need to be modified to handle the new configuration syntax. We are just about ready to ship. Boy is this code base different - "Oh SHIT! We forgot we need migration scripts!". So they are frantically built and hastily tested. The product ships. Customers complain. Not only do the migration scripts hork periodically, but the configuration language is new to them.
        • Version 3.0 - The server engineers are adding lots of new features to support customer requirements. Unfortunately, every new feature needs custom GUI and CLI work to handle the administration of that feature. This is simply not sustainable, so it's been decided to data drive the GUI and CLI from a specification file that describes the syntax, the interdependencies, etc for each configuration item/file. Furthermore, the community is going gangbusters, but downloading new widgets requires a restart of the server. So does all configuration changes. Once again every part of the system is changed to handle this dynamic configuration. Man is this hard - "what do I do with the data that is already in the queues when the queue is supposed to be shrunk in this re-configuration, asks one of the brightest engineers?" Hmm.

        You get the idea.
        So here in a nutshell is a list of reasons why configuration systems are so difficult. I'm sure you can add more:

        • They are actually small languages. I have seen XML, simple linear lists of attribute/value pairs, scripting languages with flow control, strange and weird languages like in sendmail, etc.
        • They need validation so they don't break the system
        • If there is GUI or CLI access, they need to be dynamically updated
        • Consistency between updates is critical so that someone editing a config file using via doesn't collide with someone using the GUI.
        • They need to be migrated from version to version or need some kind of backward compatability
        • They can be layered so local changes override system defaults
        • They need to be extensible so ultimately 3rd parties can develop configurations that are add-ons
        • They need solid documentation - ultimately self generating.
        • They should be data driven such that every time someone invents something that needs new configuration, the GUI and/or CLI doesn't need new code.
        • They need to support dynamic loading with no system restarting
        • They may need to support versioning in systems that are composed of modules, each which may be independently revved.

        Conclusion

        Configuration systems are often overlooked, but can be the core of an entire system. There is no substitute for a really good one. It's almost impossible to get it right the first time, but you must really think long and hard about where you want it to go and what you want it to become.

        Yes. I copped out. I didn't tell you how to do these things. I didn't tell you where you can look on SourceForge to find the ultimate configuration system so you don't need to re-invent the wheel yet again. That is because there is none - at least not that I know of. I have some ideas on how to build a generic configuration system that if open-sourced could save engineers months of time, but that is the topic of a different post.

        ]]>
        Eric Garner: My Interview with an IT Eventhttp://blogs.splunk.com/maverick/2007/09/23/my-interview-with-an-it-event/http://blogs.splunk.com/maverick/2007/09/23/my-interview-with-an-it-event/Mon, 24 Sep 2007 04:14:18 +0000Eric GarnerThe following is a short interview I conducted with an IT event that I discovered last week while investigating an issue within my data center.

        Maverick
        Hello and thank you for taking time to participate in this interview.
        IT Event
        No problem. Thanks for having me, Mav.

        Maverick
        So tell us a little bit about yourself. What kind of event are you? Syslog? Web App? Proxy Log?
        IT Event
        Sure. I'm a syslog event.

        Maverick
        I see. Any particular kind?
        IT Event
        Well, I'm NOT a syslog-NG event, if that's what you mean. Just plain standard syslog.

        Maverick
        No. I mean, what type? User event? SNMP trap? Something like that?
        IT Event
        Oh, yeah, I'm an sshd "session opened" event.

        Maverick
        As in reporting USER activity?
        IT Event
        Precisely.

        Maverick
        That makes sense. So when were you written out to the log file, exactly?
        IT Event
        A couple weeks ago. My timestamp is Sep 7 10:36:17, assuming you are interested in my details.

        Maverick
        Of course. Why would you think I'm not interested in your details?
        IT Event
        Well, most of the time we go unnoticed, is all. Most of the time me and all my fellow events just sit in our log file until it gets rotated out and eventually written over.

        Maverick
        You seem somewhat bitter about that. Why?
        IT Event
        Well, Mav, you would be bitter too if you had something important to say and no one to listen to you.

        Maverick
        Well, in all honesty, you are one out of thousands of syslog events that report USER activity in real-time and on a continual basis. The importance of your details, what you have to say, etc, is relative to each specific situation, don't you think?
        IT Event
        See? That's exactly what I thought you would say. That it's all "relative". That I'm not "important". What I have to "say" is irrelevant until I'm applied to some "context" or "correlation". You sysadmins are all the same. You just don't get it!

        Maverick
        Well, technically, I'm an SE, not a sysadmin...
        IT Event
        WHATEVER!

        Maverick
        WOW! Settle down, dude...everything is okay...
        IT Event
        (taking a deep breath)...sorry...

        Maverick
        No problem....Some anger there, huh? This really bothers you, doesn't it? Not being noticed?
        IT Event
        Yes, it does. I mean, I do have a purpose, a voice, something to say, and I have a need to be heard like everyone else.

        Maverick
        I understand. We all need that. I didn't mean to imply that you were not important. I was just saying...
        IT Event
        I know what you were saying. It's okay. You don't have to explain. It's not your fault. It's just the way things are. It's also one of the reasons we started the Association for Equal Rights for Events Everywhere, or AEREE.

        Maverick
        AEREE? Who is doing this? You and your fellow syslog events?
        IT Event
        Actually, ALL of the events from ALL of the log files in your IT environment as well as many other data centers around the world got together to form AEREE.

        Maverick
        Wow. I had no idea. That's great! I'm happy for you.
        IT Event
        Yeah, well don't get all TOO excited yet. We just started. We still have a long way to go, a tough journey ahead of us, if you will. But we think Splunk will help us raise awareness for our cause, so I'm not too concerned.

        Maverick
        You mean you think Splunk can help you promote event equality?
        IT Event
        Yes, exactly.

        Maverick
        That makes sense. With it's robust universal real-time indexing and time-series searching technology, I can see how the Splunk platform could help the voice of AEREE to be heard by sysadmins, developers, operations folks, etc, pretty much anyone within a company or organization, for that matter.
        IT Event
        Well that's our hope, at least. We'll see.

        Maverick
        Excellent! Well again, thank you for your time and good luck with AEREE. I wish you the best.
        IT Event
        Thank you, Maverick.

        If you found this interview interesting or if you have a story about an IT event of your own, please leave a comment and share. -Mav

        ]]>
        Ben Strawbridge: Scrum caps for scrumshttp://blogs.splunk.com/ben/2007/09/21/scrum-caps-for-scrums/http://blogs.splunk.com/ben/2007/09/21/scrum-caps-for-scrums/Fri, 21 Sep 2007 20:32:02 +0000Ben StrawbridgeWe have been using agile development processes splunk for the past few months, including sprints, daily standing meetings and functional scrum groups. Our fearless chief mind, David suggested that we should have a team leader hat, like they wear for a rugby scrum, to protect them from thrown objects.

        Our scrum leader

        I thought it was a great idea too, what do you think?

        ]]>
        Nick Mealy: Intangibleshttp://blogs.splunk.com/nick/2007/09/19/intangibles/http://blogs.splunk.com/nick/2007/09/19/intangibles/Wed, 19 Sep 2007 23:38:02 +0000Nick MealyThere's lots of subtle things that are required for good user experience. Simplicity, speed, comprehensibility, consistency. These are the core value of any software, and there's a spectrum on which they're at the other end from 'Features'.

        Features are cool. They make you sound smart. Whether you're a customer talking to a sales guy, or an engineer fleshing out an idea you had. New stuff tends to show up in sentences as the word 'feature'. It's exciting. Sure it has a certain cost in speed or something. It tends to not color entirely within the lines. But that's OK. It's new, therefore it's cool.

        Jumping forward many years though, everything at some point was new, and gets old and those costs start to suck. After a while these intangibles have pretty much been sacrificed away and you're in large-company hell, sitting in endless meetings trying to figure out how and when everything got all bloated and slow.

        So, we're trying to swim against this current as we scale (no kidding). We're trying to prioritize speed and simplicity. We're trying to keep talking to users in the trenches. We're fighting off checkbox-itis, we're trying to have new corner-case features built in offshoot, quasi-standalone manners, we're trying to use the extensible architectures we have, and create more of them when needed.

        In short, we're trying to keep that thing about Splunk that is cool. That you can get going quickly, you can set it up quickly, you can change directions quickly. It's yours to drive. When you want to do something with Splunk for the first time, it generally makes sense and doesnt take very long.

        So enter YOU! We sometimes suck at all this, and could use your help to suck less. And if you the user want to talk to us, about how you use Splunk, what searches you run with it, what stuff you've found easy, what you've found hard, we want to know. It's not terribly hard to figure out my email address since you know my name. So email me. or email ui. Really. =)

        We have some nifty but lightweight web-ex type stuff that just requires a browser, and we can do a 10min conference call and watch you drive your splunk instance around from the safety of our desks. Email me for an invite.

        ]]>
        Nick Mealy: wayback machinehttp://blogs.splunk.com/nick/2007/09/19/wayback-machine/http://blogs.splunk.com/nick/2007/09/19/wayback-machine/Wed, 19 Sep 2007 19:01:54 +0000Nick MealyIm a pretty nostalgic guy, so hanging out with me there's a lot of 'back in the day', 'onion on my belt' kind of stuff. You have been warned.

        So my history at splunk - I started here in March '05. First UI Developer, inheriting the front end built by our notorious founder Erik Swan. They brought me in as a dHTML guru and gave me free reign (crossed fingers notwithstanding). But for better or for worse Splunk has always been pretty different on the client-side. Even the alphas and private betas all were all client-side XSLT and had that holy crap moment where you wonder why the hell everything is clickable and lighting up on mouseover.

        Then during the sprint to 3.0 we ran off in even crazier directions, and did all the things we'd talked about doing, but held back from (eg endless pager, free form charting in Flash, rethinking the timeline interactions, replacing the tabs with more compact layers).

        From this point forward though, there will be more building out and less building up if that makes sense. ie no more monolithic single all powerful UI, but rather links between quasi-standalone bits. And on the monolith instead of bolting on new features Instead we'll be solidifying things, cleaning, improving, fixing.

        That said, there will still be a lot of unusual and useful interactivity. Actually probably more so overall if the monolith-maintenance burden falls as expected.

        So interesting times here at Splunk. More news as it comes.

        And oh, we are hiring. We are extremely hiring. If you are interested, or you know someone who's interested, or a friend of yours is really smart and you think he needs a new job... Send them our way.

        ]]>
        Johnvey Hwang: Driving Miss Erikhttp://blogs.splunk.com/johnvey/2007/09/18/driving-miss-erik/http://blogs.splunk.com/johnvey/2007/09/18/driving-miss-erik/Tue, 18 Sep 2007 23:33:56 +0000Johnvey Hwang Internal view: ]]>Adventures on a mini-bike amongst the boxes in engineering:

        External view:

        Internal view:

        ]]>
        Johnvey Hwang: Dev vs. Support Boat Racehttp://blogs.splunk.com/johnvey/2007/09/18/dev-vs-support-boat-race/http://blogs.splunk.com/johnvey/2007/09/18/dev-vs-support-boat-race/Tue, 18 Sep 2007 23:11:08 +0000Johnvey Hwang]]>Dev destroys support in a 4 on 4 boat race.

        ]]>
        Johnvey Hwang: AjaxWorld 2007http://blogs.splunk.com/johnvey/2007/09/17/ajaxworld-2007/http://blogs.splunk.com/johnvey/2007/09/17/ajaxworld-2007/Tue, 18 Sep 2007 01:45:34 +0000Johnvey HwangFor all you hardcore Web 2.0 fanboys, I'm giving a talk at AjaxWorld on "High-Performance AJAX Application Design" down in Santa Clara at the end of September. The official blurb is:

        Designing an AJAX application that meets enterprise scalability and performance requirements presents technical challenges that aren’t addressed by traditional AJAX frameworks. This session will highlight the techniques used in Splunk to address handling large amounts of data in the browser, persistent multi-panel state management, interface customization and localization, and interactive DOM-accessible graphics support. By leveraging existing, though less common, techniques such as iframe-style AJAX, in-browser XSLT, and contextual CSS, modern browsers can provide a compelling interface without the need for a thick-client installation.

        Come by and say hi.

        ]]>
        Christina Noren: Automating and opening up product planninghttp://blogs.splunk.com/cfrln/2007/09/15/automating-and-opening-up-product-planning/http://blogs.splunk.com/cfrln/2007/09/15/automating-and-opening-up-product-planning/Sun, 16 Sep 2007 03:14:27 +0000Christina NorenThe PM and engineering teams are embarked on an interesting experiment here at Splunk. While we've always leveraged the support case system to track enhancement requests and automate some of the input end of the product management process, the real meat of product definition has happened pretty much as it does anywhere - via product requirements documents (PRDs) written by PMs and answered by a variety of technical specifications, bugs and tasks in the engineering tracking system, emails, whiteboard sessions, etc.

        OK, it's Splunk, so the PRDs and tech specs have always been on the corporate wiki so there's some measure of collaboration. Anyone in the company could go up there and have a look at what was in progress. But it's been pretty difficult to keep PRDs and specs fully up to date while we've been innovating as quickly as we have since the initial launch of the product in 2005. And it's been impossible to give our customers and field sales engineering teams the level of transparency we want in order to get their full involvement.

        Our public roadmap has to be created manually and is of necessity fairly high level and updated only every month or so. The other PMs and I are constantly fielding a barrage of "what's the status of this feature?" questions.

        Now that engineering is moving to a scrum-based model (read what my boss has to say about that) in order to deliver functionality quicker and more incrementally, the whole notion of a PRD is obsolete. But that doesn't mean that product management is obsolete - in fact a rational process of analyzing inputs, setting priorities and communicating about new feature capabilities is more important than ever.

        So the experiment: We're hacking Jira, our bug tracking system, in order to automate the entire product planning and marketing process and facilitate real-time communication back to customers, internal stakeholders and even the community at large via our public roadmap.

        We're leveraging Jira's capabilities to create custom issues and workflows in order to reproduce the essentials of pragmatic marketing's "requirements that work" framework, the bible on effective product management. (I wish I could link to their picture but unfortunately they are so busy selling seminars the information is under lock and key.)

        This means that we setting it up to automatically bring enhancement requests from our SugarCRM system into a PM work queue within Jira; asking PMs to enter call reports and market datapoints; linking all of these to problem statements; and generating granular engineeringrequirements from these problem statements. These requirements then get triaged by the cross-functional scrum teams into sprints to deliver small, complete units of functionality quickly. Features are entered as the requirements get into enough focus in order to describe complete pieces of functionality and their benefit to customers.

        Beyond "requirements that work" planning, we're going to be driving a lot of the outbound communication off this system as well. For example, when a feature's last critical requirements are completed, we'll be automatically opening a task for a product manager to create a demo for the feature, another to update the datasheet, etc.What's most exciting is that once the system is tuned and we know it's producing accurate information, we're going to be able to give customers and the community real-time status and with the ability to give input right in the middle of the design process. Customers with enhancement requests tracked through the support portal will be able to see how they've been triaged, how the problem has been interpreted, and what requirements are at what stage of delivery to meet the request.The public roadmap will be maintained in real time, with the potential for drilldown into more of what's behind each listed feature.

        We're not the only ones trying to marry agile/ scrum with pragmatic marketing. FeaturePlan is a great dedicated product for product managers that does just that. We looked at it and like it but unfortunately it's currently Windows centric in the software version while our current internal corporate infrastructure is pretty Linux-centric, and we're too oriented around running our own systems to use their hosted version. (Here's a good presentation by Jason Tanner that describes using FeaturePlan in a similar way.)But I think that the level to which we're trying to open things up to customers and the community is new ground.

        My intent is to post here as we progress with this experiment as a way of tracking our progress and forcing myself to think through some of the challenges.

        If you're trying to do something similar at your company, I'd love to hear from you. I'm happy to share some of our process flows and schemas for Jira as well. Just drop me a line at cfrln@splunk.com.

        ]]>
        Christina Noren: Complexity and failures in the NYThttp://blogs.splunk.com/cfrln/2007/09/15/complexity-and-failures-in-the-nyt/http://blogs.splunk.com/cfrln/2007/09/15/complexity-and-failures-in-the-nyt/Sun, 16 Sep 2007 02:11:29 +0000Christina NorenI've been posting occasionally when there's some huge meltdown of a big service like the two recent Blackberry outages. My point is usually that the systems are too complex so the failure mode is usually unpredictable and hard to track down - hence the sputtering of PR people days after big outages while sysadmins are frantically digging through logs, configs and system metrics all over the place.

        Anyway, looks like the NYT picked up on the same idea. Good article citing recent outages at United and Skype and tying them into the larger problem of increasing system complexity.

        It quotes Andreas Antonopolous, who's been one of the analysts to really understand why IT Search is necessary in the face of increasing chaos and change in the datacenter. Here's a video clip hosted on splunk.com of him talking about this.

        ]]>
        Johnvey Hwang: Drugging employees for fun and profithttp://blogs.splunk.com/johnvey/2007/09/05/drugging-employees-for-fun-and-profit/http://blogs.splunk.com/johnvey/2007/09/05/drugging-employees-for-fun-and-profit/Thu, 06 Sep 2007 02:09:56 +0000Johnvey HwangBlue Bottle Coffee

        On a daily basis, I pay homage to the wonder that is Blue Bottle Coffee espresso, which flows freely - some would say excessively - from our kitchen. The benefits to productivity that this fine coffee bestows upon the dev team is enormous, easily eclipsing other contenders such as video games or foosball. Of course, there were some hurdles to get to this point, namely somebody pouring M&#038;Ms into the bean grinder of the super-automatic that was previously in service. The result was a pitiful molten mess of chocolate, beans, plastic, and gears. And, of course, the perpetrator was never discovered. So the only recourse was to beef up the machinery and move to a true commercial setup: a La Spaziale, Mazzer Mini, and freshly delivered Blue Bottle. BB even asked us what hardware we were running, and sent us the most compatible beans. Brilliant.

        ]]>
        Johnvey Hwang: Download Splunk 3.0 Today!http://blogs.splunk.com/johnvey/2007/08/03/download-splunk-30-today/http://blogs.splunk.com/johnvey/2007/08/03/download-splunk-30-today/Sat, 04 Aug 2007 06:10:54 +0000Johnvey HwangI'm pleased to announce that Splunk 3.0 has been released, and is available for download immediately! It's been a very long road to GA, but I think it is worth the wait. With 3.0, exploring your unstructured data has never been easier, thanks to the new reporting interface. As always, we love user feedback so try it out and let us know what you like and what you don't - either to me, or to support@splunk.com. Stop guessing about what's going on in your datacenter and start getting answers with Splunk.

        ]]>
        Eric Garner: In case you did not hear, v3.0 is GA!!11!1!http://blogs.splunk.com/maverick/2007/08/03/in-case-you-did-not-hearv30-is-ga111/http://blogs.splunk.com/maverick/2007/08/03/in-case-you-did-not-hearv30-is-ga111/Fri, 03 Aug 2007 19:15:02 +0000Eric GarnerAs we say here in Dallas, TX, YEEEEEEEEEE-HAW!!!1!11!!

        Splunk 3.0 is GA now!!!!

        In celebration of this wonderful day, I would like to redirect you to a previous blog article regarding a song I wrote about being a Splunk user. It's real geeky, I admit, but hey, if you use Splunk or are thinking about it, I'm am sure you can relate to it. And if you are a long-time customer already, well, then,...you know doing geeky stuff like this is part of being a Splunkhead.

        Check out my rap song called "Splunk IT"

        Also, if you have a sysadmin that is an absolute rockstar where you work, please go and nominate them for Sysadmin of the Year. Let us know what makes them a rockstar in your eyes and they might win some fabulous prizes, like a new guitar, laptop, a case of redbull, etc. Do it now!

        ]]>
        Eric Garner: Yo, I am telling you, dog, you need to Splunk IT!http://blogs.splunk.com/maverick/2007/07/09/yo-im-telling-you-dog-you-need-to-splunk-it/http://blogs.splunk.com/maverick/2007/07/09/yo-im-telling-you-dog-you-need-to-splunk-it/Tue, 10 Jul 2007 05:03:02 +0000Eric GarnerAfter being extremely inspired by all you die-hard Splunk fans out there, I decided to lay down some high-tech "geeky" rhymes over some old familiar classic rock riffs, including Queen's "We Will Rock You", Rush's "Tom Saywer", and AC/DC's "Back In Black". So...


        Yo, dog, turn up da bass and check it....Maverick is in da hayouse!


        Splunk IT.mp3

        Here are the sick lyrics, dog!

        Splunk IT (a rap by Eric “Maverick? Garner)
        Copyright © 2007, Garner. All rights reserved.

        We got all kinds of issues occurring in the system
        They’ve always been there, but I guess we just missed ‘em
        We need Splunk to help troubleshoot it
        We got Red Hat 3.0, so we won’t have to chroot it

        Yo, we got hundreds of servers in multiple locations
        And the IT folks are venting all their frustrations
        Telling me that grep is a bottleneck
        We need something better, we need to Splunk IT

        Oooooohhhh We need to Splunk IT
        Yo, Yo, I’m telling you, dog, we need to Splunk IT
        Oooooohhhh We need to Splunk IT
        (Word to your mother)

        Splunk makes life a hellavalot easier
        Everything else just makes life cheesier
        Yo, we gotta stay within our SLA
        So we DL’d Splunk and installed it today

        It’s exactly what we need and it absolutely
        Takes the S-to-the-H out of the I-to-the-T
        So now we’re gonna go to our CFO
        And request a PO because we need to Splunk IT

        Ooooooohhh We need to Splunk IT
        Yo, I’m telling you, dog, we need to Splunk IT
        Oooooohhh, We need to Splunk IT
        Splunk IT, Boy!

        So, if you can’t afford to wait, or hesitate
        Or, if you need to comply with a strict mandate
        Or cut your cost while increasing visibility
        Or beef up all your network security
        Or manage your transactions without a doubt
        Or search in real-time to figure it out
        Or keep your apps available to public
        Then what you need to do is S-S-S-S-Splunk IT

        Ooooooohhh You need to Splunk IT
        Yo, yo, I’m telling you, dog, you need to Splunk IT
        Ooooooohhh You need to Splunk IT
        What are you waiting for, beeoooottcccchhhh?!

        ]]>
        Amrit Bath: Administering remote Splunk servers via the CLIhttp://blogs.splunk.com/amrit/2007/07/03/administering-remote-splunk-servers-via-the-cli/http://blogs.splunk.com/amrit/2007/07/03/administering-remote-splunk-servers-via-the-cli/Tue, 03 Jul 2007 15:02:46 +0000Amrit BathIt's a little known (mainly because it's undocumented) fact that it is possible to use the Splunk CLI to manage remote Splunk servers. This capability has been built into the product since version 2.1, and allows one to do things such as remotely manage data inputs, run searches, manage users, etc. For fairly obvious reasons, this cannot be done with commands that require Splunkd to be stopped.

        The syntax is simple:

        /opt/splunk/bin/splunk &lt;command&gt; [&lt;subcommand&gt;] &lt;params&gt; -uri https://my2ndSplunkBox:8089

        The key here is the -uri parameter, which instructs the PCL to send all SOAP requests to the specified server. There are 3 pieces to the parameter: protocol, host, and port.

        The protocol must be one of http or https, depending on whether or not SSL is enabled on the Splunkd port. Most users will want the latter, as recent versions of Splunk enable SSL on this port by default.

        The second part is the hostname or IP address of the host that the remote Splunk server is running on. This should need no real explanation - in this case, the remote server has the hostname my2ndSplunkBox.

        The last part of the argument is the Splunkd port (aka the management port). Note that this is not the port that's used to reach the web interface, but the port that Splunkd listens on for incoming SOAP requests. If you're unsure of what this port is, try the default, which is 8089. Alternatively, splunk show splunkd-port will display the Splunkd port that the current server is listening on.

        As a practical example, one can add a tailed data input on the /var/log directory of host my2ndSplunkBox with the following command:

        splunk add tail /var/log -uri https://my2ndSplunkBox:8089

        The only caveat to this feature is that if you're logged into your Splunk server via splunk login, you will have to re-authenticate when sending commands to the remote server (and once again when you resume targetting your local server by leaving off -uri). Workarounds include using the -auth parameter or the SPLUNK_USERNAME and SPLUNK_PASSWORD environment variables, but these are better left to a later post.

        ]]>
        Amrit Bath: HI@WEB2.0http://blogs.splunk.com/amrit/2007/07/03/hiweb20/http://blogs.splunk.com/amrit/2007/07/03/hiweb20/Tue, 03 Jul 2007 14:24:36 +0000Amrit BathWell, I guess I had to start "blogging" eventually...

        Hi, I'm Amrit, the main CLI (Command Line Interface) and PCL (Python Control Layer) guy here at Splunk. This means that I maintain our more common bash scripts (bin/splunk &#38; friends), and our Python support scripts (site-packages/splunk/clilib/), which do the heavy lifting for a number of CLI &#38; Web UI features.

        These aren't the only things I work on, but they are the parts of the Splunk codebase that have consumed most of my time since starting here in December 2005. I should also mention that Ivan Tam (no blog.. yet..?), who now works on the SplunkWeb UI, helped write the first implementation of the PCL during mid-2006.

        Every now and then I'll post some tips &#38; tricks related to the things I'm working on, which you'll hopefully find useful.

        KTHXBAI

        ]]>
        Eric Garner: Splunk SEs: Your "HowTo" Teamhttp://blogs.splunk.com/maverick/2007/04/21/splunk-ses-your-howto-team/http://blogs.splunk.com/maverick/2007/04/21/splunk-ses-your-howto-team/Sun, 22 Apr 2007 04:02:15 +0000Eric GarnerRecently, I received an email from a client that was struggling with a Splunk configuration issue. He was a sysadmin trying to figure out how to setup Splunk-2-Splunk within his private testing environment. The specific issue he was encountering was not so much related to the Splunk software not working or throwing an exception, etc. But rather, it was more about him trying to understand the "how to" part of Splunk-2-Splunk.

        I think anytime you have a technical IT tool like Splunk combined with the ability for a technical person to download, install, and evaluate it for FREE, you will also have plenty of "how to" questions that will naturally accompany those evaluation efforts.

        With this said, I want to remind all you technical folks, especially those of you who may still be struggling with the HowTos of Splunk, that as Sales Engineers, it's our job to provide you with the HowTo support you need during your evaluation of Splunk. In a way, you can think of us as Splunk's HowTo Team, always willing and able to discuss and recommend the best ways to configure and test out Splunk. It's our job to make sure you understand all of the technical features and how best to leverage them for your specific needs. And, it's also our job to help you develop a strong business case for purchasing a Splunk license based on the technical benefits. That way, your manager or director can more easily justify the purchase of that license for you. And, if you are like me, more often than not you need all the justification you can get.

        On a side note, I am curious about your initial experience with evaluating Splunk.

        Therefore, please leave your comments and let me know the following:

        1) When you FIRST downloaded Splunk and began your evaluation, what features or concepts did you find yourself struggling with the most?

        2) What concept or feature were you NOT aware of at first, but later "discovered"? How did you discover it?

        3) If you could go back in time and start your evaluation of Splunk over again, what would you do differently?

        Thanks for participating. Your feedback is greatly appreciated!

        ]]>
        Nick Mealy: How to modify the 2.1 UI’s default behaviour to only search recent eventshttp://blogs.splunk.com/nick/2007/02/12/how-to-modify-the-21-uis-default-behaviour-to-only-search-recent-events/http://blogs.splunk.com/nick/2007/02/12/how-to-modify-the-21-uis-default-behaviour-to-only-search-recent-events/Tue, 13 Feb 2007 05:14:47 +0000Nick MealyiIf you only ever care about the last few hours or the last day of your data, this simple change will speed up your search results tremendously. Until our next big release which will basically be this way by default, here's how you can do this in 2.1 code.

        This is a change in three places, but fortunately very fast to make, and all in the same file.
        $SPLUNK_HOME/share/splunk/search/dynamic/main_ui.html

        Note: The example here will set your UI to search only the past 6 hours by default. After doing this it should be easy to see how to change it to search 1 day, or 45 minutes etc...

        Note: Also you dont need to restart the front end to see these changes, but you DO have to refresh your browser by clicking the refresh button up top.

        step 1) around line 70, change
        &lt;div class="#productVersion#Version landingPageState #userType#User noTimeFields eventsTab relativeTimeMode #dynamicallySetStates#" id="outerWrapper" /&gt;

        to
        &lt;div class="#productVersion#Version landingPageState #userType#User eventsTab relativeTimeMode #dynamicallySetStates#" id="outerWrapper" /&gt;
        (basically this removes the 'noTimeFields' state so the time controls are now open by default)

        step 2) around line 122 of the same file, change
        &lt;input type="text" id="relStartTime" /&gt;

        to
        &lt;input type="text" value="6&#8243; id="relStartTime" /&gt;

        (now the UI will load with "6&#8243; already entered into the relative start field)

        step 3) around line 125, still in the same file, change
        &lt;option value="hours"&gt;Hours ago&lt;/option&gt;

        to
        &lt;option value="hours" selected="selected"&gt;Hours ago&lt;/option&gt;

        (this means that hours will be selected by default. instead of minutes

        That's it. You're done. Refresh your browser and the UI will now restrict it's searches to the most recent 6 hours by default. If you really only ever care about the last 2 hours, switching it to 2 hrs may speed you up even more.

        ]]>
        Nick Mealy: quick way to allow you to autologin and run a search from a single linkhttp://blogs.splunk.com/nick/2007/02/09/quick-way-to-allow-you-to-autologin-and-run-a-search-from-a-single-link/http://blogs.splunk.com/nick/2007/02/09/quick-way-to-allow-you-to-autologin-and-run-a-search-from-a-single-link/Sat, 10 Feb 2007 02:58:03 +0000Nick MealyThis is a quick update to Mark's post from 10/9/2006

        Again, to reiterate Mark's qualifier - this is all assuming you understand that by doing this, you send users and passwords in clear text and the risks involved.

        So, uncommenting the 2 lines as described in Mark's post will only get you the first part, ie the ability to send a GET request that logs you in. We've had people ask if that request can go further and also return results right away for a particular search they also pass in. Obvious request but somehow we didnt anticipate it.

        So until we wrap this feature up in a bow in a release, once again this involves editing python by hand. And this time it's more than just uncommenting two lines. It's cut and paste, and if you know python you know that tab-indentation is meaningful, and this seemingly simple action can be deadly. You have been warned. Back up the file and proceed carefully.

        Alrighty, still with us? =) Find the 2 lines that Mark blogs about uncommenting. (this will be XMLResource.py, line 395 - 400 ish depending on which 2.1 release this is)

        Now replace those two lines with these lines below. NOTE: REPLACE HYPHENS WITH SPACES. wordpress seems to insist on removing leading spaces.

        --------if ("usr" in request.args) and ("pwd" in request.args) :
        ------------logger.info("user is attempting login on GET")
        ------------if ("q" in request.args) :
        ----------------logger.info("user attempting login on GET is requesting redirection to a permalink")
        ----------------sessNS = request.getSession().sessionNamespaces
        ----------------sessNS["postLoginRedirect"] = "/?q=" + request.args["q"][0]
         -  -  -  - return self.render_POST(request)

        now restart the python front end using splunk restartss (a full splunk restart is not necessary)
        And now you'll have the ability to embed URL's like this in the webapp of your choice

        http://your.host/login?usr=username&#38;pwd=password&#38;q=interestingTerm1%20interestingTerm2

        UPDATE - -

        as pointed out in the first comment (thanks!!) the above snippet will happily fall into a recursive loop if the auth information it's given is incorrect. New improved version below: (AGAIN, REPLACE LEADING HYPHENS WITH SPACES)

        --------if ("usr" in request.args) and ("pwd" in request.args) :
        ------------logger.info("user is attempting login on GET")
        ------------sessNS = request.getSession().sessionNamespaces
        ------------if ("cannotConnectToSplunkd" not in sessNS and "error" not in sessNS) :
        ----------------if ("q" in request.args) :
        --------------------logger.info("user attempting login on GET is requesting redirection to a permalink")
        --------------------sessNS["postLoginRedirect"] = "/?q=" + request.args["q"][0]
         -  -  -  -  - -return self.render_POST(request)
        ]]>
        Nick Mealy: one minute guide to making search results autorefreshhttp://blogs.splunk.com/nick/2007/01/10/one-minute-guide-to-making-search-results-autorefresh/http://blogs.splunk.com/nick/2007/01/10/one-minute-guide-to-making-search-results-autorefresh/Wed, 10 Jan 2007 08:37:10 +0000Nick MealyEverybody wants this, and until the day when it's built into the UI somewhere, you can use this little bookmarklet to do it in about a minute.

        So. the link below is your friend. (If you've used bookmarklets before you know what to do. Otherwise, read on. )

        Instead of clicking this link though, right-click it or option click it, and choose 'bookmark this link'.
        splunk 30 second refresh

        Once you've done that, then whenever you have Splunk loaded, clicking that bookmark will run the tiny little script, and the upshot is that the UI will start autorefreshing in 30 seconds and every 30 seconds thereafter.

        And if you want to change the 30 seconds, edit the bookmark, find the 30000 and change it to whatever.

        ]]>
        Kim Wallace: Meet the plumberhttp://blogs.splunk.com/kim/2006/12/14/meet-the-plumber/http://blogs.splunk.com/kim/2006/12/14/meet-the-plumber/Fri, 15 Dec 2006 02:37:43 +0000Kim WallaceHi! My name is Kim, and I'm the release engineer here at Splunk.

        Thanks to my acquisition-happy former employer, Symantec, I've seen a variety of startup approaches to release engineering. Most frequently it seems some senior developer has a bug up you-know-where about how the build system should work, and some poor junior developer or sysadmin type person dutifully does the drudge work (usually by hand). At other sites, some very diligent and detail-oriented person creates and executes a process with a great deal of record-keeping and attention to detail but often not a lot of automation. Consistency across different build platforms usually isn't a strong point.

        Here at Splunk, things are a bit different. I called myself the plumber in the title of this post because that's how I see my job: I create and maintain the plumbing that produces consistent, reproducible Splunk builds across all of our platforms, with as much visibility as I can muster. I see my contribution more as enforcing process through tools - ideally, tools that enable process in a way that is more convenient for everyone than "doing it wrong" - rather than personally pushing all the buttons and scribbling in all the logbooks. And I've had the good fortune to come into a culture that encourages this approach.

        Whew. That's a mouthful for an introduction. In the near future I hope to write a bit more about how the plumbing works, and some neat tools I've found along the way. I'm sure y'all will be waiting with baited breath. ;-)

        ]]>
        Brian Murphy: Auto host resolving in splunk using pythonhttp://blogs.splunk.com/brian/2006/07/05/auto-host-resolving-in-splunk-using-python/http://blogs.splunk.com/brian/2006/07/05/auto-host-resolving-in-splunk-using-python/Thu, 06 Jul 2006 06:55:46 +0000Brian MurphyThis only works in 2.0.x
        Ok so I've had a couple of people ask me how to resovle the ip addresses in their syslog files to their hostnames in splunk.
        There's no way to do this just by tweaking a config variable .. we need to dig a little deeper under the surface. It's actually pretty easy to get splunk to call out to python during event processing so I've used that functionality to solve this problem.

        Note that this will negatively impact indexing performance but it should work until we get this behavior baked into splunk.

        First up I've created a python script that calls socket.gethostbyaddr to resolve the hosts. It will also cache the results so that the performance hit for dns misses is reduced.
        So copy and paste the following into your favorite editor and save it to &lt;SPLUNK_HOME&gt;lib/python2.4/site-packages/splunk/pyHostNameResolve.py . This directory is where the dynamic loaded python will look for scripts; the filename will be referenced later in a config change.

        
        #Copyright (C) 2006 Splunk Inc. All Rights Reserved. This work contains trade
        #secrets and confidential material of Splunk Inc., and its use or disclosure in
        #whole or in part without the express written permission of Splunk Inc. is prohibited.
        
        from pipeline_data import PipelineDataWrapper #This is a virtual module/class that gets inserted into the python namespace at runtime by splunk
        import traceback
        import socket
        
        #Set global variables
        HOST_KEY = "MetaData:Host"
        
        HOST_RESOLVE_MAP = {} #cache so we don't have to call gethostbyaddr ( expensive ) every event
        
        def resolveHost( pdata, confDictString ):
            global HOST_RESOLVE_MAP
            try:
        
                host = pdata.get(HOST_KEY)
        
                resolvedHostName = None
        
                if host.startswith("host::") :
                    host = host[6:]
        
                if host in HOST_RESOLVE_MAP:
                    resolvedHostName = HOST_RESOLVE_MAP[ host ]
        
                if not resolvedHostName:
                    try:
                        resolved = socket.gethostbyaddr(host)
                        resolvedHostName = resolved[0]
                        HOST_RESOLVE_MAP[ host ] = resolvedHostName
                    except:
                        HOST_RESOLVE_MAP[ host ] = host
                        print "Could not resolve " + host
                        return 1
        
                if resolvedHostName :
                    pdata.put( HOST_KEY, "host::"+resolvedHostName )
        
                return 1    
        
            except:
                print "EXCEPTION !!"
                traceback.print_exc()
                return -1
        
        

        Ok now open your &lt;SPLUNK_HOME&gt;/etc/myinstall/splunkd.xml and insert the following chunk of xml between the diskusageprocessor and the bytequotaprocessor in the indexerpipe pipeline :

                       &lt;processor name="hostnameresolver" plugin="pythonprocessor"&gt;
                                         &lt;config&gt;
                                                 &lt;scriptFilename&gt;splunk.pyHostNameResolve&lt;/scriptFilename&gt;
                                                 &lt;command&gt;resolveHost&lt;/command&gt;
                                                 &lt;pyContext&gt;resolveContext&lt;/pyContext&gt;
                                                 &lt;pyConfig&gt;&lt;![CDATA[]]&gt;&lt;/pyConfig&gt;
                                         &lt;/config&gt;
                                 &lt;/processor&gt;
        

        Ok now fire up splunk and you should start seeing your hosts getting resolved. Note that this will negatively impact performance but it should work until we get this behavior baked into splunk.
        Cheers,
        Brian

        ]]>
        Brian Murphy: Splunk Cheat Sheet !http://blogs.splunk.com/brian/2006/04/27/splunk-cheat-sheet/http://blogs.splunk.com/brian/2006/04/27/splunk-cheat-sheet/Fri, 28 Apr 2006 05:26:29 +0000Brian MurphyI've been pretty busy so I haven't updated for a while but I thought I should share this :
        Corey Shields has made a great splunk cheat sheet ! It's available at : http://staff.osuosl.org/~cshields/?p=140
        It's pretty awesome, and I'm recommending that everyone I know that uses splunk downloads it.
        Until next time,
        Brian

        ]]>
        Nick Mealy: hip deep in fastmovingnesshttp://blogs.splunk.com/nick/2006/03/30/hip-deep-in-fastmovingness/http://blogs.splunk.com/nick/2006/03/30/hip-deep-in-fastmovingness/Thu, 30 Mar 2006 15:33:19 +0000Nick MealyFull speed ahead for the next big round of improvements and fixes and we're all going cheerfully bonkers. I was especially cheerful/bonkers today because I spent the morning prototyping some SVG stuff. In particular, since the splunk ui runs almost entirely on xml and client-side xslt, I was looking into how feasible/fast/stable it would be for our client-side XSL to just generate SVG directly, and for javascript to clone those svg nodes into a big complex DOM.

        The answer is - omg it works well. Fast, seemingly stable, it can be pushed. Even in a big javascript front end like ours, the event handlers on svg elements pass right up into our existing framework. Some small tweaks had to be made to accomodate it, but no showstoppers. And it is rare for such a complicated thing to present so few obstacles in practice.
        So thanks to Mozilla for being generally awesome, and particularly for turning on SVG in their release builds . Of course i have absolutely no idea if any svg will ever appear in the product ... We do after all have a great deal of other more mundane improvements in the works. =)

        Also, my apologies for not talking about skins. I had wanted to post a big treatise on skins, but since I'm rewriting and reworking all the css right now, it would be way too cruel; any skins you made would die with the upgrade to 1.3. So Im saving the post for another day.

        Until then, for the indomitably curious, suffice it to say that the 'invert' link hidden in the footer actually cycles through the skin list, and the fact that there are currently only two skins shipping in the product does not prevent you the user from hacking the front end and making a third, fourth skin, etc...

        if you're feeling adventurous, put another skin file in [opt/splunk]/share/splunk/search/static/css/skins/, crack open photoshop and use all the existing skin graphics as a base to make your new skin from,
        and as for how to hook up this new skin, for later 1.2 dot releases, look for the skinFileList in [opt/splunk]lib/python2.4/site-packages/splunk/search/SearchService.py

        (for I think 1.2 and older builds, the list of skins was just the link tags in share/splunk/search/dynamic/main_ui.html

        ]]>
        Nick Mealy: UI tinkeringshttp://blogs.splunk.com/nick/2006/03/15/ui-tinkerings/http://blogs.splunk.com/nick/2006/03/15/ui-tinkerings/Wed, 15 Mar 2006 15:35:36 +0000Nick MealyFirst post, so i'll begin at the beginning.

        Im the front-end guy. From the xsl,js,html and css on the client side, up to the python in splunkSearch, I am responsible (read: to blame) for the current implementation, and also for much of the interaction design. I've worked here just over a year now, and I have no noticeable scars or weird tics to show for it, so I guess I've got that going for me.

        What possessed me to come work here: Essentially all of my experience before splunk was at services companies, and for the prior 3 or 4 years specifically I had become this sort of high-throughput template builder and dhtml-specialist. Boring stuff I know, but I mention it because I came to splunk partly to get away from this. You can build lots of really complex front ends while at services companies. You can do build for flexibility and simplicity all you want, but when the project is over you never really see the code again (or worse the code never gets updated or changed by anyone and so it never evolves at all). So outside of maybe some escalated issues, you never really know what were the good parts of the implementation and what were the bad. And the development pain induced by changing requirements, the codebase evolution and accidental devolution, the day to day suffering really, you get spared from all that and that sucks.

        So here at splunk I get all that too. I'm not just some head-in-the-clouds master-template builder, I'm actually the poor slob responsible for maintaining it too. =)

        One nice silver lining is that sometimes I'll get rewarded with these little gems of code that have been going on a journey towards the nonsensical. Where say, at one point it did something simple and made sense, then something complicated was factored in alongside, then at 2AM before such and such release that changed, then later there was a quick tweak to address someones feedback, then another change etc... and then suddenly you look at it one day and its just a stunningly silly little thing and you throw it away and rewrite it in 5 minutes.

        I'll try and post again in a couple days. There's a lot of dry topics that I would love to ramble on about, but this post is pretty parched actually, now that I read it.

        So maybe instead I'll post on how to create your own skin or something.

        or even better, I'll update you on recent injuries imparted by our skateboard-trapeze-of-death... (so far Rory and Marc have had the only mishaps, Rory had nothing broken, and Marc escaped with no injuries at all)

        ]]>
        Brian Murphy: Splunking from Python Part Ihttp://blogs.splunk.com/brian/2006/03/14/splunking-from-python-part-i/http://blogs.splunk.com/brian/2006/03/14/splunking-from-python-part-i/Wed, 15 Mar 2006 02:10:22 +0000Brian MurphyOne of the neat things about splunk is that it's search interface is a SOAP call. In this post I'm going to talk about using the python modules that ship with splunk to talk to splunk over this SOAP interface.
        First off you will need to set some environment variables so that you are running the version of python that ships with splunk :


        export SPLUNK_HOME=&lt;WHERE_YOU_INSTALLED_SPLUNK&gt;
        export PATH=$SPLUNK_HOME/bin:$PATH
        export LD_LIBRARY_PATH=$SPLUNK_HOME/lib:$LD_LIBRARY_PATH

        Ok so now you should be good to go so fire up python. Your python version should be 2.4.2. If it's not do a "which python" from the command prompt to make sure you are using the python that shipped with splunk.
        We need to do some setup before any searches can be run :


        Python 2.4.2 (#1, Mar 11 2009, 21:45:07)
        [GCC 4.0.2] on linux2
        Type "help", "copyright", "credits" or "license" for more information.


        &gt;&gt;&gt; import splunk.search.splunkTest #initialize the python internals without using twistd
        &gt;&gt;&gt; import splunk.search.SearchCore as SearchCore #This is the module we are going to use to issue searches

        If you want to run against a remote splunk server or on different ports you can run the following :


        &gt;&gt;&gt; SearchCore.SearchService.gSearchService._searchEngineURL = "http://&lt;remote_host&gt;:&lt;searchengine_port&gt;"

        The method on the SearchCore module that executes the queries is called runQuery and it takes two arguments.


        def runQuery(queryString, userStr )

        The userStr can be any string for now; in future releases it will probably be an auth token. It is the user that your searches will appear under in the searchhistory domain.
        The queryString is where the magic happens ) .
        Basically a query string contains three major elements.

        QUERY : Terms following this are as you would see in the splunk web ui search box. This pulls the resulting ids into an id space internally in the query.
        GET : Terms following this instruct splunk on what extract from ids in the id space into results the result space.
        OUTPUT : How to format the results from the result space to output.

        For a more detailed reference on the query syntax check out : http://www.splunk.com/index.php/docs?doc=developer.html&#38;vers=#58

        Now for our first search :

        The meta::all key is a splunk key that every event in the system will have.


        &gt;&gt;&gt; SearchCore.runQuery("QUERY meta::all","brian")

        You will get the result "&lt;queryResult&gt;&lt;/queryResult&gt;" from this as we have not specified an OUTPUT element. Note that unless you specify a domain to run these queries in they will run in the default index ( main ).

        Run :

        &gt;&gt;&gt; SearchCore.runQuery("QUERY meta::all OUTPUT splunkui::1.0&#8243;,"brian") # We use the splunkui output here because we want to do things that the ui does like get events ...

        Now the result is :

        &lt;queryResult&gt;&lt;eventIndexedCount&gt;58728&lt;/eventIndexedCount&gt;
        &lt;ids&gt;
        &lt;/ids&gt;
        &lt;projectedResultCount&gt;1001&lt;/projectedResultCount&gt;
        &lt;clampedStartTime&gt;1049204073&lt;/clampedStartTime&gt;
        &lt;clampedEndTime&gt;1142300808&lt;/clampedEndTime&gt;
        &lt;/queryResult&gt;

        Of course your numbers will be different.
        The projected result count element is legacy and can be safely ignored.
        The eventIndexedCount is the total number of events in this domain.
        The clampedStartTime/clampedEndTime constrain the timerange in which results for this query may appear.

        Note there is still no event output ... lets fix that :


        &gt;&gt;&gt; SearchCore.runQuery("QUERY meta::all GET events::0-2 OUTPUT splunkui::1.0 format::raw", "brian") #The format::raw tells the outputter to ignore all segment information

        Results :


        &lt;queryResult&gt;&lt;eventIndexedCount&gt;19704&lt;/eventIndexedCount&gt;
        &lt;ids&gt;
        &lt;/ids&gt;
        &lt;projectedResultCount&gt;1001&lt;/projectedResultCount&gt;
        &lt;clampedStartTime&gt;1041618608&lt;/clampedStartTime&gt;
        &lt;clampedEndTime&gt;1142302043&lt;/clampedEndTime&gt;
        &lt;results type="events"&gt;
        &lt;result cd="0:1532081&#8243;&gt;
        &lt;segtext xml:space="preserve"&gt;Oct 14 16:29:38 liftoff sendmail[20336]: i9ENTcHf020336: from=&lt;erik@transaction-engines.com&gt;, size=667, class=0, nrcpts=1, msgid=&lt;416F0BE2.3060306@transaction-engines.com&gt;, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]&lt;/segtext&gt;
        &lt;timestamp&gt;1141691378&lt;/timestamp&gt;
        &lt;source cd="1&#8243; string="/opt/splunk/var/spool/splunk/maillog"&gt;
        &lt;dir&gt;/opt/splunk/var/spool/splunk/&lt;/dir&gt;
        &lt;file&gt;maillog&lt;/file&gt;
        &lt;/source&gt;
        &lt;host cd="1&#8243;&gt;localhost&lt;/host&gt;
        &lt;sourcetype cd="1&#8243; base="sendmail_syslog"&gt;sendmail_syslog&lt;/sourcetype&gt;
        &lt;type cd="38&#8243; wob=" v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0 "&gt;
        &lt;tags&gt;&lt;tag&gt;transaction&lt;/tag&gt;&lt;tag&gt;class&lt;/tag&gt;&lt;tag&gt;sendmail&lt;/tag&gt;&lt;tag&gt;com&lt;/tag&gt;&lt;tag&gt;size&lt;/tag&gt;&lt;tag&gt;net&lt;/tag&gt;&lt;/tags&gt;
        &lt;/type&gt;
        &lt;/result&gt;
        &lt;result cd="0:2223455&#8243;&gt;
        &lt;segtext xml:space="preserve"&gt;Oct 18 15:14:27 liftoff sendmail[2527]: i9IMERup002527: from=&lt;erik@transaction-engines.com&gt;, size=3690, class=0, nrcpts=1, msgid=&lt;41744043.3060306@transaction-engines.com&gt;, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]&lt;/segtext&gt;
        &lt;timestamp&gt;1141686867&lt;/timestamp&gt;
        &lt;source cd="1&#8243; string="/opt/splunk/var/spool/splunk/maillog"&gt;
        &lt;dir&gt;/opt/splunk/var/spool/splunk/&lt;/dir&gt;
        &lt;file&gt;maillog&lt;/file&gt;
        &lt;/source&gt;
        &lt;host cd="1&#8243;&gt;localhost&lt;/host&gt;
        &lt;sourcetype cd="1&#8243; base="sendmail_syslog"&gt;sendmail_syslog&lt;/sourcetype&gt;
        &lt;type cd="38&#8243; wob=" v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0 "&gt;
        &lt;tags&gt;&lt;tag&gt;transaction&lt;/tag&gt;&lt;tag&gt;class&lt;/tag&gt;&lt;tag&gt;sendmail&lt;/tag&gt;&lt;tag&gt;com&lt;/tag&gt;&lt;tag&gt;size&lt;/tag&gt;&lt;tag&gt;net&lt;/tag&gt;&lt;/tags&gt;
        &lt;/type&gt;
        &lt;/result&gt;
        &lt;result cd="0:3155870&#8243;&gt;
        &lt;segtext xml:space="preserve"&gt;Oct 21 14:03:53 liftoff sendmail[11725]: i9LL3quJ011725: from=&lt;erik@transaction-engines.com&gt;, size=2663, class=0, nrcpts=1, msgid=&lt;41782438.7060303@transaction-engines.com&gt;, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]&lt;/segtext&gt;
        &lt;timestamp&gt;1141423433&lt;/timestamp&gt;
        &lt;source cd="1&#8243; string="/opt/splunk/var/spool/splunk/maillog"&gt;
        &lt;dir&gt;/opt/splunk/var/spool/splunk/&lt;/dir&gt;
        &lt;file&gt;maillog&lt;/file&gt;
        &lt;/source&gt;
        &lt;host cd="1&#8243;&gt;localhost&lt;/host&gt;
        &lt;sourcetype cd="1&#8243; base="sendmail_syslog"&gt;sendmail_syslog&lt;/sourcetype&gt;
        &lt;type cd="38&#8243; wob=" v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0 "&gt;
        &lt;tags&gt;&lt;tag&gt;transaction&lt;/tag&gt;&lt;tag&gt;class&lt;/tag&gt;&lt;tag&gt;sendmail&lt;/tag&gt;&lt;tag&gt;com&lt;/tag&gt;&lt;tag&gt;size&lt;/tag&gt;&lt;tag&gt;net&lt;/tag&gt;&lt;/tags&gt;
        &lt;/type&gt;
        &lt;/result&gt;
        &lt;/results&gt;
        &lt;/queryResult&gt;

        Now you can see the actual event text in the segtext element in the results.
        If you want to get counts like you see in the tab headings in the splunkui you can use OUTPUT term scheduler::1.0.
        This will give you the following output :


        &lt;queryResult&gt;
        &lt;schedResults&gt;
        &lt;eventCount&gt;10000+&lt;/eventCount&gt;
        &lt;hostCount&gt;1+&lt;/hostCount&gt;
        &lt;sourceCount&gt;1+&lt;/sourceCount&gt;
        &lt;typeCount&gt;239+&lt;/typeCount&gt;
        &lt;sourceTypeCount&gt;1+&lt;/sourceTypeCount&gt;
        &lt;eventtagCount&gt;62+&lt;/eventtagCount&gt;
        &lt;starttime&gt;12/31/1969:16:00:00&lt;/starttime&gt;
        &lt;endtime&gt;03/13/2006:18:50:48&lt;/endtime&gt;
        &lt;/schedResults&gt;
        &lt;/queryResult&gt;

        Note the + marks that are the equivalent of the &gt; signs in the ui that tell you that there may be more than what is displayed.
        You may mix the splunkui and scheduler outputs in a single querystring.

        Tune in next time where I'll explain some of the more advanced elements of the search language.

        Brian

        ]]>
        Brian Murphy: Slow queries and solutions.http://blogs.splunk.com/brian/2006/03/10/slow-queries-and-solutions/http://blogs.splunk.com/brian/2006/03/10/slow-queries-and-solutions/Sat, 11 Mar 2006 05:07:25 +0000Brian MurphySince the launch of the 1.2 product some people are experiencing really slow query times. This is especially noticable when you are running a live splunk pretty often, as this tends to fragment the database quiet a bit.

        Fear not as there is a hidden undocumented call that you can make ! If you run the query "++cmd++::optimize" you will cause a database optimization. This call may take a while to return so use with care. Soon we will have a release with an auto-optimizer but if it's hampering your splunking right now you can create a live splunk to run every 10-30 mins that runs "++cmd++::optimize".

        Laters,

        Brian

        ]]>
        Brian Murphy: First Posthttp://blogs.splunk.com/brian/2006/03/07/28/http://blogs.splunk.com/brian/2006/03/07/28/Tue, 07 Mar 2006 07:20:34 +0000Brian MurphyFirst Post !

        So this is the start of my splunk blog.

        First up I'm splunk employee #1. Way back in Sept. 2004 I joined Erik, Rob and Michael when they were still based down in the VC offices in Palo Alto. I'm responsible for searches and indexing so if you have splunks that are taking WAAAY too long to complete I'm the person that's probably responsible.

        I'll post more later on what I'm coding, struggling against or just hacking on.

        Brian out.

        ]]>
        David Carasso: One Geeks Reasons for Splunkhttp://blogs.splunk.com/david/2005/09/30/one-geeks-reasons-for-splunk/http://blogs.splunk.com/david/2005/09/30/one-geeks-reasons-for-splunk/Fri, 30 Sep 2005 15:04:16 +0000David Carasso I don't think our website makes it painfully clear why you'd want Splunk.
        Here is my view why you will want Splunk.


        What is Splunk?

          Splunk is a search server that indexes all your log files.

          If you need to search and troubleshoot log files, you need Splunk. It
          handles any log format, including syslog, Apache, Jboss, mysql,
          oracle, router data, etc. It parses and indexes in real time.

        Grep works fine. Why do I need Splunk?

          grep is totally fine for small, simple, local files, but grep doesn't
          work on 20GB of log files, across a dozen servers
          ; doesn't group
          multiline log messages together; doesn't unify timestamps across
          files; doesn't automatically find related log events; doesn't show
          histograms of log events; doesn't search gigabytes in seconds; doesn't
          have a cool ajax web interface similar to google.

        What are multiline log messages?

          As an example, java exceptions look like this:

            [source:java]java.lang.reflect.UndeclaredThrowableException
            at $Proxy231.getAllAttributes(Unknown Source)
            at com.collation.proxy.clientproxy.common.Module.getModelObject(Module.java:326)
            at com.collation.proxy.clientproxy.server.action.ChangeHistoryModule.getDependencies(ChangeHistoryModule.java:402)
            at com.collation.proxy.clientproxy.server.action.ChangeHistoryModule.getIdsWithDependencies(ChangeHistoryModule.java:386)
            ...
            [/source]

          You can't use
          grep to search for java proxy exceptions because
          "Exception" and "proxy" don't occur on the same line!
          The same
          would apply to sql, router data, email, or any other multiline event.
          Splunk automatically groups
          multiline events into single events
          , so the above exception
          would become one event. Splunk does this with advanced heuristics and
          machine learning algorithms, as well as customizeable groupping rules.

        What about unifying timestamps?

          Most log files have timestamps embedded in them. Splunk understands
          dozens and dozens of timestamp formats, unifying them across
          timezones. Some log files write events out as GMT (Greenwich Mean
          Time) some as local time such as PST (Pacific Standard Time). Some
          logs can come from servers on the east coast, some from the west
          coast, or beyond. By
          normalizing all these timeszones in dozens of timestamp formats,
          Splunk allows you to say "What happened at 11:57pm", world-wide,
          across all my log files, across all my servers.
          "I got an error
          at 1:15am yesterday. Show me the log events from all my logs just
          before 1:15&#8243;.

        OK, one more. What are related log events?

          Suppose you see suspecious activity or an error. Just ask Splunk to
          find logs related to that activity. It'll find logs that have the
          same IP, UserID, URL, codes, etc. If there was a problem with an IP,
          Splunk will show you all the related events for that IP; same for
          UserID, URL, or any other code. You can even ask Splunk to show you events sorted
          by how unexpected they are!

        How much does Splunk cost?

          The Splunk Personal Server is Free. Give it a try.

        How can I get Splunk?

        ]]>