This week in “That happened: notes from #splunk”, a blog about the goings-on in the Splunk IRC channel: troubleshooting full disk with debug logging on, living with ADD, and enabling management pr0n.
BUT FIRST a message from our sponsors: We just released the code behind our custom-built Splunk documentation platform, Ponydocs, as open source! Check this blog post from Ashley, our WebDev manager, the docs I wrote for users of Ponydocs, and the GitHub repo. And now we return you to your regularly-scheduled IRC shenanigans:
Sometimes it’s hard to be the allmighty tallest^Wduckfez:
<duckfez> oh my aching head
<duckfez> colleague #1 – “app xyz errored out due to a disk full”
<duckfez> colleague #2 – “Well, that disk is full because app xyz has multiple threads doing runaway logging”
<duckfez> colleague #3 – “OK, I turned off everything but debug logging”
Stay on target…
ftk tells us to stop and smell the roses:
<^Brian^> anyone ever notice that servers boot faster if you aren’t consoled into them and watching the things go by
<duckfez> ^Brian^: and the chance to hit “F1” for setup is only 250ms long when you’re not looking
<^Brian^> duckfez: i had a hell of a time hitting f12 for the pxe boots
<^Brian^> i was like wtf, i can’t look away
<ftk> ^Brian^: why would you need to look away
<^Brian^> ftk: cause i’m too busy talking with my cow-orker
<ftk> what could be more important than being prepared to hammer ALT+R in ilo
<JPres> ftk: ADHD
<ftk> just turn of the twitters and facebooks for a minute
<ftk> it’ll be ok i promise
50 shades of input recursion
All in the service of management pr0n:
<edeca> I have data in a folder structure like: /blah/<host>/<type>/0001.txt – is it possible to make directory monitoring follow all /blah/*/<type>/*.txt easily?
<duckfez> edeca: sure, [monitor:///blah] whitelist=^/blah/[^/]+/type/.*\.txt
<edeca> Ah clever, I see what you did there!
<duckfez> (noting that splunk will recurse through all of /blah looking for files to match that white list)
<edeca> Cheers, that’s brilliant. And I can pick out hostname with a regex.
<ziegfried> host_segment = 2 might be slightly more efficient
<duckfez> if you want the hostname in the path to be the “host=” of the event, then use host_segment=2
<duckfez> like ziegfried just said
<edeca> Nice. All this is too easy.
<edeca> I need to read up on the indexing, some stuff gets indexed twice if the data files are overwritten (even with identical data)
<duckfez> it’s the speed of the overwrite, likely
<edeca> Ah, because the CRC of the file end doesn’t match?
<duckfez> splunk sees, “well, I know this file used to be here, and it was X bytes.. now its way < X, and the end CRC doesn’t match … perhaps it got rolled?”
<edeca> Can I tell it that files will never get rolled?
<duckfez> that I’m not sure of … what are you doing, rsyncing into a tree?
<edeca> Which is my situation, as I control all the data splunk indexes (not using it for host monitoring, vis/searching of other event data)
<edeca> Another node does some data processing then moves across the output
<edeca> And sometimes (annoyingly) reprocesses files
<duckfez> make your moves atomic
<duckfez> eg, don’t use ‘mv’ across filesystems, and if you must, ‘mv’ to a temporary name first, then a second ‘mv’ to overwrite the first file atomically
<duckfez> from an atomic operations point of view, cp /foo/bar /baz/.bar.tmp && mv /baz/.bar.tmp /baz/bar ; is vastly superior to cp /foo/bar /baz/bar
<edeca> That makes sense, nice idea.
<edeca> Moving data across NFS links is how it works, so I like it.
<ziegfried> heh, thx
<edeca> Nice, I now have loads of extra data (>3 million in 20 seconds..) flowing into splunk with the host type auto recognised. Cheers guys!
<edeca> Time to draw some management pr0n^W^Wgraphs