SharePoint, PowerShell and Network Latency

I was listening to Todd Klindt and Shane Young at TechEd North America this year (the link goes to the recording of the session). The session was on the basics of SharePoint 2013 administration. During the session, Todd mentioned an interesting metric – all the servers in a SharePoint farm should have no more than 1ms of network latency between them.  I, of course, had what I thought was a perfectly reasonable question – how do you go about monitoring that? The answer was a little less than satisfactory to me as I am somewhat of a monitoring guy. Use ping, but only when you suspect latency is an issue.

Problems rarely happen at convenient times when I can measure latency in this way. Normally, it occurs when I like to be asleep, which is why I like automated collection of metrics. How do we automate this collection and display it in a reasonable manner? A little scripting later and the use of my favorite tools – I now have a solution. Let’s take a look at the script first.

$LocalIPAddresses = Get-LocalIPAddress
Get-NetworkConnections `
| Where-Object { $LocalIPAddresses -contains $_.LocalAddress -and $_.State -eq "ESTABLISHED" } `
| Select-Object -Unique RemoteAddress `
| Foreach-Object { Test-Connection -Count 1 -ComputerName $_.RemoteAddress -ErrorAction SilentlyContinue } `
| Select-Object Address,ResponseTime

Ok – I cheated a little bit here. There are a couple of cmdlets that I failed to include, and I will get to them shortly. Firstly, let’s take a look at the basics. My first action is to get a list of local IP addresses. I want to record the latency of the connections that this computer makes outbound, so it makes sense to only log connections originating from one of these IP addresses. Then I get a list of network connections (more on this later), filter out those connections where this computer is not the origin, de-duplicate the resulting list, then use Test-Connection to ping the other computer, and finally record the address and the response time.

Getting the list of local IP addresses is found within the WMI Win32_NetworkAdapterConfiguration class, and my function for getting this list is simple:

function Get-LocalIPAddress
    $AdapterSet = Get-WmiObject -Class Win32_NetworkAdapterConfiguration
    $IPAddressSet = @()
    foreach ($adapter in $AdapterSet) {
        foreach ($ipaddress in $adapter.IPAddress) {
            $IPAddressSet += $ipaddress

Each network adapter has a list of associated IPv4 and IPv6 addresses, so we loop over each network adapter and get the list of addresses, then add each address individually. The Get-NetworkConnections function uses the netstat –no command to get a list of current connections. The code is slightly longer, so I’ve put it on Github (along with the complete script), so feel to download. When you run the command, you get something akin to the following:

Address      :
ResponseTime : 302

Address      :
ResponseTime : 68

Address      :
ResponseTime : 95

Address      :
ResponseTime : 65

Address      : fe80::b51b:8187:4746:9b7e
ResponseTime : 0

Our next task is to get the data into Splunk on a regular basis. For this, I turn to the SA-ModularInput-PowerShell add-on, which runs PowerShell scripts in a host process on a regular basis. All I need to do is to add the following to the inputs.conf of my app:

script = & "$SplunkHome\etc\apps\MyApp\bin\get-networkconnections.ps1"
schedule = 0 */5 * ? * *
sourcetype = MSWindows:NetworkLatency
source = Powershell

As you can see, I run this every 5 minutes so that I can get continuous information on network latency throughout the day. I have a lookup in an upcoming app that provides a list of SQL Servers that my SharePoint servers communicate with, so now I can monitor just the latency values that matter with this search

sourcetype=MSWindows:NetworkLatency | lookup SPServer Address OUTPUT Type | where Type=”SQLServer” | timechart ResponseTime by Address

In closing, if it can go wrong, it’s worth monitoring. If you monitor it, you can correlate the changes with other things going on within your systems. This allows you to get to the root cause of a failure much quicker, which allows you to reduce the time your service is off the air.

Posted by