In these times of remote teamwork, the pressure on IT teams is at its peak. So how can you ensure teams function well and conditions are good when working remotely? How do you ensure that the IT Ops teams can support the business as per usual? VPN, office suite, critical applications, videoconference, etc. The list of priorities change, new business apps need to be added while your kids and their endless energy become your face to face office colleagues. :)
According to Atlas VPN user data, VPN usage has increased in almost every single country in March (+112% in Italy, +53% in the United States but estimated to increase over 150% by the end of April) and this has a direct impact as many enterprises have to support multiple network and security technologies stressing VPN concentrators, DHCP servers, the number of SSL sockets, etc.
As the need for collaborative tools also explodes, more and more companies tend to make some changes in security to meet VPN demand such as using split tunneling for example.
The objective of this blog is not to go into very technical details but rather to help (at my humble level, but with the help of some colleagues) our customers by pointing to certain tools and practices to cope with an increase in remote work needs, not only to absorb internal demand but also to allow IT operations teams to work more easily remotely (someone said “distributed NOC”?).
Here are the main questions we will be addressing:
- How do I collect the relevant data to monitor all systems’ smooth operation for remote workers?
- Where in my environment is the next bottleneck coming up?
- How can I share the big picture within my (remote) IT Operations team?
- How can I take action when I’m not at my wall of screens in the NOC?
Get Data In to avoid blindness
Naturally, you are already monitoring your network, your VPN, endpoints, etc ... but not that long ago, it wasn’t strictly necessary to supervise in-depths details such as access to certain applications in the cloud. At the end of this blog, you’ll find a (long) list of applications and other sources of information (from our Splunkbase, or Splunk Answers, even a fresh new add-on created by my fellow colleague Matthias Maier...) that should set you up to onboard data more quickly and easily as well as monitor usage and issues.
You don’t have time to look at such a long list? Don't despair, Splunk created a dedicated webpage listing Splunkbase Solutions for Remote Work. Our CTO, Tim Tully, and his team created Remote Work Insights (RWI), a solution composed of technical add-ons, dashboards, and connectors delivering real-time visibility across multiple disparate systems (VPN, Okta, Zoom…). RWI is available to any organization and includes free Splunk resources to understand your distributed workforce (and we made sure the dashboards in RWI rendered well in the Splunk mobile app as well).
More pressure on remote access = more risks
To save VPN resources or control costs (especially in high bandwidth consuming applications like videoconferencing), or just deal with the lack of transport services in specific areas of the country, more and more companies are changing their remote access approach by adopting split tunneling. Microsoft has posted an interesting blog on “How to quickly optimize Office 365 traffic for remote staff & reduce the load on your infrastructure“ where they recommend the use of split tunneling. This phenomenon becomes a troubleshooting challenge and might impact the way you monitor your WFH (work from home) infrastructure as your organization cannot easily monitor web traffic on the remote device through the VPN connection anymore.
Splitting the tunnel on the remote endpoint gives you two (or more) data paths. So to my previous point, you might want to also gather data from both paths and onboard data from your endpoint agents at the same time you monitor activity in-depth across your online services (G Suite, Office365, Salesforce…) to ensure you can support your business even if part of the traffic is not routed via your VPN.
There are several options for monitoring your endpoints such as UberAgent (paid service - refer to the dedicated link section), or Nexthink (paid), but there is another option to explore: install a Splunk Heavy Forwarder (HF) or Universal Forwarder (UF) on your endpoints.
To do this, you’ll need to do the following:
- Identify users critical apps and services
- Define the right data point to monitor
- Create an input.conf for an HF/UF and use addon data input or command input or execute a batch/python script that puts in stdout the timestamp with the metric (more details on the scripting in the links section).
- Investigate within apps like Splunk Add-on for Unix and Linux (to collect some network statistics, network interfaces information…), Curl command app (to poll data from REST API, etc
- send it via outputs.conf to the Splunk server and build your dashboard
- Or simply use the new free Add-on created by my fellow colleague Matthias called “WebPingi” available on GitHub that will allow you to monitor web services from the perspective of your endpoints.
- Connect everything to IT Service Intelligence (if you use it) to see the big picture.
Example of dashboard using WebPingi add-on to measure performance from a remote workers system to cloud applications
I thought about what to monitor and ingested the relevant data, now what? - The single pane of glass
Yes, the IT practitioner’s role is to look after critical applications, systems, networks, etc but they also need to look after themselves. We still spend countless hours looking at too many tools/screens, switching from one screen/tool to another. It is made much worse when your NOC/service desk “wall of screens” is now...your laptop (and the kids are still running around). Splunk IT Service Intelligence might help you see the big picture, save time and identify the issue faster. Here is a mockup of a glass table to monitor what is going on in a complex WFH situation.
(suggestion of glass table mockup- might require paid apps)
I can see what’s going on, how do I act?
I now have full visibility across my WFH infrastructure, but what if the delivery person rings the doorbell just as an incident occurs?
We can’t expect remote IT Practitioners to closely watch their laptops, or lose time juggling between their smartphone and their laptop. Working from home means that we have less time to do our daily tasks (manage incidents, change on-call schedules, curate content for post-incident/ problem management review and more). VictorOps might help you route alerts to the right people using a mobile-first experience that leverages machine learning to make on-call accessible wherever you are (at the door, accepting the delivery in this case).
As a hitchhiker, I have reached my destination. I hope that this blog has helped you. Thank you for reading, I know, it’s a loooong blog, but it’s a challeennnnging topic as well. A special thanks to my colleague Matthias Maier for proposing the HF/UF way...and creating the WebPingi add-on over the weekend after a nice Friday afternoon chat!
Other blogs that might interest you (keep screening Splunk’s blog as new content is constantly created on this topic):
- IT Monitoring: How Do I Know Who is in My Network
- IT Monitoring: Everyone’s Video Conferencing Now
- Securing a New Way of Working
For detailed information about installing Splunk Enterprise apps and addons consult the App Deployment Overview
Splunk Apps and Add-ons
|App Name||Short description||Runs on Splunk version||Link|
|VPN, DLP, Security...|
|Cisco ASA||Populate dashboards around remote access and bandwidth on key links||6.6 to 8.0||https://splunkbase.splunk.com/app/1620/|
|Cisco ISE||Extract and index ISE AAA Audit, Accounting, Posture, Client Provisioning Audit and Profiler events.||6.0 to 7.3||https://splunkbase.splunk.com/app/1589/|
|Cisco AnyConnect Network Visibility Add-on & App||Analyze and correlate user and endpoint behavior and visualize data with pre-built reports for AnyConnect NVM||6.3 to 8.0||
|Palo Alto App for Splunk (Firewall & Panorama, Traps endpoint protection, Aperture SaaS, WildFire Malware…)||Correlate application and user activities across all network and security infrastructures from a real-time and historical perspective.||6.3 to 8.0||https://splunkbase.splunk.com/app/491/|
|Zscaler Splunk App||Reporting on web usage, Remote access usage, Zscaler DLP…)||7.0 to 8.0||https://splunkbase.splunk.com/app/3866/|
|Access management and MFA|
|Okta Identity Cloud add-on||Connects to the Okta Identity Cloud REST APIs to report on events, users, groups and application assignment information.||6.3 to 7.3||https://splunkbase.splunk.com/app/3682/|
|SailPoint IdentityNow AuditEvent Add-on||This Splunk add-on provides an easy way to extract audit event data from SailPoint's IdentityNow product||6.5 to 8.0||https://splunkbase.splunk.com/app/4088/|
|RSA SecurID||Collect data from the RSA SecurID Authentication Manager (AM) server via syslog (prebuilt dashboard panels included)||6.6 to 8.0|
|Duo Splunk Connector||Collect and report on Authentication Logs, Administrator Logs, Telephony Logs and Endpoint Logs.||6.5 to 7.3||https://splunkbase.splunk.com/app/3504/|
|Online Services, Apps, Productivity...|
|Microsoft 365||Dashboards for Teams, Exchange, Active Directory, Sharepoint, OneDrive...||7.2 to 8.0||https://splunkbase.splunk.com/app/3786/|
|Splunk add-on for Microsoft Office 365||Pull service status, service messages, and management activity logs from the Office 365 Management API||7.0 to 8.0||https://splunkbase.splunk.com/app/4055/|
|Splunk app for Microsoft Exchange||Gather performance, log and configuration data from all elements of Microsoft Exchange and its underlying infrastructure||7.2 to 8.0||https://splunkbase.splunk.com/app/1660/|
|G Suite for Splunk||Allows to interface with G Suite, consuming the usage and administrative logs provided by Google||7.0 to 7.3||https://splunkbase.splunk.com/app/3791/|
|Splunk add-on for Salesforce||Collect different types of data from Salesforce using REST APIs (event logs, SOQL…)||7.0 to 8.0||https://splunkbase.splunk.com/app/3549/|
|Splunk add-on for Box||Collect data from Box and monitor Box events in near real time (enterprise events, user and groups data,...)||7.0 to 8.0||https://splunkbase.splunk.com/app/2679/|
|Dropbox Business App for Splunk||Logging, membership and device activity, security metrics...||6.4 to 7.3||https://splunkbase.splunk.com/app/2755/|
|Slack App for Splunk||Collect and index data on your Slack activity||6.5 to 7.3||https://splunkbase.splunk.com/app/3542/|
How to pool QOS API /
Splunk-to-Zoom notification webhook
|WebPingi Add-on||WebPingi allows you to query a webservice and get the DNS & PageLoad from a Splunk Heavy Forwarder install on a endpoint machine.||https://github.com/matthias2maier/webpingi|
|Splunk Add-on for Unix and Linux||https://splunkbase.splunk.com/app/833/|
|UberAgent (paid addon)||Agent for Windows end-user-computing collecting data from physical PC, Virtual Desktop, Citrix, VMware Horizon...||6.3 to 8.0||https://splunkbase.splunk.com/app/1448/|
|Other Splunk products listed|
|Splunk IT Service Intelligence||Machine learning-powered service analyzer tree to leverage the right data and quickly determine the service, application, or infrastructure origin of an incident and its root cause.||https://www.splunk.com/en_us/software/it-service-intelligence.html|
|VictorOps||Engage people where they work. Mobile-first experiences leverage machine learning to make on-call accessible wherever you are.||https://www.splunk.com/en_us/software/victorops.html|
|Atlas VPN stats||https://atlasvpn.com/blog/vpn-usage-in-italy-rockets-by-112-and-53-in-the-us-amidst-coronavirus-outbreak/|
|Splunk documentation about scripting||https://docs.splunk.com/Documentation/Splunk/8.0.2/AdvancedDev/ScriptSetup|
|Nexthink (3rd party solution)||Nexthink delivers a positive workplace & service experience, it provides visibility into your end-user computing environment.|
Until next time,