For decades, system administrators have worked largely in the shadows to maintain the accessibility and uptime of your most important IT services. And, while the rise of DevOps and cloud computing has led to more people with a hybrid sysadmin/developer skillset, the primary duties of a system administrator will always be required.
System administrators are normally tasked with the installation, maintenance, configuration and repair for servers, networks and other computer systems. They dabble in both hardware and software – learning a little bit of programming and scripting to execute tasks and actions across their applications and infrastructure. In the world of DevOps, software developers are becoming more like sysadmins and sysadmins are becoming more like developers – leading to better collaboration and tighter feedback loops across all teams.
Because the system administrator role has changed so much in the last decade, we’ve put together this definitive guide to being a system administrator today. Let’s take a look at:
- The basic roles and responsibilities of a System Administrator
- Tips and resources for being highly effective in a Sysadmin role
(Interested in more IT roles? See common IT salaries by roles, location and more.)
What’s a System Administrator?
As a sysadmin, you’re essentially maintaining the entire technology and IT stack. In the technology industry, this means you’re literally maintaining the system holding up your entire business. You might even be working directly in the NOC or the SOC.
For every second that your website or server goes down, it means lost productivity, revenue and hefty costs of downtime. So, above all, sysadmins need to be efficient problem solvers. With numerous operating systems, network configurations and security concerns to keep in mind – being an effective system administrator means you can learn new things and maintain strong feedback loops with your development team.
To get more granular, let’s look at 12 common Sysadmin job responsibilities so you can better understand the skills and technologies you’ll need to be acquainted with.
1) Monitoring & alerting
Depending on your toolchain and technology stack, the system administrator is in charge of monitoring and alerting across your applications and infrastructure. To detect incidents, monitor core server and network metrics like:
- Disk usage
Then, you can set up alerts based on monitoring thresholds to receive on-call notifications in case of major incidents. It’s important that sysadmins know how to use both external system outputs and metrics to determine the health of their systems, leading to more observable architecture.
(Get tips for configuring adaptive thresholding.)
2) Administering user permissions & administration
System administrators are generally in charge of user permissions and administration for all applications and services. sysadmins can assign user roles and manage the entire organization’s IT stack, allowing everyone the access they need to certain applications and services in a secure way.
3) Managing SSO & passwords
The Sysadmin is tasked with managing passwords and single-sign-on (SSO) policies and practices across the company. They are able to reset passwords and ensure security requirements are met everywhere.
If using SSO and/or two-factor authentication, the system administrator is in charge of managing these tools and helping employees get access to the systems they need when they need it.
4) Managing files
To ensure data organization and consistency, the SysAdmin will usually place policies and procedures around the way files are organized and shared within the organization. Along with most of the other SysAdmin responsibilities, this is to ensure security from external attacks as well as ensuring appropriate, easy access to files for employees.
5) Defining system usage policies & procedures
At a very general level, the system administrator will need to define best practices for working within the organization’s systems. This includes everything from proprietary software you’re building to different third-party IT applications and services.
By educating employees how to use systems in a secure, productive way, sysadmins are able to completely change the way work is conducted within an organization.
6) Installing & maintaining software
It’s the sysadmin’s job to put policies and procedures in place to keep up with software installation and updates. If there are any errors with new updates or interdependencies between new versions of systems, the Sysadmin should be able to detect these issues and fix them.
7) Planning for redundancies, rollovers & recoveries
Sysadmins should have active, updated plans for redundancies, rollovers and incident recovery. Through effective monitoring, alerting and cross-functional communication, the system administrator should be able to quickly detect any failures and remediate IT incidents.
Security should be top-of-mind across everything a system administrator works on. Whether it’s user permissions or the way the team maintains documentation, the sysadmins needs to perform all actions in a secure way. As they set up networks, policies and servers, the sysadmin will know how to do it in a technically sound, secure way.
(Learn about infrastructure security and DevSecOps.)
9) Maintaining documentation & runbooks
SysAdmins are often tasked with maintaining documentation and keeping runbooks up to date. In a world of CI/CD, this can be a daunting task. System administrators need to know how they can leverage automation to keep runbooks and documentation accurate and updated without slowing the development lifecycle.
10) Detecting & remediating incidents
System administrators can’t simply throw their IT and security environment together. They need to build it with visibility and speed in mind.
- How can you set up a system to allow for rapid incident detection, response and remediation in case an issue does pop up?
- What kind of monitoring and alerting needs to be in place?
- What’s the communication strategy if you experience an outage?
Sysadmins should be on top of all of these questions in order to make the most of their incident management practices.
11) Performing post-incident reviews
Many times, system administrators will be in charge of conducting post-incident reviews for their affected systems.
- How long did it take to identify the issue?
- How long did it take to actually remediate the incident?
Keeping up with post-incident reviews, collaborating with other affected teams and taking detailed post-incident notes can help improve IT and software developer relationships, leading to better feedback loops and more reliable deployments. Use post-incident reviews as a way to learn from your past mistakes and improve people, processes and technology for the future.
12) Preparing & problem solving
At the core, a good system administrator will be an excellent problem solver who can find ways to prepare for unknowns. Today’s teams are deploying more complex architecture faster – making a sysadmin’s job more complicated than ever.
So, finding ways to reduce bottlenecks in the deployment lifecycle while simultaneously reducing risks in your IT and security infrastructure will always make your life as a Sysadmin easier.
System Administrator skills & technologies
To be effective, system administrators need to know more about programming, automation and cloud computing. Sysadmins aren’t simply rebooting servers and decommissioning old equipment — they maintain the reliability and uptime for all of your software and hardware.
So, let’s cover the more modern skills and technologies that system administrators should be familiar with.
Configuration management and automation
Get comfortable with the CMDB, for starters. Then, you can move into configuration automation tools like Puppet, Chef, Ansible and Jenkins is paramount to SysAdmin success. These tools allow system administrators to automate a number of tasks and configurations along the release lifecycle – leading to fewer errors and faster deployments.
This enables developers to spend more time building new applications and services instead of reworking projects in the current pipeline or fixing support escalations.
With the dominance of AWS, Azure and GCP, system administrators everywhere need to understand how to orchestrate systems in the cloud.
- What types of monitoring and alerting tools should you use?
- What’s the best way to manage your servers and networks now that your infrastructure is cloud-based?
System administrators work on questions like these all the time, building redundancies and security into the entire system. But, as nearly every application and service moves to the cloud, it’s one of the most important skills for sysadmins everywhere.
Git and other version control forms
Git is the most popular form of version control. Version control is a way to track changes in code and different versions of an application or service. This way, if there’s ever an issue with the current version of a service, sysadmins can easily rollback deployments or updates to fix the problem.
Version control is essential to maintaining a reliable CI/CD pipeline and providing visibility to projects across all of engineering and IT. Sysadmins need to understand version control so they can quickly see what developers are doing, identify issues and fix them – many times before they ever happen for customers.
(Read about source code management.)
Server and network upkeep
As mentioned above, SysAdmins need to understand the ins and outs of server and network upkeep. These servers and networks are the pillars holding up your entire business and providing value to customers. So, system administrators need to be continuously improving on processes in order to:
- Maintain more reliable systems.
- Avoid outages as much as possible.
- Improve incident response when an incident does strike.
Scripting and programming
System admins are increasingly writing scripts and programming. This need for system administrators who frequently write code has given way to a newer movement in site reliability engineering (SRE).
Traditionally, sysadmins have been highly reactive toward incidents in production due to the code that was passed to them by developers. But, as sysadmins and SRE teams start to write code more often and collaborate with developers earlier in the deployment lifecycle, they’re able to proactively identify problems and fix them more often.
Sysadmins who’re effective at writing scripts and programming are highly coveted in today’s market because they can actively help improve system reliability and drive business value.
(Explore an SRE’s roles and responsibilities.)
Appreciate your System Administrators
System administrators rarely get the glory they deserve. They frequently respond to on-call incidents at 4AM and fix incidents that could potentially lead to millions of dollars in lost revenue and negative customer experiences.
Within any good IT and engineering team, there’s a constant balance between speed and reliability. While developers are often pushing the boundaries on speed, sysadmins are doing the hard job of slowing them down before they go too far — ensuring greater reliability and security across all of your applications and services.
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.