IT incident management was difficult enough before the pandemic, but with people working from home, using their own connections and personal devices, it just got exponentially more complicated.
More employees working remotely means there are more opportunities for incidents to arise. These incidents are harder to address because distributed IT teams have more difficulties coming to a shared understanding and getting to the bottom of an outage. It’s tougher to pinpoint the problems with a remote team, which leads to a communication barrier.
The good news is there are new tools and approaches teams can use to employ deep collaboration, especially remotely. IT incident management is when an IT-related incident has occurred which has halted the flow of the work process. Without this incident being solved, the workday will either be longer or won’t continue, for example, the internet outage.
Challenges in the Remote Work Era
Distributed workforces are operating on many different networks and using more devices than if they were co-located in an office. Sometimes, they’re even running on different schedules. IT teams are also distributed and IT incident management is more challenging. Staying in the loop and maintaining context or making sure things don’t fall through the cracks requires much greater effort than ever before.
Remote workforces also have a greater reliance on digital communication. Almost everything is done through digital devices instead of being co-located in a physical “war room,” more commonly known as a conference room. This means that there could be restrictions on when people are working on the same project, there are now more technical difficulties in syncing up, and there’s a technological learning curve. Teammates are having to experiment and learn new communication tools on their own.
IT teams need to communicate effectively to manage incidents when they occur. There are also inherent security risks to the company databases with employees working from home on multiple devices and operating on other networks. Generally, home networks aren’t secured the same way office networks are.
The everyday tasks of IT incident management are often more difficult for distributed teams. They have to collaborate to debug issues remotely. They have to figure out how to set up collaborating triggers to detect issues caused by multiple sources and systems. If their trigger combinations lead to a more significant problem, that’s how most outages happen. In addition, collaborations, like internet connections, create friction and restrictions. The solution is that teams now have to work together on a single terminal, but that doesn’t necessarily mean that they physically have to be there.
These Tools Can Help Remote Teams Collaborate Deeply
Initially, web conferencing tools like Zoom and Microsoft Teams were the go-to solutions for online communication and collaboration. However, these solutions have limits, especially for IT teams. Sending a post-mortem email after a meeting can help prevent future collaboration issues.
Many IT incident management responses can be automated with a computer-generated response. There are various tools available to help with this that are designed to meet the needs of remote IT teams like Slack, Zenduty, Squadcast, Pagerduty, and CoScreen.
- Responses that can be automated are things like which teammates will be contacted for various needs of end-users. The teammate that needs to be contacted can also depend on who’s working at the time of the call.
- Benefits of automation include faster triage, automatically compiling relevant data to speed up the investigation, and automatic response tasks.
Slack is a popular communication tool that can also be used for asynchronous IT incident management because it helps to keep all stakeholders in the loop. It helps with the collaboration process between remote teammates using instant chat messages. They can be sent by team members as well as by any of Slack’s various integrations with DevOps tools. The communication flow between teammates also helps keep everyone informed and up to date on the progress of various assignments.
It can also remind teammates of various incident management tasks throughout the day that have to be completed, like meetings that will occur soon or to submit feedback on a particular issue.
You can also connect Slack to tools like Zoom or CoScreen, so you can do a face-to-face chat with someone when you have quick questions on an ad hoc basis.
Zenduty offers tips on incident response as an incident command channel on your chat app. It sets up roles such as a commander, communications lead, operations lead, etc. It also sets up task templates and playbooks. You can integrate Zenduty with your communications tools, such as Zoom, and you can use alert rules to assign specific subject matter experts (SMEs) to an incident-management-related matter.
Zenduty helps with the automation of responses because of its ability to set up an incident command channel and roles.
Kintaba is used to manage the entire incident response process from the declaration of an incident, all the way to the post-mortem so you can learn and revise your processes once the incident is resolved.
Kintaba supports incident management teams through its built-in collaboration features, can automate structured activities, and comes with a range of integrations to the most important incident management tools.
Squadcast suggests best practices for managing incidents. It helps you use good communication practices, including documentation, central channels, and an always-open online meeting room. It includes features to keep a timeline of response activity, document resolutions, and have a blameless post-mortem.
Squadcast helps with the automation of responses and with communication. It allows you to automate your responses when an incident occurs. As a result, this tool can actually make your meetings more efficient. Regardless of whether you’re in the office or doing remote work.
Pagerduty is a real-time incident monitor. It connects all of your on-call team members when there is an IT-related incident and notifies them with phone calls, SMS, push notifications, or emails. It aggregates all of your ping, server, IP, network, website, and bandwidth monitoring tools into one single point of communication.
Pagerduty helps with the automation process because it notifies your teammates when an incident has occurred.
This eliminates the need for everyone to be in the same location working on the incident at the same time. If someone is on call, they can simply be the one to reply to the IT-related incident.
Another useful tool is CoScreen, which helps IT teams collaborate more effectively with an all-in-one tool. It is a multi-user tool for sharing and control that lets users share desktops. It’s compatible with any desktop application, so users can share any terminal or dev tool. It also has audio and video chat, which enables clear and contextual communication. CoScreen allows everyone to share, edit and control applications simultaneously, creating the feeling of working on the same machine with unparalleled flexibility.
CoScreen helps with the deep collaboration between teammates. Everyone used to have to be in the same room together while they were working on IT incident management. Now with everyone on CoScreen, teammates can collaborate to find bugs, diagnose issues, and respond to incidents by collaborating with all teammates on the issues at the same time in a virtual war room.
A tool like CoScreen will continue to be useful because it allows you to bring in SMEs quickly and makes it easier for on-call engineers to work remotely. It speeds up incident response—even when collaborators aren’t in the same room—and helps to instantly create a shared context.
Best Practices for Hybrid IT Incident Management
Many companies are already exploring hybrid models, meaning some members will work from the office and some will continue to be working from home. This can be by alternating days or set per employee. This ensures a more flexible work-life balance. The tools and best practices mentioned above are helpful for all IT incident management scenarios, including a hybrid model, and will help improve IT incident management in your organization.
For example, Atlassian recommends these basic best practices should be followed:
- Emphasize on over-communication to avoid any misunderstandings. Generally, when an incident occurs, it’s best to give as many details as possible because you never know what component will help the IT incident technician understand the issue. For example, numerous IT technical difficulties can cause a flashing screen, but details like what you were working on before it occurred can help diagnose the issue and prevent it from happening again in the future.
- Openness and transparency can maximize efficiency. Be honest. Just like in the incident above, you’re going to want to be open and transparent about what you were doing before the flashing screen occurred. Even if that means you spilled your coffee on your computer. Don’t waste your IT manager’s time by making them go down the list of things that could have gone wrong.
- Document the resolutions. Maybe you keep getting the same issues between the night shift and the day shift. If you document what the solution was for the incident once, it mitigates the response time needed to solve it.
- Using a combination of synchronous and asynchronous collaboration is useful no matter if you are working remotely, in a co-located office or hybrid. There’s more in-depth information in our Complete Guide to Teaching Programming Remotely article. When an incident occurs, try using various collaboration tools or a mix, such as boosting collaboration with CoScreen for Zoom.
Deep Collaboration for SRE Teams is Here to Stay
The shift to remote work is changing the way almost everything is done, including IT incident management. It has accelerated the use of tools that facilitate deep collaboration to ensure high availability and reliability even of the most complex systems. When things go back to normal, certain aspects of the remote era will remain. That said, deep collaboration tools will help teams be more productive and efficient in any setting.