SharePoint Incident Mangement

There is a huge difference between incident management and problem management, and I still constantly come across people who get the two confused.

Without getting too technical an incident is something that can be easily resolved, if not resolved then worked around and it can be a recurring thing with a known resolution, and thats OK.  Many a SharePoint system runs fine with incidents regularly occuring, as long a you have a well trained support team and good support tooling.

Lots of connected incidents can usually be grouped into a problem and problems are much more serious when you have people who don’t clearly know the difference as you end up with scattergun approach to resolution where they try and deal with it.  You should ideantify and deal with the incidents indiviaually, eliminate them if you can, provide workarounds and try to identify a single or connected root cause, and the problem might well have a single or different set of root causes that appear to be the same thing.

Too many people with SharePoint systems immediately look at the SharePoint platform and look at that as the obvous cause of the issue, even when nothing may have changed in the platform, so unless you have recently installed something, increased your usre numbers dramatically or grown your data corpus magnificantly last weekend it is more likely to be deeper down the technology stack.

Modern SharePoint is ultimately more complicated than the historical dedicated hardware soltuions we saw with 2003 and early 2007 platforms, SAN’s, DAS, NAS based storage, hypervisors with on-demand resource allocation, redundant hypervisor base network pathing, follow the sun based operations, kerbros, claims and diffent domain considerations, integration with CRM, MS Project Server, Meridio, Documentum, Livelink, SAP Duet etc. etc.  Inevitably things go wrong at more than the SharePoint instance, so typically if nothing has changed in SharePoint and your having issues the last place to look is SharePoint itself.

SharePoint delivery has moved from being a bottom up implementation to a holistic top down strategic arsenal, but problem and incident resolution remains the same.  If it is lt an obvious resolution to what is a commonly known incident in SharePoint fairly easily resolved via the forums on Technet for example, the resolution is usually best found by starting from the bottom (networks and SAN) and working up through the hypervisor layer into the SQL and Sharepoint servers.

Don’t do scattergun resolution, it never works and take a lot longer than focussed incident management.