Thursday, December 7, 2017

Multi Service Incident Update

We are pleased to announce we have added the ability to create incidents which affect multiple services. This is an extremely important and long awaited enhancement which forms part of a large rework of our internal systems in StatusHub.

To see how it now works please see this article in our help centre - Managing Incidents Across Multiple Services

If you have any questions please - contact us

For those of you that are interested or curious the following is a detailed description of how we planned and crafted this big update and how the system now works.

As background due to a legacy issue there was a serious inconsistency for users as maintenance events could be tied to multiple services but this was not the case for incidents. Now both event types work in the same way.

As mentioned this was part of a larger project to rework key parts of the system. For multi service incidents (MSI) we did not want to just add a simple drop down and move on. We also wanted to ensure that by adding this option we would not introduce unexpected behaviour or worse, unpredictable behaviour.

So the first step was to clearly define what MSI means. The answer was unexpected at first but logical. MSI means supporting multiple overlapping incidents for a service.

Without the overlapping scenario we would have to restrict the list of available services to only those which are event-less at this time. In order to create incidents that are affecting multiple services, one would have to clear existing events first.

Another legacy and counter intuitive element which we took the opportunity to remove was the option to allow users to set the status on a service without creating an incident or maintenance event. Without this making MSI work would have been extremely difficult if not impossible.

Why counter intuitive? StatusHub is not a tool to check if something is up or down. If an end user experiences your website as down, and your end users are checking your status page on StatusHub, then the information that your website is just down will bring zero value.
StatusHub is about communication. To be transparent with your end users about first: when something will be resolved, and secondly: what happened. Therefore allowing our users to change status per service without an event is no longer valid.

This view was also shared by some of our customers who asked us to block the ability to set a service status without any event over the last several months.
This choice introduced a key task which was to preserve the historical data as many StatusHub users have used this feature to date. In some cases due to ease of use, it was a simple drop down on our control panel home page instead of creating an incident using the form.

For others it was set via the API. We have chosen to keep the Service Statuses API instead of removing it entirely. However it's logic is now changed to automatically add an incident with a generic name to every status change so those who rely on this API option can continue to use it but can hopefully transition in their own time to creating only events.
The same approach was taken for past data. All 'naked' service statuses have been converted to generic incidents.

Implementation


With these two key decisions outlined above made we were able to start our project which involved re-working our internal logic and data querying almost from scratch.
Working with events only resulted in much cleaner and less complex code. A very important goal to ensure maintainability and quality.

The system no longer needs to operate in terms of what is happening now and what happened before and just needs to work with what is happening at this very moment. Now, with multiple overlapping incidents, the problem of "What's the status of this service when I'm closing my incident?" starts to be non-trivial.

As an example, take a web application. One team noticed that due to problems with the database, the web app is responding very slowly. So they set the web app status to 'yellow' and they are trying to fix the problem. A pretty simple case.

But as problems like to appear in pairs (or worse), a 3rd party services has gone dark. Unfortunately this service was vital to the same web app and has resulted in a complete outage.
A second team, responsible for the 3rd party integrations sets another incident also affecting this web app and sets the service status to 'red'. So far, everything is simple.
But now let's assume that first team finished their work and the DB is operating fine (and not as a result of lower load due to web app being down, their fix was a problem with underlying DB storage performance).

They want to close their incident. But which status should they use ? 'green' because their problem is solved and in this aspect the app should be working fine? Or 'red' because in the end the app is not working which they can clearly see from their checks?
Or now let's assume that the second team finished first. Is the service 'up' (they have fixed their problem) and the web app should work fine or 'yellow' because maybe the DB team hasn't finished yet?

So for this reason we have decided to not explicitly set service status when closing an incident.  And because we have decided to operate with events only, we can tell StatusHub to not care about individual service statuses but care only about "Is there an event at this time?".

So returning to our example, in the first case, when the DB team finishes first and closes their incident, the web app service will be 'red' because the other event is dictating the status.
Only then when the second incident is closed will the service will be set as 'up' again.

In other words: Service status at any point in time is a result of much simpler logic: "Is there any event then? If so, use the status of the worst event, if not the service is 'up'". The same applies to aggregated historical views on your StatusHub page: "Was there any event on that day ? If so, then use the worst status from the event or events. If not, then the service was 'up'".
Now users who update incidents, don't have to check with other users to ask if the service should be up or not. They can focus on their part only.

One more thing!


Another final change, prior to this update after adding services to a maintenance event there was no way to remove it. If a user made a mistake, they had to recreate the maintenance without that particular services. A very poor user experience. With this MSI release we have addressed this problem too.

Now a user can remove services from maintenance and from incident events at any point.
With incidents it is more complex but we have put a solution in place. When creating incidents, you don't know when it will be resolved and can’t know how many updates will be posted.
In order to remove a service from an incident, one has to do this from the hub history view. The same view that was always used to edit already created incidents.

Services will have to be removed from all incident updates to disappear from this incident entirely. "What's the point of an incident where one of the services will not be updated while others will be?"
This is a hint towards a feature that we want to complete next year which is the ability to skip notifications for incidents updates that are not changing the status of a service which an end user is subscribed to.

Like many elements of software sometimes what looks like a simple change on the surface is hiding many complex work and systems underneath.

Again we want to express our thanks and appreciation to all our existing customers and users who have been so patient in waiting for this update. As you can see we needed time and care to do it right and now these underlying changes give us a stepping stone and flexibility to introduce more great enhancements to StatusHub going forward.

As before if you have any questions or feedback on this please do let us know - contact us.

Share:

Friday, August 11, 2017

Your StatusHub is now viewable inside your ZenDesk Support.




As a customer of StatusHub, you are aware that you can quickly and easily communicate with your customers when an incident occurs. Your dedicated StatusHub, helps you to communicate both planned and unplanned events. Those who are subscribed to your StatusHub can receive email or SMS updates anytime you update your hub.

However,  when the average person receives over 120 emails a day, it is possible that these updates could be lost to an overflowing inbox, which is why at StatusHub we have introduced in what we aim to be the first of many ways of displaying your StatusHub in a number of external sources.

With our partnership with Zendesk, you can now display a number of events from your StatusHub directly inside the main ZenDesk Support interface.




Available directly from Zendesk's App Marketplace your support agents can quickly and easily see what is happening with any service at any time, without having to leave the Zendesk environment to dig through previous emails or SMS messages.

With the app, your support team will be able to view any issue that has happened in the last 90 days along with any maintenance planned for 90 days into the future.

We look forward to seeing how you use this new feature and as always we welcome any feedback you may have.

For more information about our app, check out the Zendesk marketplace.
https://www.zendesk.com/apps/support/statushub/?source=app_directory 

To integrate your StatusHub with your Zendesk, take a look at the Zendesk article on our Knowledge Base. https://support.statushub.com/hc/en-us/articles/115002273005
Share:

Wednesday, July 12, 2017

StatusHub: Feature Updates July 2017

After our recent addition of new features to StatusHub, this latest batch is focused on improvements to existing features, most of which were requested by customers. Please keep sharing any feedback you have so we can continue to improve StatusHub.

SMS subscribers limit

Each StatusHub plan has a limit on the number of SMS available each month.  To help you manage your usage we have introduced a new feature that will allow you to set limits on the number of subscribers who can receive updates via SMS. Please note this limits the number of recipients and not the number of SMS messages sent. 

How to article

Disabled SMS subscribers

To unsubscribe from SMS updates from your hub a recipient (subscriber) can opt-out simply by replying to any SMS update with any of the following STOP, STOPALL, UNSUBSCRIBE, CANCEL, END or QUIT.
Previously, when this happened, there was no way for the hub owner to know this subscriber had cancelled their subscription and might have led to confusion if they had forgotten that they cancelled the subscription.

 To help you quickly identify subscribers who have cancelled their SMS subscription they will be highlighted in red.

Reducing the number of incident updates for subscribers

When an incident occurs, it is common for multiple updates to be communicated during the incident depending on the time and severity of the issue. StatusHub allows you to easily keep sharing updates to share the progress you are making on resolving the issue.
Some customers informed us they had certain subscribers who are only interested in receiving updates when an incident is identified and when it is resolved and don't need to see the progress updates.
To solve this your StatusHub subscribers can now select an option in their settings. When selected they will just receive the first and last incident update.

How to enable less verbose incidents.

Improved subscribers CSV import tool

Our previous CSV import tool for subscribers was insufficient: The tool only allowed email and subscribers had to be imported to one hub at a time.

The improvements to the CSV import mechanism include making it a multi-step process. This includes a validation step which allows you to preview what will be imported and see any potential errors.

Importing Subscribers via CSV.
https://support.statushub.com/hc/en-us/articles/115004170209

Subscribers API

As the start of a series of updates we are making to the API we have introduced a new subscribers section. The API allows you to list/view/create/update and delete subscribers.
These new API elements are documented at https://app.statushub.io/api#api-subscriber


Please let us know if you have any questions on any of these new additions and please get in contact if you have other suggestions or feedback to help us to keep improving StatusHub.
Share:

Thursday, May 25, 2017

StatusHub: Feature Updates May 2017



We have been busy with some company changes over the last few months. We recently launched our new branding and website. As a result, we were overdue a product update on StatusHub.

The following are recent changes and we'll be posting more regular updates here going forward. 

Incident and Maintenance Templates

When your site or service is experiencing downtime, and you're updating your StatusHub, you don't want to be held up by what to name an incident and what the message should say. You can now create incident templates and maintenance templates to use as a starting point.

Generic Webhook

StatusHub currently offers integrations for Pingdom, PagerDuty, New Relic, VictorOps, and UptimeRobot. Like most SaaS company's we would love to be able to integrate with every external service that can improve your experience with StatusHub.

But this is not feasible, so we have introduced a generic webhook.

With this new webhook, any product/service that supports sending POST requests with JSON payload can be integrated with Statushub. E.G. Products like Nagios and services like SolarWinds.

ATOM feed

We have made some updates to our RSS/ATOM feed.  By adding the following parameters you now have more control over the content displayed on your feed.

parameters:

  • days_before
    • Limit how many days in the past to look for
      • default: 90
      • min: 1
      • max: 180
  • days_after
    • Limit how many days in future to look for
      • default: 10
      • min: 0
      • max: 90
  • limit
    • Total amount of events
      • default: 100
      • min: 1
      • max: 200
https://your_hub.statushub.io/atom/maintenances?limit=10&days_after=1&days_before=7

Would show a feed with maintenance's that occurred over the 7 days and any that will happen tomorrow but don't show more than 10 maintenance events.

This allows your feed to act as a timeline of changes allowing subscribers to your feed to see updates to your page even after an incident/ maintenance has been resolved.

Hubs Switcher

Managing multiple StatusHub's from your dashboard has always been easy. But if you wanted to share these multiple hubs internally with your colleagues it meant having to share individual links for each hub.

Until Now, introducing Hub Switcher. With this new feature, you can add a dropdown to each hub and allow users to switch between multiple hubs.

Subscriber Updates

We have made some updates to the subscriber's feature, in the first step towards offering your subscribers a portal where they can manage their subscription across a variety of channels.

Allow Hub Owners to update/manage subscriptions on behalf of their subscribers.

As the owner of a hub you can see, how many subscribers you have, what services they have chosen to receive updates from and how they want to receive them. We have now added the ability for you as the hub owner to edit these subscriptions. 



In the Actions column of the subscriber's table, you will notice a blue link. From here you can edit, the services a subscriber has signed up to receive alerts from.

Added global "All/None" in service selector for subscribers

We have added an all/none selector to make it easier for your subscribers to select the groups and services they wish to be kept up to date about.




Search functionality in subscribers.

As your subscriber list grows, it can be difficult to find and edit or remove their subscription quickly. So we have added a search function that will allow you to quickly and easily find who you are looking for.



Please let us know if you have any questions on any of these new additions and please get in contact if you have other suggestions or feedback to help us to keep improving StatusHub.

Share: