Risk management for companies includes one important component: being able to ensure business continuity and keep disruptive events and downtime to a minimum.
Even when it’s planned, downtime is expensive. When it’s unplanned, it can be disastrous for companies. It can:
- Erode your customers’ trust
- Damage your reputation
- Lead to a major loss of employee productivity
- Impact your financial health.
Gartner estimated the average cost of downtime at $5,600 per minute (or nearly $340,000 per hour). Atlassian places this number between $427 for smaller businesses and $9000 for large organizations.
Considering these numbers, it’s easy to see how the high availability (HA) of your systems is critical for the success of your business. Therefore, we’ve made it a top priority when developing ZigiOps, in order to make our integration platform 100% reliable.
In this article, we’ll explain the details of how we’ve achieved that, and how to benefit from the high availability of ZigiOps to ensure business continuity.
Why is high availability so important for businesses?
First, let’s quickly discuss the basics.
High availability is the ability of a system to remain operational despite the failure of an infrastructure component or an application. It’s measured in uptime percentage over the course of a year, and most organizations strive for achieving 99,9% (known as “three nines”) or 99,99% (known as “four nines”) uptime in a year:
- For 99,9% uptime, this means 8.77 hours of downtime per year
- For 99,99% uptime, this means 52.60 minutes of downtime per year.
The availability you need depends on the impact the outage would have on your company, as well as on the cost of maintaining this level of availability. It also depends on any service level agreements (SLAs) you have, both with your providers and clients.
High availability is typically achieved by:
- Removing single points of failure
- Designing reliable crossover procedures.
Those two components help maintain availability even when a given component fails. The reasons for failure can be multiple:
- A software bug
- A hardware failure or a cable problem
- A power outage
- An environmental issue or a natural disaster
In our world of instant connectedness, customers and partners expect 24/7 availability. While high availability is absolutely critical for some businesses, such as telecommunication operators, hospitals, and data centers, all businesses nowadays understand and recognize its importance. E-commerce businesses, SaaS providers, and IT organizations, among many others, take measures to improve availability. This effort starts with addressing the availability of each of the systems and applications you’re using, including your integration platform.
How do we guarantee the high availability of our integration platform?
We’ve designed our platform with high availability (HA) as one of our top priorities and key features, to make sure our clients can maintain excellent availability and uptime stats. As a result, even if a server fails or needs to be stopped, your integrations will still work, and connect your applications.
How did we achieve that? Let’s look into the details:
Our HA solution comprises a Primary ZigiOps server, plus at least one Backup ZigiOps server. The integrations can run on the Backup server after a manual failover procedure.
You could have a few ZigiOps servers ready to take over from the Primary one, but only one could be active simultaneously. If the failover is performed to the Backup ZigiOps server, the Primary one remains inactive until you manually switch back to it, after you address the reason of the failure (or once you complete the necessary maintenance).
How does ZigiOps’ high availability work?
We’ve designed ZigiOps to integrate data with a minimum offline footprint related to the integration state.
Runtime files are a few kilobytes in size, and the platform does not generate heavy data sets, neither in the file system, nor in a database. This makes it very easy to replicate the integration state on the Backup server and continue from where the Primary one had stopped. All you need is to synchronize runtime files between the Primary and the Backup server:
You can sync runtime files by copying them over to the Backup server manually. Alternatively, you can have the runtime parameters on a shared storage, which means it’s not necessary to copy them in the case of a Failover. You can configure a shared network directory to store them; for example (in Linux).
The passive ZigiOps server that is currently not in use must always be stopped, if another ZigiOps server is active.
What are the requirements to use the high availability feature?
To use ZigiOps high availability functionality, you need to:
- Use servers that are equal (or comparable) in terms of performance, memory and hardware.
- Make sure the same version of ZigiOps is installed on both servers, and that you have active licenses for both.
- Maintain the same integration configuration between servers.
- Use only one ZigiOps server at a time.
For maintaining high availability for listener triggers, there are some additional requirements that you can check in our documentation.
How do you configure and use ZigiOps’ HA feature?
Configuring ZigiOps for high availability is an easy and similarly straightforward process. You can set it up both for polling or for listener triggers.
If you aren’t sure whether you’re using listener or polling triggers, you could check that in the Source tab of each trigger:
High availability for polling triggers
Before you begin, you first need to configure the Primary ZigiOps server on which your integrations are running, and activate the license for each instance.
- Install the same version of ZigiOps on the Backup server as on the Primary one
- Stop ZigiOps on the Backup server
- Stop ZigiOps on the Primary server
- Synchronize the integration configuration from the Primary server to the Backup one by copying over the following directory:
– For Windows: C:\ZigiWave\ZigiOps\conf
– For Linux: /opt/zigiwave/zigiops/conf
- Start ZigiOps on the Primary server
- Leave the Backup ZigiOps service down.
The integration configuration on both servers needs to be in sync; whenever you change something in the ZigiOps’ configuration on the Primary server, you need to sync the configuration files again (step 4).
You should maintain the above-described integration configuration in sync. The recommendation is to sync the files again anytime a change is made on the active ZigiOps server.
To move the active ZigiOps instance between the Primary and the Backup servers, you need to:
Stop the Primary ZigiOps server.
- Synchronize the runtime files by copying over the following files to the Backup server:
– For Windows: C:\ZigiWave\ZigiOps\conf\settings\runtime
– For Linux: /opt/zigiwave/zigiops/conf/settings/runtime
- Start ZigiOps on the Backup server
High availability for listener triggers
High availability for listener triggersBelow, you’ll see how HA works for integration actions for listener triggers.
Examples of listener triggers you can use out-of-the-box are:
- OBM events to ServiceNow incidents (CLIP)
- OBM events to Cherwell incidents (CLIP)
- OBM events to BMC Remedy incidents (CLIP)
- OBM events to Jira issues (CLIP)
The main Failover solution described above can be used for any integration. However, when you’re using a listener type of trigger and Failover is performed to a Backup ZigiOps server, the host which serves the listener’s endpoint changes. In this instance, the integrated system must continue directing the HTTP requests to the active ZigiOps host.
In the integration actions configured with a listener trigger, ZigiOps is simply waiting to receive an HTTP request. This design allows putting a Load Balancer software before ZigiOps to automatically direct the HTTP requests to the active instance. The integrated system doesn’t need to know which ZigiOps host is serving its requests, which results in an automatic Failover.
This scenario is an expansion of the main Failover solution described above and has similar requirements. For it, you also need:
- External Load Balancing software
- VIP for each network port used in an integration action
- HTTP Health Monitors
For more information on the Load Balancer VIP configuration, please reach out to us at [email protected]
Moving the active ZigiOps integration from the Primary to the Backup server when using a Load Balancer for integration actions triggered on HTTP Listeners is very simple. You simply need to:
- Stop the Primary ZigiOps
- Start the Backup ZigiOps
There should be only one ZigiOps instance active at the same time.
High availability has a direct impact on your business, as well as on your reputation, and your clients’ trust. This is why in 2022, it will continue to be a major topic for IT. With everything moving to the cloud, you shouldn’t leave anything to chance: plan for downtime and failure and be prepared to react instantly when it happens, to ensure business continuity. This requires looking into each one of your systems, and their capacity to maintain high availability in the case of a disaster, and addressing structural vulnerabilities on time.
Once you configure ZigiOps’ high availability feature by setting up at least two ZigiOps servers, your integrations are safe, and you won’t lose any data if one of your servers goes down or needs maintenance. Besides that, ZigiOps’ retry mechanisms can handle a scenario when one of the integrated systems is down: ZigiOps will make sure it will update the missing information once it’s back up.