Welcome to our docs site. Docs on this site are for ACP version 9.
See these links for previous versions: Version 8, Version 7

Adding and Removing Zookeper or Cache Nodes on the Apprenda Platform

This page displays information about adding and removing the number of Zookeeper and Cache nodes in your Apprenda Platform confguration.

Cautions and Pre-Requisites

This document applies to versions 6.7.0 and newer only.

Currently (as of 8.2.0) the platform allows the addition and/or removal of Web/Application/Linux/Load Manager/Database nodes via the Platform Installer, but ZooKeeper and Cache nodes must remain static.

Adding or Removing of Cache and ZooKeeper nodes is a manual process and may require platform downtime. This procedure is not normally recommended under regular circumstances. Every effort must be made to backup/snapshot the existing platform prior to attempting the following processes. In the event of a failure during these procedures, a full platform restore may be required. It is advised to execute your planned procedure in a lab environment first to insure the process is understood before executing the steps in a platform that is in use for production or development.

For either Cache Nodes or ZooKeeper Nodes, all server operating system and role based pre-requisites must be applied prior to adding the node(s) to the platform. See the “Preparing for the Apprenda Platform” document relevant to the version in use.

Adding / Removing ZooKeeper Nodes

If plans are to both add and remove ZooKeeper nodes, it is recommended to add new ZooKeeper nodes to the platform before removing any to ensure the platform remains functional.

NOTE: The current ZooKeeper Connection string is limited to 128 characters in SQL so it may be necessary to add one and remove one at a time in order to be able to fit the hostnames in the column and remain below the 128 character limit.

IMPORTANT CONSIDERATIONS PRIOR TO ADDING OR REMOVING ZOOKEEPER NODES TO REDUCE APPLICATION DOWNTIME:

The fundamental “problem” with adding or removing ZooKeeper nodes is that the connection to ZooKeeper is established on workload startup. This means that if the ZooKeeper nodes are changed, workloads will be trying to use nodes that no longer exist and not using the new nodes. In addition to guest application workloads, this will also affect Apprenda’s Windows services.

The ZooKeeper client is designed to tolerate node failure so may not be necessary to restart the workloads to establish new connections if at least one of the original nodes remains present. There is a trade-off, however, in that clients will only use the nodes that are known. If, for example, only one node remains from before the change and that node goes offline, any workloads running before the change will lose their connection completely until that specific node is back online.

Basically, there is a trade-off to be made here. You can either keep everything up and avoid guest application downtime, or you can have the best future availability if one of the ZooKeeper nodes becomes unreachable.

What needs to be restarted?

  • Load Managers
  • Host Controllers
  • Windows Containers
  • Linux Containers
  • WCF Services
  • IIS App Pools
  • Wars

Are there any methods to avoid guest app downtime?

If there are enough available resources, it is possible to use a deployment policy and the Move functionality to reduce or prevent guest application downtime.

  • Create a Custom Property named 'Available' with values 'Yes' and 'No' that applies to Applications and Servers and has the default 'Yes'
  • Create a 'Must Match' Deployment Policy that uses the 'Available' property
  • For each server:
    • Change the 'Available' property to 'No'
    • Use the Move functionality to move all workloads off the server
    • Restart any Apprenda services on that node
    • Change the 'Available' property back to 'Yes'
  • Remove the Deployment Policy and Custom Property

If workloads can be moved without affecting the application (this should be possible as long as they have no in-memory state), this process should result it no perceptible downtime for the guest applications and a fully transitioned platform.

Obtain full instructions

A detailed set of instructions including the above information is available at this link