Information about the servers that comprise the Platform can be found in the Servers page of the SOC. Navigate to the Servers page from the top navigation menu by going to Infrastructure> Servers.
From this page you can manage all the servers that have been added to your Platform.
When a Windows/Linux server is added to the Platform or an SQL/Oracle instance is configured to host Platform databases, that server or node will appear in the Servers page (accessible from the Infrastructure menu) under the Servers tab or Database Nodes tab respectively. It is possible for a machine to appear under both tabs if it hosts Platform services and databases.
Servers will appear in this list once they are running the Apprenda Physical Host Windows service (for a Windows server) or the Apprenda Linux Daemon (for a Linux server) and have communicated to the rest of the fabric. At that point, the server’s resources–such as RAM, CPU, and disk space–are noted as participants in the fabric. This alone does not mean the server will be used by Apprenda’s deployment mechanisms. Nonetheless, with the Apprenda Physical Host Windows service/Linux Daemon running, the Apprenda Platform knows of the server and its resource capabilities.
This page will also so the servers that are part of a Kubernetes cluster that has been added to a Cloud in your Platform.
The list of servers participating in your environment is located in the Servers tab on the center of the page. Each item in the list displays a server’s name and images corresponding to its Roles:
Items in the list will also be tagged with a Cloud Name corresponding to the Cloud to which the server belongs, and an OS icon denoting the type of operating system the server is running:
Note: If Resource Throttling is enabled, the Servers page will display the memory and CPU capacity and allocation for each server:
To view more information about a server, click on either the server’s name or the server’s View Details button to get to the server’s Overview page. From this page you can see details about the server and all the workloads deployed to it.
On the left side of the Servers page, a display shows where the Core Components of the environment are located. Note that this may include nodes that host Apprenda Components (such as the Cache, Load Manager, Platform Coordination, or Repository) in a standalone fashion and that do not run the Apprenda Physical Host Windows service or Apprenda Linux Daemon. See more about different server roles on the Platform.
For every cluster added, you are able to view the nodes in a cluster though the SOC in the same way that you can view the Platform servers. Each cluster node will also have a Server Details page that lists information about the node and a list of the Pods deployed on the cluster. In Platform version 8.2.0, from a Server Details page for a cluster node, you can switch to other nodes in that cluster’s Server Details page by using the drop-down list at the top of the page.
The Apprenda Platform allows you to download a Remote Desktop Connection file (RDP) to connect directly to each one of the Windows servers participating in the environment. To download the file, click on the arrow next to the View Details button for the server and select Remote Desktop.
The Apprenda Platform allows you to download a Management Console file (MSC) to remotely manage each of the Windows servers participating in the environment. The console loads the following snap-ins: Services, Event Viewer and Certificates. To download the file, click on the arrow next to the View Details button for the server and select Manager Server.
In the Server Details box of a server’s Overview page, click on the View Custom Properties link. This will display a window containing a list of that server’s Custom Properties settings. For more information see the documentation on setting and utilizing Custom Properties settings.
In the Platform, a Workload is defined as an instantiation of an application component currently deployed. The available Workload types:
In order to see which Workloads are currently deployed on a server, click on the server name in the Servers page. The Workloads tab contains a list of all the Workloads currently deployed on the server. You can search for a particular item using the Search textbox and filter down by:
Note: If Resource Throttling is enabled, the Server information page will show additional graphs displaying the total memory and CPU available and allocated for that particular server. In addition, the Workloads tab will display information on the Resource Policy assigned to each Workload.
At times, it may be necessary to start or stop a single web service or Java Web Application instance. To do so, click on the Stop button on the right of the workload item or click on the arrow and select Stop. To start an instance, click on the corresponding Start button. Note that taking these actions on a single .NET service/Java Web App instance only affects that instance. Other instances of the same guest application component may continue to operate elsewhere on the Platform if they are deployed.
You can remove a .NET service/Java Web App workload by clicking on the arrow to the right of the workload item and choosing the Remove option. If you choose to remove a workload, you are not only killing the running process if the web service/Java Web App is started, but you are also instructing the Platform to remove that instance’s local launchpad (binaries) from the server and unregister any record of the instance with the rest of the Platform.
You can view the launchpad for a .NET Service Workload by clicking on the arrow on the right end of the Workload item and choosing the View Launchpad option. This will open a new tab and direct you to the Repository Browser page, where the selected launchpad information will appear.
You can view the details for a .NET Service Workload by clicking on the arrow to the right of the Workload item and choosing the View Details option. This will open a pop-up window with the following information:
Click on the Router Control Panel tab in a Windows Server’s details page to view that server’s routing profile. Each server runs a message routing process called the Apprenda Router that ushers web service requests around the Platform to known endpoints. This is how Apprenda maintains location transparency when deploying application artifacts to various servers on the Platform.
To understand the Router Control Panel, it is important to understand what the Apprenda Router does – so we’ll use a small sidebar to illustrate.
Imagine the communication pathway of a standard request to an application GUI: a request arrives at the GUI via an HTTP request and the GUI needs to display data in its response, so it might make a call to a deployed web service in order to obtain information or perform business logic. Instead of the GUI having to be aware of the endpoint address(es) of web service instances for the target service type, the GUI is configured automatically by Apprenda to talk to a local Apprenda Router on the same server. That local router maintains awareness of endpoint URIs across the Platform and forwards the message to an appropriate endpoint, waits for a response, and then ushers the response back to the GUI.
The Router Control Panel, as expected, displays a list of the specific endpoint URIs for which the router has address records. This is akin to viewing routing tables in DNS. This routing table is maintained by the router and Apprenda to ensure that the router has knowledge of any necessary endpoints to which it might need to forward messages.
Click on the Flush button to empty the router’s routing table. The router will then begin to auto-discover endpoints as requests are made and endpoint URIs are needed to satisfy those requests. This is a good approach to take if web service requests are taking a long time or timing out completely, as that may indicate that the router has stale records in its routing table for endpoints that no longer exist (Note: the router and Apprenda continuously work together to maintain these records in real time, so this shouldn’t happen).
This is the ability to transition servers to different states of activity within your Platform. It allows for easier troubleshooting and removal of servers from your Platform by giving you control over workloads hosted and deployed to a server.
Note: You are not able to transition servers on a Kubernetes cluster into different states. The buttons described in this section are inactive on a cluster server’s detail page.
A server in this state is functioning normally and has no restrictions on Platform activity.
This state is shown as Online - Needs Attention in the SOC. A server in this state means that the server is in the Online state but the Platform cannot reach it or has detected that a feature of the server has failed during a Health Report.
In this state, a server will function normally for its existing workloads, but will be removed from future deployment strategies. This means that a Reserved server will not be considered a valid target when your Platform is deploying new workloads, and no new workloads will be automatically deployed to it. Operators will still have the ability to manually deploy new workloads to or move workloads from a Reserved server.
A use case for this state would be if a new server was being added to the Platform, but further on-Platform testing and configurations were needed before the server is ready to be a full part of your Platform and host an important workload. The server can be added to the fabric through the Modify an installation workflow and then placed into the Reserved state. This will block the normal deployment activity of your Platform from accessing the new server. The important workload can be manually deployed to the new server and testing can be performed without impeding the rest of your Platform. When the new server is ready, it can be moved into the Online state to become a full part of your Platform.
A Reserved server can be removed from your Platform. You can also upgrade your Platform while a server is in the Reserved state, and the server will retain its Reserved state after the upgrade.
This state is shown as Reserved - Needs Attention in the SOC. A server in this state is in the Reserved state but the Platform cannot reach it or has detected that a feature of the server has failed during a Health Report.
Placing a server into the Maintenance state will cause your Platform to redistributed its workloads to other acceptable servers and remove the server from deployment strategies. A server in Maintenance will have no workloads hosted on it and will not be considered by your Platform for new workload deployments. In the event that your Platform cannot find another server to move a workload to when a server is moving into Maintenance, your Platform will instead move the server into the Reserved state. This will stop the server from receiving any new workload deployments and give you the opportunity to manually move or remove the workload(s) that couldn’t be moved automatically by your Platform. Servers hosting core workloads, like the Load Manager or Development Portal, cannot be put into Maintenance without having another server to host those services.
A server in Maintenance can be removed from your Platform. You can also upgrade your Platform while a server is in maintenance, and the server will retain its state after the upgrade.
When a server is Activating it means that the server is transitioning to a different state. Once the server has been transitioned, the Server Overview page will automatically update to show the most current server state information.
A server’s state can be changed from a server’s Overview page by using the buttons above the Server Details box. Servers can be manually placed into the Online, Reserved, or Maintenance states only. If a server is in a Down state (OnlineDown or ReservedDown), the Platform has transitioned the server into that state because the Platform cannot reach the server or has detected that a feature of the server has failed during a Health Report. While in a Down state, the Platform will block the server’s participation in future deployment strategies and workloads hosted on the server will not be counted towards the application component instance count for set scaling requirements.
The current state of the server will be highlighted. To change the server’s state, click on the state you want (Online, Reserved, or Maintenance). These buttons are disabled on Kubernetes cluster nodes.
When changing a server’s state, you are required to provide a reason. Once the new state is selected, an input box will appear for Platform Operators to use. After providing a reason, your Platform will automatically begin changing the server’s state.
When a reason is available for a server state transition, it can be seen by hovering over the question mark shown next to a server’s State in the Server Details box.
Note: Server Health reports are not available for Kubernetes cluster nodes.
Server Health is the ability for a server to perform checks on different attributes of itself, make assessments on the performance of the attributes in regards to the health of the server, and take actions to mitigate any risk for current and future workloads. Server Health also provides Platform Operators with more diagnostic tools to troubleshoot problems by generating reports of a server’s health and making these health reports easily accessible.
When Server Health is enabled, all servers in your Platform will perform periodic Health Reports to ensure that the server can perform basic functionality for hosting workloads. Health Reports consist of several individual checks that are responsible for testing a server attribute and making assessments of the performance of the attribute. Once a server has run the checks locally, the server will report its health to the Platform as the aggregated least healthy result of any of the checks it runs. Depending on the result of a Health Report, the Platform may transition the server into a different state to prevent current and future workloads from being negatively affected.
The table below shows the potential outcomes of a single check and a description of what the outcome means. If the result is the least healthy result of any check in a Health Report, the Platform will transition the server into the state shown in the table.
The server feature is running and should support normal Platform activity
The server feature is not functional or beyond Platform Operator set limits and the Platform should prevent deployments to the server
The server feature is not functional or beyond Platform Operator set limits and the server should be considered Down.
There was a timeout while performing the check
State defined by the NodeHealth.CheckTimeoutOutcome Platform Registry Setting
There was an exception while performing the check
State defined by the NodeHealth.CheckExceptionOutcome Platform Registry Setting
For example, when a server in the Online state runs a Health Report and one of the checks returns a result of Limited, the health of the server will be reported to the Platform as Limited and the Platform will respond by transitioning the server into the Reserved state. As Reserved, the server will continue to host its current workloads, but will not have any additional workloads automatically deployed to it. The server will remain in the Reserved state until the problem causing the Limited check result is resolved and the server can report Normal health. When the Platform detects the server is reporting Normal health, it will transition the server into the Online state.
For every state transition the Platform initiates, the Platform will record a reason in the same way a reason must be provided when a server is manually transitioned between states. Platform Operators will be able to see the reason from the Server Details box on a server’s Overview page.
If a server was manually placed in the Reserved State or Maintenance State by a Platform Operator the Platform will not bring the server out of that state if the server has Normal health. If the Platform detects a health of Failed, it will move the server into a Down state and, once the problem is resolved, will transition the server into the state it was in before the Failed result.
There are several checks run on each server. Where applicable, a check is run on Windows and Linux, only Windows, or only Linux. Some checks have default Platform-generated Custom Properties that are used to determine how a check is run on Windows Platform servers. Custom Properties can either be set globally to change the behavior of a check on all Windows servers from the Custom Properties page in the SOC, or by editing the value of a Custom Property from a Windows server’s Overview Page to change the behavior of a check on a particular server. Any Platform default for a Custom Property can be overridden by adding a new value to the Custom Property. No Linux server check configuration can’t be changed through Custom Properties.
*NOTE: Checks marked with a * are read only. This means that the check will gather data on the current state of the server, not test for server functionality. Read only checks will always return a Normal result in a Health Report.
|Check Name||Description||Possible Results||Server Type||Custom Properties|
|Bootstrap API||Checks that a server can issue requests to the Bootstrap API to deploy and promote applications correctly||Normal, Limited||Windows, Linux||NA|
|Coordination Cluster Connectivity||Checks that the server can connect to the Platform Coordinator||Normal, Failed||Windows, Linux||NA|
|Disk Space||Checks the available disk space of the server||Normal, Limited, Failed||Windows, Linux||DiskSpace-MinimumBoundryMiB, DiskSpace-CriticalBoundryMiB|
Distributed Cache Access
|Checks that the server can connect to the Platform Distributed Cache||Normal, Limited||Windows, Linux||NA|
|Logstash||Ensures that the Logstash server is running so the server can log correctly||Normal, Limited||Linux||NA|
|MSDTC Configuration||Checks Microsoft Distributed Transaction Coordinator for correct configuration||Normal, Limited||Windows||NA|
|*Performance Statistics (Report-only)*||Reports information about the general performance of the server by collecting samples from the configured counter areas, such as CPU or Memory.||Normal||Windows, Linux||(For Windows servers only) Performance-CounterPaths, Performance-DelayBetweenSamplesMs, Performance-NumberOfSamples|
|Platform Repository Access||Checks the connection to the Platform Repository||Normal, Limited||Windows, Linux||Repository-CheckApplication, Repository-CheckSystem|
|*Top Processes by Memory (Working Set)*||Reports the top processes using memory on the server||Normal||Windows||Memory-TopCount|
|Workload Management Queue||Checks that the Workload Management Queue is processing requests correctly by the server||Normal, Limited||Linux||NA|
|UI Workload Management Queue||Checks that the UI Workload Management Queue is receiving messages and processing requests correctly||Normal, Failed||Windows||UiQueue-RestartAfterRepeatedFailureCount|
The following Custom Properties can be applied to a server to affect how a check is performed or what is checked on a server.
NOTE: These Custom Properties will only affect checks on Windows servers.
DiskSpace-MinimumBoundryMiB (display name: Minimum MiB required for a node to work properly) is the lower bound of available free space needed for the server to host workloads and participate fully in Platform deployment strategies. If the amount of free disk space is above this number, the server can host workloads and can be considered for future workload deployments. If the amount of free space is below this number, the check will be considered Limited and the Platform will transition the server into the Reserved State because the amount of free space is too small to host current workloads efficiently and accept new workload deployments.
DiskSpace-CriticalBoundryMiB (display name: Minimum MiB required for a node to function at all) is the lower bound of available free space needed for the server to host workloads. If the amount of disk space is above this number, the server will be able to host workloads (and if it is also above the DiskSpace-MinimumBoundryMiB limit the server will participate in workload deployment strategies). If the amount of free space on the server is below this level, the check will be considered Failed and the Platform will transition the server into a Down state (OnlineDown or ReservedDown) because the disk space is too low for the server to host workloads.
Memory-TopCount (display name: Top processes by memory use) determines how many processes are returned in the top count. Defaults to 5 processes.
Performance-CounterPaths (display name: Performance counter paths in health report) sets the performance counters a check should collect and report on. By default, this value is set to collect the total Processor Information. If a new value is added, the Processor Information default will be overridden by the new value and each additional counter should be added as a separate entry. Values added should follow the form “category\counter\instance” where any part of the value can be replaced with a * to indicate that any matching path should be collected for the report.
Performance-DelayBetweenSamplesMs (display name: Milliseconds between Performance samples) defines the time (in milliseconds) between performance samples taken during a check. The default value is 250ms.
Performance-NumberOfSamples (display name: Performance samples to collect) defines the total number of samples to collect per check. The default is 120.
Repository-CheckApplication (display name: Application Repository Check) determines if the check will test for the Application Repository access. If true (default), the check will test for the Application Repository access during the check. If false, the check will not test for the Application Repository access.
Repository-CheckSystem (display name: System Repository Check) determines if the check will test for the System Repository access. If true (default), the check will test System repository access. If false, System Repository access will not be tested.
UiQueue-RestartAfterRepeatedFailureCount (display name: Consecutive failures before restarting UI Manager process) sets a limit of consecutive failures of this check. After the limit is hit, if the next check fails the check will restart the current instance of the UI Manager on the server in an attempt to correct the failure. The default value for this property is set to 10.
When a Health Report runs, the Platform will record the report so that Platform Operators can review the historical health of the server in the System Operations Center (SOC), the local file system of the server, and in a shared database. The NodeHealth.ReportMaxAgeHours Platform Registry setting controls how long a Health Report is kept in the shared database and local file system of every server in the Platform. A report removal sweep is controlled by the NodeHealth.PruningSweepHours Platform Registry setting which specifies how often a sweep is performed to remove Health Reports that are older than the NodeHealth.ReportMaxAgeHours.
To view Health Reports, navigate to a server’s Overview page and click the Server Health tab from below the Server Details menu. This page shows a record of Health Reports run on the server since the last pruning sweep (controlled by NodeHealth.PruningSweepHours).
Each Health Report is represented by a single icon on the screen. Reports will appear from the left as a new reports completes testing health on the server. The oldest report will be the right most icon (of the last row of Health reports) and the most recent report is the left most icon (of the first row of Health Reports). The icon will represent the result of the Health Report.
|Icon||Health Report Result|
A Health report may be started at any time on a server by clicking Run Health Report Now from the top of the Server Health page. The server will start checking its health and once the checks are finished, a new report will appear in the left-most position of the top row of Health Reports. Health Reports can be initiated manually even if Server Health is disabled on the Platform.
Health Reports can be filtered by date, time, and result of the report. Click the box next to Show Reports Before and select a date and time from the calendar drop down to filter for reports from before a certain date and time.
Use the checkboxes next to Filter by to show reports that contain one or more matching result for any the checks run in the report.
To see a summary of the checks performed in a report, hover over an icon. The summary will include a breakdown of the results of the checks performed during the Health Report and the timestamp of when the Health Check was performed.
To see the whole report, click on the icon and the report will expand. Inside a single Health Report, you will be able to see the Check Name, Result of the check, any Notes about the check, and the time the check was Initiated on the server for every check run in the report. The time and date the report was completed can be seen in the upper left corner of the table. To return to see all reports, click Health Reports next to the timestamp of the report you are viewing.
In addition to viewing Health Checks in the SOC, the report is also stored on the local file system of the server. If there is a problem connecting to Platform resources, Platform Operators will still be able to access information gathered during a Health Check from the local files.
All Health Checks that have run on a server within the NodeHealth.ReportMaxAgeHours setting will be saved in JSON format to this file location.
All Health Checks are also stored in the following databases, connected by a Node Health Report key. If a server is unreachable, Platform Operators can diagnose the problem using the report information from the database.
All Health Checks that have run on a server within the NodeHealth.ReportMaxAgeHours setting will be saved in these databases.
Note: You are not able to remove Kubernetes cluster nodes through the SOC.
From a server’s detail page, you are able to remove it from your Platform. Servers are removed in the same way as they are removed in the Modify an Existing Environment workflow. Note that that removing a server in this way will only remove the server’s role as a Web, Application, or Linux server on the Platform. It will not remove any specialized Platform roles such as the Load Manager, Platform Coordinator, or Cache roles. If a server with a specialized role is removed, the server will continue to function as the specialized role for the Platform but will not be available for hosting workloads.
Before removing a server, you should understand the impact the reduction in capacity could have on your Platform and move all workloads onto alternate servers.
To remove a server, click Remove and acknowledging the confirmation prompt. Your Platform will immediately begin to remove the server.
Removal of a Windows node will be logged to the Platform. Logs for Linux server removals are not available on the Platform but can be retrieved in the /var/log/apprenda/ directory on the removed server.
The Database Nodes tab will direct you to a listing for all database instances which can host database components in a supported version of SQL Server or Oracle:
The Core Components section to the left will indicate key information, such as which instance houses the Core Apprenda Database. Additional database instances can be added to the Platform via the Add New Database Node button.
As with the Servers tab, you can click on the instance name for more detailed information, including a Workloads listing of specific databases housed on the instance, and a Custom Properties link for editing that node’s Custom Properties.
Additionally, the CPU, Memory and storage allocated for database use on the node (which would initially have been configured during installation) can be updated from this screen, as can the Administrative Credentials (user account) specified for the node: