Exchange 2013 : Maintenance Components

n Exchange 2013 we introduced the concept of “Server Component States”. Server Component State provides granular control over the state of the components that make up an Exchange Server from the point of view of the environment it is running in. This is useful when you want to take an Exchange 2013 Server out of operation partially or completely for a limited time, but still need the Exchange services on the server to be up and running.

One example of such a situation is when “Managed Availability” (MA) comes to the conclusion that a specific server is not healthy in some respects and therefore should be bypassed temporarily until the bottleneck has been identified and removed. MA does so by utilizing so-called “Offline Responders”. They are explained in some detail in Lessons from the Datacenter: Managed Availability. Their counterpart are “Online Responders”, which bring the server back online when it is determined as being healthy again.

Another example is when a server is being updated, for example with a new CU.

In both situations, the server cannot be taken offline completely, but it also should not be considered as a fully operational member of its Exchange organization as well.

The primary purpose of Managed Availability is, to make the life of Exchange Administrators easier so that they usually do not have to bother themselves with the details. However, in other situations, a certain level of knowledge about the basic concepts behind “Server Component States” might prove to be useful.

Verify the current State of the Server Components

A first overview over the current State of all Server Components can be displayed in the Exchange PowerShell with the Get-ServerComponentState –Identity cmdlet:

You see, that the Server Components listed here do not map 1:1 to Exchange Services or processes running on the server. Instead, they provide an abstraction-layer and display “Components” which together make up the interfaces an Exchange Server provides to its environment. The majority of the components have a name like “*Proxy”. They are specific for the CAS Role, while other components like “HubTransport” and “UMCallRouter” are part of the Mailbox server role and “Monitoring” and “RecoveryActionsEnabled” belong to both roles.

In addition to the single components which can be managed individually, there’s also a component called “ServerWideOffline”, which is used to manage the state of all components together, with the exception of “Monitoring” and “RecoveryActionsEnabled”. For this purpose, “ServerWideOffline” overrides individual settings for all other components. It doesn’t touch “Monitoring” and “RecoveryActionsEnabled” because these two components need to stay active in order to keep MA going. Without them, no “OnlineResponder” could bring “ServerWideOffline” back to “Active” automatically.

About States and Requesters

Usually, Server Components are in one of two States: “Active” or “Inactive”. A third state, called “Draining”, is only relevant for the Components “FrontendTransport” and “HubTransport”.

Whenever the state of a component is supposed to be changed, it has to be done by a “Requester”. For example, the parameter –Requester is mandatory when you run the cmdlet Set-ServerComponentState:

There are altogether five “Requesters” defined:

HealthAPI
Maintenance
Sidelined
Functional
Deployment

Requesters are labels - you can choose any of them when running Set-ServerComponentState. But each Requester is treated and stored individually (more on this in the next section) and you should select the Requester that best matches your intention. For example, when you need to set “ServerWideOffline” to “Inactive” for maintenance purposes, it makes no sense to use “HealthAPI” as Requester. You might get what you want that way in terms of functionality; but such a choice will make troubleshooting unnecessarily complicated in case something does not work as expected, and you might get into a conflict with MA. Whenever MA triggers an OfflineResponder or an OnlineResponder it uses “HealthAPI” as Requester; therefore it is a good idea to consider “HealthAPI” as reserved for use by MA.

Interaction between States and Requesters

As stated above, each Requester is handled and stored individually. There’s no relationship or hierarchy amongst them. However, in case of a conflict between two or more Requesters, “Inactive” has a higher priority than “Active”.

Here’s a practical example:

Imagine that “ServerWideOffline” has been set to “Inactive” by two different Requesters, say “Functional” and “Maintenance”:

Then, you set back “ServerWideOffline” to “Active” with one of the two Requesters

As a Result, “ServerWideOffline” and all dependent Components still remain in the state “Inactive”:

In order to set them back to “Active” again, Set-ServerComponentState … -State Active needs to be executed with the second Requester as well.

Obviously, administrators will rarely configure such combinations purposefully. However, we have seen them happening as the result of a mix of processes running in the background and manual configuration.

Where is the data stored and how can it be retrieved?

Information about “Server Components”, “Requesters” and “States” is stored in two different places: Active Directory and the server’s Registry. Storage in the Active Directory facilitates running Set-ServerComponentState against a remote server.

In order to determine precedence in case of a divergence between the two places, a timestamp is used. The newer setting is considered as the intended one.

In Active Directory, the settings are stored in the “msExchComponentStates” attribute of the Exchange Server object in the Configuration-namespace:

In the Registry, the settings are stored under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ExchangeServer\v15\ServerComponentStates:

You can use the Get-ServerComponentState cmdlet from the Shell to retrieve these settings:

.LocalStates displays the settings in the local Registry and .RemoteStates displays the settings in the Active Directory.

Peculiarities of “FrontendTransport” and “HubTransport” Components

Most Exchange Server components pick up changes in their Component State “on the fly”. However, this is not the case for the two Components “FrontendTransport” (mapped to the “Microsoft Exchange Frontend Transport” service on CAS Servers) and “HubTransport” (mapped to “Microsoft Exchange Transport” on Mailbox Servers). They pick up changes upon their next restart only.
This can cause confusion about their actual state. For example, they might be displayed as “Inactive” in Get-ServerComponentState but functionally still be active since they haven’t been restarted since their state has changed.

In order to alert the Administrator about such an inconsistency they write warnings into the Application Event Log. These warnings (7011 for MSExchangeTransport and 7012 for MSExchangeFrontEndTransport) inform the Administrator about the current and the expected state:

MA has Responders which take care of such inconsistencies and resolve them after a while by a forced crash and restart of the affected services. These Responders have the name “FrontendTransport.ServiceInconsistentState.Restart.Responder” and “HubTransport.ServiceInconsistentState.Restart.Responder”. They can be identified in the Microsoft.Exchange.ActiveMonitoring/ResponderDefinition crimson channel using the methodology described in What Did Managed Availability Just Do To This Service? blog post.

In Exchange 2013 CU2 and CU3 they are only triggered once per day for standalone Mailbox Servers and up to four times per day for DAG members. This might change in future versions, though.

After the inconsistencies have been cleared, Events 7009 from Source “MSExchangeTransport” (or “MSExchangeFrontendTransport”) and Category “Components” are logged, which show the current state:

When the state of one or both of these components is set to “Inactive”, each attempt to connect to the SMTP service on the server (on TCP port 25) triggers the response “421 4.3.2 Service not active” (for the FrontendTransport component) and “451 4.7.0 Temporary Server error” (for the HubTransport component) and corresponding entries are written to the respective SmtpReceive Protocol Logs.

As mentioned in KB 2866822 , messages sent from internal stay in the Outbox or Drafts folder and “Service not active” entries are logged in the “Connect*”-Logs in the “Submission”-Folder underneath “TransportRoles\Logs\Mailbox\Connectivity\” when “HubTransport” is set to “Inactive”.

A problem with failing updates to CU2

While an Exchange 2013 Server is updated with CU2, the setup- sets “Monitoring”, “RecoveryActionsEnabled” and “ServerWideOffline” to Inactive using the Requester “Functional” at the beginning, as can be seen in the “ExchangeSetup”-Logfile:

However, when the update exits prematurely because it encounters an unrecoverable error-condition, it does not restore the original state. Even when the Administrator restarts all stopped Exchange services or reboots the server, the Exchange components still remain in the Inactive state.

In order to recover from this situation, you must either find the root cause for the error and remove it so that the setup completes successfully, or manually set the ServerComponentStates back to Active with the Requester “Functional”.

This issue might be fixed in future CUs and SPs.

When should you change Server Component States manually?

Perhaps the two most important scenarios where manual changes of Server Component States come into play are:

Planned server maintenance
Temporary isolation of some server components, so that they are not targets for proxy requests from other CAS servers any more.

Server component states and planned maintenance of DAG members

Scenario 1 is described for DAG-members in some detail in Managing Database Availability Groups on TechNet. However, in practice, it can prove to be trickier than expected, depending on the details of the planned maintenance measure.

The article does not mention that “FrontendTransport” and “HubTransport” have to be restarted, before a change in their Component State becomes effective at all. Without a restart, you continue to receive warning events 7011 and 7012 in the Application Event Log, but the state of the components remains the same as before, until eventually MA detects the inconsistency and force restarts the services.

Also, the problem with the prematurely exiting CU setups mentioned above in the section “Practical Experiences” cannot be resolved by following the recommendations in the article to the dot, because Setup uses the Requester “Functional”, while the article talks about the Requester “Maintenance”.