Friday, May 23, 2014

Did You Know: Why Are My Exchange 2013 Services Stopped? #MSExchange #iammec #MSTechEd

A conversation on Twitter recently got me thinking about the mental checklist that I run through whenever first looking at a problematic Exchange 2013 server. Typically, the scenario unfolds by a frantic phone call from an admin that a specific Exchange service/server is not functioning properly within their environment. Sometimes this call consists of an individual service, like Outlook Web App working properly, while another service on the box like SMTP, is not playing nice. Oftentimes, the admin is confused because “everything seems to look right” within the EAC and event viewer and other Exchange servers are functioning properly.

Part of my day-to-day responsibility revolves around making sure that I share best practices, tips-n-tricks and a vast array of additional intellectual property to our UC consultants and global operational staff. One of the more popular documents and internal training sessions revolves around how to quickly troubleshoot messaging environments. This twitter conversation made me realize that with the release of Exchange 2013, the mental checklist that I run through has changed a bit.

Based on the issue described by the frantic admin above, a quick run through of the Windows Event Log, the crimson channel (Exchange Application & Service log) and Exchange service states are a great start. However, working with Exchange 2013 for a little over a year now, I have added several new items to my checklist (without realizing it).

For this post, I will highlight the first item on my mental checklist when diagnosing erratic Exchange behavior on a server. The first cmdlet that I use when checking out a ‘troubled’ Exchange 2013 server is: Get-ServerComponentState. This informative cmdlet provides the state of Exchange components on a specific server. Managed availability (MA) is a new feature within Exchange 2013 that monitors application health and determines what components the end users can connect to. For instance, MA might determine that the FrontEndTransport is not working properly and bring that component administratively down (diverting to another healthy server) but still service Outlook Web App functionality to end users.  The screenshot below shows a server that reports all components as active and in good working order according to MA.

Each of the listed service components can have either an active or inactive state. The FrontEndTransport and HubTransport can also have a state of draining. The draining state is triggered when the FrontEndTransport or HubTransport needs to deliver SMTP messages in queue before a state change.

If you run Get-ServerComponentState on a server, and a service is inactive, that service can be viewed as administratively down. Any service that component provides to end-users is no longer functional. In the screenshot below you will see that the FrontEndTransport is showing as inactive.

Often, the phone call about erratic Exchange 2013 behavior can be traced to an inactive component. All component and state information is stored in the Exchange 2013 server’s registry. The settings are stored under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ExchangeServer\v15\ServerComponentStates.

If you need to change the state of a component, then a requester is used. There are five different requesters that you can use to change a component state.  These five states are HealthAPI, Maintenance, Sidelined, Functional and Deployment. In reality, we should use Maintenance, Sidelined, Functional and Deployment requesters. The HealthAPI is the requester MA uses to change component states so we should not use that one. The most common requester an administrator will use is Maintenance. When using this requester the reason for the state change is clear and avoids confusion if others view the state.

In my example below, if we click on the FrontEndTransport key, a requester called “maintenance” was used to set this component to 1:0:635363745285714245. The 0 is an easy way to spot that the component is set to inactive.

Just for reference, the HealthApi for EWSProxy, which is set to active, displays a value of 1:1:635350804991145390. The value of 1 lets us know that the EWSProxy component is in an active state.

Component state settings are also stored in Active Directory in the msExchComponentStates attribute within the properties of the Exchange server object. The Get-ServerComponentState cmdlet actually leverages this state information that is stored in Active Directory when executed.

While the hope would be that any inactive states are set due to server maintenance scripts executed by other administrators, this is not always the case. Another reason for an inactive state would be that a component failed specific performance thresholds and Managed Availability proactively took the component offline. 

As new iterations of the Exchange product are released, subtle nuances surface. With Exchange 2013, the MA service protects and helps maintain a positive end-user experience by guarding against service disruption. As with all automation, we need to understand when MA has taken action and why in order to properly troubleshoot.

Make sure that you add the Get-ServerComponentState cmdlet to your mental checklist before an administrator calls you about an erratic Exchange 2013 server!


1 comment: