Wednesday 8 January 2014

Personal experience and pain points migrating OCS 2007 R2 to Lync 2013

I was recently involved in an OCS 2007 R2 to Lync Server 2013 migration and I thought it would be a good idea to share my experience, with a specific focus on what went wrong, unexpected, or undocumented.
I won’t get into the details on how to migrate. This is already widely documented on a number of blogs as well as TechNet (http://technet.microsoft.com/en-us/library/jj205375.aspx).
ENVIRONMENT: OCS 2007 R2. Two-node Enterprise pool, no external deployment (edge servers). Polycom CX-700 phones. Two Sonus (NET) VX 1200 gateways terminating an ISDN-30 trunk each and used as hybrid gateways (that is, they served as mediation servers and no OCS mediations are present). OCS is the main voice platform for the customer. No other PBX or phones were present.
Below were my pain points in order of occurrence.

TOPOLOGY MERGE

Adding the Lync pool was a nifty and painless work, until topology merge. Next steps, that is, merging the topology and importing legacy configuration to Lync is where I had suspected some possibly unpredictable results due to the hybrid gateways model in OCS 2007 R2. I have struggled finding specific documentation about migrating such scenario. Biggest question mark is would Lync pool be able to use media gateway based mediation? I assumed that could be a no.
Merging topologies on the topology builder was an apparently straightforward step with the following warnings on the log:
2013-09-25 14:04:56 INFORMATION :  No new Mediation Server added to the Office Communications Server 2007 / Office Communications Server 2007 R2 deployment.Cannot find any Office Communications Server 2007 / Office Communications Server 2007 R2 "MediationServer" in the deploymentList of Office Communications Server 2007 / Office Communications Server 2007 R2 "Trusted application server" roles being migrated
Cluster fully qualified domain name (FQDN) "vx1.contoso.local"Computer fully qualified domain name (FQDN) "vx1.contoso.local"
2013-09-25 14:04:56 INFORMATION:  UCMA application with the cluster fully qualified domain name (FQDN) "vx1.contoso.local" does not depend on a pool.2013-09-25 14:04:56 INFORMATION:  UCMA application with the cluster fully qualified domain name (FQDN) "vx2.contoso.local" does not depend on a pool.
The result was media gateways being added as trusted application entries in the BackCompatSite.

IMPORT LEGACY CONFIGURATION

As no OCS 2007 R2 mediations servers were detected in the legacy topology, I wondered how the legacy configuration import procedure would react, and I was ready to redo the voice configuration in Lync if required. I run:
Import-CsLegacyConfiguration
I got the following warning for each route and gateway:
2013-09-25 14:10:23 WARNING:  Cannot find a Mediation Server with the fully qualified domain name (FQDN) "vx2.contoso.local". Run "Merge-CsLegacyTopology" cmdlet before using this cmdlet or make sure that a PSTN route is pointing to a valid Mediation Server. Skipping creation of a Lync Server 2013 PSTN route setting with the name "Emergency Services". numberPattern:  "^(\+999$)"
It seemed, Lync was expecting to see domain-joined “proper” mediation servers in the topology but that was not the case.
The result on Lync topology and configuration was the following:
  • Dial Plans (aka location profiles in OCS) were migrated fine
  • PSTN usages were migrated fine
  • Routes were migrated fine, despite the warning above saying Skipping creation of a Lync Server 2013 PSTN route setting with the name (…). However, all had a null (empty) gateway.
  • Media gateways were imported in the legacy topology (BackCompatSite) as trusted application servers
  • No media gateways or trunks were migrated in the Lync topology
Rather than trying to have Lync 2013 use the mediation on the hybrid gateways, I re-added the VXs as PSTN gateways and created new trunks both on Lync topology and media gateways so that migrated users would immediately be using the new Lync routes and mediation servers.
There was an additional warning in the legacy configuration import. This was a widely documented exception, as Lync 2013 does not accept certain characters in names.
2013-09-25 14:10:23 WARNING:  Policy/setting name  - "Service: Medium" has either ":" or "/". Import-CslegacyConfiguration is replacing them with "_" before migrating them. Office Communications Server 2007/Office Communications Server 2007 R2 policy/setting name  - "Service: Medium". Lync Server 2013 policy/setting name - "Service_ Medium"
RESPONSE GROUP MIGRATION
By all extent, the hugest pain point and that did not come out as an utter surprise, as past experience and diverse publicly available literature suggest this is not a hassle free step.
With this in mind, before migration I strongly suggest to:
  1. Carefully document every low-level aspect of existing response groups: queues, agents, groups, workflows, everything. Level of detail must allow you to recreate all response groups from scratch on Lync, in case anything goes VERY wrong. Might require some time but do yourself a favour and don’t overlook this step. In my case, with around 80 objects among workflows, queues and groups to document took a while, but it worth every second.
  2. Backup the response groups on OCS. Use the following command:applicationsettingsexport.exe /backup /pool:ocspool.contoso.com /applicationID:Microsoft.RTC.Applications.Acd /file:ResponseGroupExport.xml
  3. Do a sanity check on OCS response groups:
  • remove all orphaned agents
  • Ensure you have not renamed any of the agents in AD. There have been reports of RG migration failing because agents had certain uses attributes (name) changed in AD. If in doubt, remove the agent and add it back to the agent lists as well as in any other group.
Once done, I run the following to migrate response groups from OCS to Lync:
Move-CsRgsConfiguration -Source ocspool.contoso.com -Destination lyncpool01.contoso.com
Worth mentioning:
  1. This is a one-off step. You cannot migrate selected response groups or objects. It's all or nothing.
  2. The only RG resources actually "moved" to Lync are the contact objects representing the RG along with related sip uri. Lync now becomes the RG owner. All other objects (queues, workflows, agents) are simply “mirrored” to Lync, and a copy of everything is retained on OCS for rollback purposes (however, you cannot use the above command for that).  After the service has been migrated, all calls to a Response Group phone number will be handled by Lync 2013. Calls will no longer be handled by OCS.
  3. There are several other requirements before you can run the command. Check http://technet.microsoft.com/en-us/library/gg398782.aspx for more details.
I run the command and apparently got no errors when executing. As usual with PowerShell, no output is displayed if successful (unless you use the –verbose switch).
I then checked all RG objects would show up on Lync, and so it was. I tried to PSTN call to one response group, but failed. However, same response group could be called through a Lync call. A quick check lead me to determine the tel uri field in all workflows (around 20) was empty.
I was far less than excited realising not all information was migrated over, but as everything else seemed to have been copied fine, I assumed it was just a matter of repopulating the tel uri again.
I tried with the first workflow, and next bad surprise showed up:
Response Group Update Failure : An instance with ID "a0624626-2744-40c4-b2f7-b2e2a99c8a95" exists with a different OwnerPool. Changing OwnerPool on an existing object is not supported.
An apparently undocumented error, or, at least, unknown to Google :-)
Furthermore, strange entries showed up on OCS pool servers.
Log Name: Office Communications Server
Source: OCS User Services
Date: 10/9/2013 5:46:11 PM
Event ID: 30951
Computer: OCS-FE1.contoso.local
Description:
Active Directory indicates that user is homed on a different server but user data exists on this server as well.
Active Directory Object with guid {C99F9636-53C6-4760-857D-36FFCD54F667} and SipUri rg1@contoso-int.com is listed as being homed on lyncpool01.contoso.com.
Cause: It is possible that the Active Directory attribute msRTCSIP-PrimaryHomeServer has been incorrectly modified or the user has been improperly re-homed using outdated administration tools.
Log Name: Office Communications Server
Source: OCS Response Group Service
Date: 10/9/2013 5:46:11 PM
Event ID: 31053
Computer: OCS-FE1.contoso.local
Description:
Office Communications Server 2007 R2, Response Group Service was not able to establish the application endpoint.
The following exception occurred when establishing application endpoint associated with 'sip:rg1@contoso-int.com': Microsoft.Rtc.Signaling.RegisterException - 482 - The endpoint was unable to register. See the ErrorCode for specific reason..
Cause: Failed to connect to Front End server or the Front End server is misconfigured.
Resolution: Check the Front End server for errors.
Log Name: Office Communications Server
Source: OCS Response Group Service
Date: 10/9/2013 5:46:11 PM
Event ID: 31189
Computer: OCS-FE1.contoso.local
Description:
Application endpoint has been terminated and Office Communications Server 2007 R2, Response Group Service has recreated it.
Application endpoint associated with 'sip:rg1@contoso.com' has been terminated and Office Communications Server 2007 R2, Response Group Service has recreated it.
Long story short, I was unable to change anything in the workflow. With little time available to determine the root cause, I tried to delete and recreate all workflows. Fortunately, that worked and RGs were back in service in around one hour. I didn’t regret any minute spent documenting :-)
Surprises were not over, however. The next morning, as the customer attempted to add an agent to a group, a now “familiar” exception knocked the door again:
Error1
To recreate or not to recreate groups? With functional RGs and enough time for a deep analysis, I decided to investigate further on the root cause. First step, a Lync trace confirmed the error:
Error2
Then pointed the finger to the SQL backend, specifically to the rgsconfig database where response groups configuration is stored; specifically, searching for possibly conflicting guid on the OwnerPoolID or wrong OwnerPool.
First check was in the dbo.OwnerPool table and took a note of the Lync pool ID to determine if that would be the correct one in other objects:
Error3
Then searched in the dbo.AgentGroups table for the OwnerPoolID field. I was expecting a wrong value; surprisingly, all values were NULL.
I then manually copied the Lync pool ID (retrieved from the dbo.OwnerPool table) for one of the groups, and tried again to add a member to the group. This time, it worked!
Resolution was then to populate all OwnerPoolID fields for each object with the ID value in the dbo.OwnerPool table, including groups and queues.
Error4
Not everyone might like the idea of fiddling around with raw databases and manually modify one by one. It all depends on how big and complex your RGs are, if you ever run into the same issue. Alternatively, you might just want to delete and recreate objects. In my case, newly created objects got the OwnerPoolID populated correctly.

MIGRATE CONFERENCING DIRECTORIES

As part of the migration process, conferencing directories must be migrated from OCS to Lync. If you follow TechNet verbatim, they will tell you to do it before deactivating and decommissioning the Lync pool (check http://technet.microsoft.com/en-us/library/jj205300.aspx). However, if you do it now, you will later experience the following error while deactivating the Conferencing Attendant component:
A call to a subtask failed.: The call to subtask AppServer.GetAppState failed. Pool is not ready.
To resolve this issue, temporary relocate the conferencing directory back from Lync to OCS 2007 R2 via the Move-CsConferenceDirectory cmdlet. The detailed process is greatly described in Lee Desmond’s blog: http://www.leedesmond.com/weblog/?p=749.

No comments:

Post a Comment