Thursday, 30 April 2015

Skype for Business and Lync 2013 DDC - Detailed Design Calculator 5.0

DDC
Just in time for Skype For Business server release, friend and co-author Alberto Nunes and I are very happy to announce Skype for Business DDC - Detailed Design Calculator 5.0. It adds a lot of long-awaited features (keep reading for details). Please grab it from here. Please take a moment to rate us. Thank you! :-)

DDC is a simple offline, Excel-based, low-level design calculator for Microsoft Skype for Business and Lync 2013 on-premises deployments. Fill in host names, IP addresses etc., and DDC will calculate DNS records, certificate names, firewall rules, deployment scripts and several other design elements to help speed-up your deployment.
DDC is a continuously evolving project and you should expect frequent updates with new features added over time. It is and will always be free.
Any bug report, requests for improvements or new features, suggestions, criticism etc. are greatly appreciated.

Features

New in version 5.0.0:

  • Support for multiple SIP domains (up to 8), each with several configurable options (strict domain matching, SIP/XMPP federation, etc.)
  • DNS tables reorganised for multiple domain support (records are sorted by domain).
  • Added the ability to include/exclude external servers (Edge pool, Reverse Proxy) from the deployment
  • Added the ability to choose AD or main SIP domain name for Pools and Web Services names (earlier DDC versions used primary SIP domain name by default and did not allow to change)
  • Option to deploy 1 or 2 network cards on dedicated Mediation servers (separated internal and PSTN IP addresses). We have not included such option for collocated Mediation (not recommended)
  • Option to use separate public IP addresses for external web-based applications (FE and Director pool web services, Office Web Apps)
  • Support for Office Web Apps farms (up to 6 nodes)
  • Additional scripts (setup accounts for synthetic transactions)
  • Required empty input cells displayed in red
Other:
  • Supports Standard and Enterprise pools (up to 12 nodes), with pure device-based load balancing (HLB) or a combination of DNS load balancing and device-based load balancing for web services (DNS LB);
  • Supports Edge, Director and Mediation pools (up to 12 nodes per role) with HLB or DNS LB;
  • Supports up to 4 PSTN gateways (can be media gateways, direct SIP trunks, etc.) with or without media bypass and configurable SIP over TCP or TLS, media ports, media bypass
  • Ability to specify custom media ports for clients and servers. DDC automatically applies consecutive non-overlapping ranges, and creates the appropriate commands on the Scripts sheet to apply these on to your deployment;
  • Calculates internal and external certificates CN and SAN; for the external certificate, it provides the option of separate or single certificate for Edge and Reverse proxy;
  • Calculates DNS entries for internal and external zones. Further to that, DDC generates a script (in the Scripts sheet) which will automatically add the required records for both pinpoint or split-brain DNS.
  Calculates firewall rules for: 
  • Internal firewalls (internal-facing DMZ); 
  • External firewalls (external-facing DMZ);
  • PSTN: The PSTN firewall sheet calculates custom rules for firewalls behind PSTN gateways, if any;
  • Endpoints: the internal client Firewall sheet calculates custom rules for personal firewalls installed on clients, and rules required in scenarios with endpoints segregated by VLANs or other restrictions in place.
  Script section: new in version 4.x and still at an initial stage, but with plans to grow it over time. Currently it supports:
  • DHCP: We have included a modified version of DhcpConfigScript.bat (http://technet.microsoft.com/en-us/library/gg412988(v=ocs.14).aspx), with the correct hexadecimal values automatically calculated and included, based on your design inputs; this removes the requirement to use dhcputil to generate the script and makes it ready to run on x86 and x64 Windows DHCP servers;
  • DNS: Scripts to create necessary records based on dnscmd (with both pinpoint and split-brain, based on your selection);
  • Office Web Apps: scripts to automate certificate request (through certutil), installation and web farm creation;
  • Forward Proxy exceptions;
  • QoS: PowerShell commands to configure custom port ranges;
  • Setup accounts for synthetic transactions.

How to use 

The tool was tested on Microsoft Excel 2013 and 2010 for Windows desktop (Excel online is not supported and we have not tested it on Office for Mac). Macros and active content must be enabled. Fill all relevant fields in Global DataResource Data and Other Data sheets. All input cells with dynamic data have a dark blue background and are already populated with sample data. Please change all values to reflect your actual design. Empty or invalid entries will be marked red.
Important: remember to press the Generate Data button (available in all sheets) when data input is complete (or when you change a value). This is required to refresh and resize calculations and views. Please do not manually resize, hide or unhide rows. This is done programmatically when you press the Generate Data button so that only the relevant content is displayed.

 

Known issues

  • When you press the Generate Data button, you may notice some screen flickering through the DDC sheets. This is due to the recalculations of data and views. On slower machines, it can take some time for refresh to complete;
  • An issue in December 2014 Excel update MS14-082 may break DDC functionality in some circumstances. You may notice pull-down menus not updating, Generate Data button not working, etc. This is due to a problem described in the following article (check section known issues with this security update). Hotfixes have been released in the March 2015 Updates for Office 2007, 2010 & 2013. Refer to the articles below for more details and any pre-requisites: Microsoft Support and Microsoft Microsoft Excel Support Team Blog.

Disclaimer

DDC is a third party tool developed by independent Microsoft UC Solutions Architects. Authors are not affiliated with Microsoft. Skype for Business,™, Skype™, Lync™, Office Communications Server™, Exchange™ and Excel™ are registered trademarks of the Microsoft Corporation™. Although we took every care in calculations and scripts, use at own risk! Please read the extended disclaimer on the file.


Assumptions and limitations

Note the following assumptions or known limitations (some of which will be addressed in future versions): 
  • Very limited content validation and error catching: You will not be warned if you type 256.256.256.256 as IP address :-) Ensure you type the correct data;
  • DDC currently has no sizing/capacity calculator features. It assumes you already have made your determinations on number and types of servers to implement;
  • Only IPv4 is supported;
  • Single Pool per each role is supported;
  • Single Reverse Proxy;
  • Edge, Director and Mediation, even when 1 node is selected, are always configured in a Pool to allow for easier scalability and certificate management; 
  • When HLB (device-based load balancing) is selected for Front-End and Director Pools, we assume that internal web services host names will not be overridden. This is optional when HLB is used; override becomes mandatory with DNS LB;
  • We assume internal server resources will use an internal Certification authority. This includes Front-end, Mediation, Director and internal Edge and reverse proxy interfaces. Firewall rules are included to grant DMZ servers (Edge and reverse proxy) access to the CRL;
  • If the same domain name is used for Active Directory and SIP, in a multiple SIP domain deployment, we assume this will be the primary (default).

Credits

For beta testing, bug report, suggestions, feedback and other valuable input: Corey McClain (@cdhtweetstech), Lasse Nordvik Wedø (@lawedo), Antonio Spirandelli (@spady7), Dino Caputo (@dinocaputo), Fabrizio Volpe (@fabriziovlp), MaxSanna (@MaxSanna) Igor Kravchenko, Korbyn, Lutenus, Mauro Rita (@jmrita), Thomas Juhl Olesen, Wilfried van Oosterhout, Pat Richard (@patrichard), Daniel Banfield, James Brewster.
 

Version history

Version 5.0.0 - 1st May, 2015
New features:
1) Support for multiple SIP domains (up to 8), each with several configurable options (strict domain matching, SIP/XMPP federation, etc.)
2) DNS tables reorganised for multiple domain support (records are sorted by domain).
3) Added the ability to include/exclude Edge pool in the deployment
4) Added the ability to choose AD or main SIP domain name for Pools and Web Services names (earlier DDC versions used primary SIP domain name by default and did not allow to change)
5) Option to deploy 1 or 2 network cards on dedicated Mediation servers (separated internal and PSTN IP addresses). We have not included such option for collocated Mediation (not recommended)
6) Option to use separate public IP addresses for external web-based applications (FE and Director pool web services, Office Web Apps)
7) Support for Office Web Apps farms (up to 6 nodes)
8) Additional scripts (synthetic transactions)
9) Cells with empty or invalid entries are marked red
Bug fix:
1) Some naming inconsistencies
2) Visual improvements (smaller fonts for better readability on higher res) - refresh issues
3) Numerous optimisations on scripts and code (for many suggestions on scripts: thanks @PatRichard)
4) Some issues on firewall sheets (missing rules for Directors)

Version 4.3.1 - 13th April, 2015
New features: none
Bug fix: Lyncdiscover record was displayed in internal DNS in some instances. (thanks Fgarib)

Version 4.3 - 9th November, 2014
New features: none
Bug fix:
1) Missing rule in Firewall (external) for tcp/443 on A/V Edge server
2) Missing rule in Firewall (external) for tcp/80 on Reverse Proxy

Version 4.2 – 7th September, 2014
New features: visual improvements - added extended disclaimer
Bug fix:
1) incorrect implementation of RFC3361 (http://www.rfc-editor.org/rfc/rfc3361.txt) caused the DHCPUTIL script to generate an incorrect string for option 120 (row 11 in Scripts sheet). Thanks to Daniel Banfield for reporting the issue
2) bug in DHCP script generating the correct entry for Lync internal web services depending on load balancing method
3) various scripts optmisations and some typos (thanks @patrichard)

Version 4.1.1 – 4th September, 2014
New features: adds an entry for lyncdiscover in DNS internal sheet (required in specific scenarios where Windows Phone 8.x devices are unable to sign in from a corporate WiFi (thanks to @patrichard for the input). More info at http://jackstromberg.com/2013/06/lync-2013-dns-settings/
Bug fix: on Firewall (internal) sheet, the edge pool FQDN was displayed in a rule (should have been the FE pool FQDN)

Version 4.1 – 29th August, 2014
New features: visual improvements
Bug fix: Internal Office Web Apps certificate missed physical server name in the SAN (without it, the farm always shows as unhealthy). Thanks to @patrichard for notifying
 
Version 4.0.4 – 2nd August, 2014
New features: none
Bug fix: naming conventions - typos - some inaccurate error catching

Version 4.0.3 – 19th April, 2014
New features: none
Bug fix: additional refresh issues. Thanks to Wilfried van Oosterhout for notifying.

Version 4.0.2 – 13th April, 2014
New features: none
Bug fix: some issues in the hide/show procedures causing some entries to be incorrectly displayed (Director Web services, PSTN gateways and other). Thanks to Wilfried van Oosterhout for notifying.

Version 4.0.1 – 22nd March, 2014
New features: none
Bug fix: Inconsistencies in Office Web Apps / Office Online naming conventions + improvements on scripts descriptions
 
Version 4.0.0 – 21st March, 2014
New features: see the Features section for a full overview of existing and new features
Bug fix: Several visual issues, optimisations

Version 3.0.2 - 13th February, 2014
New features: none
Bug fix: minor code bug causing resource data refresh issues when changing Mediation pool type

Version 3.0.1 - 11th February, 2014
New features: none
Bug fix: formula issue in PSTN firewall sheet caused some IP addresses to display incorrectly (thanks Lutenus for notifying)

Version 3.0 - 10th February, 2014
New features: Support for PSTN Gateways, Mediation and Director Pools
Bug fix: several bug fixes, code and visual improvements

Version 2.0.4 - 23rd December, 2013
New features: none
Bug fix:
1) fix an issue where formula was displayed in some cells instead of result
2) On a standard edition pool, first SAN entry was not correct (should have been a reiteration of CN)

Wednesday, 15 April 2015

Lync calls fail with long post-dial delay? Check the Edge!

I hope someone can benefit from the many hours I spent on this issue :)

INFRASTRUCTURE
Lync 2013 Redundant deployment. 3-node Enterprise pool. 2-node Edge servers. Public addresses on external interfaces. All OS are Windows Server 2012 R2. All Lync servers at latest CU as of February 2015. All infrastructure virtualised on VMware ESX 5.5.

ISSUE DESCRIPTION
Lync and PSTN calls suddenly could not be connected by external or internal endpoints. Clients received a call, call is answered, client hangs on "connecting..." state for some seconds, and then call is dropped.
Along with issue above, calls suddenly took a long time to be initiated (long post-dial delay). Whilst up to 2-3 "beeps" should be considered as normal, we experienced up to 8.

The sneaky nature of the issue was no apparent recurrence pattern. On average, we experienced the issue 4 times in around 3 weeks. Worth nothing saying, it was a hugely disruptive problem affecting about 10,000 users.


OTHER INFORMATION AND THINGS CHECKED
No related events logged on windows logs (checked on Front-End Servers, Edge, and Mediation)
  • Attempts to stop some Edge services (MediaRelaySvc.exe and MRASSvc.exe) resulted in services being stuck in “stopping” state indefinitely. And it was not possible to kill them
  • IM and presence still functional
  •  The only workaround to re-established functionality was rebooting both edge servers
  • Firewall, DNS and routing was thoroughly checked and the correct configuration was confirmed to be in place
Experienced issue was identical word-by-word, to the one described in this thread. All fixes suggested in the forum attempted without success.

ANALYSIS
Such failures can be usually narrowed-down to a few types of issues:
  1.  Firewall
  2.  Routing
  3. MRAS
After ensuring 1) and 2) were correct, we concentrated on MRAS; traces and Lync reports indeed provided evidence something was not quite right (candidates not exchanged, timeout on contacting MRAS resulting in endpoint being unable to obtain MRAS token).




ROOT CAUSE
After considerable digging, we found out the issue was triggered by two drivers: vShield Endpoint Thin Agent driver (vsepflt.sys) and vShield Endpoint TDI Manager driver (vnetflt.sys), both interacting at the network layer. Conclusive proof was provided by Microsoft PSS, by analysing a memory dump taken during a failure and Edge MRAS in hanging state (service stopping….).

WHAT DO THE DRIVERS DO
VMware vShield Endpoint is required to manage anti-virus and anti-malware policies for virtualized environments. vShield Endpoint strengthens virtualization security with enhanced endpoint protection by offloading AV processing to a secure virtual appliance supplied by VMware partners. All servers in the deployment featured a file-level AV scanning, and the drivers were required as an agentless communication component between the virtual machines and VMware hosts.

RESOLUTION
such drivers were already known to cause stability issues, including BSOD (check this and this other post. Besides, they are not certified by Microsoft (at least, until the tested build).



Although we thought we were running a version fixing the issues described in the articles above, it seemed we hit a different type of bug which VMware fixed at a later date through an ad-hoc patch.
Our only other quick fix was to uninstall the drivers from the Lync servers completely. Simply disabling AV scanning or disabling the drivers did not help.

TAKEAWAY
Low-level processes from third party applications can affect stability and reliability of Lync traffic. In our case it was even worse: Edge services were in hung state, causing Media Relay authentication to fail for all calls. Whilst a file-level antivirus scanner should be installed on any Lync server as a common security measure (with the correct exclusions), you should pay close attention to low-level additional components or third parties like:
  • Network-level inspection
  • IDS
  • Personal firewall add-ons
  • Network accelerators
  • Broadly speaking: any other network-level software may interfere with Lync traffic
Confirming their full compatibility will definitely save you some headaches.

OTHER
I have experienced very similar issues on another deployment, this time, with McAfee antivirus. On that occasion, the trigger was the FireTDI driver (a host intrusion detection component).

Wednesday, 4 March 2015

The real HotDeskingTimeOut for Polycom CX600 Lync phones

I have recently come across an interesting challenge. A customer asked for several Polycom CX600 phones to be provisioned with a Lync Client policy enabling hotdesking and a long hotdesking timeout. I experienced a quite unexpected behaviour, which I though It would worth sharing.


BACKGROUND

A Common Area Phone (CAP for brevity) is a special-purpose Lync phone setup, that is not associated with a specific user account, therefore with the user-specific set of policies and phone number. A CAP is usually placed in public areas like lobbies, restaurants, public-access meeting rooms, etc., CAPs use a specific contact object which has the usual set of policies (dial plan, voice policy, client policy, conferencing policy, etc.) associated. CAPs are signed in through PIN authentication, then they are usually left alone and will stay signed in permanently.

Because CAPs are easily accessible "by design", it is common practice to provide them with a very restrictive voice policy, only allowing specific numbers to be called (e.g. internal company numbers or extensions, help desk, internal security, emergency numbers, etc.).

A CAP may or may not be also configured as a hotdesking phone. This is determined by the EnableHotdesking ($true) parameter in the client policy you assign to the CAP contact object (and not to the user).
When a CAP has hotdesking enabled, a user may "override" the CAP account, sign in to a CAP with own user account, and use Lync Server features and their own user profile settings.
Because user accounts are usually provisioned with additional calling capabilities (mobile phones, international numbers, etc.) through voice policies, security is a concern: if a user steps away from the hotdesk, the phone may be available to others for call capabilities they shouldn't have access to.

To overcome this, Lync client policy provides a mechanism to sign out the hotdesk user from the phone after some inactivity, and sign the CAP account back in. This is governed by the HotDeskingTimeOut parameter in the Lync client policy. The default value is 5 minutes, consistently with the security implications.


REQUIREMENT

My customer works in a higher security environment, so, enhanced restrictions on CAPs were not a concern. No CAPs are accessible by non trusted people. There are several common areas and conference rooms frequently used as hotdesking spaces. Several CAPs are available. Users wish to be able to sign in as hotdesking users on CAPs in the morning, and stay signed in at least until the end of business hours, so I was asked to configure a 12-hour timeout.

CONFIGURATION

So, I created a client policy as follows, and assigned to CAP contact objects:

New-CsClientPolicy -Identity "CAP-Hotdesk" -EnableHotdesking $True -HotdeskingTimeout 12:00:00

It worth noticing that the HotDeskingTimeout parameters does accept values up to 23:59:59. Theoretically, the object type(System.TimeSpan) even accepts values bigger than 24 hours.
This upper limit is also consistent with some public sources.
It should also be noted that TechNet does not provide any information about the maximum configurable value.


ISSUE

I was reported that all users signed it as hotdesk users on CAPs were kicked out after about 1 hour, and the phone reverted back to the CAP account.

TROUBLESHOOTING

First off, I ruled out a bug in firmware. The issue was experienced on several LPE builds, including the latest one at the time of writing (7577.4457).
Phone logs revealed something interesting, when playing around with different HotDeskingTimeout values.

An excerpt when HotDeskingTimeout was set to 4:44:00 (correct):

CLockedScreen::OnEndPointSignedIn: Signing in as hot desking user. Starting hot desking timer for 284 minutes

An excerpt when HotDeskingTimeout was set to 9:59:00 (correct):

CLockedScreen::OnEndPointSignedIn: Signing in as hot desking user. Starting hot desking timer for 599 minutes

Now the surprise; an excerpt when HotDeskingTimeout was set to 10:01:00 (wrong):

LockedScreen::OnEndPointSignedIn: Signing in as hot desking user. Starting hot desking timer for 60 minutes

TAKEAWAY

It appears the highest honoured timeout value is 10 hours (600 minutes). Any value beyond that is either ignored or misinterpreted by the phone, and a fallback value of 60 minutes is used instead.

TOOLS

Pulling out LPE logs is not exactly a straighforward task. I recommend this excellent tool by Andrew Morpeth to streamline the process.

DISCLAIMER

I have only tested on Polycom CX600 phones and cannot confirm if other Common Area Phones would show the same behaviour.

Wednesday, 21 January 2015

Lync 2013 DDC - Detailed Design Calculator

Lync DDC is a low-level design calculator for on-premises Lync 2013 deployments. Fill in hostnames, IP and other design elements, and it will calculate DNS records, certificate names, firewall rules and other scripts to help speed-up your deployment. Available at: http://goo.gl/jU1hZR

Sunday, 26 October 2014

Migrating with style

Kudos to my customer for cutting such nifty things for floorwalkers :) This is what I call "migrating with style"