Version 2.0 May 2014
© 2014 Avaya Inc.All Rights Reserved.NoticeWhile reasonable efforts were made to ensure that the information in this document was complete and accurate at the time of printing, Avaya Inc. can assume no liability for any errors. Changes and corrections to the information in this document may be incorporated in future releases.For full support information, see the complete documents, Avaya Support Notices for Software Documentation, document number 03-600758 and Avaya Support Notices for Hardware Documentation, document number 03-600759.To locate this document on our Web site, go to http://www.avaya.com/support and search for the document number in the search box.Documentation disclaimerAvaya Inc. is not responsible for any modifications, additions, or deletions to the original published version of this documentation unless such modifications, additions, or deletions were performed by Avaya. Customer and/or End User agree to indemnify and hold harmless Avaya, Avaya's agents, servants and employees against all claims, lawsuits, demands and judgments arising out of, or in connection with, subsequent modifications, additions or deletions to this documentation to the extent made by the Customer or End User. Link disclaimerAvaya Inc. is not responsible for the contents or reliability of any linked Web sites referenced elsewhere within this documentation, and Avaya does not necessarily endorse the products, services, or information described or offered within them. We cannot guarantee that these links will work all of the time and we have no control over the availability of the linked pages.WarrantyAvaya Inc. provides a limited warranty on this product. Refer to your sales agreement to establish the terms of the limited warranty. In addition, Avaya’s standard warranty language, as well as information regarding support for this product, while under warranty, is available through the following Web site: http://www.avaya.com/support.CopyrightExcept where expressly stated otherwise, the Product is protected by copyright and other laws respecting proprietary rights. Unauthorized reproduction, transfer, and or use can be a criminal, as well as a civil, offense under the applicable law.Avaya supportAvaya provides a telephone number for you to report problems or to ask questions about your product. The support telephone number is 1‑800‑242‑2121 in the United States. For additional support telephone numbers, see the Avaya Web site: http://www.avaya.com/support.
The Avaya Aura® solution is based on the Session Manager, a geo-redundant IP Multi-Media Subsystem (IMS) based active-active multi-media switching and application binding SIP core. The Session Manager provides SIP multi-vendor interoperability, dial plan generation, SIP normalization, SIP routing, and binds users to an administrator defined list of sequenced applications as well as managing user profiles, etc.
Because the Avaya Aura® Session Manager (SM) is geo-redundant, there are several design criteria to consider when planning bandwidth needs between SMs in the network, and between SMs and remote enterprise locations. These notes provide the background information and design rules to help design SM network bandwidth needs.
12
The Avaya Aura® Session Manager (SM) solution exchanges information over a separate management network between components in order to function properly. The information exchanged falls into three general traffic types described below and illustrated in Figure 1:
Figure 1 SM Management Network Use in a Geo-Redundant 10,000 User SM Solution
SM-to-SM Data – SMs dynamically exchange data over the management network. As users register to SMs, the primary and secondary SMs exchange information about each user and its location so that SIP traffic for that user can be correctly routed to the appropriate SM. The SM Call Admission Control (CAC) feature tracks VoIP bandwidth usage and availability between SMs by exchanging information periodically or as needed. Branch SMs (BSMs) do not exchange any SM-to-SM data.
SM Status Data – Various SM status information is reported up to the SMGR and available to SMGR screens. For example, the Session ManageràDashboard screen lists the entity monitoring status, active call count and active user registration count for each SM. Most status information is periodically updated across the management network but may also be collected on-demand.
SM Configuration Data – Session Managers are administered using System Manager (SMGR). All SM configuration data (users, dial plans, entities, links, routes, etc.) is administered on the SMGR and distributed out (via replication) to all SMs in a solution. Each SM will operate uniquely based on the specific users or routing information it must support, but it must also be aware of all the configuration data to support the overall solution and handle failure scenarios.
Designing a SM management network topology and determining the bandwidth needs over WANs is dependent upon the SM configuration and planned uses over that network. Figure 1 above depicted an example 10,000 user SM solution distributed across multiple redundant SMs in separate data centers along with a Branch Office. The Asia Data Center can be taken out-of-service for maintenance purposes and SM2 from the Europe Data Center will now provide those services – 2500 SIP stations on SM4 move to their secondary SM2. Similarly the Europe Data Center may go out-of-service with the users distributed to each of the other data centers and their SMs.
Configurations, redundancy, maintenance strategies and intended uses must be considered when designing a management network solution and determining its bandwidth needs. Sections below focus in more detail on the three traffic types of management network use, describing their typical and exceptional uses over the network along with guidelines to help determine the bandwidth needed within varying solutions.
Bandwidth needs may have a cumulative property where the bandwidth needs of a single SM sending/receiving data must be added together to determine the overall bandwidth need for that traffic type. For instance, looking at Figure 1, there are multiple SMs that might simultaneously be sending status data up to the SMGR, implying the WAN to the SMGR would have additional bandwidth needs for each supported SM. Similarly, a combination of simultaneous traffic types could present another minor cumulative impact (SM sending status data to SMGR while simultaneously sending SM-to-SM data). The cumulative impact due to simultaneous traffic types is difficult to characterize, highly dependent on varying customer use and will not be discussed further in this document. The cumulative properties of each management network traffic type are described in the individual sections below as they have a direct impact on the needs and scaling of bandwidth.
Note that all the management network traffic described can occur in large bursts of data and will consume as much available bandwidth as allowed to service that use/burst. For example, stating that a 3Mbps WAN connection would be “sufficient” does not mean that the exchange between SMGR/SM(s) will only ever use 3Mbps. Configuring a WAN connection that is limited to 3Mbps will guarantee that large bursts are spread out over time across 3Mbps but configuring a 100Mbps connection will allow the SMGR/SM(s) to use up to 100Mbps if necessary to service a burst.
In general, SM-to-SM data is more time-sensitive than the other two traffic types as call routing uses this information. Normal, steady state SM-to-SM information exchanged throughout the day (as calls come and go, users register or refresh their registration, adjustment of VoIP bandwidth through Call Admission Control [CAC]) would have a low bandwidth but constant use on the management network.
A failure/recovery scenario restoring a SM could have a high bandwidth use though. When a primary SM is inoperable or inaccessible, the primary users for that SM will have migrated to their secondary SM. After the primary SM is restored to service, primary and secondary users quickly return (or failback) to the restored SM. A failback causes significant SM-to-SM information exchange in a very short period of time (seconds to minutes) as the primary and secondary SMs are updated.
SM-to-SM data can have a cumulative impact depending on the configuration and given the high bandwidth use of failback, this impact must be taken into account when determining bandwidth needs. The Europe Data Center in Figure 1 has two SMs and twice as many users. If that data center is taken out-of-service, the users and traffic migrate to their secondary SMs in the other two data centers. When the Europe Data Center is restored, all 5000 primary and 5000 secondary users will failback causing twice as much SM-to-SM data.
It is recommended that the SM failback configuration and total number of users involved be analyzed to determine the bandwidth needed to support the failback scenario. While data is exchanged for both a primary and secondary user, determining the SM-to-SM data exchanged can be simplified as approximately 60Kb per primary user. Looking at the configuration in Figure 1, both the Americas and Asia Data Centers with only 2500 primary users would need to exchange 150Mb (2500 primary users at 60Kb) while the Europe Data Center with twice the users (5000) would need 300Mb. Since SM-to-SM data is more time-sensitive, adequate bandwidth must be provided to exchange the data in a reasonable amount of time. All endpoints will have recognized and started (not completed) failing back to their primary SM within 60 seconds after the primary SM is restored to service. Distributing the total bandwidth over 60 seconds is a general, simplified guideline to provide enough bandwidth for bursts of traffic as the endpoints arrive at varying rates/time; however, the failback data exchange could last longer than 60 seconds as components of an Aura solution will likely flow control (on the VoIP signaling) the failback for larger numbers of users. A WAN connection for the Europe Data Center would then be 5Mbps to support the failback exchange of 300Mb in 60 seconds.
BSMs only become active when they are cutoff (lose connectivity to the core SM network) so they do not participate in any of the SM-to-SM scenarios or information exchange.
Most SM Status data (call counts, registration counts, SM performance data, etc.) is flowing up to the SMGR at periodic intervals (5 or 15 minutes) and has a low bandwidth use. On-demand status data for individual status screens or system tools (like SIP Trace Viewer) have a higher bandwidth use. A key example is the Session ManageràSystem StatusàUser Registrations screen which requires the registration status of every user from every SM immediately (and periodically when the screen refreshes) reported up to the SMGR.
These on-demand status cases also have cumulative effects to consider. Certainly more SMs and users in one data center would send more status and user registration data than a single SM; so, a WAN connection from the Europe Data Center in Figure 1 would use more bandwidth than the connection from the Americas Data Center. More obvious is the SMGR WAN connection which now must accommodate status for all the users from all the SMs simultaneously.
Status data is not necessarily time-sensitive, even in the on-demand cases as there is a tradeoff between WAN connection sizes and screen delays. An on-demand status screen will always take several seconds to collect all the data over the network and present it. It will take even longer to collect and present if the status data is delayed due to limited bandwidth. Slower status screens may not be of consequence to some users and could be a reasonable tradeoff to bandwidth provisioning costs.
It is recommended that the on-demand User Registrations screen, along with its cumulative effects, be used as a simple guideline to determine bandwidth needs for SM Status data. Each registered primary and secondary user along with any BSM users must report its status to the SMGR. Again to simplify, it is only necessary to consider primary users at 4Kb per user to determine SM Status data reported. So collecting the user registration status in Figure 1 (assuming all 10,000 users and 300 BSM users) would cause 40.1Mb of status data to suddenly flow to the SMGR. The flow from each data center can similarly be determined based on the number of users in each (20Mb from the Europe Data Center, 10Mb each from the other data centers and 120Kb from the Paris Branch). The WAN connection to SMGR can then be sized based on tolerable screen update times. The recommended minimal delay guideline for reporting status is 10 seconds (does not mean the screens will update in exactly 10 seconds). For example, a WAN connection of approximately 4Mbps to the SMGR would support a User Registrations SM Status update (40.1Mb spread over 10 seconds).
SMGR uses a low bandwidth network replication technology to disseminate the centralized configuration data out to all SMs. Configuration data is broken down into smaller components, periodically pulled down by each SM in bursts of data, processed and then stored in the local SM database. The majority of replication time is spent polling for and processing data; so bandwidth is not continuously used throughout the replication. Each SM is likely pulling and processing data at a different instance in time; so while replication may be occurring to multiple SMs simultaneously, these properties minimize the cumulative impact.
Normal, daily, administrative operations (adding/modifying users, changing routes, etc.) result in small replications for all SMs that are completed and usually in effect within a minute after they have completed on the SMGR. A recovery scenario (referred to as a repair) or a release upgrade of SM (called an initial load) would require all configuration data to be sent to the SM. A repair or initial load can take several minutes based on total database size and can periodically transmit larger bursts of data. Upgrade of the SMGR will cause a repair of all SMs but this is an extreme edge case (SMGR upgrades should be few and far between) and should not be considered when determining or designing for bandwidth needs.
Configuration data is not time-sensitive as the SMs and replication technology will tolerate delays due to low bandwidth. Given the described properties of configuration data and technology used, configuration data is a less significant contributor (than the other two traffic types) to the overall bandwidth needs of a solution. Configuration data will operate well within the WAN connection sizes determined by the SM Status or SM-to-SM data. Given that the majority of replication time is not spent transmitting data, allocating higher bandwidth connections does not significantly improve the overall replication time. For instance, doubling the allocated bandwidth from 1.5Mbps to 3Mbps would likely only reduce the overall replication time by less than 10%. However, a minimum of 500Kbps is recommended to support Configuration data as replication times increase more significantly below this speed (mainly relevant to BSM WAN connections).
SM Management Network Design
Each SM management network traffic type was described individually above along with guidelines to determine probable bandwidth needs based on varying solution configuration and uses. Cumulative effects of a configuration or simultaneous management network use were also identified. This section summarizes that information into general management network design guidelines that can be applied to any configuration.
It is recommended that the management network be designed to support the higher, cumulative bandwidth needs (worst case) for failback and on-demand status. This design will then accommodate the Configuration Data needs and likely any simultaneous cumulative uses within a reasonable time frame. Use the following guidelines to determine bandwidth needs for a management network design:
1. Support SM-to-SM Data for failback
a. Identify failback scenarios per SM WAN connection
b. Determine total number of primary users supported per WAN connection
c. Calculate WAN connection bandwidth
“Primary Users”*60Kb per User” = “Primary Users”*“1Kbps”
“60 seconds”
d. Apply bandwidth for each WAN connection
2. Support on-demand SM Status Data
a. Only necessary to calculate for WAN connection to SMGR and Branch Offices as the previous SM-to-SM step has determined higher bandwidth for most WAN connections
b. Determine total number of users per Branch Office WAN connection
c. Determine total number of users to be supported over SMGR WAN connection
d. Calculate WAN connection bandwidth (start with 10 seconds for delay and increase if desired)
“Users” * “4Kb per User”
“10 Seconds Tolerable Delay”
e. Apply bandwidth for SMGR and Branch Office WAN connections
3. Support Configuration Data
a. Configuration Data will operate sufficiently over WAN connection bandwidth greater than 500Kbps from previous two steps
b. Apply 500Kbps minimum to any WAN connections less than 500Kbps
The design guidelines do not distinguish between uplink versus downlink (assumes both directions are the same bandwidth) and any protocol overheads have been included in the guidelines. Calculated bandwidth values may need to round up to align with available rates from Service Providers.
Applying the design guidelines to the solution of Figure 1 the bandwidth for each WAN connection can be calculated as follows:
1. SM-to-SM Data
· Europe Data Center
o 5000 primary users
o WAN connection bandwidth: 5000 * 1Kbps = 5Mbps
· Americas and Asia Data Centers
o 2500 primary users
o WAN connection bandwidth: 2500 * 1Kbps = 2.5Mbps
2. SM Status Data
· Paris Branch
o 300 users
o WAN connection bandwidth: 300 * 4Kb / 10s = 120Kbps
· SMGR
o 10,000 users from SMs and 300 users from branch
o WAN connection bandwidth: 10,300 * 4Kb / 10s ~= 4Mbps
3. Configuration Data
· Paris Branch updated to 500Kbps minimum
Figure 2 then shows an updated view of the SM solution with WAN bandwidth provisioning applied. Note that these values are recommendations for optimal service. It would be possible to operate a functional installation with lower values, at the cost of degraded service.
Figure 2 SM Management Network Design for Geo-Redundant 10,000 User SM Solution
The Avaya Aura solution is extremely tolerant of network latency limitations. Up to 1 second of round trip delay can be tolerated by the solution between any two elements shown in the previous figure
This table reflects the SM port usage as it pertains to the previously described traffic flows - at the time of this writing. For a more up to date reference, please see “Avaya Port Matrix: Avaya Aura® Session Manager”, available on support.avaya.com
konri75