Dec 282011

With the new release of PS Firmware Revision 5.1, Dell introduced more advanced data management options. At the core are 3 load balancing layers that ensure high-performance while balancing capacity value. This post describes the 3 layers and how they can be applied.

EqualLogic Load Balancers in PS Series Pools

When you initialize the first array and create a PS series group, a default pool is automatically established. After adding an array to the group, it is referred to as a member of the group. All members are initially placed into the default pool, and administrators subsequently deploy volumes from this pool. It is within a pool that resources such as network bandwidth, disk capacity, and I/O are balanced automatically. Multiple pools can be created to isolate volumes and separate members. This may be done for a variety of reasons, including technical (e.g. placing specific application data on resources such as SSD) or business reasons (e.g. ensuring that legal department data is isolated from the data from other departments.) With more than one pool administrators can initiate moving volumes or members between the pools seamlessly, with no downtime to the applications. Within a pool, Dell’s EqualLogic PS Series is designed to automate the placement of data to maximize the utilization of the resources that the customer has chosen for their SAN.

There are three load balancers that operate within a pool:

  • The NLB (Network Load Balancer) manages the assignment of individual iSCSI connections to Ethernet ports on the pool members
  • The CLB (Capacity Load Balancer) manages the utilization of the disk capacity in the pool
  • The APLB (Automatic Performance Load Balancer) manages the distribution of high I/O data within the pool.

How the Network Load Balancer (NLB) Works

Communications between application servers (iSCSI initiators) and volumes (iSCSI targets) are called connections. An EqualLogic PS series group will present all iSCSI targets through a single virtual address known as the group IP address. This allows administrators to establish connections easily by only having to configure the iSCSI initiator with the group IP address. As the load increases or decreases on the various Ethernet ports, the NLB automatically distributes connections among the active Ethernet ports of the members using a feature of the iSCSI specification called redirection. Redirection defines how the iSCSI target instructs the iSCSI initiator to log out and close the connection to the IP address that it is currently using and immediately log in to another address and establish a new connection. Support for redirection is required for iSCSI initiators by the iSCSI specification. Redirection is utilized by the NLB within an EqualLogic PS Series group to permit the application server to establish iSCSI connections as needed without first needing to be updated manually to know all of the possible IP addresses that the SAN is using. Leveraging redirection, the NLB ensures that all the network interfaces within the SAN are optimally used. The NLB and iSCSI connection redirection are also key functions used by the PS Series architecture to enable volumes and members to migrate seamlessly from one pool to another, and permit members to join or leave the group as required with no interruption in service to the applications.

The NLB should not be confused with Multi-Path I/O (MPIO), which is the load-balancing that occurs on the application host. MPIO uses redundant physical interfaces to deliver high availability to shared storage. Using MPIO, servers can send multiple I/O streams to SAN volumes. Each of these paths uses an iSCSI connection that is managed by the NLB.

In addition to the standard functionality provided by MPIO, Dell provides host tools to enhance the performance of MPIO and to automatically manage the connections for Windows (including Hyper-V), VMware and Linux environments.

How the Capacity Load Balancer (CLB) Works

The CLB ensures that as volumes are created and deleted, and as members are added to and removed from a pool, the relative percent of capacity in use is maintained at a consistent level among the members in that pool. Keeping the members in the pool filled to the same percentage of their disk capacity helps to ensure that all of the resources in the pool are used equally, and helps avoid overloading one member compared to another. It can also help ensure that members have the necessary free space available to perform other tasks such as replication and internal maintenance properly.

When the CLB assigns a portion of a volume to an array, it is called a slice. The CLB will attempt to satisfy the capacity needs of each volume with a distribution policy that typically limits the number of slices per volume to three. More than three slices will only be created when the capacity requirements of a volume cannot be satisfied with three slices.

Most administrators choose the default “Automatic” RAID preference setting for the majority of their volumes. The CLB will normally choose the members to use without regard to RAID level unless the administrator selects a specific RAID preference type for the volume (for example, RAID6).

If an administrator chooses a specific RAID type and it is available in the pool, the CLB attempts to honor the preference request and place the volume on members with the requested RAID type. As long as all of the volumes that are requesting a particular RAID type can be accommodated on members of that RAID type they will be, even if this results in the members of the pool with the requested RAID type having higher capacity utilization than other members of the pool. If the request cannot be honored because there is insufficient capacity available (or no members) at the requested RAID type, volumes will be placed on other resources in the pool as if the RAID preference had been set to “Automatic”. Setting RAID preference for a volume in an existing environment may cause other volumes with their RAID preference set to “Automatic” to have their slices moved to members other than the ones that they resided on prior to the change.

When the CLB needs to re-adjust the distribution of the data in the pool, it creates a rebalance plan (RBP). Some examples of when a RBP is created are in response to either a change in the resources available in the pool (e.g. adding or removing a member), or a change in the way that the current resources are used (e.g. adding a volume, changing a snapshot or replica reserve, modifying delegated space for the replicas from another PS Series group, or due to the growth of a thin provisioned resource). An RBP is influenced by any RAID preference settings for the volumes in the pool and will, when possible, honor RAID preference settings for volumes as discussed above. As resource usage is optimized, an RBP may temporarily create a capacity imbalance, but after the RBP is executed the imbalance will be rectified.

Similar to an RBP, the CLB can also create free-space-trouble plans (FSTP). An FSTP is created when the CLB determines that a pool member has reached a critical point (10% free space) and there is free space available on other members in the pool. An FSTP will cancel other RBPs. Once the low space issue that prompted the FSTP has been resolved, the CLB will create new RBPs if they are required.

All data movement, regardless of whether caused by an RBP or FSTP, is handled in a transactional manner, i.e., data is only removed from the source of the transfer and internal metadata that tracks the location of the data is updated only after its receipt is confirmed by the target of the transfer.

How the Automatic Performance Load Balancer (APLB) Works

The APLB feature is designed to help alleviate the difficulties inherent in manually balancing the utilization of SAN performance resources. Operating on the resources in a pool, The APLB is capable of adjusting to dynamic workloads in real time and at a sub-volume level. It will provide both sub-volume based tiering when presented with heterogeneous or tiered resources to work with, as well as hot spot elimination when presented with homogeneous resources in the pool.

The APLB optimizes resources in an EqualLogic PS Series pool based on how the applications are actually using the SAN resources. Once the slices have been assigned to members in the PS Series pool by the CLB and I/O begins, certain patterns of access may develop.3 Due to the random nature of I/O these access patterns are often unbalanced, which while perfectly normal, may place more demand on certain EqualLogic PS Series members than on others. Often, the imbalance will occur within the same volume, with portions of the volume exhibiting high I/O, while other portions of the volume exhibit low I/O. This imbalance can be detected and corrected by the APLB.

In an EqualLogic PS Series pool, all other EqualLogic PS Series products can adjust to this potential imbalance in latency: in the event that a workload causes a particular PS Series member to exhibit relatively high latencies compared to other members of the same pool, the APLB will be able to detect and correct this imbalance and by exchanging high I/O data from the PS Series member with high latency for low I/O data from a peer with low latency. This rebalancing results in better resource utilization and an overall improvement in the performance of all of the applications using the resource of the EqualLogic pool.

The APLB is surprisingly simple in its concept and execution, leveraging various aspects of the EqualLogic architecture to automatically balance the performance delivered to applications by the PS Series SAN. For example, the rebalance plans that the CLB uses to re-adjust the placement of data, are leveraged by the APLB as well. Instead of the typical one-way movement that the CLB usually performs, movement of data in the RBPs that the APLB creates is typically a two-way exchange between PS Series members to ensure that after a performance rebalance operation the capacity balance is still maintained.

As with all EqualLogic management tasks, the APLB runs with a lower priority than the processing of application I/O. Every few minutes, the APLB analyzes the range of latencies of member arrays in an EqualLogic pool, and determines if any of the members have a significantly higher latency (20 ms or greater) than the latency of the lowest latency members(s) in the pool. If it does, the APLB will attempt to identify workloads that could be rebalanced by moving high I/O data to less heavily loaded members (i.e. those with lower latency). If any are identified, then a RBP will be created to exchange a portion of the high I/O data from the member with high latency with an equivalent amount of low I/O data with one of its peers supporting the workload that has been selected for rebalancing. The peer member that is chosen for the data exchange will be one of the other members in the pool already supporting a slice of the volume that has been selected to be rebalanced.

When the APLB is presented with more than one option for rebalancing, i.e., the volume selected for rebalancing has slices on two other members in a larger pool, and the latency of both options is similar, then the APLB will use a second criteria to make the determination. This second criteria is the relative “busyness” of the arrays, which is a composite score of factors such as RAID type, disk speed, number of disks, as well as EqualLogic controller type and the current I/O load. The array with the lower relative busyness will become the array chosen for data exchange.

The APLB works well in a variety of environments. For example, in EqualLogic pools with members displaying similar performance characteristics, the net effect is to eliminate “hot spots” in the pool. In pools with members displaying dissimilar performance characteristics (for example arrays with different drive types), the net result is tiering of the data such that the bulk of the active data will be serviced by the array(s) with the most I/O capability.

The data that is used to determine what portion of the workload is high I/O is based on recent activity, (on the order of minutes) so the APLB is able to adapt to a change in an application I/O pattern quickly. The APLB is also dynamic, constantly evaluating the environment and making small adjustments as required. Once an application has reduced its demand for resources the APLB does not continue to “optimize” the formerly active data.

The advantages of the APLB approach are fourfold:

  • Seamless support of 24/7 business activities: By adjusting incrementally there are no large batch movements of data. Instead, the APLB spreads the overhead of rebalancing into small operations through the day instead of in one large activity.
  • Ability to adjust to cyclical or one-time workload changes: By evaluating a relatively recent window of activity, the APLB detects the temporary nature of certain increases in I/O load (such as end of month financial activity), and they don’t continue to influence the balancing of data after they are no longer relevant.
  • Reduction of “worst case scenario” purchasing: By working continually, the APLB can detect and act on cyclical business processes, such as increased end of month activity by the finance group enabling the resources of the SAN to be leveraged in near-real-time. This may enable IT management to purchase fewer resources since each application can better leverage the storage when it needs it most.
  • Future-proofed load-balancing: Finally, by using latency as the primary criteria, the APLB does not need to explicitly evaluate any other details of the storage, such as disk type (e.g. SAS vs. SATA), spindle speed, number of disks, or EqualLogic controller type. This makes the APLB a very simple and robust mechanism that does not need to be re-trained when new hardware configurations are introduced to the EqualLogic product line. This also ensures that when unplanned events occur that may influence the ability of certain arrays to serve their workload (e.g., RAID rebuild or bad NIC) that the system automatically compensates.

Tiering with the APLB

When provided with tiered resources in a pool, for example arrays with different spindle speeds or set to different RAID types, the APLB is able to use them to tier the workload. This is not limited to any particular RAID type, interface I/O type, spindle speed, number of disks, or EqualLogic controller generation since the use of latency as the primary factor when deciding when to rebalance the workload abstracts all of these factors. The ability to tier gives the customer greater flexibility in selecting products that provide the correct combination of performance and capacity for their environment since any of the factors above could be relevant to creating differences in latency between PS Series members in a pool. For example, combining large capacity PS65x0 class arrays with lower capacity PS60x0 arrays using disks that provide higher I/O to get better total system ROI may be the appropriate design for some customers. Others might choose to combine members with 10K SAS and members with SSD to meet their application workload requirements. Many other configurations are possible, these are simply examples.

 Posted by at 10:34 am