The hyper-converged market is rapidly growing in today’s climate; the market alone is worth upwards of one billion pounds and is thought to be growing at a rate of 150% every year. Those that do not understand what converged and hyper-converged infrastructure are I will try to explain in its simplest form.
With a non-converged infrastructure you have for example a virtualisation server (with Hyper-V/Xen/KVM etc), which would then connect to some form of data storage via direct attached storage (DAS), storage area network (SAN) or a network attached storage (NAS) device. The virtual machines’ disks are hosted completely separately from the virtualisation server. The storage device will have some form of raid configured and optimised for performance and redundancy, but the key is they are completely separate; you would generally connect multiple virtualisation servers to the storage array.
With a hyper-converged infrastructure, everything is rolled into one, disks are stored on the same server with a storage controller running as a service on each node (you need a minimum of two nodes) which means you can scale your cluster while also maintaining the redundancy and resiliency that a storage device gives you. Storage is then abstracted as a separate layer which is used to create virtual san’s within the same hardware, demonstrated in the picture below:
With Windows Server 2016, hyper-converged is made possible by a feature called Storage Spaces Direct. This technology allows multiple nodes within the same cluster to see each disk as if it was is its own (from every node available in the cluster). Storage Spaces Direct also makes sure that every disk is resilient, so there are at least 2 copies of the data split across multiple nodes. If there is a faulty node or faulty disk, the data from that disk is still intact elsewhere. Storage Spaces Direct acts as a storage controller, which replaces the need for physical hardware raid, although you can still use both for performance gains.
The amount of nodes you have in your cluster determines how resilient you can make your infrastructure and how efficient it is. In a simple 2-node cluster, which is the lowest entry point in this infrastructure (for obvious reasons) you would be limited to a two-way mirror, this allows for a complete failure of one node. To determine which machine is live in the event of a failure you do require an external witness server, this can be anything within your network outside of your cluster, but could also be externally cloud based. A witness server is used to make the deciding vote when a node fails, so if node 1 can not communicate with node 2, and visa versa, the external witness will have the final say as to which node should be active. It is used primarily in even node clusters to ensure that there is always a majority vote in the event of a hardware or network failure with one or multiple nodes. When setting up a 3-node cluster Microsoft recommend a 3-way mirror, this allows failure of one node as well as a failed disk on a second node simultaneously, so an extra layer of redundancy compared to the 2-node cluster, a witness server is not required for this setup as 3 nodes are used so there will always be a deciding vote. A 4-Node cluster allows for dual parity, which adds another layer of redundancy and is the recommended setup by Microsoft for optimal performance. There are also figures to suggest that a four Node-Cluster is 50% more efficient, an 8-Node is 66% more efficient and a 16-Node improves efficiency up to 80% with a full SSD configuration.
There are many other features under the hood that deserve an honourable mention. One example is that you can set priorities on Virtual machines. If you need to take one node down and you do not have the memory to fail all VM’s to a different node you can set different priority levels. What this means is that when a node is put into maintenance, it will always prioritise moving the VM with the highest priority to the next available node. If there is an insufficient amount of memory to move all of the virtual machines then those with a lower priority will pause until the original node is brought back online.
The health service which monitors the state of the drives in your nodes has also been improved over 2012 R2, if a disk fails for any particular reason the end user will be notified, the disk will be highlighted within the node and can be replaced and rebuilt without any intervention (other than physically replacing the disk).
This was just a small glimpse into hyper-convergence and we look forward to rolling this out in the coming months to many of you! If you are interested in this technology, feel free to contact us by email at firstname.lastname@example.org or call us on 0800 0803 200 for more information to discuss your requirements.
By Nick Stears on April 13th, 2017