How all-flash hyperconverged infrastructure is the future for data centers
In March, 2015 http://cc.readytalk.com/play?id=7fkdq9of Gridstore co-hosted a webinar with industry analyst Arun Taneja, founder of the Taneja Group, on how all-flash hyperconverged infrastructure is the future for data centers, providing the performance, simplicity, scalability and cost/benefit IT managers need. You can see a recording of the webinar here:
knew it was a timely discussion by the responses of the audience. They had not only hundreds register for the event, but many excellent questions posed during it.
We thought others might be interested in those questions and their answers, so we posting them here.
Q: Would you speak briefly to your advantage or difference with Nutanix?
A: There are three main differences between Gridstore and Nutanix.
1) Superior Economics
Due to how our architecture scales, our solution requires 50% less infrastructure to offer the same usable capacity and protection level. This translates into 50% lower capital costs and 50% lower ongoing costs (space, power, cooling – 50% less stuff that you need to manage).
2) Superior Performance
We operate in the Kernel of windows – we are not a bolt on VM running a full NFS server as a guest. We are purpose built for the Windows platform. Nutanix support of Windows is an afterthought when VMware brought to market their EVO solution (which also works in the Kernel the same as Gridstore – in order to offer the best performance while consuming the least amount of CPU and RAM from guest VMs.).
Gridstore also only offers All-Flash HyperConverged Systems. Due to our architecture, our All-Flash HCI is lower cost than comparable Nutanix systems based on Hybrid Storage (SATA storage + Flash Storage – typically about 10% of the capacity).
All-Flash performance compared to Hybrid Storage ranges from being about 40% faster (for a very small working set) to as much as 58X faster for a large work set. Flash will give you predictable and consistent high performance across the entire data set – not just the portion that fits into the cache.
3) Superior Scaling
Nutanix forces customers to scale compute and storage together. This has been one of the biggest customer complaints against Nutanix. It’s really easy to get started, and then you realize you are adding full compute stacks including the full licensing stack on top of it (because they require a HyperVisor to be deployed on) – now at the rate of storage growth.
For decades, storage has grown as 5-6X the rate of compute. This is accelerating. So growing you compute and license cost at the rate of storage is a massive problem that will hit you later.
Again, the Gridstore architecture makes the difference here. We run bare metal which in this context means that we do not require a hypervisor to run on. So we can put our software on to any hardware appliance and have it add that storage into the pool. We do not require the CPU or RAM to run a full hypervisor or an NFS host requiring 32GB.
We also do not depend on data locality like Nutanix does, they must have local copies of the data to get decent performance. This is one of the main reasons you are forced to scale compute and storage together. Effectively, Nutanix depends on direct attached storage for performance.
Gridstore does not have this design flaw. Our name is for a reason, we have a true grid architecture. This allows us to add storage resources to the grid independent of the compute tier. And we leverage parallel I/O and flash storage to deliver performance that is far superior to Nutanix. Our architecture also allows you to place our software on your existing hosts, so they can access that same high performance flash tier of storage.
So we not only allow you to scale compute and storage independently, we allow you to fully utilize your existing infrastructure investments in the same grid.
Q: Is Gridstore going to support VMware? I know you support Hyper-V.
A: At this point in time, we have no plans to offer a VMware based HyperConverged system. However, our systems can easily coexist in mixed environments and provide iSCSI based all-flash storage to VMware based VMs.
For HyperConverged on VMware – VMware EVO is the dominant HCI solution in that market. Gridstore intends to be the dominant HCI platform in the MSFT Hyper-V market. Hyper-V is not an afterthought for us (like our one other competitor in this
market) – we are purpose built for Hyper-V and this brings significant advantages over any of our competition. The only reason these competitors are supporting Hyper-V is because the writing is on the wall for them due to VMware EVO. We expect to see more competitors try to make that leap across as their market share in VMware declines. All of them have the same architecture and in comparison, we offer superior economics, superior performance, and superior scaling due to our architecture that is purpose build for Hyper-V.
Q: Which solution to use – Hybrid vs. Flash?
A: We only recommend Flash for your primary tier 1 storage. This gives you consistent and predictable high performance across the entire data set – not just 10-20% of it that a flash cache can cover. Due to our architecture, we are able to offer this at less cost than our competitor’s hybrid storage based systems.
We do also sell hybrid storage and we only recommend this for Tier 2/3 storage for things like File, Backup, Archive, and DR. This is why we understand the performance characteristics of both very well. Eventually environments grow and eventually the working set is not contained within the flash cache. Once that occurs, every I/O not in the cache gets the performance of SATA disk.
Q: Flash vs. Hybrid HCA – is it really the extra ‘thinking’ for hybrid, or is it slowness of disk?
A: What I was referring to when I made that comment was that when the working set fits nicely inside the hybrid flash – the 40% performance difference is the additional work being done by the caching algorithms. For every I/O, they pause and work out where this block is, then get it. When the system is All-Flash, there is no thinking involved, it’s always in the same place, and there is no pause to decide where to look. There is less CPU and RAM consumed which is why even when ultimately the block is coming from Flash, there is a difference in the execution path.
As the working set gets larger, then your comment above is the main reason and the performance drops off considerably. When the I/Os start to come more and more from SATA disk, the performance drops off a cliff. The performance delta grows from about 40% for small working sets that fit in the cache to about 58X when the working set exceeds the cache.
Q: How long is your Flash guarantee – 5 years? And how do you provide on-site service?
A: Our flash is guaranteed to last for 3.6 full drive writes per day for 5 years. This is really the wear on flash. And the reality is that it would be practically impossible for anyone to write that much data every day for this period. A fully populated appliance comes with approx 24TB raw flash, yielding roughly 18TB. You would need to write about 65TB per day for 5 years or 118 PB over that time to wear out the flash.
If your flash wears out within that time frame, we will replace it. All components in our systems are hot swap. We offer next business day, 4 hour parts and service and/or onsite spares kits to give you immediate replacement.
Q: What would be an example of a “perfect” business case to use your product, what would run the best application wise, main target of your product?
A: The honest answer – is mixed workloads. Private Clouds that run multiple workloads. Saying that, many of our customers start with a single workload where we can accelerate performance such as VDI, SQL Server consolidation, OLTP systems, or DevOps (test/dev). The good thing is that you can start with a single workload, then scale out adding more workloads – but manage everything as a single infrastructure – not silos. So this gets us back to mixed workloads eventually.
Q: Does System Centre run within the hyper converged infrastructure or is that deployed on separate ‘management’ servers outside the Gridstore appliances?
A: Yes. System Center runs within the same infrastructure. It has several management VMs (different components of System Center). These can run on any host within the infrastructure. We also provide a standalone GUI that allows you to manage all aspects of Gridstore if you do not use Systems Center. We also provide a RestAPI that allows any other management system to drive functions such as provisioning storage to a host in the same way that Systems Center works.
Q: Is the Gridstore appliance intended to be used in private cloud environments or in public clouds like AWS or MS?
A: Both. We have customers with large private clouds and we have large service provider customer who run on our infrastructure and provide service to their end customers. These tend to be Tier 2 service providers (unfortunately not Amazon
Q: Can you explain how you eliminate Back Up with Hyperconvergance?
A: We do not eliminate backup. We eliminate the need for separate backup appliances, dedupe appliances, Cloud Gateway appliances. All of this is built into the environment and policy based. It eliminates dealing with multiple different vendors as well as the cost and complexity. These services are available per VM so you can chose which VM for example gets a DR Service and this then replicates according to policy up to Azure (or another data center if you have one).
Q: Are you eliminating hardware/systems or just managing existing or diverse systems through a portal or virtual desktop?
A: We are eliminating the separate layers of storage and SAN. We converge this to a single tier and in doing so, reduce cost and complexity. And because this is an All-Flash architecture, we drive very high performance.
Q: Would you use a separate node to address the separation of application and data when required (i.e. controlled systems/data vs uncontrolled systems/data?
A: The application layer (in a VM) sees only a virtual disk within the storage pool that is managed by the hypervisors. The VM can move anywhere on the infrastructure and have seamless access to its virtual disks. It does not matter if any of the storage resource is coming from the same host. The storage is pooled and abstracted so VMs (apps) do not know the difference of where the storage is provided from or care.
Q: I have heard some vendors claim their version of dedupe is quicker than most others. What is your retort here?
A: Any vendor that claims their inline dedupe does not impact performance is either simply lying to you or they really have no clue what happens in a system. You cannot just consume CPU cycles and RAM in the middle of an I/O. If I eliminate that inline
dedupe in the middle of the I/O – my system has more resource to do more with. There is no magic here.
In HyperConverged – Dedupe should be used post process. Regardless of how fast your dedupe is, it impacts system performance which in Hyperconverged impacts your VMs by reducing available CPU and RAM and ultimately you pay for this.
As I pointed out, we do not need inline dedupe to extend the life of flash (or to artificially claim lower $/GB flash). Our flash is guaranteed to last and because of our economics, we do not need to trick people with artificial / theoretical pricing. Our priority is to preserve CPU and RAM for VMs – otherwise your TCO goes up.
We still give you the option to run Dedupe if you want to maximize the flash capacity. You can do this on a scheduled basis or when the systems are less busy and there will be no impact to your workloads.
Final point here, the dedupe/compression ratios are all the same, we all use the same algorithms. So the only question is when do you want these to run and on what data. VDI has great dedupe ratios, SQL Server has practically zero.
Q: Does the Gridstore storage support CIFs volumes?
A: Yes. We provide SCSI block devices to the hosts. The host can mount and put any file system it wants on this. The host then provides the CIFS/SMB services to its clients.
Q: Does the Gridstore VDI hyperconverged solution give the option for NVidia add-on graphics? I run a VDI solution with the full Adobe suite, and need the graphic power for my users.
A: Yes. There is one available PCIe slot per server node. I assume in your question that you are referring to the NVidia GRID cards. These are supported both by our hardware and the Microsoft or Citrix virtualization stacks.
Q: What do companies typically do with their legacy equipment that still has useful life?
A: Legacy infrastructure (both servers and storage) can continue to operate as part of the environment. Unlike our competitors, our goal is to NOT create another silo. I did not go into the details of our architecture, but our vController (software driver that provides a SCSI block device to the host) can sit on any existing Windows Server or host. These hosts can then have access to All-Flash storage in the pool the same as our HyperConverged Nodes. Our HyperConverged nodes can also access any other storage. We do not pool that storage as part of ours, but you have full access to any legacy storage environments. All of this infrastructure can then be managed through a single management console like MSFT Systems Center.
Q: Is this all OS Agnostic or does it require a specific version of Linux or Windows? Also, will it allow instances of Mac OS-X?
A: Our platform runs on Windows Server / Hyper-V. Hyper-V can run any OS you require in a VM.
Q: Leading on from that O/S question. Can we run native Windows on the nodes or only virtual machines in Hyper-V?
A: We run on any host that runs Windows Server 2008R2 or later. Hosts can be either physical hosts for example running SQL Server or Exchange, or they can be virtualized hosts running anything supported by the Hyper-Visor including Linux.
Q: E.g. for SQL /File clusters?
A: Yes – we run bare metal on the server and offer excellent performance for SQL Server applications typically with around 1ms latency for small I/Os.
Q: Are these numbers for SLC or MLC?
A: These are eMLC. Guaranteed for 3.6 Full Drive write per day for 5 Years. This is for 100% Random 4K writes.
So for a 1TB drive, you can write 3.6TB per day for 5 years or a total of 6.5PB.
For sequential I/O – this goes up to 22 full drive writes per day for 5 years. I don’t need to do the math – that’s a lot of data.
Finally, as I pointed out in the webinar – these numbers do NOT assume any data reduction such as dedupe or compression.
Q: Is there a single point of failure in the replicated environment? What if there is a double disk failure containing same data in the replicated VMs? Does Flashcache cache all write and read IOs? Or it is intelligent enough to ignore non-critical IOs?
A: I’m not sure if you’re referring to Gridstore here or our competitors. We do not use replication or competitors do.
For our solution, we are fault tolerant to multiple failures. This is configurable. There is no single point of failure nor is there a single point of control that creates bottlenecks. Any caching that we do in the host is for reads only. All writes are persisted to storage. No state is held on the host side.
Q: Do you need to start with predictable performance for SQL or will it scale as needed SQL Data Warehousing needs in the converge architecture
A: You can scale incrementally either compute or storage as your needs grow. Our platform is excellent at SQL Server consolidation, BI, data warehousing. We are working with Microsoft to develop a complete BI in a box. Which you can then scale incrementally as your environment requires.