Cisco UCS in a Flash


So, have you heard the news? The world’s fastest growing x86 server company is joining forces with the world’s fastest application acceleration company? While the partnership on the blade front is somewhat recent news, Cisco has long been supporting Fusion-io accelerator products in our C-Series servers (and will continue to do so). In June 2012, Fusion-io and Cisco inked a deal that would extend our partnership to cover UCS B-Series Blade servers as well. It’s not simply changing the form-factor and connector; it’s also extending UCS Manager to include integrated support for the cards for inventory and management purposes. To read more on the partnership, look here:

The Hardware: Initially, Cisco will be incorporating the 365GB and the 785GB Fusion ioDrive2 starting with the M3 line of blade servers. Both products offer persistent NAND flash-hardware combined with custom software that offers extreme low-latency, high performance, and a proven track record of reliability.

When I first heard of Fusion-io, I had the same though that I’m sure many of you have right now. It basically goes something like this “so they make some really fast SSD cards, so what?” But after an intense 1.5 days of training and interviews at Fusion-io headquarters in Salt Lake City, I have come to learn it’s quite a bit more than that. Fusion ioDrives are based on MLC NAND flash memory. That last sentence is loaded with terms begging for explanation, so I won’t leave you hanging…

Geek Speak: SLC vs MLC:
Flash memory stores data in an array of cells. There are two main types in use today: SLC (single level cell), and MLC (multi-level cell). Both have their pros and cons, but MLC is king when it comes to density (because more data is stored in each cell) and the cost per bit of MLC is lower. Traditional thinking is that SLC is king when it comes to reliability, but technology improvements have now driven MLC reliability to be on par with SLC.

NAND vs NOR: The invention of flash memory grew from traditional EEPROM technology. Traditional EEPROM would not be a good fit for many of the uses we use flash for today. Here’s why: Imagine if you had a USB flash drive that contained a large unneeded file. If you delete the file, you can fit your new data onto the drive, but you were not allowed to erase a single file on the drive unless you were willing to delete ALL files on the drive and re-write the ones you want. Big limitation. Both NOR & NAND (Not OR & Not AND if you’re wondering) overcome this and could be used for ever-day application use (cameras, phones, PDAs, tablets, media players, etc), but NAND is by far the most popular due to its better density and lower cost. The most important difference between NOR and NAND is how they handle reads: NOR is true Random Access when it comes to reads and NAND requires page (block) reads which is not the most efficient. But cost often wins in the enterprise space.

The Fusion ioDrive2s in Cisco UCS servers are not implemented as a traditional storage device. The interface to access the storage is not SAS, SATA, or FC. Each of these interfaces would introduce latency inherent in each respective technology. Because the ioDrive2 requires direct access to the server’s PCIe bus, it puts the ioDrive2 as close to the CPU as possible. There is no doubt that an SSD drive from any electronics store is much faster than spinning disk. SSD is implemented as a traditional SATA device which makes it extremely user-friendly, but introduces a great deal of latency due to SATA itself. The following graphic showing the IO path illustrates this problem:


When that same SSD device is put into an array on the storage network, the problem is compounded due to the HBA, storage switch, and raid controller in the array. The following graphic illustrates the problem of this added latency:

In contrast, direct access to the PCI results in extreme low-latency. Consider the following graphic to see the difference:

If I could summarize the importance of this technology, it would come down to this on paragraph: x86 History has shown that the key to making devices faster lies in their proximity to the CPU. In the server world, Distance=Latency. Consider CPU cache which used to be made of SRAM and housed on the motherboard. Beginning with the 486 CPU, the L1 cache was integrated in the CPU die itself. Fast forward a decade or so and we now have 3 levels of cache directly on the CPU die. Another example is in memory architectures. We used to house all system memory in one location behind a single memory controller (Northbridge) and all CPUs would compete for access. AMD entered the server market and introduced a new design that brought banks of memory closer to each CPU and gave each CPU an individual memory controller (NUMA). Intel integrated the memory controller inside the CPU die starting with Nehalem. Lastly, take PCIe in Intel’s E5 line of processors. For the first time, the function handling PCIe itself was integrated into the CPU die as well. This gives many devices in the system a boost in performance. All of these examples illustrate that there is an obvious speed advantage when devices are moved closer to the CPU. So it’s quite paradoxical that we keep storage far away (CPU->HBA->storage switch->raid controller->processor on SSD ->flash controller on SSD). Unless the problem you are solving is storage density (multi-terabyte), there are better ways to crack this nut. These new ioDrive2s eliminate much of this latency and you instead have the CPU talking directly to the flash controller (all ioDrives have a controller). This is incredibly efficient and blazingly fast.

While architectures such as this do not eliminate the need for a SAN, they certainly have the potential to reduce the storage requirements of the array if designed correctly. This fact alone makes Cisco unique in this I/O accelerator space as every other major server OEM offering an I/O accelerator product would delight more in selling you a large storage array, especially if they make it. Cisco is not in the SAN array market and wants to help you build a solution that is truly “best of breed.” If that means a large storage array (or not), we’ll help you architect it.

Is It Worth It? The idea of faster storage access extends to universal application use. It doesn’t matter if you’re run a database, a virtual server farm, VDI, Big Data, or a private cloud of some sort. If you were able to access your data faster – a LOT faster – would you do it? Is it cheap? No. It’s quite frankly not the cheapest option you’ll buy for your server. But it’s quite unlike any option you’ve ever seen and the cost needs to be weighed against other options for increasing access to your storage. Traditionally this is done by adding spindles (DAS or SAN). It is not economically feasible for you to add enough spindles to equal this performance.

We’re supporting Windows 2008 R2/2012, RHEL and SLES, ESX. For the actual versions of each, please see the Cisco UCS Interoperability tool:

I found an interesting blog that compares and Intel enterprise-class SSD with a Fusion ioDrive2 (the same card to be used in UCS blade servers). Coincidentally, the blog authors (Percona) happened to be using a Cisco UCS C250 to compare the results of the two drives, but Cisco had nothing to do with the testing or the results.

Intel SSD 520:


The comparison at its most basic level breaks down to the following:

As you can see the Fusion ioDrive2 more than doubles the SSD performance across the board. Based on data from both companies, I put together following table. Please note that I used statistics for the lower capacity 365GB Fusion ioDrive because the largest Intel SSD is 480GB

Obviously the Intel drive is not lacking in performance and does quite well (I wish the SSD in my laptop was this fast). But again, it’s no comparison to the numbers that the Fusion ioDrive2 put up. Like me, you have probably used an SSD in some form or another and experienced the tremendous speed increase it brings. Imagine a drive that more than doubles that performance…

We’ll be shipping the Fusion ioDrive2 for Cisco UCS blades REALLY soon (it’s real – I saw it with my own eyes. I tried to stick it in my bag but they counted them before and after show-n-tell! That’s an actual picture of the card earlier in the article). In the meantime, your Cisco UCS sales team should be able to assist you with any detailed questions, but as always, your comments and questions are welcome here.

Thanks for reading


7 thoughts on “Cisco UCS in a Flash

  1. This looks great! What a fantastic combo, all my Favorite tech in one place.
    What I would love to see is an implementation of Citrix XenDesktop on VMWare using UCS flexpod where the majority of the io is done on the Fusionio. Going even further it would be even better if the same kit could be used to provide mutli-tenant desktops which intergrate with their respecitve vcloud director secure vshield zones. If this all could be easily provided by private clouds it would truly be an excellent solution.

  2. Shaun/Kenny,

    When I said “M3 blades”, I meant just that – all M3 blades. There is no restriction on full or half slot form factors. Let me know if that does not answer your question.

    Update: I lied (sort of). The full-slot B420 M3 does not support FIO but it will support it in the near future.

  3. But application will not run on the same machine with data (or said storage), because there are lots of applications in enterprise. And separate computing power, share storage… etc. are all necessary.
    So the architecture will also “Server – Network – Storage”, and the only difference is this is “ioDrive-based SSD card” storage, right?

  4. It was hard for me to find your posts in google.

    I found it on 16 position, you have to build a lot of
    quality backlinks , it will help you to get more visitors.
    I know how to help you, just search in google – k2 seo tricks

  5. This seems to be quite promising for HPC environment where one can avoid distributed computing and storage.. one can go for single server architecture? Any idea if benchmarking of these servers with FIO2 has done for EDA applications such as cadence and synopsis, Silicon tapeout process…etc. which is usually IO hungry.. I will be interested to know more details.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.