• Home
  • About
  •  

    Join the Online Survey on Disaster Recovery

    August 15th, 2016

    To start things off for the Disaster Recovery Special Interest Group (SIG) described in the previous blog post, the DPCO Committee has put together an online survey of how enterprises are doing data replication and Disaster Recovery and what issues they are encountering. Please join this effort by responding to this brief survey at: https://www.surveymonkey.com/r/W3DRKYD

    THANK YOU in advance for doing this!  It should take less than 5 minutes to complete.


    Data Storage & the Software Defined Data Center

    November 3rd, 2014

    gnagle-color     The overall success of and general acceptance of server virtualization as a way to make servers more fully utilized and agile, have encouraged IT managers to pursue virtualization of the other two major data center functions, storage and networking.  And most recently, the addition of external resources such as the public Cloud to what the IT organization must manage has encouraged taking these trends a step further.  Many in the industry believe that the full software definition of all IT infrastructure, that is a software defined data center (SDDC), should be the end goal, to make all resources capable of fast adaptation to business needs and for the holy grail of open-API-based, application-aware, single-pane-of-glass management.

    So as data storage professionals we must ask:  is storage changing in ways that will make it ready to be a full participant in the software defined data center? And what new storage techniques are now being proven to offer the agility, scalability and cost effectiveness that are sought by those embracing the SDDC?

    These questions can best be answered by examining the current state of software defined storage (SDS) and how it is being integrated with other aspects of the SDDC. SDS does for storage what virtualization did for servers—breaking down the physical barriers that bind data to specific hardware. Using SDS, storage repositories can now be made up of high volume, industry standard hardware, where “white boxes,” typically in the form of multiple CPU, Open Compute Project servers with a number of solid-state and/or spinning disks, perform storage tasks that formerly required specialized disk controller hardware.  This is similar to what is beginning to happen to network switches under software defined networking (SDN).  And in another parallel to the SDN world, the software used in SDS is coming both from open source communities such as the OpenStack Swift design for object storage and from traditional storage vendors such as EMC’s ViPR and NetApp’s clustered Data ONTAP, or from hypervisor vendors such as VMware and Microsoft.  Making industry standard hardware handle the performance and high availability requirements of enterprise storage is being done by applying clustering technologies, both local and geographically distributed, to storage – again with object storage in the lead, but new techniques are also making this possible for more traditional file systems.  And combining geographically distributed storage clusters with snapshot may well eliminate the need for traditional types of data protection in the form of backup and disaster recovery.

    Integrating storage devices, SDS or traditional, into the rest of the data center requires protocols that facilitate either direct connections of storage to application servers, or networked connections.  And as storage clustering gains traction, networking is the logical choice, with high speed Ethernet, such as 10 Gigabit per second (10 GbE) and 40 GbE increasingly dominant and the new 25 GbE coming along as well.  Given this convergence – the use of the same basic networking standards for all networking requirements, SAN or NAS, LANs, and WANs – storage will integrate quite readily over time into the increasingly accepted SDN technologies that are enabling networking to become a full participant in the virtualization and cloud era.  One trend that will bring SDS and SDN together is going to be the increasing popularity of private and hybrid clouds, since development of a private cloud, when done right, gives an IT organization pretty close to a “clean slate” on which to build new infrastructure and new management techniques — an opportune time to begin testing and using SDS.

    Industry trends in servers, storage and networking, then, are heading in the right direction to make possible comprehensive, policy-driven management of the software defined data center.  However, despite the strong desire by IT managers and their C-level bosses for more agile and manageable data centers, a lot of money is still being spent just to maintain existing storage infrastructure, such as Fibre Channel.  So any organization that has set its sights on embracing the SDDC should start NOW to steadily convert its storage infrastructure to the kinds of devices and connectivity that are being proven in cloud environments – both by public cloud providers and by organizations that are taking a clean-slate approach to developing private and hybrid clouds.

     


    Data Reduction Research Notes

    March 13th, 2012

    With the continuing system enterprise data growth rates, which in some areas may even exceed 100% year over year, according to the IDC, many technical approaches to reducing overall storage needs are being investigated. The following is a short review of the areas in which interesting technical solutions have been implemented. One primary technique which has been receiving a lot of attention involves ‘Deduplication’ technology, which can be divided into many areas. Some papers covering deduplication overviews are currently available in the DPCO presentation & tutorial page, at http://www.snia.org/forums/dpco/knowledge/pres_tutorials. A new presentation by Gene Nagle (the current chairman of the DPCO) and Thomas Rivera will be posted there soon, and will be presented at the upcoming spring 2012 SNW conference.

    Other areas which have been investigated involve storage management, rather than concentrating on data reduction. This involves implementing storage tiers, as well as creating new technologies, such as Virtual Tape Libraries and Solid State Devices, in order to ease the implementation of various tiers. Here are the areas which seem to have had quite a bit of activity.

    Data reduction areas

    • Compression
    • Thin Provisioning
    • Deduplication, which includes
    o File deduplication
    o Block deduplication
    o Delta block optimization
    o Application Aware deduplication
    o Inline vs. Post processing deduplication
    o Virtual Tape Library (VTL) deduplication

    Storage Tiering

    Tiered storage arranges various storage components in a structured organization, in order to have data storage automatically migrated between storage components which have significantly different performance as well as cost. These components are quite variable, based on performance characteristics and throughput, location with regards to the servers, overall cost, media types, and other issues. The policies based on these parameters which are developed to define each tier will have significant effects, since these policies determine the movement of data within the various tiers, and the resulting accessibility of that data. An overview of Storage Tiering, called “What’s Old Is New Again”, written by Larry Freeman, is available in this DPCO blog, and he will also be giving a related presentation at the Spring 2012 SNW.

    SSD and Cache Management

    Solid state memory has become quite popular, since it has such high retrieval performance rate, and can be used both as much larger cache implementation than before, as well as the top level for tiered storage. A good discussion of this is at http://www.informationweek.com/blog/231901631

    VTL

    Storage presented as a virtual tape library will allow integration with current backup software, using various direct attach or network connections, such as SAS, FibreChannel, or iSCSI. A nice overview is at http://searchdatabackup.techtarget.com/feature/Virtual-tape-library-VTL-data-deduplication-FAQ.

    Thin Provisioning

    Thin provisioning is a storage reduction technology which uses storage virtualization to reduce overall usage; for a brief review, see http://www.symantec.com/content/en/us/enterprise/white_papers/b-idc_exec_brief_thin_provisioning_WP.en-us.pdf

    Deduplication Characteristics & Performance Issues

    When looking at the overall coverage of deduplication techniques, it appears that file level deduplication can cover a high percentage of the overall storage, which may offer a simpler and quicker solution for data reduction. Block level deduplication may introduce bigger performance and support issues and will add a layer of indirection, in addition to de-linearizing data placement, but it is needed for some files, such as VM & filesystem images. In addition, when performing deduplication on backup storage, this may not be a severe issue.

    One deduplication technique called sparse file support, where chunks of zeros are mapped by marking their existence in metadata, is available in NTFS, XFS, and the ext4 file systems, among others. In addition, the Single Instance Storage (SIS) technique, which replaces duplicate files with copy-on-write links, is useful and performs well.

    Source side deduplication is complex; storage side deduplication is much simpler, so implementing deduplication at the storage site, rather than at the server site, may be preferable. In addition, global deduplication in clustered environments or SAN/NAS environments can be quite complex, and may lead to fragmentation, so local deduplication, operating within each storage node, is a simpler solution. It uses a hybrid duplicate detection model aiming for file-level deduplication, and reverting to segment level deduplication only when necessary. This reduces the global problems to simple routing issues, so that the incoming files are routed to the node which has the highest likelyhood of possessing a duplicate copy of the file, or of parts of the file.

    See “A Study of Practical Deduplication”, given the best paper award at USENIX Fast 2011: http://www.usenix.org/events/fast11/tech/full_papers/Meyer.pdf. It has references to other papers which discuss various experiments and measurements with deduplication and other data reduction techniques. Also, look at various metrics, discussed in “Tradeoff in Scalable Data Routing for Deduplication Clusters” at http://www.usenix.org/events/fast11/tech/full_papers/Dong.pdf


    What’s Old is New Again: Storage Tiering

    October 3rd, 2011

    Storage tiering is nothing new but then again is all new. Traditionally, tiering meant that you’d buy fast (Tier One) storage arrays, based on 15K Fibre Channel drives, for your really important applications. Next you’d buy some slower (Tier Two) storage arrays, based on SATA drives, for your not-so-important applications. Finally you’d buy a (Tier Three) tape library or VTL to house your backups. This is how most people have accomplished storage tiering for the past couple of decades, with slight variations. For instance I’ve talked to some companies that had as many as six tiers when they added their remote offices and disaster recovery sites – these were very large users with very large storage requirements who could justify breaking the main three tiers into sub-tiers.

    Whether you categorized your storage into three or six tiers, the basic definition of a tier has historically been a collection of storage silos with particular cost and performance attributes that made them appropriate for certain workloads. Recent developments, however, have changed this age-old paradigm:

    1) The post-recession economy has driven IT organizations to look for ways to cut costs by improving storage utilization
    2) The introduction of the SSD offers intriguing performance but a higher cost than most can afford
    3) Evolving storage array intelligence now automates the placement of “hot” data without human intervention

    These three events lead to a rebirth of sorts in tiering, in the form of Automated Storage Tiering. This style of tiering allows the use of new components like SSD without breaking the bank. Assuming that for any given workload, a small percentage of data is accessed very frequently, Automated tiering allows the use of high performance components for that data only, while the less-frequently accessed data can be automatically stored on more economical media.

    As with any new technology, or in this case a new technique, vendors are approaching automated tiering from different angles. This is good for consumers in the long run (the best implementations will eventually win out) but in the short run creates some confusion when determining which vendor you should align you and your data with.

    As a result, automated storage tiering is getting quite a bit of press from vendors and industry analysts alike. For example, here are two pieces that appeared recently:

    Information Week Storage Virtualization Tour – All About Automated Tiering
    Business Week – Auto Tiering Crucial to Storage Efficiency

    SNIA is also interested in helping clear any confusion around automated storage tiering. This week the DPCO committee will host a live webcast panel of tiering vendors to discuss the pros and cons of tiering within the scope of their products, you can register for it here: Sign up

    Join this session and learn more about similarities and differences in various tiering implementations. We hope to see some “lively” interaction, so join the tiering discussion and get your questions answered.

    See you there!

    Larry

    PS – If you can’t make this week’s Webcast, we’ll also be recording it and you’ll be able to view it from the DPCO website