• Home
  • About

    Join the Online Survey on Disaster Recovery

    August 15th, 2016

    To start things off for the Disaster Recovery Special Interest Group (SIG) described in the previous blog post, the DPCO Committee has put together an online survey of how enterprises are doing data replication and Disaster Recovery and what issues they are encountering. Please join this effort by responding to this brief survey at: https://www.surveymonkey.com/r/W3DRKYD

    THANK YOU in advance for doing this!  It should take less than 5 minutes to complete.

    Got DR Issues? Check out the new Disaster Recovery Special Interest Group

    June 27th, 2016

    The SNIA Data Protection and Capacity Optimization Committee (DPCO) would like to announce the creation of a new, Special Interest Group focusing on Data Replication for Disaster Recovery (DR) Standards. The mission of this SIG is focused on investigating existing ISO standards, carrying out surveys, and studying current guidance in order to identify if there is a need to improve the interoperability and resiliency, and/or education and best practices in the area of data replication for disaster recovery.

    Why are we doing this? There have been a number of industry observations that customers either don’t know about standards that exist, cannot implement them or have other needs relating to DR that warrant exploration. The aim of this group is not to reinvent the wheel but examine what is out there, what can be used by customers and find out whether they are using appropriate standards, and if not why.

    What are we doing? We are starting with a survey to be sent out to as many industry members as possible. The survey will examine replication DR needs that customers have, systems that they have implemented and questions about their knowledge regarding standards and other issues encountered in designing and operating DR, particularly in multi-site, multi-vendor environments.

    What can you do? Get involved, of course! Contact the SNIA DPCO team to indicate your interest as we implement the organization structure for the Data Replication for DR Standards SIG.

    John Olson and Gene Nagle

    Data Storage & the Software Defined Data Center

    November 3rd, 2014

    gnagle-color     The overall success of and general acceptance of server virtualization as a way to make servers more fully utilized and agile, have encouraged IT managers to pursue virtualization of the other two major data center functions, storage and networking.  And most recently, the addition of external resources such as the public Cloud to what the IT organization must manage has encouraged taking these trends a step further.  Many in the industry believe that the full software definition of all IT infrastructure, that is a software defined data center (SDDC), should be the end goal, to make all resources capable of fast adaptation to business needs and for the holy grail of open-API-based, application-aware, single-pane-of-glass management.

    So as data storage professionals we must ask:  is storage changing in ways that will make it ready to be a full participant in the software defined data center? And what new storage techniques are now being proven to offer the agility, scalability and cost effectiveness that are sought by those embracing the SDDC?

    These questions can best be answered by examining the current state of software defined storage (SDS) and how it is being integrated with other aspects of the SDDC. SDS does for storage what virtualization did for servers—breaking down the physical barriers that bind data to specific hardware. Using SDS, storage repositories can now be made up of high volume, industry standard hardware, where “white boxes,” typically in the form of multiple CPU, Open Compute Project servers with a number of solid-state and/or spinning disks, perform storage tasks that formerly required specialized disk controller hardware.  This is similar to what is beginning to happen to network switches under software defined networking (SDN).  And in another parallel to the SDN world, the software used in SDS is coming both from open source communities such as the OpenStack Swift design for object storage and from traditional storage vendors such as EMC’s ViPR and NetApp’s clustered Data ONTAP, or from hypervisor vendors such as VMware and Microsoft.  Making industry standard hardware handle the performance and high availability requirements of enterprise storage is being done by applying clustering technologies, both local and geographically distributed, to storage – again with object storage in the lead, but new techniques are also making this possible for more traditional file systems.  And combining geographically distributed storage clusters with snapshot may well eliminate the need for traditional types of data protection in the form of backup and disaster recovery.

    Integrating storage devices, SDS or traditional, into the rest of the data center requires protocols that facilitate either direct connections of storage to application servers, or networked connections.  And as storage clustering gains traction, networking is the logical choice, with high speed Ethernet, such as 10 Gigabit per second (10 GbE) and 40 GbE increasingly dominant and the new 25 GbE coming along as well.  Given this convergence – the use of the same basic networking standards for all networking requirements, SAN or NAS, LANs, and WANs – storage will integrate quite readily over time into the increasingly accepted SDN technologies that are enabling networking to become a full participant in the virtualization and cloud era.  One trend that will bring SDS and SDN together is going to be the increasing popularity of private and hybrid clouds, since development of a private cloud, when done right, gives an IT organization pretty close to a “clean slate” on which to build new infrastructure and new management techniques — an opportune time to begin testing and using SDS.

    Industry trends in servers, storage and networking, then, are heading in the right direction to make possible comprehensive, policy-driven management of the software defined data center.  However, despite the strong desire by IT managers and their C-level bosses for more agile and manageable data centers, a lot of money is still being spent just to maintain existing storage infrastructure, such as Fibre Channel.  So any organization that has set its sights on embracing the SDDC should start NOW to steadily convert its storage infrastructure to the kinds of devices and connectivity that are being proven in cloud environments – both by public cloud providers and by organizations that are taking a clean-slate approach to developing private and hybrid clouds.


    Data Reduction Research Notes

    March 13th, 2012

    With the continuing system enterprise data growth rates, which in some areas may even exceed 100% year over year, according to the IDC, many technical approaches to reducing overall storage needs are being investigated. The following is a short review of the areas in which interesting technical solutions have been implemented. One primary technique which has been receiving a lot of attention involves ‘Deduplication’ technology, which can be divided into many areas. Some papers covering deduplication overviews are currently available in the DPCO presentation & tutorial page, at http://www.snia.org/forums/dpco/knowledge/pres_tutorials. A new presentation by Gene Nagle (the current chairman of the DPCO) and Thomas Rivera will be posted there soon, and will be presented at the upcoming spring 2012 SNW conference.

    Other areas which have been investigated involve storage management, rather than concentrating on data reduction. This involves implementing storage tiers, as well as creating new technologies, such as Virtual Tape Libraries and Solid State Devices, in order to ease the implementation of various tiers. Here are the areas which seem to have had quite a bit of activity.

    Data reduction areas

    • Compression
    • Thin Provisioning
    • Deduplication, which includes
    o File deduplication
    o Block deduplication
    o Delta block optimization
    o Application Aware deduplication
    o Inline vs. Post processing deduplication
    o Virtual Tape Library (VTL) deduplication

    Storage Tiering

    Tiered storage arranges various storage components in a structured organization, in order to have data storage automatically migrated between storage components which have significantly different performance as well as cost. These components are quite variable, based on performance characteristics and throughput, location with regards to the servers, overall cost, media types, and other issues. The policies based on these parameters which are developed to define each tier will have significant effects, since these policies determine the movement of data within the various tiers, and the resulting accessibility of that data. An overview of Storage Tiering, called “What’s Old Is New Again”, written by Larry Freeman, is available in this DPCO blog, and he will also be giving a related presentation at the Spring 2012 SNW.

    SSD and Cache Management

    Solid state memory has become quite popular, since it has such high retrieval performance rate, and can be used both as much larger cache implementation than before, as well as the top level for tiered storage. A good discussion of this is at http://www.informationweek.com/blog/231901631


    Storage presented as a virtual tape library will allow integration with current backup software, using various direct attach or network connections, such as SAS, FibreChannel, or iSCSI. A nice overview is at http://searchdatabackup.techtarget.com/feature/Virtual-tape-library-VTL-data-deduplication-FAQ.

    Thin Provisioning

    Thin provisioning is a storage reduction technology which uses storage virtualization to reduce overall usage; for a brief review, see http://www.symantec.com/content/en/us/enterprise/white_papers/b-idc_exec_brief_thin_provisioning_WP.en-us.pdf

    Deduplication Characteristics & Performance Issues

    When looking at the overall coverage of deduplication techniques, it appears that file level deduplication can cover a high percentage of the overall storage, which may offer a simpler and quicker solution for data reduction. Block level deduplication may introduce bigger performance and support issues and will add a layer of indirection, in addition to de-linearizing data placement, but it is needed for some files, such as VM & filesystem images. In addition, when performing deduplication on backup storage, this may not be a severe issue.

    One deduplication technique called sparse file support, where chunks of zeros are mapped by marking their existence in metadata, is available in NTFS, XFS, and the ext4 file systems, among others. In addition, the Single Instance Storage (SIS) technique, which replaces duplicate files with copy-on-write links, is useful and performs well.

    Source side deduplication is complex; storage side deduplication is much simpler, so implementing deduplication at the storage site, rather than at the server site, may be preferable. In addition, global deduplication in clustered environments or SAN/NAS environments can be quite complex, and may lead to fragmentation, so local deduplication, operating within each storage node, is a simpler solution. It uses a hybrid duplicate detection model aiming for file-level deduplication, and reverting to segment level deduplication only when necessary. This reduces the global problems to simple routing issues, so that the incoming files are routed to the node which has the highest likelyhood of possessing a duplicate copy of the file, or of parts of the file.

    See “A Study of Practical Deduplication”, given the best paper award at USENIX Fast 2011: http://www.usenix.org/events/fast11/tech/full_papers/Meyer.pdf. It has references to other papers which discuss various experiments and measurements with deduplication and other data reduction techniques. Also, look at various metrics, discussed in “Tradeoff in Scalable Data Routing for Deduplication Clusters” at http://www.usenix.org/events/fast11/tech/full_papers/Dong.pdf

    What’s Old is New Again: Storage Tiering

    October 3rd, 2011

    Storage tiering is nothing new but then again is all new. Traditionally, tiering meant that you’d buy fast (Tier One) storage arrays, based on 15K Fibre Channel drives, for your really important applications. Next you’d buy some slower (Tier Two) storage arrays, based on SATA drives, for your not-so-important applications. Finally you’d buy a (Tier Three) tape library or VTL to house your backups. This is how most people have accomplished storage tiering for the past couple of decades, with slight variations. For instance I’ve talked to some companies that had as many as six tiers when they added their remote offices and disaster recovery sites – these were very large users with very large storage requirements who could justify breaking the main three tiers into sub-tiers.

    Whether you categorized your storage into three or six tiers, the basic definition of a tier has historically been a collection of storage silos with particular cost and performance attributes that made them appropriate for certain workloads. Recent developments, however, have changed this age-old paradigm:

    1) The post-recession economy has driven IT organizations to look for ways to cut costs by improving storage utilization
    2) The introduction of the SSD offers intriguing performance but a higher cost than most can afford
    3) Evolving storage array intelligence now automates the placement of “hot” data without human intervention

    These three events lead to a rebirth of sorts in tiering, in the form of Automated Storage Tiering. This style of tiering allows the use of new components like SSD without breaking the bank. Assuming that for any given workload, a small percentage of data is accessed very frequently, Automated tiering allows the use of high performance components for that data only, while the less-frequently accessed data can be automatically stored on more economical media.

    As with any new technology, or in this case a new technique, vendors are approaching automated tiering from different angles. This is good for consumers in the long run (the best implementations will eventually win out) but in the short run creates some confusion when determining which vendor you should align you and your data with.

    As a result, automated storage tiering is getting quite a bit of press from vendors and industry analysts alike. For example, here are two pieces that appeared recently:

    Information Week Storage Virtualization Tour – All About Automated Tiering
    Business Week – Auto Tiering Crucial to Storage Efficiency

    SNIA is also interested in helping clear any confusion around automated storage tiering. This week the DPCO committee will host a live webcast panel of tiering vendors to discuss the pros and cons of tiering within the scope of their products, you can register for it here: Sign up

    Join this session and learn more about similarities and differences in various tiering implementations. We hope to see some “lively” interaction, so join the tiering discussion and get your questions answered.

    See you there!


    PS – If you can’t make this week’s Webcast, we’ll also be recording it and you’ll be able to view it from the DPCO website

    Trends in Data Protection

    July 29th, 2011

    Data protection hasn’t changed much in a long time.  Sure, there are slews of product announcements and incessant declarations of the “next big thing”, but really, how much have market shares really changed over the past decade?  You’ve got to wonder if new technology is fundamentally improving how data is protected or is simply turning the crank to the next model year.  Are customers locked into the incremental changes proffered by traditional backup vendors or is there a better way?

    Not going to take it anymore

    The major force driving change in the industry has little to do with technology.  People have started to challenge the notion that they, not the computing system, should be responsible for ensuring the integrity of their data.  If they want a prior version of their data, why can’t the system simply provide it?   In essence, customers want to rely on a computing system that just works.  The Howard Beale anchorman in the movie Network personifies the anxiety that burdens customers explicitly managing backups, recoveries, and disaster recovery.  Now don’t get me wrong; it is critical to minimize risk and manage expectations.   But the focus should be on delivering data protection solutions that can simply be ignored.

    Are you just happy to see me?

    The personal computer user is prone to ask “how hard can it be to copy data?”  Ignoring the fact that many such users lose data on a regular basis because they have failed to protect their data at all, the IT professional is well aware of the intricacies of application consistency, the constraints of backup windows, the demands of service levels and scale, and the efficiencies demanded by affordability.    You can be sure that application users that have recovered lost or corrupted data are relieved.  Mae West, posing as a backup administrator, might have said “Is that a LUN in your pocket or are you just happy to see me?”

    In the beginning

    Knowing where the industry has been is a good step in knowing where the industry is going.  When the mainframe was young, application developers carried paper tape or punch cards.  Magnetic tape was used to store application data as well as a media to copy it to. Over time, as magnetic disk became affordable for primary data, the economics of magnetic tape remained compelling as a backup media.  Data protection was incorporated into the operating system through backup/recovery facilities, as well as through 3rd party products.

    As microprocessors led computing mainstream, non-mainframe computing systems gained prominence and tape became relegated to secondary storage.  Native, open source, and commercial backup and recovery utilities stored backup and archive copies on tape media and leveraged its portability to implement disaster recovery plans.  Data compression increased the effective capacity of tape media and complemented its power consumption efficiency.

    All quiet on the western front

    Backup to tape became the dominant methodology for protecting application data due to its affordability and portability.  Tape was used as the backup media for application and server utilities, storage system tools, and backup applications.


    Backup Server copies data from primary disk storage to tape media

    Customers like the certainty of knowing where their backup copies are and physical tapes are comforting in this respect.  However, the sequential access nature of the media and indirect visibility into what’s on each tape led to difficulties satisfying recovery time objectives.  Like the soldier who fights battles that seem to have little overall significance, the backup administrator slogs through a routine, hoping the company’s valuable data is really protected.

    B2D phase 1

    Backup Server copies data to a Virtual Tape Library

    Uncomfortable with problematic recovery from tape, customers have been evolving their practices to a backup to disk model.  Backup to disk and then to tape was one model designed to offset the higher cost of disk media but can increase the uncertainty of what’s on tape.  Another was to use virtual tape libraries to gain the direct access benefits of disk while minimizing changes in their current tape-based backup practices.  Both of these techniques helped improve recovery time but still required the backup administrator to acquire, use, and maintain a separate backup server to copy the data to the backup media.

    Snap out of it!

    Space-efficient snapshots offered an alternative data protection solution for some file servers. Rather than use separate media to store copies of data, the primary storage system itself would be used to maintain multiple versions of the data by only saving changes to the data.  As long as the storage system was intact, restoration of prior versions was rapid and easy.  Versions could also be replicated between two storage systems to protect the data should one of the file servers become inaccessible.


    Point in Time copies on disk storage are replicated to other disks

    This procedure works, is fast, and is space efficient for data on these file servers but has challenges in terms of management and scale.  Snapshot based approaches manage versions of snapshots; they lack the ability to manage data protection at the file level.  This limitation arises because the customer’s data protection policies may not match the storage system policies.  Snapshot based approaches are also constrained by the scope of each storage system so scaling to protect all the data in a company (e.g., laptops) in a uniform and centralized (labor-efficient) manner is problematic at best.


    Writes are captured and replicated for protection

    Continuous Data Protection (both “near CDP” solutions which take frequent snapshots and “true CDP” solutions which continuously capture writes) is also being used to eliminate the backup window thereby ensuring large volumes of data can be protected.  However, the expense and maturity of CDP needs to be balanced with the value of “keeping everything”.



    An offer he can’t refuse

    Data deduplication fundamentally changed the affordability of using disk as a backup media.  The effective cost of storing data declined because duplicate data need only be stored once. Coupled with the ability to rapidly access individual objects, the advantages of backing up data to deduplicated storage are overwhelmingly compelling.  Originally, the choice of whether to deduplicate data at the source or target was a decision point but more recent offerings offer both approaches so customers need not compromise on technology.  However, simply using deduplicated storage as a backup target does not remove the complexity of configuring and supporting a data protection solution that spans independent software and hardware products.  Is it really necessary that additional backup servers be installed to support business growth?  Is it too much to ask for a turnkey solution that can address the needs of a large enterprise?

    The stuff that dreams are made of



    Transformation from a Backup Appliance to a Recovery Platform

    Protection storage offers an end-to-end solution, integrating full-function data protection capabilities with deduplicated storage.  The simplicity and efficiency of application-centric data protection combined with the scale and performance of capacity-optimized storage systems stands to fundamentally alter the traditional backup market.  Changed data is copied directly between the source and the target, without intervening backup servers.  Cloud storage may also be used as a cost-effective target.  Leveraging integrated software and hardware for what each does best allows vendors to offer innovations to customers in a manner that lowers their total cost of ownership.  Innovations like automatic configuration, dynamic optimization, and using preferred management interfaces (e.g., virtualization consoles, pod managers) build on the proven practices of the past to integrate data protection into the customer’s information infrastructure.

    No one wants to be locked into products because they are too painful to switch out; it’s time that products are “sticky” because they offer compelling solutions.  IDC projects that the worldwide purpose-built backup appliance (PBBA) market will grow 16.6% from $1.7 billion in 2010 to $3.6 billion by 2015.  The industry is rapidly adopting PBBAs to overcome the data protection challenges associated with data growth.  Looking forward, storage systems will be expected to incorporate a recovery platform, supporting security and compliance obligations, and data protection solutions will become information brokers for what is stored on disk.