With the cost of operating and managing storage going skyrocketing due to huge growth data, and regulatory requirements driving companies to store data longer and longer, its difficult to understand how you can manage backups that scale.

The answer in the past was always to offload data to tape to archive it long term. This allows many organizations to reduce the need for storing backups locally while giving them a checkbox on the regulatory list that your data is safely offsite.

Well, times have changed and there are now technologies that are being Enterprise options down to small businesses. However, in order to keep your cost of Continuity to a minimum, it’s important to know when to use certain technologies over others.

They say nobody loses their job by doing daily tape backups until the backups don’t work and your faced with trying to restore critical data and have corrupted or non-tested tapes that error out when you perform the restore. So how can you minimize exposure to this risk, improve your recovery times and reduce cost at the same time?

A Common Use Case

Before we get into the weeds, let’s talk about a use case. Say you have HIPAA compliance requirements that state that you have to keep one years of backups of your file server or database because of very sensitive patient information. Many organizations will simply do nightly backups and move older backups (say 30 days or more) off to tape.

Let’s Talk about Tapes (Old School)

In order to understand the differences between secondary storage and tape, you have to understand the process that is used. So for decades, people became comfortable with the process that ensures them the following: (1) The data is backed up. (2) It’s securely transported from onsite to storage and back. (3) has user-defined retention and (4) a recall process for restoration.

Now here are the problems with tape. First, its prone to error. Unless you constantly test your backups, you could get caught out there with corrupted or invalid tapes. The host tape drive requires periodic maintenance or you will have issues.  Then a person has to handle the tapes which open risk.

Second, for longer retentions, you have to manage a large number of tapes. For example, if you have 50T (terabytes) of data on a 1-year retention that needs to be put on tape, that could be as much as 400 LT05 tapes and then you have to factor in replacements.

So although its a trusted and relatively cheap way of protecting data, it seems to be outdated and continues to be error-prone. That said, people are comfortable with this and it continues to be a very practical way of storing data short and long term.

Secondary Storage With Deduplication (New School)

Cheap storage doesn’t always translate into better backup management. Remember the process just mentioned. So to move it to a digital, automated process a technology would need to do the following:

  1. Improves scale to of backups. i.e allow more backups with the same amount of storage.
  2. Provides high-speed replication of backup data offsite.
  3. Enables the ability to recall backups from long-term offsite storage quickly and easily.
  4. Reduces human intervention in the recall process to near zero.
  5. Reduces overall cost as the backup volume scales up.

The only technology that meets all of these is deduplication storage appliances. These appliances come in a wide variety of configurations and support for different types of replication strategies. For example, many deduplication technologies have a built-in appliance to appliance replication features. Others have support for AWS S3 or Azure integration. Therefore long-term data storage of backups can be moved out to the cloud with very quick retrieval. Why is this important?

Most modern backup technologies have the ability to move to a tiered backup strategy. For example, you may want locally 7 days of backups but 1 year offsite. With HIPAA compliance it could be as much as 7 years.  With a deduplication appliance, you can store more locally without the need for adding a lot of disk space. Then offload longer term backups to the cloud like S3 which is crazy cheap. Therefore a similar process can be achieved with modern backup software and deduplication technologies that traditional tape offers. So whats the more cost-effective option overall?

What’s REALLY Cheaper?

The rule of thumb is, the more backup data you have the easier it is to justify moving to a fully digital process from a cost perspective. Typically though the cost of tape is relatively low, however at scale, the cost of digital outperforms tape. Here is the reason why.


50T Stored on Tape 1 Year Retention

4 (Weeks per Month) x 24 96
7 (Daily) @ 1% change rate 1
12 (Monthly) X 96 288
1 (Yearly) 24
Replacement 5% per year 20.45
Total Tapes: 409
Tapes offsite 312

There are many vendors in this space. After looking that the pricing for the major players, will say for the sake of argument the average cost for a 20T to 24T appliance is 30K (some a little more, some a little less) with a three-year maintenance agreement.

Also, an average dual tape library costs 19K with a server with 50T of source data capacity. The OSs are standardized on a couple of versions of Windows.

In this example, you would have to rotate 312 tapes offsite continually. Each time human hands have to touch the tapes multiple times.


Ok, now here is where it gets really compelling. Using the assumptions listed above creates a simple to understand cost model for comparing the old school with the new.

Cost Model for Secondary Storage vs Tape (3 years) Dedupe Appliance (New School) Tape (Old School)
24T Appliance including Maintenance $30,000
– Replication to AWS S3 (.02 USD per Gig, 5:1 Reduction Ratio) $7,500
Tape System (2 LTO5 SCSI Tape Drive Library [24 Tapes]) $19,800
– Tapes for a Year (409) $26,500
– Tape Replacements (@ 5% per year) $2,660
– Tape Rotation to Offsite Storage (1 USD per Tape) $11,200
COST OF OWNERSHIP: $37,500 $60,160
COST PER GIG: $.75 USD $1.20 USD
TOTAL STORAGE CAPACITY: 100-120T (Post Dedupe) 50T


Here is the upside. When using a deduplication appliance, there is room for growth without further capital investment. Notice with a 5:1 reduction ratio, you can assume about 100T in source data capacity. So in the example above, you will only store about 10T post-dedupe assuming a 50T of source data.  Of course, as you scale, your price per gig comes down substantially. In this case, it could come down by over half ($.36 USD per gig or less) at scale.  If your looking at this on a monthly bases, $.75 per gig reduces to $.02 cents per month. This is not the case with tape that will only give you at most 2:1 with compression of source data and even then you can only use 70% (due to overhead) of the tape to store data. If you factor in growth, then the cost goes up as you will need to add more tape capacity. Therefore the reduction ratio has a big part in your overall cost per gig.  I used a conservative reduction ratio of 5:1. However, I have seen (at scale) reduction rations go into the teens. This has a profound impact on your cost per gig and allows for even greater scale.

So the results are pretty compelling. This cost model was calculated using a major deduplication vendor technologies (industry-leading vendors as a baseline). Further reductions in cost can be realized by using open source deduplication technologies which are for free. However, if you go with an open source technology, do your research as there are pros and cons to doing so and make sure you at buy support! Windows Server has built-in deduplication and its free with your Windows Server license.

Here are a few final things to keep in mind:

First, in this example, I used AWS S3 for offsite storage of the dedupe store. However, there are other vendors with an even more cost-effective rates per gigabyte with S3 gateways which could further reduce the price per gig using deduplication. So it pays to shop around if you’re really trying to optimize offsite storage costs.

Second, you can combine different deduplication options together on-premise and in the cloud to optimize costs. If you have multiple office locations, you can use those as targets for offsite backup replication.

Third, if you go with a major vendor, also consider leasing the appliances. This could further drive down costs as long-term leases spread your cost out further and reduce your price per gig.

The Bottomline

This article was intended to justify the move to a secondary storage with deduplication as a method to eliminate the use of error-prone old-school processes through a simple cost justification using major vendor numbers. However, since there are so many different options available and methods of deployment, consult an experienced BC/DR professional to architect the best solution that will meet your needs. As they say, no matter how you slice it and dice it using a secondary storage appliance with deduplication is compelling in terms of operational efficiency and cost reductions. With a little bit of research, you can have Enterprise level protection on a small business budget.

Comments are closed.