Four ways to beat data bloat
There are ways to shed some of those unwanted terabytes
By Rick Clark | Network World US | Published: 19:20, 19 January 2012
It is estimated that unstructured data - everything from email to spreadsheets, documents and digital media - accounts for at least 90% of an organisation's data. You systems are bloated with everything from personal iTunes playlists to the early versions of that PowerPoint presentation you delivered in March. To make matters worse, analysts at Gartner and IDC predict that data growth in IT organisations will grow by as much as 800% in the next five years.
Corporations can fight information bloat by using tools that provide a file-by-file inventory to identify files that are duplicate, unused, infrequently accessed or violate policy. In short, there are ways to shed those unwanted terabytes. Here are some tips how:
1. Inventory your technology
Most companies don't know how big their problem is. They can't tell you what file content they have, how much exists, who created it, what resources it is consuming or how much data is duplicated. When we first begin working with an enterprise, we typically find an average of 50%-60% of any given organisation's NAS data has not been viewed in several years.
Since many view the task of sifting the bad data from the good as too daunting, the problem just gets worse. Traditional, manual profiling is difficult and expensive. As such, profiling is done infrequently - sometimes annually - making it impossible to understand the data's impact to the corporation and its storage resources.
Before you can identify wasted files, re-tier storage or trend on storage usage patterns, you need to understand your current capabilities and decide what tools you need to be successful. There are several that offer varying degrees of visibility into some or all of your unstructured environments.
Native array monitoring tools often stop at array capacity and can't provide file-level information, such as when the file was last accessed. Furthermore, this view tends to overestimate your true capacity, leaving you searching for budget to buy more arrays sooner than is truly necessary. Solutions that walk the file tree tend to be cumbersome and place a significant burden on your system, slowing down not only your visibility reporting, but potentially your network as a whole. These "boil the ocean" tools tend to take months or years to deploy and may force users to install agents to feed a relational monitoring database, which can weigh your system down and present scalability challenges.
More lightweight solutions can be deployed in a matter of weeks, not months, and work without the use of agents. Some use a purpose-built database to collect file metadata (versus the complete file). This enables them to characterise and report on billions of files at 10x to 100x faster than a standard relational database. Many of these solutions can be paired with a data mover or user script to implement removal, archiving or re-tiering of data.
2. Identify inappropriate files
End user files make up a hefty portion of an organisation's unstructured data and often include several duplicate versions stored in multiple locations. From employees' personal files of photos, videos, playlists and potential viruses, to outdated versions of old documents, virtually every company is storing these files in some form or another. These files are resource hogs. You need to put a priority on identifying the files so they can be removed or re-tiered - and their storage reclaimed.
Data owned by employees who are terminated or reassigned also represents a security concern. These files should be quickly identified and quarantined or archived to ensure corporate guidelines on data ownership and retention are maintained. If your data centre needs a diet, make sure your administrators are able to aggregate and analyse file information by file type to make their roles as watchdogs for enforcing corporate retention and unauthorised usage policies more efficient.
3. Re-tier your data
Make sure you understand the value of the data you're holding on to. IT organisations are often shocked to learn they can save millions through automated tiering of less-than-critical data. The sheer growth of data makes it critical to store unstructured data on the most cost-effective storage tier throughout its life cycle. Tiering criteria include the availability, security and reliability needed for those files. With the annual cost of owning Tier 1 storage running around $8,000 per terabyte, it's vital that only the data that is most critical to business be stored here.
Once you've identified useful information such as the last access dates, you can determine the "value" of individual files to an enterprise and move data to specific storage tiers based on the value of the data. Furthermore, once each file's value is determined, administrators can establish and monitor data storage policies. These enable administrators to set up automated tiering policies that streamline the process and help ensure that only the most critical information is stored on pricey Tier 1 storage.
4. Implement trending protocols
It's important to monitor the information you collect to see how your data grows and changes over time. Capturing the changing impact of data on your corporation - or trending - gives you a greater understanding of the resources the data is using. That knowledge enables you to set actionable policies to better manage that data.
Depending on the tools you use, you'll be monitoring your storage environments more often. You want to be sure you avoid tools that only aggregate the file level information and do not make file level information accessible to the user, as you can't trend on what you can't see. Imagine being able to aggregate file level information and see trends in this data over time, rather than being limited to "point-in-time" views.
In short, know your data. Unbridled data growth leaves sensitive and non-corporate files hanging in the lurch, representing potential security risks. Identify duplicate end user files, their locations and owners so you can consolidate inactive files or old versions and remove duplicate files to free up storage resources for business critical information.
If you follow these tips, bulky NAS volumes can finally shed some weight.
Rick Clark is president and CEO of APTARE, who supply storage and storage solutions for enterprises. While this vendor-written tech primer has been edited to eliminate product promotion, readers should note it will likely favour the submitter's approach.