Data Retention

Kent English, Director of IT at Doña Ana County

Kent English, Director of IT at Doña Ana County

Too Much Information

How much information is too much information? At some point maintaining more data than is actually required can become an expensive burden, in both time and money. Consider if you were storing an extra ten (10) years of data and experienced a security breach. The threat exposure could be exponentially greater considering the potential financial obligation, damage to reputation, and the resource hours needed to unlock or recover the data. Suppose you receive a public records request for all information on a particular subject. All releasable documents in your possession must be provided even if beyond the legal requirements for retention. If all the

information was stored digitally, then a search wouldn’t take significantly longer but that storeroom full of boxed records could present a problem. Even though computer storage is getting more affordable, it can still be expensive to keep adding capacity to it for growing historical data. Many older applications were not designed to purge data. They might support deleting individual records but provided no means to remove bulk information based on some age threshold.

Data Retention

A key component to any Data Management Framework is establishing a Data Retention policy. The minimum retention should be based on the legal and regulatory requirements of your organization. Some industries have their own mandates such as Government (FOIA), Healthcare (HIPAA), and Financial services (FINRA). Then consider the operational requirements to support your business. What data is needed to service your customers, forecast budgets, predict growth, and satisfy compliance? Some departments may have a strong business case to keep information longer, for example, to perform tax audits or fraud investigations. Sensitive data like Personally Identifiable Information (PII) and Payment Card Industry (PCI) information may require that it has a shorter retention to minimize risk. All these factors should be carefully considered when defining the policy.

Data Location & Categorization

Once you have defined your retention policy, the next step is locating the data. There are many places where it can be stored that would need to be considered, such as:

• Desktop computer internal / external drives

• Network shared storage / folders

• Company cloud drives

• On premise application servers (filesystems / databases)

• Cloud-based Application as a Service (AaaS), Infrastructure as a Service (IaaS)

• Backups

After the data has been located, you will need to determine the type of data being stored. Categorizing the information will help map it to the Data Retention Policy that was previously defined. It will also be helpful to identify the owner of the data and who else has access to it.

Data Clean-Up

The retention policy should also include directions for disposing of data. While applying the retention policy to the categorized data, you can start purging anything that is found to be out of scope. Security is also important for any process established for the destruction of data. Make sure that the disposal is compliant for each type of data processed. For ongoing maintenance, you can create rules or scripts that will automatically delete data based on the defined retention periods. If you also define where each type of data should be stored in your policy, it could help to reduce redundant copies of information and improve security. Most modern storage systems also include features for deduplication so that only one copy is stored and subsequent copies are included just by reference. The same technique may also be used for common blocks of data and not just entire documents. Using summarization can reduce the amount of historical information retained to just the important facts. Why keep something like daily timecard punches when all you may need is the total number of hours worked for the pay period?