Dark Data: Out of Sight, Out of Mind

Elizabeth Swabey-Keith · Wednesday, March 23rd, 2022

Businesses are creating and collecting more data than ever. Far too much of this data is saved and stored – on file shares, network drives, and cloud services – “just in case” it happens to be needed later. Shockingly, upwards of 70 percent of this data holds no business value. We know the saying “out of sight, out of mind” in that if we don’t see something for a period time, we stop thinking about it. The same holds true for the data organizations collect, store, and then inevitably forget about. This stored, forgotten, data is known as “dark data”.

What is dark data and why it is a problem?

Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” Organizations often collect vast amounts of data with either the intent to use it later or for compliance purposes. But most of it gets lost and forgotten about, with no way to search for it, as it isn’t classified. Surprisingly, dark data accounts for the majority of most organizations’ universe of information assets and is redundant, obsolete, or trivial.

While holding onto this type of data may seem harmless, it can lead to problems such as:

Security breaches – Since dark data is unclassified by nature, any employee potentially has access to it. This opens up the risk of users, either accidentally or maliciously, leaking sensitive information.
Compliance violations – Holding onto data for too long or not being able to find it when required can put an organization in violation of privacy regulations.
Excess storage costs – Dark data wastes space and maintenance if stored on internal servers, or wastes bandwidth if it is stored in the cloud.

Therefore, what you don’t know can hurt you! Holding onto dark data means the risk of a data breach is always imminent. Data breaches are hugely damaging to an organization, with big fines and business reputation at stake. This may be amplified if breached data contains dark data, especially information that should previously have been deleted.

In addition, regulatory requirements and governing bodies are cracking down on the time period in which organizations have to report data breaches. For example:

Both GDPR and the ICO require organizations to report a breach within 72 hours of being aware of it taking place
The SEC has proposed an amendment to require organizations to report data breaches within 4 days

Resolving and reporting a data breach is a huge undertaking in itself, but when nothing is known about how the breach happened, what data was leaked, and where it went because it was dark data, this task is practically impossible, especially within the given amount of time.

When to get rid of data vs when to hold onto data

Generally, data should be kept only if it is actively being used, and then deleted after it is not actively in use. However, many industries such as finance, health, and government require data to be kept for a certain amount of time for compliance purposes. While organizations may think that simply storing the data will keep them compliant, there are also regulations regarding when data should be deleted.

The amount of time data should be kept before being deleted and under what circumstances varies depending on the regulation for that state or country. For instance, under the GDPR, individuals’ data must not be kept for longer than it is needed and the reason for keeping it must be justified. Individuals also have the “right to be forgotten”, meaning all data pertaining to that individual must be erased when requested. There is a fine line to walk here, because while an organization can be in violation if data is not held onto for compliance reasons, they can also be in violation if data is kept longer than that needed.

Identify and classify data that needs to be kept

The best way to meet compliance regulations and prevent data breaches is to use a data classification solution to identify and classify dark data so it isn’t dark anymore. When classifying data, in addition to labeling it and deciding who can access it, organizations can also set data retention policies to ensure certain categories of data are automatically deleted after a specified date. This is especially helpful for compliance purposes as it ensures that the data is kept until it then isn’t needed anymore, meaning compliance regulations can be met without user involvement. Data classification solutions can also identify and classify data uploaded to cloud, enabling organizations to track their data wherever it goes.

Data is a company’s greatest asset, but also its greatest risk. By using data identification and data classification solutions, organizations can separate the assets from the liabilities, while helping to define what and where their data is, who has access to it, and how to protect it.

This post was first first published on Titus website by Elizabeth Swabey-Keith. You can view it by clicking here