Data at rest versus data at creation: It’s not a choice you should make

Home » Blog » Data at rest versus data at creation: It’s not a choice you should make – secure them both

Jamie Manuel · Thursday, May 28th, 2020

As part of your defense against an external cyberattack, you’re ready to tackle the enormous task of securing all the data sitting on your servers, desktops and external drives.

If you’re like most organizations, you’ve got tons of it because no one throws anything away these days. And if you’re working at an enterprise, it’s an exponentially larger job.

Protecting the data at rest within your organization is definitely an important step in an overall security strategy, but it’s not necessarily the most critical.

Consider how many employees you have in your organization.

On any given day, how many documents and emails does each employee create?

And of those, how many contain either sensitive personal information belonging to your employees and customers or proprietary business data?

And how much of that information is being sent outside of your organization’s walls — either on purpose or accidentally?

The ever-growing mountain of data at rest

For many organizations, the idea of classifying data at rest might seem easier than trying to classify data in motion. But we are creating new data at a mind-boggling rate.

Today we will create twice as much as we created yesterday.

Your organization will never catch up if you only focus on data at rest.

Applying security and privacy policies for data at creation and in motion requires getting all employees on board and training them how to use your data security technologies, which might feel daunting.

Indeed, training your users in best practices can take time. They need to understand why information privacy and security is important. They need to learn your data privacy policies and be able to apply them on the fly.

A lot of planning must go into launching a data classification strategy that considers data at creation as well as data at rest.

Identify the requirements for your business

To begin, business leaders and stakeholders from across your organization need to establish information handling guidelines and roll out communications to educate users.

If you are going to implement data identification and classification technologies, you need to understand what your business requirements are and how your various security technologies can help.

What are the levels of classification you might need to consider?

Public?
General business?
Confidential?
Internal only?

Every business — and often every department within an organization — will have unique needs, so labels cannot be one-size-fits-all.

Your users need to understand which labels apply in a range of situations.

Identify your data

In addition, the data security story is getting more and more complicated. Rather than simply lock down all of your data at rest within your organization, you need to identify sensitive information and then apply policies and actions that classify and protect it according to your specific policies.

Some information needs more protection than other information, and some doesn’t need to be classified at all.

With new privacy regulations in some regions, the “right to be forgotten” allows individuals to request to see all of the information your company has that’s related to them. They can also request that all of their information be deleted within 30 days.

That data may be secured on your servers, but you may not be able to quickly find it for deletion if it hasn’t been identified and labeled as sensitive.

Ultimately, you need to address the data that’s resting in storage, new data that’s being created day to day and all data as it moves within and outside of your organization.

To do all of this, you need integrated technologies that work together to apply your security and privacy policies.

What types of technologies can help?

Data identification technologies enable you to define the data at rest within your organization, identify who has access to it and understand how to protect it. These technologies examine and automatically classify your files.

File analysis capabilities further gather details about each file to build a data inventory you can use for analytics across your organization.

The most robust solutions also allow you to apply classification metadata to enhance the ability of your encryption, data loss prevention, electronic rights management, and other security technologies to apply the appropriate controls.

When implementing a data classification system, you need to ask yourself, which information is most sensitive? And why?

With data at rest, it’s not uncommon for users to over-classify it just to make sure they are protected.

The problem is that when someone later wants to open a document that’s been classified as Highly Confidential and encrypted, they may not be able to open it. At that point, the information loses its value completely because it can’t be shared or used.

If your organization previously over-classified your information, data identification and file analysis can help you evaluate whether your data is classified at the appropriate level.

A first quick scan of your data using identification technologies provides high-level details such as how many documents you have resting within your organization, what types they are, when they were created and modified, who owns each one and how big the files are.

A second scan using file analysis offers a deeper level of detail to help you understand where your most sensitive information resides. This level of scanning for privacy takes more time, so you have to use a more targeted approach.

Instead of scanning all of your data at this deeper level, you can use the information gleaned from your first scan and break your data down into manageable chunks, according to file type or other parameters.

You may not need to run a deeper privacy scan on all of your data. For example, you wouldn’t need to scan files containing only JavaScript source code but you might want to scan all Excel spreadsheets.

Use deep-learning technologies to discover privacy data

Best-of-breed file analysis solutions use the latest deep learning technologies to discover personally identifiable information (PII) within unstructured data, enabling you to protect sensitive files and emails.

These intelligent data security technologies enable you more accurately categorize your data by scanning individual words as well as the context surrounding them.

When sending an email that contains PII, these types of solutions can warn users before the email is sent. Users can then either send it anyway, send it with encryption, classify the email as Confidential (if you also have a classification solution in place), or remove PII from the email before sending it.

A comprehensive solution offers both sophisticated data identification and classification capabilities, including a highly flexible policy engine driven by machine learning and extensive security ecosystem integration.

You get the flexibility to tailor a data classification solution to meet your organization’s specific needs, both in terms of the underlying schema for how the technology is built and in terms of policy customization.

These types of solutions give your data context to help your people and systems understand how to handle all of your information. Metadata attributes — with unlimited detail and contextualization — are embedded into email, documents and files at every stage of the content life cycle.

Based on flexible and customizable classification metadata schema, some of these solutions can also automatically add visual markings to documents to help you meet compliance and legal requirements.

A key benefit of applying rich and persistent metadata is that it enables many other security technologies to trigger information handling actions based on your policies, making your entire security ecosystem more accurate and effective.

This post was first first published on Titus website by Jamie Manuel. You can view it by clicking here