Data Masking is Becoming a Must for Many Organizations

ODSC - Open Data Science
5 min readFeb 2, 2024

The data masking market is estimated to be worth $435 million by 2025, growing at a compounded annual growth rate of 15 percent (forecast period: 2020–2025). This growth is largely driven by the emergence of data privacy laws and regulations in different parts of the world. Even AI training models, which are massive data consumers, are attracting the scrutiny of policymakers and regulators.

Data masking is one of the key solutions organizations use to continue using data without violating privacy and confidentiality requirements. It is inevitable for new data obfuscation technologies and products to emerge as the need for data privacy grows.

Anonymizing data

Data masking refers to the process of transforming data into a version that does not reveal actual details. It entails the creation of a fake but realistic and functional version of data to protect sensitive information. The data is made accessible to a broad range of users or the public in general, but it has been morphed to avoid revealing real-world information.

Names, values, and other details that can be traced to real people, organizations, products, or events are modified white retaining their statistical relevance. Their formats are maintained but the specifics are transformed in ways that cannot be reverse-engineered or decoded.

Data masking involves several techniques, one of which is character shuffling, wherein the characters in information strings are randomly rearranged. Another is the substitution or the replacement of some or all characters according to a predefined scheme. Encryption may also be employed in data masking with data ciphered and decrypted only when needed by authorized users. Data may also be nulled out or become unreadable in certain situations. Moreover, data can be made anonymous through value variance (the replacement of values by a function) and pseudonymization.

EVENT — ODSC East 2024

In-Person and Virtual Conference

April 23rd to 25th, 2024

Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.

REGISTER NOW

Use cases

Data masking is employed in situations when real data is not needed. The data is transformed to an anonymous form but it is still representative of real-world situations or realities. In some cases, data is dynamically obscured or redacted so users can see the context of the data presented but are unable to view the actual data.

AI development: There is a need for synergy between AI and data privacy. AI developers cannot cavalierly use whatever data they can find to train their models. They need data masking to ensure compliance with existing and upcoming regulations on data use.

Training and education: It is possible to train employees or impart knowledge to students without sharing sensitive data. Data about consumers, business secrets, products, and various other entities can be masked without altering their underpinning characteristics and relevance.

Product development and testing: Companies use consumer data to come up with new products. It would be imprudent to expose consumer data in the process of developing and testing products, so obfuscation is a must.

Analytics and business intelligence: Businesses collect huge amounts of data to be used in spotting trends, forecasting outcomes, and supporting decision-making. These data may include personally identifiable information (PII), hence they should be masked effectively.

Sales demonstrations and pitches: Presenting sales demos or investment pitches entails the presentation of a wide range of data to potential clients or investors. Data masking enables the presentation of obfuscated but realistic and sensible data without exposing sensitive information.

Outsourcing and collaboration: Outsourcing jobs to third parties is a common practice among businesses nowadays. To avoid sharing confidential information, it is advisable to implement data masking. This is also important to prevent threat actors from finding opportunities for data breaches.

Many organizations are involved in at least one of these use cases. Some routinely undertake product development and testing, especially software developers. Most businesses conduct regular employee training. Also, many organizations regularly hold sales demonstrations and investment pitches and perform business analyses. In all these situations, data masking provides a convenient solution to ensure that sensitive information is not shared unnecessarily.

Regulatory and legal requirements

If the above-mentioned use cases are not enough to convince organizations to use data masking solutions, here is something that will: data laws and regulations. By far, the most compelling reasons for data masking are the regulations and laws concerning data.

In Europe, for example, data use is governed by the General Data Protection Regulation (GDPR). This law does not explicitly state that data masking or other similar technology is required. However, it stresses the importance of data privacy and protection. As such, organizations that handle data are obligated to use all means necessary to prevent the exposure of private information.

Also, GDPR emphasizes the idea of data minimization or the need to make sure that organizations only use the private data they need for a specific purpose. Data masking helps in this regard by limiting the exposure of private or sensitive data. Also, GDPR has a provision for the security of personal data processing as well as pseudonymization. Both of these are challenges that can be eased with the help of data masking.

Meanwhile, in the United States, several laws suggest the significance of technologies like data masking. The Health Insurance Portability and Accountability Act (HIPAA), for instance, has provisions that indicate the applicability of data masking. The “De-identification Standards” 45 CFR 164.514(a)-(b) and “Security Rule Requirements” (45 CFR Part 164, Subpart C), in particular, require the use of techniques to ensure the anonymity of shared data, for which data masking is a convenient option.

The United States is also in the process of developing an AI regulation law. It has already published an “AI Bill of Rights” blueprint, which empowers individuals to protect their data privacy and demands organizations that collect and store their information to responsibly handle their data. Organizations can turn to data masking to comply with the potential requirements of the upcoming regulation.

Again, existing and proposed regulations do not expressly mention data masking or other specific data protection methods or technologies. However, it is quite clear that data masking has a big role to play in meeting regulatory requirements for data privacy and security.

Growing importance of data masking

The market for data masking solutions is still in its fledgling phase. There are more innovative solutions to address evolving data privacy and security needs. Notably, organizations that need data masking are not limited to those traditionally associated with high levels of data handling such as AI companies and server operators. Even small online stores will eventually have to employ data masking solutions as they perform tasks such as business analysis, sales pitches, and employee training. Data masking is increasingly becoming important for a wide range of organizations of different sizes and in different industries.

About the author: Hazel Raoult is a freelance marketing writer and works with PRmention. She has 6+ years of experience in writing about business, entrepreneurship, marketing, and all things SaaS. Hazel loves to split her time between writing, editing, and hanging out with her family.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.