[INDUSTRY BLOG] Preventing a Data Swamp: How RIM Best Practices Can Shape a Data Lake for Success
You have the data, now it’s time to make it valuable to your business. In Part One of this blog we discussed how many organizations are turning to Big Data and data mining tools to gain strategic insights into their business processes, consumers, products and more. Although data warehouses can provide immense value to your business, data lakes offer more in almost every aspect: more space, more speed, more usability, more answers, more value. Unfortunately, many organizations look at data lakes as a simple fix. Surprise: it’s not!
As a RIM professional, it’s probably no bombshell that an organizational structure is vital to the success of a data lake, yet many organizations still dive into building their facility without a clear plan in place. In order to effectively implement a data lake and start gaining value from your information, organizations must take strategic steps to create a system that works for them and their information.
Records managers already have the skills and experience needed to develop and implement a comprehensive content management procedure that creates consistency, clarity and convenience, which makes them a potential key contributor to the development of a successful data mining program
Planning is Essential
If your organization is considering data mining with a data lake, remember these RIM best practices:
Data does not organize itself.
While software makes it possible to organize large amounts of data, you must first understand how it needs to be managed and apply that within the environment. Start by identifying a group of questions that you may want to ask, then build or find the tools that will allow you to answer those questions. As you continue to input new data or discover new questions through analysis, new subsets of questions can be created, data indexed and sorted according to value and importance.
Management is key.
The same rules and regulations that apply to specific documents and information in the real world also exist in a data lake. In order to ensure compliance across industries and locations, you must know how to project the appropriate controls onto your data lake. This includes the protection of personal information like PII, enforcing privacy regulations specific to the country of origin and regulatory reporting of any pressing information that is gleaned from your data to the appropriate party.
In addition to monitoring and complying with regulations, you will need to set up security and access controls for your data lake. While data lakes provide shareable information and flexible, collaborative analysis, privacy controls, access controls and protection of PII and company sensitive information must be put in place, enforced and audited.
Out with the old, in with the new.
Just like physical records, most data does not have indefinite value. Over time, all data gets stale. Plus, the storage space in a data lake, while fairly efficient, is neither unlimited nor free. Retention rules are vital to optimize the value of your data lake, as well as the value of your results. Setting up retention schedules allows you to discard the older information that could skew the results of the analysis. While this concept and practice is well known to RIM leaders, not all parts of the organization have experience with or understanding of the importance of managing the information lifecycle. In that same context, some data may have significant longitudinal value even past a typical retention schedule and business context, and value will become the shared language for deciding on the retention model for different data sets and types.
Data mining allows organizations to collect and gain insights from many different sources; insights that could not be gained from one individual source. But, in order to get big value from your Big Data, you have to do it the right way.
While data lakes offer a more flexible, robust solution than data warehouses, the fact of the matter is that you still must organize your data to achieve the value you seek. Careful planning is essential to your success.
Traditionally, RIM leaders have been responsible for managing the lifecycles of all physical documents within the workforce. In today’s Information Age, records managers must continue to adapt their strategies. Physical documents are certainly not dead, and they could be a missing element of your Big Data strategy.
This is a great time for RIM leaders to engage IT and data science teams and work to unlock and leverage the data that exists in key physical documents. By embracing digital transformation, RIM leaders can consolidate information into one central repository, improve document access and retrievals, easily share files as needed, convert physical documents into valuable sources of information and remain competitive.