AI Series – Part 3: Data Collection and Preparation
3) Data Collection and Preparation
From Data Analysis we come to Data Collection and Preparation. First you need to collect and bring the data into a central depository with the goal of developing a unified database. Data unification is our fundamental challenge here.
Before we dive in, let’s begin with a few quotes from Vin Vashishta‘s new book, ‘From Data to Profit’. “There is no finish line”, “Continuous transformation is a framework to manage the evolutionary nature of technology.” This mindset directly supports the concept for creating antifragile systems that evolve and become stronger over long periods of time. By doing the work for the long game you can uncover trends with customers, industries and assets and see how things are evolving so you can see the future. So how are you going to set up your company’s transformation framework to handle all the new data requirements? How is it going to become stronger over time as your company and the industry evolves? (Note: CoE and Governance.)
Some years ago I was sitting in a room for a week with a team of analysts pulling, transforming and aggregating data from 7 disparate systems. In another project we were tackling large SAP data sets with strict medical governance challenges which lasted the length of the project. While this is heavy lifting, it is never an easy task regardless of the quantity or quality of your data. Take a deep breath and prepare your teams for some serious work.
Data Collection:
I really enjoy this part of the process. It’s heads down straight into all the systems to begin process value mappings. I like to think of it as detective work. I typically work forwards and backwards to identify the systems so I know what I’m dealing with and the process lifecycles they map to. Here is a key takeaway – this is the time to make things better by eliminating and streamlining processes and balancing out the org as a unit (MVC). Ex. If you see an heavy object (too many actions) you may want to drop these lower to disperse the weight and drive quicker value per request. If a process or model has too many connections or touch points, bring them down. So many examples here.
This is the time to ask a lot of questions but I like to have 5-7 that I repeat daily with my clients. Here are a few I use to make sure I/we am looking at things in the best manner:
– Is there a better way of executing this information or action?
– Why are we doing it this way?
– Why is this important to you? (*my favorite)
– Can we automate this process?
Go to the drawing board and really challenge the current design.
Data Preparation:
Clean the data
– ie. No Duplicates
– Delete the bottom low value records, processes, triggers, etc – clean house! Go for 10>30% if (ie. unused) in 2-3 years, streamline associated processes and tables. Salesforce provides so much support for these actions.
– Tools: Trifacta, OpenRefine, DataRobot
Apply Standardized Data Protocols (*CoE)
– You should have figured out via your new Data Mgmt CoE how to execute and automate as much as possible on this task with governance, protocols, standards, tools, team and execution owners and processes to name a few.
Transform the data.
– Ex. date, time transformation.
– Metadata please…
Aggregate:
Bring it all together to identify how users like Field Techs, Sales, Service and Marketing users can have the data they need to service their customers. Getting the right product or part to the right customer or asset at the right time and in the right way (skills, tools, dynamic checklists, etc.) for a one and done success rate. One brain, many fingers working in unison.