Garbage In, Garbage Out: How data hygiene is the foundation for good metrics
A recap of our Program Manager Collective conversation
Data hygiene seems honestly kind of boring, as mundane as brushing your teeth. But as a former Bain consultant, I can tell you that I realized how important this topic was when:
a CEO or senior executive suddenly wanted to zoom in on a metric and suddenly I had to explain every step of my analysis [mistakes could then become high stakes!]
someone asked me to send them my Excel or Google Sheets analysis and I realized it was incomprehensible to anyone else because I hadn’t “cleaned it up”
finding out I kind of have to redo analysis for an Annual Report; and then finding some discrepancies in what I reported six months earlier because I was a little sloppy with how I structured my columns on the fly
Last week, we recently hosted an insightful webinar on Data Analysis for Program Managers. We covered two topics, but for this one I’ll dive deep into "Data Hygiene and Data Collection Best Practices" featuring expertise from Marin Camille Hood (a nonprofit CRM consultant + former Sierra Club program leader) - see her post about the experience here!
When I was managing programs, I found I had a lot of control over my program data....but had NEVER been trained on how to systematically approach it. Especially when it came to tips and tricks for setting up my data so I didn't cause a ton of manual work for myself later.
This is pretty common. Data intake is often in the hands of a potentially very junior program manager. It all starts with how a survey or registration form is designed. And based on that, future analysis can become really really hard and painful, or slightly less painful.
My favorite tips focused on data hygiene + data collection
Constraining your data: Marin highlighted the frequent errors organizations make when designing intake forms and surveys, emphasizing the importance of setting up proper validation rules (e.g. “an email or Zipcode must be in this format”) plus
Avoid student or company emails as unique IDs. Those are highly changeable. Unique ID’s could be a personal email or a combination of email + first name + last name + zipcode (what happens if you have 3 Steve Smiths?)…OR a system-generated unique ID.
Note - a lot of folks using Airtable and spreadsheets realize at this point how challenging it is to truly store data against the same person year over year. The participant record keeps duplicating, from one table to another. This is one huge reason a CRM like Campground could be advantageous vs. Airtable.
Document your standards: When deciding how certain fields will be captured, like demographic data, document those decisions and any logic / background information about how those decisions were made.
Pro Tips for Data Management: Marin also stressed the importance of saving raw data files BEFORE manipulating it. (I learned this the hard way many a time.)
Excel / Google Sheets Tips: Conditional Formatting, De-Duplication, and learning to use VLOOKUP in spreadsheets can be a game-changer. Naming conventions for files and consistent use of tools like pick lists can make data analysis easier.
When surveys start coming in, EYEBALL THE RESPONSES! Then, you can adjust quickly if you’re missing any important questions, or if the formatting is off.
Final tip: You own your own data as a program manager. Don’t be afraid to acquire this skill. It’s not the job of a Data Team or the higher ups to determine how you handle your data — ultimately, mistakes, insights, opportunities will be attributed to your program, and that depends on you getting control of your own data. This means teaching yourself via YouTube tutorials, whatever it takes. I still google VLOOKUP ….JUST to be sure I’m doing it right ;)
Feel free to reach out to Marin Camille via LinkedIn for further insights.
Stay tuned for more events and keep practicing good data hygiene!
-Sruti
