Wednesday, April 10, 2013

How do you create a data-driven organization?

Something that I've been thinking a lot about recently is how do you create a data-driven organization? A lot of companies play lip service to the idea but when it comes down to making decisions they end up being made by those that are more senior (by HiPPO: highest paid person's opinion) or, worse, loudest, based on gut instinct, experience or opinion. What does it take to create a company that makes evidence-based decisions and involves a broad swathe of employees vested in data capture, metric design and analysis?

I have recently taken on this challenge. A few weeks ago, I switched coasts and companies to head up the data team at Warby Parker in New York. This is a company that has been very successful to date and has grown very rapidly. So fast in fact that it has had little time, bandwidth or experience to put in place a centralized, standardized and scalable data infrastructure with professional business intelligence tools. Analysts are working at the limits of Excel and have great difficulties tying together different data sources across the company. What it does have, however, is a strong desire to change and is willing to provide resources and support to establish new data capture, analysis and reporting systems---and promoting the appropriate culture---that will take the company to the next level.

In this post, I wanted to set out what I've been thinking and how I've started to go about it. It is absolutely work in progress. I cannot guarantee that the following is the right approach or that it will go exactly as planned. Thus, I would love to hear how others have fared and what comments and suggestions that you might have.


Listen to people. Chat with the different heads of departments, analysts and other data stakeholders and listen to what they do, how they do it, the data that they work with and what they would like to do. Ask them how the data team can help. Identify tasks that are highly manual, repetitive and could easily be automated. Identify different data sources, pain points and aspirations. Asking about what they would like to do but cannot is as important as asking what they currently do.

Identify easy wins and build trust. While the rule is always under promise and over deliver, it is always good to identify easy wins that provide some immediate benefit and set up good will and trust. We were able to identify and implement a simple javascript/HTML tool that will save one of our teams at least 100 hours/year. While it was not strictly a data project, the cost to us was just 3 hours and that team now loves our data team and will likely be more accepting of interruptions to their work flow as we implement changes.


Identify workers with skills but not tools. One of our staff knows SQL well but has not had access to databases in his current position. Thus, he is relegated to working with Excel spreadsheets. Try to get those people the tools they already know how to use well and will use. There has to be some level of control here---you don't want too much tool / language proliferation and fragmentation---but if these will become core team skills or tools anyway, get them to those people early.

Identify staff hungry to learn. Identify any particular staff that are hungry for new tools and to learn new skills. These may be staff who are already taking statistics or data science classes outside work. Mentor them, work to get them tools they will make good use of. Send them to training. These people, as well as being more productive and happier, will become your prime advocates. They will be willing to mentor others and share their experience and skills.

Train and mentor. At a broader level, if all your analysts are using Excel, train them to also use SQL, R, python or some other skill to take them to the next level, some skill that will allow them to produce more detailed, insightful, automated analyses. Start with a small, motivated group and let them set a great example for others. Statistics is not only a set of tools for analysis but it also provides a framework for critical thinking, in this case about evidence. At Warby Parker, we are planning to send a reasonably large cohort of staff to statistics training this year. With great free online courses available now, this represents a relatively low cost to the company, other than employee time, but we expect the effect of having a significant fraction of the company thinking more critically, numerically and objectively, it will have a profound effect on culture and decision-making.

Carefully choose the right tools. Clearly, if you are introducing a new tool to a team or organization, make sure that it is the right one. It should perform the tasks you need, ideally with a easy to use interface but also with power-user functionality, be well documented and supported and in an ideal world be open source.


It goes without saying that you a need a robust, scalable data infrastructure.

Centralize data where possible. This is very company-scale dependent but try to create a data infrastructure to bring together all the different data sources where possible to allow a holistic view of your customers and business. For instance, make it easy to tie ad strategy to clickstream to sales to social etc. A particular solution may not scale indefinitely as the business and data grow. For instance, you may need to switch from a single MySQL instance to a hadoop-based solution in time but scaling issues are always good problems to have.

Create an open data warehouse. Create a data warehouse with broad access and easy to use tables. For instance, there may be some key concepts that are frequently used for analysis that require highly complex joins across multiple tables. For those, denormalize the tables to allow easy querying (as well as other housing benefits). There will be some data that are sensitive---credit card transactions, any medical or HIPAA compliant data etc---but favor openness wherever possible.

Automate where possible. If I think that I will need to do a task two or more times, I will attempt to automate it if possible. Whenever you think something is a one off, it almost certainly is not. By automating processes, you will free up future analyst time to focus on, you know, analysis.

Focus on the team's ROI. Like everyone else, a data team has limited time and resources. Focus on return on investment. Implementing two "good enough" solutions for problems in a week may pay higher dividends than one "almost perfect" solution for one problem. Beware of diminishing returns.

Suck down data now. Some data are potentially valuable or useful but ephemeral. For instance, Instagram photos are only available for a week or so before they disappear forever. Suck them down now as you will never know when or for what you might need them for later.  


The goals of the the above strategies is to capture the data and make it accessible. Now comes the fun part: analysis and reporting.

Design metrics carefully. They should be unbiased, deterministic and should reflect true measurable variables. They should be readily interpretable. They should reflect the business. Design or identify metrics that make the company tick. Think carefully about units. If you end up trying to compare apples and oranges, is there some common currency, such as dollars, that they can be converted to? For instance, if you improve operations and can ship product to customer one day faster, what is that worth? Can you assign a dollar cost to per customer / per day / per order ship time?

Remove redundancy. Dashboards should be information rich. Like building a statistical model, if you have two very highly correlated metrics you can consider one to be redundant and you may do better to remove it and increase the information density of the remaining metrics.

Tailor to your audience. In some cases, it may make sense to have multiple reports with different levels of details for different audiences. For instance, a manager may have a highly detailed report about their team and responsibilities, a higher level report goes out to her team and the C-level execs get the 50,000ft view, which if you choose the right metrics, will still be useful and relevant.

Pander to the C-level. In terms of driving a data-driven culture, impressing C-level execs with dashboard and reports that deliver huge value (and are not just eye candy) will almost certainly produce a trickle down effect. They will expect such reports and will provide resources to create reports commonplace. Create dashboards so relevant that C-level execs watch them like a hawk.

Identify metrics that map to the organization's core values. One of Warby Parker's core values is to deliver insanely great customer service. Thus, there are metrics that relate to that, net promoter score being one. Those metrics should be highly visible across the organization: in top level dashboards, on screens, on reports that get emailed out. 

Conversely, take out distracting metrics. One of Marisa Mayer's first actions when she took over Yahoo! was to take the share price off their internal home page. It is her job to worry about that but the rest of the org had been focusing on actions that tried to drive up the share price (unsuccessfully) and they had almost forgotten about the users and the value that Yahoo! delivered or should be delivering to them.

Where possible, tie those key indicators to other metrics that fundamentally drive it. For instance, suppose that the major component of dissatisfaction is late shipping then promote that metric in a top level dashboard.

Let the data speak. In some cases, a more machine learning or unsupervised learning approach may bring some surprising insights. For instance, many companies segment their customers using some set of subjective a priori criteria. Running unsupervised clustering may support those segment choices but it also may provide some interesting insight into new types of groups you might never have expected. Be open to findings that challenge your intuition or understanding of your business, market and customers. Be objective: if an A/B test shows a higher average order value but the results are not statistically significant accept that they are not significant. Never fish for significant results.

Let the interns speak. A data-driven organization should give the data a voice from wherever that derives. Thus, a new intern who has been analyzing data with a fresh perspective should be given as much voice and respect as a senior manager. Data are the senior partners here. Give people a voice, forum and opportunity to provide data-backed evidence.

Share the data and findings widely. A data-driven organization should share data and reports widely. This is not to say that they should be blasted to everyone as spam but those that are interested should have access. (Remember that interesting insights and alternative points of view could come from anywhere in the company.) Owners and higher managers should be open to questions, to alternative evidence, and to implement change based on that evidence. 

Those are my initial thoughts. I will report back later in the year, perhaps at DataGotham in New York, about what worked and what did not.  Again, if you have any suggestions or feedback, I would love to hear from you.