Monday, February 8, 2016

The role of model interpretability in data science


This is a cross post of a piece I posted on medium (Feb 1, 2016):

In data science, models can involve abstract features in high dimensional spaces, or they can be more concrete, in lower dimensions, and more readily understood by humans; that is, they are interpretable. What’s the role of interpretable models in data science, especially when working with less technical partners from the business? When, and why, should we favor model interpretability?

The key here is figuring out the audience. Who is going to use the model and to what purpose? Let’s take a simple but specific example. Last week, I was working on a typical cold-start predictive modeling problem for e-commerce: how do you predict initial sales for new products if you’ve never sold them before?

One common approach is to make the most of your existing data. For instance, you can estimate the effect of temporal features, such as launch month or day of week, using your historical data. One can also find similar products that you have sold, making the naïve assumption that the market will respond similarly to this new product, and create predictors based on their historical sales. In this case, I had access to a set of product attributes: frame measurements of Warby Parker glasses. The dataset also happened to contain a set of principal components for those measurements. (If you don’t understand what this means, don’t worry, that is the whole point of this story.) I created a model that contained both readily-understood temporal features and the more abstract principal component #3, “pc03,” which turned out to be a good predictor. However, I decided to rip out pc03 and replace it with the set of raw product attributes. I pruned the model with stepwise regression and was left with a model that had three frame measurement variables instead of just one (pc03). Thus, the final model was more complex for no additional predictive power. However, it was the right approach from a business perspective. Why?

I wasn’t creating the model for my benefit. I was creating it for the benefit of the business, in particular, a team whose job it is to estimate demand and place purchase orders for initial inventory and ongoing replenishment. This team is skilled and knows the products, market, and our customers extremely well but they are not statisticians. This is a really tough task and they need help. Further, ultimately, they — not me — are on the hook for bad estimates. If they under-predict demand and we stock out, we lose those immediate sales and the lifetime value of any potential customers that go elsewhere. If they over-predict, we invest unnecessary CAPEX in purchasing the units, we incur increased warehouse costs, and we may be left with a pile of duds. Thus, the idea of my model is to serve as an additional voice to help them make their decisions. However, they need to trust and understand it, and therein lies the rub. They don’t know what principal component means. It is a very abstract concept. I can’t point to a pair of glasses and show them what it represents because it doesn’t exist like that. However, by restricting the model to actual physical features, features they know very well, they could indeed understand and trust the model. This final model had a very similar prediction error profile — i.e., the model was basically just as good — and it yielded some surprising insights for them. The three measurements the model retained were not the set that they had assumed a priori were most important.

This detailed example was meant to highlight a few reasons why a poorer, more complex but interpretable models might be favored:
  • Interpretable models can be understood by less-technical (non data scientists) in the business and, importantly, they are often the decision makers. They need to understand and use the model, otherwise the work has no ultimate impact. My job as a data scientist is to maximize impact.
  • Interpretable models can yield simple direct insights, the sort that they can readily communicate to colleagues.
  • Interpretable models can help build up trust with those partners and, through repeat engagements, can lay foundations for more sophisticated approaches for the future.
  • In many cases, the interpretable model is similarly performant to a more complex model anyway. That is, you may not be losing much predictive power.
My team often makes these tradeoffs, i.e., choosing an interpretable model when working in close partnership with a business team where we care more about understanding a system or the customer than pure predictive power. We may use classification trees as we can sit with the business owner and interpret the model together and, importantly, discuss the actionable insights which tie naturally to the decision points, the splits, in those trees. Or, in other cases, we may opt for Naïve Bayes over support vector classifiers, again because the terms of the model are readily understood and communicated.

In all these cases, it represents a calculated decision of how we maximize our impact as a data science team and an understanding of the actual tradeoffs, if there is a loss of performance.
Of course, those tradeoffs do not always land in favor of interpretable models. There are many cases where the best model, as determined byRMSEF-score and so on, do and should win out irrespective of whether it is interpretable or not. In these cases, performance is more important than understanding. For instance,
  1. Show me the money! The goal is simply to maximize revenue. A hedge fund CEO is probably not going to be worried about how the algorithmic trading models work under the hood if they bring home the bacon.
  2. There is an existing culture of trust and trail of evidence. Amazon has done recommenders well for years. They’ve proved themselves, earned the respect to use the very best models, whatever they are.
  3. The goal is purely model performance for performance sake. Kaggleleaderboards are littered with overly-optimized and overly-complex leaners (although one could argue it is related to 1 through the prize money).
  4. Non-financial stakes are high. I would want a machine-learned heart disease detector to do the best possible job as any type II prediction error could be devastating to the subject. Here, I believe, the confusion matrix performance outweighs a doctor’s need to understand fully how it works.
When choosing a model, think carefully about what one is trying to achieve. Is it simply to minimize a metric, to understand system, or to make inroads to a more data-driven culture within the organization? A model’s total performance is the product of the model predictive performance times the probability that the model will be used. One needs to optimize for both.

Saturday, January 23, 2016

When should I hire a data scientist?

This is a cross-post from a piece I wrote in medium:

Being part of the New York data scene, and especially being part of multiple VC networks, I often get asked to meet and advise early-stage startups and give my perspective on setting up the right data infrastructure and data team. Frequently, I get asked “when should I hire a data scientist?” as the word on the street is that you need a data scientist on staff. Sooner is better than later, right? They are often surprised when I say, “No, not yet. You are not ready.”

The truth is that very early startups often only have a basic data infrastructure in place to support the current business and are not ready to spend precious resources for more advanced analytics and data products such as recommenders. Focus on the foundational data first. Keep the website backend up and running, keep the signups flowing into a database table, instrument the site and track how users use your products. There’s lots to do.

You need some central transactional and analytics store that will at least scale for the next year or two. Be safe and opt for boring technology such as relational databases unless you have good reason otherwise. They are tried and tested. Boring is good. Centralize the data. Build in data quality processes. Create more robust ETLs to marshall the data. The data engineer is going to support the whole business not just analytics so is a good deal. Moreover, they are easier and cheaper to hire than data scientist.

“OK, great. We’ll hire a data engineer. And then we hire a data scientist?”

No, not yet. I would recommend hiring a data analyst first. Why? An early stage startup is probably still feeling out their business model. They are still trying to work out strategically where they should go. They are probably still seeking funding. These activities require getting answers from traditional analytics to help the founders and advisors make the right decisions and to provide the necessary information to investors. Excel will probably suffice for this work — and you can even connect it to a relational database as a data source. A good analyst will take you far. If they know SQL and they can query raw data stores directly, or can do some modeling in R, even better. Importantly for a cash strapped startup, a business analyst is probably only half the price of a good data scientist.

So at this point, you have a reasonable data infrastructure, hopefully some somewhat solid data quality processes, and you’ve met the founders’ basic data needs. Now, we hire a data scientist? Well, maybe. It very much depends on the type of business and whether a data scientist will be a central part of the business model. If you could only hire one more person for the team/rocket ship, who would provide the greatest return? That is, if a central offering of the business is a data product, a data science driven process, such as recommenders, or something similar that provide a competitive advantage then now might indeed be a good time. Maybe not. Maybe you just need another analyst. You need to have a good idea of why you need that data scientist. Don’t get me wrong. I’m very pro data science. I’m a data scientist. However, I do believe that for early stage startups at least, there can be too early for a data scientist. We are not cheap. We need data, and we are not necessarily the best people to be building out the early ETLs to get the data. Others, including software engineers, can probably do a better job, more quickly for less.

One option of course is to outsource. If you have a clear, crisp question you can essentially hand over a dataset to a consulting data scientist and let them at it. Who is going to prepare that dataset, or build an API to get the data, or provide raw access to the database? That’s right: the data engineer that you hired ahead of the data scientist.

By all means hire a data scientist but let them come into an environment where there is data ready to be mined and others to focus on vanilla business intelligence reporting and analysis and free up the data scientist to focus as much as possible on what they are good at: the fun stuff, where that unique blend of business, data, math, stats, and visualization skills can really shine.

Friday, June 12, 2015

Data is not the new middle manager

In April, the Wall Street Journal published article that claimed, as its title, "data is the new middle manager" and, further, in the opening paragraph set out this bold claim:

Firms are keeping head counts low, and even eliminating management positions, by replacing them with something you wouldn’t immediately think of as a drop-in substitute for leaders and decision-makers: data.

As we say in England: codswallop! (Yes, it is in the dictionary. Think "Baloney" or "BS.") Data are replacing leaders and leadership? Really?

As you can imagine, it caused a bit of a stir within the data science field at the time. I've heard a few people mention it since, one of whom who called it a "useful meme," but I simply can't believe that basic premise. I strongly believe that

  • Humans make decisions.
  • Algorithms make decisions.
  • Data do not and cannot make decisions. 

The article has bothered for me a couple of months, simmering away in the back of my mind. Part of the reason is that I agreed with much of the article in terms of the value of data, data tools, broad data access, and operational decision-making pushed out to the fringes. However, all tolled, the arguments presented didn't provide evidence to back up the article's major, and erroneous, claim.

It is true that
  • Data has indeed become more readily captured and more broadly accessible within orgs. That's a good thing.
  • Data tools for reporting, analysis, and data discovery are better, cheaper, and easier to use than ever before. That's a good thing.
  • Operational and tactical (but not strategic) decision-making can, or is, being pushed down to the front lines of orgs. Transparency via data helps achieve that. That's a good thing.
However, all these points don't lend weight that data is the new middle manager. I can’t ask data, numbers, a question: hey 6, should we switch shipping carriers? 42, you have all the answers, how much should I increase my ad spend budget? As Scott Berken puts it, "Data is non conscious: it is merely a list of stupid, dead numbers".

Data is of course a key ingredient here but its role is to augment the decision maker: humans or machines. The latter is especially interesting because I would expect its role to increase over time as we gather more data and feed it to ever better machine learning techniques on ever more powerful platforms. As I argue in my book, if you have a sufficiently stable or predictable environment and a sufficiently good algorithm that you can in fact make decisions based on data alone, without human intervention, then this is called automation, a good example of which just in time replenishment in supply chains. You should be doing that where possible. That can eliminate bodies, allows quicker, more consistent, and less emotion-based responses etc. However, this is not what is being claimed. The claim is that management positions are being eliminated because data are now acting as a middle manager, making decisions.

The author claims that the cost of data tools, once so expensive that companies could only provide them to managers, has decreased significantly such that they can now be more democratized, accessible to the front lines. That empowers those people to make informed operational and tactical decisions. Such tools and data access can also facilitate coordination among teams. One can keep abreast of what else is happening in the company and help people make decisions accordingly. However, I don't think either of these eliminate the need for true leadership, people whose job is to think strategically and to make strategic decisions, people whose job it is to inspire, align, and rally the troops. If managers are just a conduit for information and serve a coordination role, that is neither leadership nor decision making.

Better data processing tools can indeed eliminate bodies, specifically data pullers and crunchers, if instead you engender a self-service culture and everyone has the tools, skills, and access that they need. However, these bodies are not leaders or decision makers.

Organizations should be leveraging data as a strategic asset as much as possible but, ultimately, you need people to release its value. 

Monday, April 6, 2015

Creating a Data-Driven Organization: Two Years On

This is the third post in a series documenting the process of creating a more data-driven organization at Warby Parker. The first post covered my initial thoughts as I joined the company as Director of Data science in 2013. The second post documented progress made after one year. This next installment details progress made in my second year.

What a year it has been! I believe that we have made huge strides.

THE POWER OF THE DATA DICTIONARY
After trialing Looker, our business intelligence tool, last spring and receiving great feedback from our analysts, we made the decision to drop Tableau (at least for now, although we may get back to it for executive dashboards later this year) and instead focus on Looker.

We rolled out the tool in a fairly conservative manner over a six month period. This was not because Looker integration was difficult but we had a large number of data sources, we wanted to maintain the highest level of trust in our data, and because we needed to work with the business owners to agree upon and lock down the data dictionary. That is, set out the business logic that defined our business terms, such as precisely what constitutes a customer, how we define our sales channel logic and so on.

This data dictionary, and the consequent alignment of teams across the company, may be the most significant activity to date that has contributed to an enhanced data-driven culture. Thus, I want to go through this in some detail.

Creating and Validating The Data Dictionary
Our plan was to focus on one data source at a time, partner with the department(s) who "owned" the business logic, i.e. how the terms were defined, and who had datasets that we would validate against. They possessed Excel spreadsheets that contained raw data exported from our other systems but, importantly, in which they also had layered on derived metrics, additional logic that specified say how to handle say giftcards, exchanges, and giveaways when calculating what constituted a “sale.” The idea was that they would provide us with the business logic, we would implement that in Looker and then we would generate a dataset from Looker and compare that with the spreadsheet data, row by row, to validate. What happened was a very interesting and revealing process and is the reason that this process is so impactful to the organization.

There were a number of interesting lessons. First, unbeknownst to those teams, the business logic that they provided to us didn’t always precisely match what was actually in their spreadsheets. The reason is that these spreadsheets had grown organically over years, had multiple contributors, and contained all sort of edge cases. Thus, it was hard to keep track of the complete current logic. Therefore, the act of asking those teams to list out the logic very concretely—an actual series of IF THEN statements—turned out to be a really valuable exercise in itself.

Second, there was occasional mismatch among teams for the same metric. While the spreadsheets among different teams had originally been the same for those common metrics, they had unknowingly got out of sync. This set off very useful conversations about what the logic should be, where it should be the same, and also where and why those terms should differ. The output was a commonly agreed upon set of business logic and greater clarity and visibility about any terms that differed. For instance, our finance and product strategy teams have a different and valid perspective around the terms "bookings units," i.e. how many items we have sold. Now we are in a position to have two unambiguous, clearly-documented terms—"bookings units" and "product bookings units"—and can speak a more precise language across the company. Conversely, there were also several cases where there was a difference in definitions and the teams agreed that they should in fact be the same and they came to an agreement about what they should be.

Third, because we were using SQL during the validation process, we could easily drill down to understand the root causes of any rows that did not match. We found unusual edge cases that no-one had ever considered before, such as how some split orders are processed. When we explained these edge cases to the business owners, their reaction was often "That can’t possibly happen!" but with the evidence staring them in the face, we were able to apply those learnings to our internal processes and fix and improve our order handling and other scripts. Thus, everyone won from that activity.

Finally, some of the business logic we encountered in those Excel files was a workaround and based on the limitation of the enterprise resource planning software that generated the raw data. It was suboptimally-defined business logic. Thus, we were able to change the conversation and instead ask the business owners to specify their preferred business logic: in an ideal world, what would you like this logic to be? We were then able to implement that logic thus freeing up the teams to have simpler, cleaner, and more rational business logic that everyone could understand.

As you can imagine, this was a slow, painful process as we went through each of our many data sources working with those stakeholders to bring the data into Looker, validate it (which was the most time consuming step), and have those teams sign off on it. Those initial teams, however, saw the huge benefit to this process. They understood their own metrics better, had a centralized system that they could trust, was automated, and was locked down. Based on the benefits and great feedback that they were hearing, our CEOs made that a company priority: to get all the data into Looker, fully validated, and for analysts to use that as the primary source for all reporting and analysis. They helped us create a schedule for all the additional data sources to be included and got that necessary stakeholder buy in to do the work to define, validate, and sign off on the implemented logic.

I can’t stress enough the impact of this process on us being more data-driven. Even if we were to drop Looker today (which we don’t intend to), we would still have that data dictionary and that new alignment among all the stakeholders. It literally changed the conversation around data in the company.

To put the icing on the cake, we documented that logic in what we called the Warby Parker Data Book, an internal website with a book like interface (using gitbook.io) that lists out all our data sources, all our privacy and other data policies, and lists out that data dictionary. Everyone at Warby Parker can easily use the book to understand those terms. (This Data Book is the subject of a post on Warby Parker’s tech blog.)

Data Democracy
We now we have a suite of datasets in Looker. They can be sliced and diced with each other, the data are trusted, and are the central source of truth for the organization. Many reports are now auto-generated and directly emailed to stakeholders. For other reports, Looker is used to aggregate the data which are then exported for additional analysis or manual annotation to explain insights in the data. With Looker taking on the mechanics of crunching the numbers, this has freed up time for the analysts to spend on data discovery and analysis. Consequently, we are seeing more, deeper, and richer analyses. In addition, we are able to democratize data more than ever. For instance, Warby Parker sends out customer surveys to gather feedback about our brand, products, and experience, including within our retail stores. We now use Looker to aggregate responses that originated from each store and email them directly to the individual store leaders so that they can see and respond to what our customers are saying about their particular store. As you can imagine, those store leaders love these data and this new level of visibility.

ANALYST GUILD AND OFFICE HOURS
Switching gears and focussing on the analytics org itself, we decided that the analysts guild meetings, mentioned in the previous posts, were not as effective as they could be and we decided to shelve them for a while. They had reached a critical size in which a form of the bystander effect manifested itself. That is, the larger the group got, the less individuals wanted to help out such as volunteer to present or start or contribute to conversations—the size of the group became intimidating, especially for junior analysts. The breadth of interest and skill level of the large group meant that it was also hard to continue to find topics that were relevant and interesting to all. We decided that smaller more focussed discussions centered around a more precise topic and involving the most relevant stakeholder analysts and business owners would be a better approach. We haven’t found the right balance and process yet but is something that we are working on.

To provide additional support, I offer weekly analytics office hours, one in each of our two office buildings in New York. That is a chance for analysts to ask for help with statistics, experimental design, and in general act as a sounding board for their analysis, interpretations, and ideas. This is also helpful to me to understand what people are working on, what are their pain points, and how the data team can help.

Next on Deck
So what is coming up in terms of the analytics org? Lots of training for one. We've just had Sebastian Guttierez of https://www.dashingd3js.com/ do an in-house data visualization training session attended by a dozen of our analysts.

I am also planning to do some statistics training, not for the analysts but for the middle management at Warby Parker. You will recall from my last post that statistics training with the analysts did not work out well. Thus, my plan here is that by educating the managers and making them more demanding in the quality of analysts that they receive and the use of statistical inference—in short, making them more data literate—that will constitute more of a pull model on analysts. With me pushing from the bottom and managers pulling from the top, analysts will have no choice other than to level up.

Finally, I am working on an analyst competency matrix, a document that sets out the required skills for different levels of analysts. Thus, it specifies the level of data munging, data analysis, data visualization skills and so on that are required to jump from an analyst to a senior analyst. By providing a very clear career path, and the support to develop those skills needed to get promoted, we hope that this will make for happier, more content, and productive analysts.

More generally, I want to promote more forward thinking analyses in the next year: many more predictive models and hopefully even some stochastic simulation models for supply chain.

A BOOK
As an aside, one exciting thing that happened over this last year, at least for me, is that I decided to write a book. Based on the discussion and feedback to the previous two posts in this series, I approached O’Reilly Media with a proposal for a book (imaginatively) entitled "Creating a Data-Driven Organization" which was soon accepted. Thus, since August I’ve been more intensely researching what it means to be data-driven, interviewing others about their experiences, and writing up a long form synthesis. I’ve learned a huge amount, it has been a lot of fun, and I’m in the final stages—just revisions and corrections to do. In fact, although not quite complete, it is now available for purchase as part of their early release program.

As with these posts, I would love to continue the discussion and get your feedback and learn about your experiences. I shall be presenting on this topic at http://www.next.ml/ and at http://datadayseattle.com/.

AN INCREASING THIRST FOR DATA
Bringing the conversation back from the analytics org to the company level, I’m definitely seeing a thirst for data now. Analysts are wanting more and more data. This is a great problem to have. For instance, in my first year, analysts were doing crude, high-level geo-analyses. It had some value but they wanted more detailed insight into the business. Thus, we provided them with a dataset containing ZIP codes, CBSA (metropolitan areas), and DMA (TV viewing areas) and folded those into our customer and sales data. This set off a flurry of deeper,  more nuanced reporting, which was fantastic. Last week, however, that same team approached us again and asked how they can get neighborhood level detail. With Warby Parker opening more retail stores, they wanted a finer view of the local impact of those stores.

In addition, a couple of days ago, I attended a Warby Parker management retreat, a quarterly review and planning session. One of the themes that popped up in a number of conversations was more data, more visibility, and even the term "data-driven" was mentioned many times. Good things are happening and I really sense a cultural change.

As before, check back in a year’s time to monitor our progress.

Saturday, March 21, 2015

The "Creating a Data-Driven Organization" book is now available in Early Release

My book Creating a Data-Driven Organization is now available for purchase as part of O'Reilly's early release program. That means you can get access to chapters as they are released before the print date in July 2015.

Another advantage is that you have the chance to input and shape the book. I would love and appreciate your feedback and comments as I have another month or so to incorporate major changes. If you have anything that you would like to add or say, feel free to add comments via the add errata link.

Many thanks

Carl


Monday, March 9, 2015

Advice to graduate students interviewing for industry positions

A couple of weeks ago I saw a post in a LinkedIn group which went something like this: "I've just received a Ph.D. in physics and I know python and R. I've been applying for data scientist roles. However, I'm not getting much traction. Do you think that I need to learn a BI tool such as Tableau?" To summarize, this is a physics Ph.D., python and R! That is pretty much a trifecta for an ideal data scientist background. In the current climate, he should be inundated by offers.

I didn't interview this person but I assume that he could be doing a better job at selling himself based on his background, experience, and skill set. This is something that I have seen many times when interviewing graduate students over the years. Many students grossly undersell themselves, which is huge shame. Thus, I want to take the opportunity to give a few pieces of advice from a hiring manager's perspective.

When I've interviewed graduate students wrapping up their Ph.D.s, too many times the conversation goes like this:
  • Me: so tell me about yourself and what you've been doing at University of X.
  • Candidate: I was working in Professor Boson's lab which studies Doppler Shifts. While we have a pretty good idea of Y, we don't understand Z....[science, science, science]...and our group was specifically looking at [science, science, science]...
  • Me: OK but what what was your role?
  • Candidate: I was analyzing red shifts using geodesic equations...[science, science, science]...
(if you haven't worked it out, I know nothing about this particular area. I'm just trying to make a point.)

Don't get me wrong. I have a science background and was a professor. I love science, learning about anything new, and could chat about it all day. However, from a hiring manager's perspective, so far in this conversation, I haven't heard anything that is relevant or useful to me. My organization doesn't study red shifts. What I'm interested in are transferable skills that can be applied to different domains or the problem sets in my company. Thus, what I want to hear about are problem solving skills, coding skills, data munging skills, how you overcame a huge challenge and so on.

So, I often have to push the conversation quickly to these areas. After some probing, I might then find out that they had to process huge amounts of data from optical arrays, or they had to deal with a significant missing data problem and impute values, or they had to develop some sophisticated computer vision algorithms. They do in fact have a more interesting and marketable skill set that they, unfortunately, aren't leading with. In short, I find that graduate students often don't think about what skills they posses that are valuable to an organization to which they are applying. Draw attention to those clearly in your resume and in how you talk about yourself during a phone screen. In essence, explain why we would be a good match.

The fact that you are completing a Ph.D. shows focus, persistence, and dedication. Research is often open-ended and you have to get a sense of where the ripe questions and approaches are and know when to give up and tackle it another way. That is a highly valuable skill. Dealing with messy real world or voluminous data are problems that we face in industry. We want smart, creative thinkers who can switch domains, own a problem, think laterally to deal with unknown but inevitable issues that will crop up, and who, ultimately, still produce results. We want good communicators. We want to know what combination of those you are. Where do you shine and what do you bring to the table? And, we need to make an initial assessment of all of this in 30 to 45 minutes over a phone line. It's tough so you have to put your best foot froward. Want an insider tip? You can get more of my time if, in advance of the call, you have a github account or a competition profile on Kaggle listed on your resume that I can check out. 

No, you don't need to learn Tableau to get my attention. You likely already have a great set of skills. Just sell me on what they are. What can you do?

Good luck!

 

Sunday, October 12, 2014

Creating a Data-Driven Organization: the book

I am pleased to announce that I am currently under contract with O'Reilly to write a book on "Creating a Data-Driven Organization." It is still at the early stages but should be out next year.

This is a great opportunity to go deep into what it means to be data-driven and to think about how organizations best achive that goal. I'll cover the infrastructure, skills, and culture needed to create organizations that take their data, treat it as a core asset, and use it to drive and inform critical business decisions and ultimately make an impact.

So far, it has been a lot of fun. I've read and learned a ton and can't wait to share.

No word yet on which animal will appear on the cover...

Saturday, September 27, 2014

Creating a data-driven organization: the presentation

Following on from my earlier posts, How do you create a data-driven organization and How to create a data-driven organization: one year on, I recently gave a presentation entitled "Creating a data-driven organization" at the Predictive Analytics & Business Insights 2014 conference in Philadelphia. You can obtain the slides here.




Sunday, June 29, 2014

How data science shapes our world view

Data science is increasingly impacting how we view and experience our world. 

Last weekend, I was playing around with WordLens, a translation / augmented reality app recently acquired by Google. You point the smartphone's camera view at clear text such as road signs, book covers, and the like and the app not only translates the text but swaps the text out completely, matching the background (almost) seamlessly. You see that road sign in English? I see it in Spanish. 




When it works well, it can be impossible to tell that the camera view, and target object, was modified in any way. Yes, this app is really buggy. It is not at all ready for prime time but nevertheless you can see the future right there in your hand. Let Google throw a team of engineers at it for the next 12 months, integrate it into glass, and you have a killer app. 

One of the reasons that I am so excited and interested in this technology is that this is a significant step forward in augmented reality and, it could be argued, blurs the line significantly with virtual reality -- it is hard to tell the difference between the original world and the modified world. Moreover, it reinforces just how data science can and will influence our world view, literally.
Think of the ways that data science shape and filter what you see, sense, and perceive in the world:

  • Data science powers your newsfeed, showing what it thinks you want to see and who you want to interact with and filters out what it thinks is irrelevant. This obviously hugely influences your social interactions.  
  • Skype will soon be introducing real time translation in calls. If that works well, it could dramatically shape how we collaborate with other around the world, enabling collaborations that would ordinarily be prohibitive because of a language barrier.
  • Adverts in the real world are becoming dynamic. In London, smart trash cans were introduced (and thereafter soon banned) that would read your mac address and target display ads at you directly. 
It kind of sounds like a rogue state, an Orwellian future: a set of algorithms that effectively controls what news you see, which friends you see and interact with, translating and modify the sights and sounds around you. A large proportion then of what you see, hear, and experience could be shaped by data science. Thus, as practicing data scientists, this represents a great deal of responsibility.

I started writing this post before Facebook's newsfeed manipulation study was published, a perfect example of precisely the sort of scenario I am talking about. If you've been hiding under a rock and didn't see it, the core Facebook data science team published a paper in Proceedings of the National Academy of Sciences (one of the most prestigious science journals) about an experiment to test social contagion of emotions. To quote directly



I am old enough to remember sitting down and consuming my news sequentially, cover to cover, on pieces of paper. (Kids, those things are called "newspapers.") I had to make a decision what to read and what to skip. Importantly, I got a brief sense of the news articles that I was not reading. This is completely different today. While I read a physical newspaper for long form articles at the weekend, during the week I consume my news from a variety of sources, all shaped, curated, and filtered by algorithms: on Twitter, Prismatic, Reddit and the like. I have a higher hit rate of stories that interest me but I have no idea what I am missing, what's been buried.
"The experiment manipulated the extent to which [Facebook users] (N = 689,003) were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion."
and they found that
"When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred."
As I write, there is heated debate about this study. There are those that argue that there was no informed consent of the user. Even the editor of the paper (Prof S. T. Tiske) expressed concern:
"I was concerned until I queried the authors and they said their local institutional review board had approved it—and apparently on the grounds that Facebook apparently manipulates people's News Feeds all the time... I understand why people have concerns. I think their beef is with Facebook, really, not the research." (Source: article in the Atlantic)
and therein lies the flip side. A / B tests, personalization, recommenders, coupons etc. manipulate users all the time. Is this really any different? What makes overt emotional manipulation worse that manipulating their likelihood to open their wallets and purchase a product or share your content?


I don't want to take sides here but simply want to reinforce that point that as data scientists our work influences people, real people. Keep them in mind and seriously consider ethics and informed consent. Would the user expect their data to be used in this manner? Would you be OK if it were your data? If the answer is no to either of these, then don't do it. If you look out for the customer, put them first, then the business will surely follow.

Wednesday, May 7, 2014

How to create a data-driven organization: one year on

A switch from "I think" to "The data show"
A year ago, I wrote a well-received post here entitled How do you create a data-driven organization?". I had just joined Warby Parker and set out my various thoughts on the subject at the time, covering topics such as understanding the business and customer, skills and training, infrastructure, dashboards and metrics. One year on, I decided to write an update. So, how did we do?

We've achieved a lot in the last year, made some great strides in some areas, less so in others.

Initiative metrics
One of the greatest achievements and impacts, because it cuts across the whole organization and affects all managers, are to do with our initiative teams and how they are evaluated. Evaluation is now tied very strongly to metrics, evidence backing underlying assumptions, and return on investment.


What’s an initiative team?
Much of the work and improvements that individual teams (such as Customer Experience, Retail and Consumer Insights), requires software development work from our technology team. For instance, retail might want a custom point of sale application, or Supply Chain might want better integration and tracking with vendors and optical labs. The problem is that the number of developers, here organized into agile teams, is limited. Thus, different departments essentially have to compete for software development time. If they win, they get use of a team --- a 3-month joint "initiative" between an agile team and business owner --- and can implement their vision. With such limited, vital resources, it is imperative that the diverse initiative proposals are evaluated carefully and are comparable (that we can compare apples to apples), and that we track costs, progress and success objectively.

These proposals are expected to set out the metrics that the initiative is trying to drive (revenue, cost, customer satisfaction etc.) and upon which the initiative will be evaluated --- for example, reducing website bounce rate (the proximate metric) should lead to increased revenue (the ultimate metric). They are also expected to set out the initiative’s assumptions. If you claim that this shiny new feature will drive $1 million in increased revenue, you need to back up your claim. As these proposals will be reviewed, discussed and voted upon by all managers in the company, and they are in competition, it creates increased pressure to have a bullet proof argument with sound assumptions and evidence, and to focus on work that will really make a difference to the company. It also has an additional benefit: it keeps all the managers up to speed with what teams are thinking about and what they would like to accomplish, even if their initiative does not get "funded" this time around.

It took a few rounds of this process to get where we are now and there are still improvements to be made. For instance, in this last round we still saw #hours saved as a proposed initiative impact when the hourly rate varies considerably among employees. That is, they should be standardized into actual dollars or at least a tiered system of hourly rates so that we can compare against hours saved in a more expensive team. This visibility of work, metrics, assumptions and the process by which resources are allocated has really pushed us towards data-driven decisions about priorities and resource allocation.

ROI
While the initiative process covers the overall strategy and just touches on tactics at a very high level, what happens within a funded initiative is all about low-level tactics. Teams have different options to achieve their goals and drive their metrics, what features should they work on specifically, and when. This, too, is a very data-driven process all about return on investment (ROI). Again, there is a good process in place in which our business analysts estimate costs, returns, assumptions and impacts on metrics (this is the “return” component). While the developmental time is mostly a fixed cost (the agile teams are stable), costs can vary because they may choose to pay for a 3rd party vendor or service rather than build the same functionality (this is the “investment” component). These ROI discussions are really negotiations between the lead on the agile team and the business owner (such as head of Supply Chain): what makes most sense for us to work on this sprint. This ROI process also covers my team too, the Data Science team, which is outside the initiative process but involves similar negotiations with department heads who request work; this allows us to say no to teams for requests because the ROI is too low. By asking department heads to spell out precisely the business impact and ROI for their requests, it also gets them to think more carefully about their strategy and tactics. 

Our ROI process is very new but is clearly a step in the right direction. Estimating both return on investment and justifying the assumptions is not at all that easy, but it is the right thing to do. In essence, we are switching from "I think" to "The data show"...

"Guild meetings are held to improve. They are there to ‘sharpen your saw’. Every minute you use for ‘sawing’ decreases the amount of time for sharpening" from a post by Rini van Solingen 

Analyst guild
Warby Parker employs a decentralized analyst model. That is, analysts are embedded in individual teams such as Digital Marketing, Customer Experience, and Consumer Insights. Those analysts report to their respective team leads and share those team's goals. The advantage, of course, is that analysts are very close to what their team is thinking about, what they are trying to measure, what questions they are asking. The downside, however, is that metrics, processes and tools can get out of sync with analysts on other teams. This can --- and in our cases did --- result in redundancy of effort, divergent metric definitions and proliferation of tools and approaches, etc. 


To compensate for these inefficiencies, we instituted a "guild," a group that cuts across the organization (rather like a matrix style organization). The guild is an email list and, more importantly, an hour long meeting every two weeks, a place for all the analysts to come together to discuss analytics, share their experiences and detail new data sources that might be useful to other teams. In recent weeks, the guild has switched to a more show-and-tell format in which they showcase their work, ask for honest feedback and stimulate discussion. This is working really well. Now, we all have a better sense of who to ask about metrics and issues, what our KPIs mean, where collaborations may lay and what new data sources and data vendors we are testing or are in discussion with. When the analysts are aligned you stand a far greater chance of aligning the organization, too.

SQL Warehouse
Supporting the analysts, my team has built a MySQL data warehouse that pulls all the data from our enterprise resource planning software (hereafter ERP; we use Netsuite) with 30-minute latency and exposes those data in a simpler, cleaner SQL interface. Combined with SQL training, this has had a significant impact in the analysts’ ability to conduct analysis and compile reports on large datasets.

Prior to that, all analysts were exporting data from the ERP in CSV file and doing analysis in Excel; that came with its problems. The ERP software has limits, so exports can time out. Excel has its limits, and analysts could sometimes run out of rows or more frequently memory. Finally, the ERP software did not allow (easy) custom joins in the data; you exported what the view showed. This meant that analysts had to export multiple sets of data in separate CSV files and then run huge VLOOKUPs in the Excel file. Those lookups might run for 6 hours or more and would frequently crash. (There is a reason that the financial analysts have the machines with the highest amount of RAM in the whole company.)

To combat this insanity, we built a data warehouse. We flattened some of those tables to make them more easy to use. We then ran a number of SQL trainings. We combined going through material in w3schools  as well as interactive sessions and tutorials using simplified Warby Parker data. After analysts got their feet wet, we supplemented these with more one-on-one tutorials and help sessions, and also hosted group sessions in the analyst guild. The analyst guild was a place that individuals could show off their queries and share how quickly and easily their queries ran compared to the old ERP/Excel approach. We now have a reasonable number of analysts running queries regularly and getting answers to their questions far more easily and quickly.

In addition, this centralization of not just the raw data but also the derived measures, such as sales channel or net promoter score, means a central source of truth with a single definition. This has helped move us away from decentralized definitions in Excel formulae sitting on people’s laptops to standard definitions baked into a SQL database field. In short, people now (mostly) speak the same language when they reference these metrics and measures. While there is still work to be done, we are in a better place than a year ago.

"People want to move from a culture of reporting to a culture of analytics" - Steffin Harris


BI tooling
While writing SQL is one approach to getting answers, it does not suit all levels of skills and experience. Once you have a set of core queries, you likely want to run these frequently, automatically and share results (think canned reports and dashboards). This is where business intelligence tools come into play. While we did a good job at automating a number of core queries and reports using Pentaho Data Integration, we did not make sufficient progress (and I am solely to blame for this) in rolling out a more self-service set of business intelligence tools, a place where analysts can spend more time exploring and visualizing data without writing SQL. While we trialed Tableau and, more recently, Looker, my team did not push analysts hard enough to use these tools and try to switch from Excel charting and dashboard to report feedback. Thus, while we are currently rolling these out to production this quarter, we could have done this up to 6 months ago. Getting analysts to switch would have both created more high quality dashboards that could be easily visible and shared around the company. It would have gotten more people to see data on monitors or in their inbox. It would have also freed up more time for analysts to conduct analysis rather than reporting, an important distinction.

Statistics

Another area where I made less progress than I expected was the level of statistical expertise across the company. Having a Ph.D. from a probability and statistics department, I am clearly biased, but I think having statistical training is hugely valuable not just for the analysts performing the analysis and designing the experiments, but their managers too. Statistical training imports a degree of rigor in thinking in terms of hypotheses, experimental design, thinking about populations and samples, as well as the analysis per se. In many cases, analysts would ask for my advice about how to analyze some dataset. I would ask "precisely what are you trying to answer?", but they wouldn’t be able to express it clearly and unambiguously. When I pushed them to set out a null and alternative hypothesis, this crystallized the questions in their mind and made the associated metrics and analytical approach far more obvious.

I announced that I would run some statistical training and 40 people (which represented a large proportion of the company at the time) immediately signed up. There was a lot of interest and excitement. I vetted a number of online courses and chose Udacity's The science of decisions course. This has a great interactive interface (the student is asked to answer a number of questions inline in the video itself during each lesson) and good curriculum for an introductory course. It also has course notes, another feature I liked. I decided to send about 20 employees through the first trial.

It was a complete disaster.

The number of people who completed the course: zero. The number of people who completed half the course: zero. The problem was completely unrelated to Udacity; it was our fault. Students (=staff) weren't fully committed to spending several hours per week of their own time learning what should be a valuable, transferable skill. To truly embrace statistical thinking you have to practice, to do the exercises, and to attempt to translate concepts you are learning to personal examples such as specific datasets that you use in your job as an analyst. There was insufficient buy in to this degree of effort. There was also insufficient reinforcement and monitoring of progress from managers; that is, expecting participation and following up with their direct reports. I am also to blame for not having in-house check in sessions, a chance to go through material and cover problematic concepts.


I haven't yet solved this. What I have noticed is that a concrete need drives a response to "level up" and meet expectations. Over the last few months, our A/B tests have picked up. The business owners, the agile teams building features (including their business analysts and the project managers), as well as the particular manager who runs our A/B tests and analysis, are all simultaneously expecting and being expected to provide rigor and objectivity. That is, to run a sample size and power analysis in advance of the test, to define clear metrics and null and alternative hypotheses, to be able to explain why they are using a Chi-squared test rather than another test. Having to defend themselves from probing questions from other senior managers, ones who do have some experience in this area, and just asking the right questions is forcing people to learn, and to learn quickly. This is not ideal, and there is a long way to go, but I do feel that this represents a significant shift.



In conclusion, the business is in a far better place than a year ago. People are starting to ask the right questions and have the right expectations from peers. More decisions are based from data-backed assumptions and results than before. More people are starting to talk in more precise language that involve phrases such as "test statistic" and "p-value." More people are harnessing the power of databases to leverage its strength: crunching through large amounts of data in seconds. We are not there yet. My dream for the coming year is
  • more canned reports
  • more sharing of results and insights
  • more dashboards on monitors
  • more time spent on analysis and deep dives rather than reporting
  • more accountability and retrospectives such as for prediction errors, misplaced assumptions
  • more A/B testing
  • more clickstream analysis
  • more holistic view of the business

How will we do? I don't know. Check back next year!