I was recently interviewed for Data Science Weekly in a post entitled
Update: It was picked up by Business Insider.
Update: And then things really jumped the shark on gothamist
Saturday, April 19, 2014
I meet a lot of aspiring data scientists, people starting out who are often switching from academia or finance. They are all keen-eyed and bushy tailed, drawn in by the tales of advanced algorithms from Netflix, the latest competition at Kaggle or the shiny new visualization from Facebook. However, when it comes to e-Commerce, they are kind of stumped. They don't really grasp the scope of how data science can help a business that sells physical “stuff”. They get the idea of recommendation engines baked into almost every chunk of Amazon's website of course but beyond that, they find it hard to imagine how else data scientists may spend their days in such companies.
The purpose of this post, then, is a brief, almost superficial, overview of some of the different aspects of a typical e-Commerce business where data scientists can add value.
Before I start, however, I want to mention a couple of caveats:
• All of the areas below are serviced by a swathe of specialized vendors. They can do a great job --- potentially far superior than an in-house data science team because their business is so focused and specialized and their tools so developed --- but it usually comes at a price. At small company scale, an individual data scientist or team may be able to provide something that is sufficiently good to meet the company's needs or to demonstrate the need for a specialized service. At larger scale, it may make sense to build such systems in-house using the data science team.
• In the list below, there is a broad overlap between the responsibilities of a typical analyst and a data scientist. Some aspects, such as “implement a recommendation engine” are clearly in the data science camp. Other areas, such as those relating to customer insights, are usually performed by analysts. In this case, however, the data scientists may be able to help the business and analysts with more sophisticated statistical approaches (say feature reduction or unsupervised clustering of customers rather than a priori slicing and dicing based on age, gender, zip etc), in other words advanced analytics, or more programmatic approaches (e.g. use an API to pull down supplementary data).
Few individual companies will use data scientists for all of these aspects. The point here is to highlight different aspects where data scientists can and do get involved and provide some value and insight.
Recommendation and Personalization
Let's get the obvious one out of the way. Consumers are increasingly reliant on recommendations these days, whether it is for news, restaurants, bands or items to purchase. Many, if not most, e-Commerce sites have some sort of recommendation engine under the hood and it is typically the data scientist's role to help conceive the type, features, weights and in many case implement it. These engines are used for cross-sell (“you are ordering this iPad so you probably want one of these cases to protect it”), up-sell (“you have been looking at this camera, here is the next level up which is even more awesome”) and personalization. It is the data scientist's role to learn the attributes and relationships among products and when possible to learn the tastes and anticipate needs of the customers. They can then help tailor the customer's experience. This might involve changing the ordering of products in the search results or galley pages specifically for the customer.
Many of us have supermarket loyalty cards. So do some e-Commerce sites (think Amazon prime). They are a source of extremely valuable data (so much so that it may even be worth making some amount of loss on those customers). Coupons and discounts can drive new purchase behavior and provide insights for whole segments of customers not in the loyalty program itself. Those programs need to be conceived, managed and maximal use made of the data.
All e-Commerce sites have to tackle the questions: what should we sell, at what price and when. Data scientists can help define and optimize the product mix. In some cases, such as my current employer Warby Parker, the company may design and manufacture their products. That is, they own the whole process from produce conception to final sale to a customer. While there is a typically a product team that owns that design process, data scientists can and do help with forecasting. Is there a hole in our product mix, what should we make and when should we sell it? How many units should we order in the initial batch from the factory? When should we retire products? Analysts will typically tackle the retrospective analysis (how much did we sell, what are the duds) whereas data scientists can help with the more advanced prescriptive and predictive analytics.
If an e-Commerce is to sell “stuff,” it needs the right amount of the right stuff in the right place at the right time. Supply chain is a particularly complex and important part of the business. It is complex because it often involves multiple vendors and factories, significant time lags for international shipping, significant shipping costs (especially if one gets it wrong and has to expedite pallets of good to warehouses) and significant capex. Also, there can be very narrow windows of demand for a product and if you miss that window, you might be stuck with a big pile of useless inventory (think of “Happy New Year 2014” products on Jan 2). Finally, demand van be highly unpredictable and might correlate strongly with exogenous factors such as above-average weather. Ideally, e-Commerce will work with specialized vendors to handle supply chain or hire an expert in-house operations research team. However, in many e-Commerce sites, especially when small, there is plenty of scope for data scientists to perform detailed analysis and develop predictive models than can help minimize risk, inform strategy and optimize customer satisfaction.
A company that puts the customers first is going to have a great customer service team that handles issues, deals with returns and complaints and generally tries to keep the customers happy. These teams generate a trove of data from phone calls, instant messages and email interactions with the customers and back end systems. They also tend to be fairly metric driven: how long on average does it take to answer the phone, to resolve a case, what is the size of the case backlog etc.? Data scientists can help with predictive models and visualization. They can also use their skills with natural language processing. For instance, they could use keyword extraction and topical modeling to understand the types of complaints and issues being filed.
Fraud is, unfortunately, very common and the strategies employed by the thieves varied and in some cases sophisticated. It can range from the use of stolen credit cards, non-returned items or items returned which are shrink-wrapped but which do not contain the original product. Again, there is potential for data scientists to develop models or monitoring or alerting systems
Hiring is tough, especially in technology where it is extremely competitive to hire good engineers (and data scientists of course). Hiring is time consuming and expensive because of the cost of recruiters, fees, and time spent interviewing. In addition, a bad hire can be counter-productive to the team or company and expensive to manage. Increasingly, companies are interested in honing their recruiting process: what makes a good fit for our company, where can we streamline the interview process, what are good discriminating interview questions and so on. Models can be used to understand attrition and retention, identify who should be rejected at the resume phase, and analyze and optimize the interview pipeline.
An e-Commerce site has stuff to sell but who are the people buying it? What are they interested in? Where do they live? How can we serve them better? What makes them tick? These questions are typically answered by analysts in a group akin to customer insights or, as the company scales, to specialized teams that might work within just one realm within the product space. As above, data scientists can help here with more advanced analytics (classifiers, predictive modeling, unsupervised clustering and segmentation and so on). This team is often responsible for customer surveys and so there is ample opportunity to help them with natural language processing including keyword extraction and topic modeling. (I wrote about this earlier in my post about matching misspelled brand names.)
OK, so the site has product to sell and they know something about their customers. An obvious next question for them to ask is how do they get more customers or encourage existing customers to purchase more? Here we enter the realm of marketing. Once again, there is lots of scope for data scientists to contribute. This might range from adword buying optimization, channel mix optimization (by that I mean print vs web vs TV), ad retargeting optimization, and SEO. Most e-Commerce sites send out a lot of emails, especially if they are in flash sales. There is lot of scope for understanding, optimizing and A/B testing subject lines, content, send times and so on. (At One Kings Lane, a home decor flash sales site, we sent customers up to 17 emails per week.) There is a careful balance between reminding customers about your presence and what you offer and turning people off by being a nuisance. In many e-Commerce sites, cart abandonment rates reach dizzy heights and understanding and addressing that can pay rich rewards. Data scientists can often comprise a core part of personalization programs.
Another obvious area where data scientists can contribute is web analytics. How do people come to the site (this relates to SEO, search terms and referer URL analysis)? What paths do they take? When and where do they bounce? What stages of the checkout funnel do we lose most customers? How can we make the experience more frictionless, enjoyable and relevant? Which products are customers entering into our search box, which we do not currently, but should, supply? These are all areas which should be covered by a specialized analyst team as the company scales but where data scientists can help with data munging, visualization, advanced clickstream analysis, A/B testing as well as contributing data products for the site (personalization and recommender APIs).
And there you have it: a whirlwind tour of an e-Commerce site from the perspective of data scientists. It is this breadth that makes being a data scientist fun, rewarding and challenging. You get to work with a board spectrum of partners across the organization, dip into different domains, and make a difference in a variety of ways.