p-value.info: June 2014

Data science is increasingly impacting how we view and experience our world.

Last weekend, I was playing around with WordLens, a translation / augmented reality app recently acquired by Google. You point the smartphone's camera view at clear text such as road signs, book covers, and the like and the app not only translates the text but swaps the text out completely, matching the background (almost) seamlessly. You see that road sign in English? I see it in Spanish.

When it works well, it can be impossible to tell that the camera view, and target object, was modified in any way. Yes, this app is really buggy. It is not at all ready for prime time but nevertheless you can see the future right there in your hand. Let Google throw a team of engineers at it for the next 12 months, integrate it into glass, and you have a killer app.

One of the reasons that I am so excited and interested in this technology is that this is a significant step forward in augmented reality and, it could be argued, blurs the line significantly with virtual reality -- it is hard to tell the difference between the original world and the modified world. Moreover, it reinforces just how data science can and will influence our world view, literally.
Think of the ways that data science shape and filter what you see, sense, and perceive in the world:

Data science powers your newsfeed, showing what it thinks you want to see and who you want to interact with and filters out what it thinks is irrelevant. This obviously hugely influences your social interactions.

Skype will soon be introducing real time translation in calls. If that works well, it could dramatically shape how we collaborate with other around the world, enabling collaborations that would ordinarily be prohibitive because of a language barrier.

Adverts in the real world are becoming dynamic. In London, smart trash cans were introduced (and thereafter soon banned) that would read your mac address and target display ads at you directly.

It kind of sounds like a rogue state, an Orwellian future: a set of algorithms that effectively controls what news you see, which friends you see and interact with, translating and modify the sights and sounds around you. A large proportion then of what you see, hear, and experience could be shaped by data science. Thus, as practicing data scientists, this represents a great deal of responsibility.

I started writing this post before Facebook's newsfeed manipulation study was published, a perfect example of precisely the sort of scenario I am talking about. If you've been hiding under a rock and didn't see it, the core Facebook data science team published a paper in Proceedings of the National Academy of Sciences (one of the most prestigious science journals) about an experiment to test social contagion of emotions. To quote directly

I am old enough to remember sitting down and consuming my news sequentially, cover to cover, on pieces of paper. (Kids, those things are called "newspapers.") I had to make a decision what to read and what to skip. Importantly, I got a brief sense of the news articles that I was not reading. This is completely different today. While I read a physical newspaper for long form articles at the weekend, during the week I consume my news from a variety of sources, all shaped, curated, and filtered by algorithms: on Twitter, Prismatic, Reddit and the like. I have a higher hit rate of stories that interest me but I have no idea what I am missing, what's been buried.

"The experiment manipulated the extent to which [Facebook users] (N = 689,003) were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion."

and they found that

"When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred."

As I write, there is heated debate about this study. There are those that argue that there was no informed consent of the user. Even the editor of the paper (Prof S. T. Tiske) expressed concern:

"I was concerned until I queried the authors and they said their local institutional review board had approved it—and apparently on the grounds that Facebook apparently manipulates people's News Feeds all the time... I understand why people have concerns. I think their beef is with Facebook, really, not the research." (Source: article in the Atlantic)

and therein lies the flip side. A / B tests, personalization, recommenders, coupons etc. manipulate users all the time. Is this really any different? What makes overt emotional manipulation worse that manipulating their likelihood to open their wallets and purchase a product or share your content?

Helpful hint: Whenever you watch TV, read a book, open a newspaper, or talk to another person, someone's manipulating your emotions!
— Marc Andreessen (@pmarca) June 28, 2014

I don't want to take sides here but simply want to reinforce that point that as data scientists our work influences people, real people. Keep them in mind and seriously consider ethics and informed consent. Would the user expect their data to be used in this manner? Would you be OK if it were your data? If the answer is no to either of these, then don't do it. If you look out for the customer, put them first, then the business will surely follow.

p-value.info

Sunday, June 29, 2014

How data science shapes our world view