Here are a few in my collection:

- Mining of Massive datasets by Rajamaran, Leskovic & Ullman
- Bayesian Reasoning and Machine Learning by David Barber [website]
- Information Theory, Inference, and Learning Algorithms by David J.C. Mackay
~~Foundations of Statistical Natural Language Processing by Manning & Schütze~~only sample chapters

- Data Jujitsu by D.J. Patil
- Building Data Science Teams by D.J. Patil

Update [12/27/12]: adding in some additions from a hacker news discussion and the comments below (thanks guys):

- Introduction to Information Retrieval, by Manning, Raghavan and Schütze
- A first encounter with machine learning by Welling
- Gaussian processes for Machine Learning by C.E. Rasmussen
- The Elements of Statistical Learning, by Hastie, Tibshirani, Friedman -- grandaddy of them all
- Introduction to Machine Learning by Smola, Vishwanathan
- Think Bayes by Downey

To add to your list:

ReplyDeleteGaussian processes for Machine Learning by C.E. Rasmussen: http://www.gaussianprocess.org/gpml/chapters/RW.pdf

The Elements of Statistical Learning, by Hastie, Tibshirani, Friedman: http://www-stat.stanford.edu/~tibs/ElemStatLearn/

This is something I am definitely keeping a watch on :) --- Introduction to Machine Learning by Alex Smola, SVN Vishwanathan: http://alex.smola.org/drafts/thebook.pdf

Great. Thanks for adding.

DeleteWould also like to add Networks,Crowds and Markets to this list

Deletehttp://www.cs.cornell.edu/home/kleinber/networks-book/

I like the amount of online resources for one to learn new knowledge..but boy, how long would it take for you to fully grasp all that concept and be able to apply that to real world applications? That's yet another question.

ReplyDeleteI mean, from the list of books you listed, how many have you actually read from first to last page, and remember everything??

There's no rule that says you can't read a book more than once or refer back to specific sections when you need to.

DeleteData science is an especially broad umbrella. No one can possibly expect any data scientist to be a NLP expert, a recommender expert, a hadoop expert, a statistical modeler expert etc. In practice you will likely dabble in several or many areas but develop a deeper expertise in only one or two areas. It is, however, nice to have resources such as above. You can dip as you need or, as I like to do, read about the same material from different perspectives and different levels until it starts to really sink in.

DeleteAgree with Carl Anderson.

DeleteIn an emerging discipline, breadth matters before than depth. Reading a lot of books help me understand what others are talking about though I can't understand everything in depth.

Also, for a practicing data scientist, I think, more than the mathematical elegance of methods, ground level realities like gathering data, massaging it into a form that's conducive to further processing, visualization, propagating a model implementation all the way back to a real-world system are much more prominent. That is not to imply that the lack of a sound theoretical background is justified, but often a data scientist must operate under other constraints.

DeleteTo add to the list:

ReplyDeletehttp://jsresearch.net/groups/teachdatascience/

Introduction to Data Science, by Jeffrey Stanton

Scroll to the middle of the post and you will find the pdf download...

This doesn't appear to be freely available ... unfortunately:

ReplyDeleteFoundations of Statistical Natural Language Processing by Manning & Schütze

Oh, you are right. Only sample chapters. Thanks and sorry about that.

DeleteIts really awesome.

ReplyDeleteThanks for the resource!

ReplyDeletebuy dissertations online should be showed effectively by the style. Then comes the style. Structure of dissertation should be in accordance with the organizations recommendations. Care must be taken while using titles, edge collections etc so it doesn't become problematic later.

ReplyDelete