Sunday, November 25, 2012

Free Datascience books

I've been impressed in recent months by the number and quality of free datascience/machine learning books available online. I don't mean free as in some guy paid for a PDF version of an O'Reilly book and then posted it online for others to use/steal, but I mean genuine published books with a free online version sanctioned by the publisher. That is, "the publisher has graciously agreed to allow a full, free version of my book to be available on this site."
Here are a few in my collection:
While we are on the subject, I would be remiss of me not to recommend D.J. Patil's free minibooks/essays. While they are not the thick comprehensive tomes of those above, they are definitely worth the time to read.
Finally, this is work in progress (just 3 chapters to date) but is one to watch: Network Science by A.-L. Barabasi.

Update [12/27/12]: adding in some additions from a hacker news discussion and the comments below (thanks guys):




14 comments:

  1. To add to your list:
    Gaussian processes for Machine Learning by C.E. Rasmussen: http://www.gaussianprocess.org/gpml/chapters/RW.pdf

    The Elements of Statistical Learning, by Hastie, Tibshirani, Friedman: http://www-stat.stanford.edu/~tibs/ElemStatLearn/

    This is something I am definitely keeping a watch on :) --- Introduction to Machine Learning by Alex Smola, SVN Vishwanathan: http://alex.smola.org/drafts/thebook.pdf

    ReplyDelete
    Replies
    1. Would also like to add Networks,Crowds and Markets to this list

      http://www.cs.cornell.edu/home/kleinber/networks-book/

      Delete
  2. I like the amount of online resources for one to learn new knowledge..but boy, how long would it take for you to fully grasp all that concept and be able to apply that to real world applications? That's yet another question.

    I mean, from the list of books you listed, how many have you actually read from first to last page, and remember everything??

    ReplyDelete
    Replies
    1. There's no rule that says you can't read a book more than once or refer back to specific sections when you need to.

      Delete
    2. Data science is an especially broad umbrella. No one can possibly expect any data scientist to be a NLP expert, a recommender expert, a hadoop expert, a statistical modeler expert etc. In practice you will likely dabble in several or many areas but develop a deeper expertise in only one or two areas. It is, however, nice to have resources such as above. You can dip as you need or, as I like to do, read about the same material from different perspectives and different levels until it starts to really sink in.

      Delete
    3. Agree with Carl Anderson.

      In an emerging discipline, breadth matters before than depth. Reading a lot of books help me understand what others are talking about though I can't understand everything in depth.

      Delete
    4. Also, for a practicing data scientist, I think, more than the mathematical elegance of methods, ground level realities like gathering data, massaging it into a form that's conducive to further processing, visualization, propagating a model implementation all the way back to a real-world system are much more prominent. That is not to imply that the lack of a sound theoretical background is justified, but often a data scientist must operate under other constraints.

      Delete
  3. To add to the list:

    http://jsresearch.net/groups/teachdatascience/
    Introduction to Data Science, by Jeffrey Stanton

    Scroll to the middle of the post and you will find the pdf download...

    ReplyDelete
  4. This doesn't appear to be freely available ... unfortunately:

    Foundations of Statistical Natural Language Processing by Manning & Schütze

    ReplyDelete
    Replies
    1. Oh, you are right. Only sample chapters. Thanks and sorry about that.

      Delete
  5. buy dissertations online should be showed effectively by the style. Then comes the style. Structure of dissertation should be in accordance with the organizations recommendations. Care must be taken while using titles, edge collections etc so it doesn't become problematic later.

    ReplyDelete