Remembering John Horton Conway as COVID-19 Shapes Our Game of Life

John Conway passed away today, reportedly from complications related to COVID-19.

A numberphile video from 2014 reveals that as Conway grew older, he became more comfortable with the fact that he is best-known for his game of life, despite feeling he made many other far-more-noteworthy contributions.

Even if his other, greater contributions were dwarfed by the fame of something he toyed with during coffee time - his game has inspired joy, wonder and interest in countless mathematics and computer science students.

Second semester of my freshman year as an undergraduate wouldn't have been the same without it (or without the professor I had who brought Conway's game into the classroom in a way that was approachable by anyone).

Of course, that's before mentioning all the follow-on work inspired by or related to Game of Life - particularly Paul Rendell's work in theory of computation.

This is one of the first of many profound losses society is likely to recognize as a result of COVID-19.

While Conway was in his 80s with his best years (and a lasting legacy) behind him - the loss of "high-value" stitches in the fabric of society is impossible to quantify, particularly as COVID-19 is taking many much younger than him. Beyond the loss of a great mind - the economic impact of someone like Conway is incomprehensible. One doesn't need to be a world-famous mathematician for this to be true. It's just as challenging to reason about the cost of losing a veteran heart or brain surgeon - or even a teacher, small business owner or community leader with 20 or more years of experience helping others to grow their lives, careers and relationships.

These aren't nodes in the graph of human capability that we can just replace with a stimulus check. They are the beautiful, potentiating "Gosper's breeders" of society - their helpful "configurations" conditioned by a system which steeped their unique characteristics and capabilities over many years of study, practice and plain old living.

Of course, a disease doesn't care about this. It changes the rules of the game and takes the breeders and glider guns just as easily as it takes the colonies of unremarkable configurations destined to disappear and be forgotten in 20 moves.

This is not to trivialize the loss of any life - all life is precious. It's also not meant to politicize the crisis, and use Conway's death to say something as pedestrian about what's happening as "see, too early to re-open the economy". That kind of statement, (made without rigorous, published exploration of its merits, or lack thereof) would be an anathema to Conway and his memory.

My only point here is that when considering the economic and societal consequences of various responses to the disease - higher order, systemic considerations (the kind that Conway's famous game helps develop an intuition for) shouldn't be ignored.

You can play the game for yourself here. Simply click the squares to toggle your initial configuration of live nodes - and then press start. You can adjust the speed of iteration and the size of the graph with the sliders to the right of the start button.

If you enjoyed this post, consider following me on twitter.

Wintersmith and wintersmith-ejs Upgrade Build Woes - ENOENT file not found on all templates, Cannot call method 'chain' of undefined.

The issues, and a quick fix...

if you have upgraded wintersmith and wintersmith-ejs, and are now getting errors when you've fixed your calls to require to work with the new version (initially, or while trying sensible fixes) that look like this:

ENOENT - file not found (for all your templates)

or maybe:

   error feed.xml: /home/Blog/templates/feed.jade:42| rss(version='2.0',                                                                                 │
    3|     xmlns:content='http://purl.org/rss/1.0/modules/content/',                                      │
  > 4|     xmlns:wfw='http://wellformedweb.org/CommentAPI/',                                              │
    5|     xmlns:dc='http://purl.org/dc/elements/1.1/'6|     xmlns:atom='http://www.w3.org/2005/Atom')                                                      │
    7|   channel                                                                                          │
                                                                                                          │
Cannot call method 'chain' of undefined

then you may have run into the same problem I did. The correct fix will require updating your templates to use the new and improved wintersmith API as far as I can tell.

If you find yourself in this position and you don't want to go to the trouble of updating your templates, just roll back to a previous version of wintersmith. You need to roll back to a previous version of wintersmith-ejs as well though (or whatever template plugin you are using).

$ npm uninstall -g wintersmith
$ npm install -g wintersmith@1.2.4
$ npm uninstall wintersmith-ejs
$ npm install wintersmith-ejs@0.1.3

Obviously you may need to investigate which version(s) of wintersmith and wintersmith-ejs you need.

This should be simple if you are a sane person with an up-to-date package.json file handy because you always remember to `--save-dev your dependencies. I, of course, don't know anyone who would forget to do something like that. ಠ_ಠ

Is 2.0.X worth it?

For me, the answer was "probably, but not at the moment".

I'll be making the switch to 2.0.X and the latest template plugins whenever my next design / template overhaul happens. For now though, my blog and its templates are just how I want them, and I don't need to take advantage of any of the API improvements.

If, however, you are planning to overhaul your site soon anyway - it might be worth just making the leap now. Casually browsing through some of the example code - it looks like the abstractions available for managing / building / maintaining views and templates are much improved.

For example, I found this in the new feed.jade example template:

var articles = env.helpers.getArticles(contents);

as an example of how to populate an object with articles and metadata. This seems like a huge improvement over something like:

var articles = _.chain(contents.articles._.directories).map(function(item){ return item.index }).sortBy(function(item) { return -item.date })

which appeared as example code with 1.2.4 and needed to be modified by hand depending on how you wanted your data structured and ordered.

Not sure where the above complexity / fine-grained control has been pushed to - but I suspect it's in there somewhere. The new version looks much easier to work with, anyway. Looking through the version history, it seems there are improvements that may be worthwhile if you are using wintersmith programmatically as well.

If you enjoyed this post, consider following me on twitter.

Remapping Your Keyboard in Ubuntu

NOTE - 26/12/2013

This tutorial is deprecated. Ubuntu now uses XKB for keyboard mapping. See this askubuntu question for more details. Incidentally - I'm now using xcape, which allows mapping caps lock to both control (in combination with another key) and escape (when pressed alone). Highly recommend this little tool, especially if you are a vim user.

Estimated completion time: 5-10 Minutes**

For now, there isn't a super-simple (modern) GUI tool to let you remap arbitrary keys in Ubuntu, at least not that I was able to find. I had to cobble together some instructions from around the Internet and sift through several man pages to understand and implement all the remappings I wanted.

Hopefully this guide saves you from the trouble, though it might be helpful to skim through the output of man xev and man xmodmap before you get started.

Don't worry if very little of what you see in the man pages makes sense, what you need to know for simple keyboard remapping is explained below.

Remapping the Caps Lock Key to the Control Key in Ubuntu

I added this here because Caps Lock --> Ctrl is a common remapping. if this is all you're after - it doesn't take much and you certainly don't need to poke around with any archaic tools.

Bring up the lens (super key) and type "keyboard layout". Select "options" > "Ctrl key position" and tick "Caps Lock as Ctrl". Done!

It's worth poking around here to see if any of the other mappings you may be interested in are available as tick boxes. Why isn't there functionality here to remap arbitrary keys? Your guess is as good as mine.

Remapping the Keyboard in Linux Using xmodmap and xev

tl;dr for fellow neckbeards that don't need hand holding

Let's try to find some data associated with the keys we are interested in. To do this, we'll use xev - a debugging tool that allows us to dump the data associated with an X-event (for example - pressing a key) to the console. Fire up a console and run:

$ xev | grep "KeyRelease" -A5

Here, we are grepping for KeyRelease events only (and the 5 lines of associated output) so that we don't get duplicate info from KeyPress events or mouse events.

After you press enter to run the above command, press a few other keys and read through the output.

Don't worry - you don't need to understand what all of this stuff means (but it is fun and instructive to google a bit and try to figure it out).

We're interested particularly in the keycode and keysym data associated with each KeyRelease event. Let's look at a concrete, step-by-step example of using this data to remap some keys.

Remapping PageUp to F11 and PageDown to F12 (and vice-versa)

I wanted to remap F11 <--> PgUp and F12 <--> PgDown, so after running xev I pressed (in order) "PgUp, F11, PgDn, F12" - but you can use the following instructions in the general sense to remap whatever keys you like.

You can see the results of my key presses below. Note the red boxes around the keycode and keysym data.

I now have all of the information I need to remap these keys however I like, using another tool called xmodmap.

We will create a mapping of keycodes to keysyms for each key we are interested in modifying, that will follow the following general format:

keycode <Hex keycode> = <Ascii keysym> <Ascii keysym> <Ascii keysym> <Ascii keysym>

To do this, create a file called "~/.xmodmap" and open it in your favorite editor.

For me, the contents of this file ended up looking like:

! F11 <-> PgDn  F12 <-> PgUp 
keycode 0x70 = F11 NoSymbol NoSymbol NoSymbol
keycode 0x5f = Prior NoSymbol NoSymbol NoSymbol
keycode 0x75 = F12 NoSymbol NoSymbol NoSymbol
keycode 0x60 = Next NoSymbol NoSymbol NoSymbol

Again, note that each keycode can be mapped to up to four space-separated keysyms. The first is the one I am concerned with but 2-4 can be mapped to a shifted keypress, an alt keypress, and a shift-alt keypress respectively if you like.

Note that I've given the keycode arguments in hexadecimal format. For example, the keycode 112 in decimal (from my keypress of PgUp) corresponds to 0x70 in hex notation. I'm then assigning to this keycode the keysym "F11". You can use any web based Dec -> Hex converter to do your conversions, or just run printf '0x%x\n' <decimal number> from the command line.

Once you have all of your keys mapped in your .xmodmap file, open up a terminal and run:

$ xmodmap ~/.xmodmap

Try your new mappings out! If things didn't work as expected, go through the instructions carefully and make sure you've followed them. If you're still stuck - leave a comment and I'll try to get back to you!

Persisting Your Changes Across Sessions

The last step is to make sure that your changes aren't lost - which from the point of view of your box means loading these settings each time you log in. We'll add the command we just ran to your .xinitrc file (which runs on every login) to accomplish this.

Open ~/.xinitrc in your favorite text editor, and at the end of the file add:

#!/bin/sh

# Custom keyboard mapping
xmodmap ~/.xmodmap

That's it! Whenever you log in, your keyboard settings will be loaded.

Summary of steps

1) Run xev | grep "KeyRelease" -A5 , press the keys you are interested in remapping

2) Get the keycode and keysym for the keys. Convert the keycode to hex. For each mapping, create a rule in a file at ~/.xmodmap that follows the following format:

keycode <Hex keycode> = <Ascii keysym> <Ascii keysym> <Ascii keysym> <Ascii keysym>

Where Ascii keysyms 1-4 map to: ['keypress', 'shifted keypress', 'alt keypress', 'alt-shift keypress']

3) Add the command: xmodmap ~/.xmodmap to your .xinitrc file

4) Profit

If you enjoyed this post, consider following me on twitter.

Machine Learning Self-study Resources

Changelog

July 10th, 2016 - I've received a few emails over the last year with suggested additions to this page. Of particular note - thanks to:

Also, over the last year or so, I worked on an internal review panel for Manning Publishing's "Introducing Data Science". I've added my public review below.

Finally, proceed with this in mind: this page is somewhat outdated and in serious need of a much more thorough update (and possibly, pruning). Feel free to reach out with suggestions on what should be added or removed!

April 9th, 2014 - Added "Building Machine Learning Systems with Python", "A Programmer's Guide to Data Mining", "Neural Networks and Deep Learning" and Hugo Larochelle's neural networks class.

March 27th, 2014 - Misc updates and edits. Added my thoughts / reviews for "Introduction to the Math of Neural Networks", "Machine Learning in Action", Project Polymath's "Introduction to Higher Math" and Professor Ng's coursera course. Added a few new courses and resources along with the Udacity and Courseara data science tracks.

July 15th, 2013 - Someone on reddit found this and linked to it in a discussion in /r/machinelearning. I added a few resources that some helpful redditors shared in that thread. Thanks everyone!


This list was initially a small set of resources put together during a session I led on machine learning self-study at Barcamp Chiang Mai 6.

Since then, it's become a place for me to both track my progress and aggregate resources for future study / exploration. Please leave a comment if you see something I've missed or want to share your experience with one of the courses / books / resources listed below.

Note that this is a list of resources targeting beginners without a strong math background who are interested in Machine Learning self-study at the undergraduate level (as opposed to exploring the bleeding edge of ML theory as a PhD student). If you are looking for more concise reference materials - you might check out this list of recommended academic ML / math texts.

The list does assume, however, that you have at least college algebra and pre-calc behind you. If not - you'll definitely need to go sort that out first. Udacity and Coursera both have a number courses available. Khan Academy and PatrickJMT can be helpful if you get stuck and need to see some concrete examples.

Mathematics

Better Explained

Before you do anything else - consider having a look at the topics covered at better explained.

Concise symbolic representations and terse, formal English are an efficient way for mathematicians to communicate with each other. These tools in isolation, however, offer a poor vehicle for teaching mathematical concepts to beginners whose primary math education failed them.

Freshman and sophomore year of undergrad, I often found myself stumbling upon the intuitive elegance of some concept after spending an entire day banging my head against pages and pages of Greek letters and complex notation. I remember asking myself: "Seriously? Is that all there is to it? Why didn't they just SAY THAT".

If that's where you are right now, you might enjoy the "intuition over infallible formal representation" approach taken by better explained. Apart from being an absolute lifesaver when you're just getting started - these intuitions and insights will help you see why the formal representations are necessary (and how you can apply them) once you arrive at a concept that's complex enough to merit a more structured, precise method for organizing your thoughts.

Basics

If you need to bootstrap yourself with a full undergraduate calculus series to reach a minimum viable level of mathematical maturity - check out "Calculus Revisited", a home study course created by professor Herbert Gross at MIT in the late 60s and early 70s. Don't let the age of this course and the non-HD black and white videos fool you. I've taken bits and pieces from several videos here as needed to fill in holes in my understanding and I can't recommend this series enough.

Better explained also has a guide to calculus that's worth checking out.

Mathematical Reasoning / Logic / Proofs / Terminology / Notation, etc.
Linear Algebra
Probability Theory / Statistics
Math for AI / ML / Neural Networks
  • Heaton Research - One of our discussion participants, @mccarthy, pinged me about Jeff Heaton yesterday - and I think much of his work is worth mentioning here given Heaton's efforts to teach math for AI / ML with college algebra as the only (strict) prerequisite.
    • Introduction to the Math of Neural Networks
      • Review (03/27/14) - I worked through the first five chapters. If given the chance, I would gladly buy it again to support Heaton's efforts (I'll have access to future versions), but I can only give it a lukewarm recommendation in its current state. There are a number of very confusing errors. Not just typos (which are to be expected in a rough draft) but cases where he says one thing but upon watching his youtube series (to clear up the confusion) it became obvious that he'd done something backwards or mixed up the order in the book. I recommend using the book together with the youtube series for the best result.

        Also, If you have any formal background, you may find his notation confusing. He bucks notational conventions pretty hard - possibly in an effort to "simply" things. I found, however, that the ambiguity this creates just makes things more confusing. The notation he uses for computing node deltas was especially difficult to follow. Note that his target audience is people whose background includes only algebra, so this isn't a really a complaint - just an observation. You'll need to "unlearn" some of the notation you know if you want to follow his.
    • Heaton Research Youtube Channel
    • Artificial Intelligence for Humans

Other ML-Specific Math

  • Mathematical Monk - Youtube channel with 200+ old-school Khan Academy style mini-lessons on probability theory and machine learning.

Machine Learning Courses / Books

Enough Theory to Know Which Way is Up...
Applied Machine Learning
  • Statistical Learning (Stanford Online) - According to the site: "This is an introductory-level course in supervised learning, with a focus on regression and classification methods... This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics."
  • Machine Learning in Action (Python based book)
    • Review(03/27/14) - I keep coming back to this book whenever I find that I want or need a deeper understanding of an algorithm's implementation. So far, I've worked through the chapters on K-nearest neighbors, naive Bayesian classification, linear regression and support vector machines. Very satisfied with how all of the content I've covered so far has been presented. Explanations are clear and concise as the author covers formatting / loading data from a file and building each piece of the process from scratch using NumPy and Matplotlib.
  • Building Machine Learning Systems with Python - a python based book that covers a wide range of algorithms and applications.
  • Probabilistic Programming and Bayesian Methods for Hackers (Python based book - free on github)
  • A Programmer's Guide to Data Mining - A free, online python based book that covers basic recommendation systems and classifiers
  • Analyzing Big Data with Twitter - A special UC Berkeley iSchool course. Thanks to reader Tim Osterbuhr for the recommendation.
Deeper Down the Rabbit Hole

Tools, Languages, Libraries, APIs

Competitions
Languages, Libraries and APIs
  • Scikit-learn
    • Python library containing ML algorithms, visualizations, model selection and more.
    • Build on top of numpy, scipy, and matplotlib
    • Source
    • Docs
    • Algorithm Cheet Sheet
  • GNU Octave
    • HLL primarily intended for numerical computation. Extensive visualization / gui features.
    • Used in Ng's Machine learning class
    • Download
    • Docs
  • WEKA
    • Set of Java libraries, visualization tools and ML algorithms
    • Source
    • Docs
  • JavaScript Libraries - Note - These are stand alone libraries of algorithms only, not intended to compete with the full-featured suites of tools above. JavaScript historically has had pretty awful native math support and isn't typically a go-to language for computationally intensive applications. The author of these libraries has worked around most of these weaknesses where possible though, and these libraries are pretty awesome. In any case, they are by far the most promising options at the moment for things like browser plugins, HTML 5 mobile apps using local storage as a persistence layer, etc.
NLP Specific
  • NLTK
    • Python library for natural language processing
    • Source
    • Docs
  • GATE
    • Large collection of applications, Java libraries and architectures
    • Overview
  • Natural
  • Sentimental
Computer Vision Specific
  • OpenCV
    • Computer vision libraries available in C++, C, Python and Java
    • Builds available for Linux, Mac, Windows iOS and Android

Other Resources

Twitter Accounts

Most of these people tweet blog posts, tutorials, links to papers, etc. relevant to machine learning:

@AndrewYNg, @NandoDF, @YhatHQ, @karpathy, @PhilemonBrakel, @vnfrombucharest, @zaxtax, @revodavid, @seanjtaylor, @jakehofman, @drewconway, @medriscoll, @bigdata, @mrogati, @ogrisel, @johnmyleswhite, @dpatil, @zybler, @ChrisDiehl, @peteskomoroch, @DataJunkie, @dwf, @siah, @hmason

Last but not least, don't forget to follow @bigdataborat - lest we begin to take ourselves too seriously. :)

Data Science Resources

I debated about whether or not to add these resources, as they go beyond what is (strictly) needed for a pure machine learning self-study curriculum.

To teach yourself anything beyond the theory and mathematics and do some applied machine learning, however, you will need to experiment using real-world data sets captured in the wild. To do this you'll need to be able to collect, store and refine data, design experiments and interpret the results.

These resources attempt to teach the foundational knowledge needed to develop this kind of skill set.

Data Science Specialization - Coursera (Johns Hopkins University)

I will be starting this track on April 7th. I chose it over the Udacity track because it looks a bit more rigorous and well thought out. I also did the first course in the Udacity track and found it to be a bit too focused on the sponsoring company's stack / toolkit. For now I'd like to focus more on the general concepts. From there, specializing in any particular stack or framework will be easier.

The data science specialization consists of 9 courses and a capstone project. The courses are listed below along with a description taken from the course website. I will be adding reviews and comments as I finish the courses:

  • The Data Scientist's Toolbox
    • "Upon completion of this course you will be able to identify and classify data science problems. You will also have created your Github account, created your first repository, and pushed your first markdown file to your account."
  • R Programming
    • "In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language."
  • Getting and Cleaning Data
    • "Upon completion of this course you will be able to obtain data from a variety of sources. You will know the principles of tidy data and data sharing. Finally, you will understand and be able to apply the basic tools for data cleaning and manipulation."
  • Exploratory Data Analyisis
    • "After successfully completing this course you will be able to make visual representations of data using the base, lattice, and ggplot2 plotting systems in R, apply basic principles of data graphics to create rich analytic graphics from different types of datasets, construct exploratory summaries of data in support of a specific question, and create visualizations of multidimensional data using exploratory multivariate statistical techniques."
  • Reproducible Research
    • "In this course you will learn to write a document using R markdown, integrate live R code into a literate statistical program, compile R markdown documents using knitr and related tools, and organize a data analysis so that it is reproducible and accessible to others."
  • Statistical Inference
    • "In this class students will learn the fundamentals of statistical inference. Students will receive a broad overview of the goals, assumptions and modes of performing statistical inference. Students will be able to perform inferential tasks in highly targeted settings and will be able to use the skills developed as a roadmap for more complex inferential challenges."
  • Regression Models
    • "In this course students will learn how to fit regression models, how to interpret coefficients, how to investigate residuals and variability. Students will further learn special cases of regression models including use of dummy variables and multi-variable adjustment. Extensions to generalized linear models, especially considering Poisson and logistic regression will be reviewed."
  • Practical Machine Learning
    • "Upon completion of this course you will understand the components of a machine learning algorithm. You will also know how to apply multiple basic machine learning tools. You will also learn to apply these tools to build and evaluate predictors on real data."
  • Developing Data Products
    • "Students will learn how communicate using statistics and statistical products. Emphasis will be paid to communicating uncertainty in statistical results. Students will learn how to create simple Shiny web applications and R packages for their data products."
  • Capstone Project
    • "The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners. The capstone project will be four weeks long, offered in conjunction with the series. The capstone class will be offered thrice yearly."
Data Science Track - Udacity

The first course udacity offered in this track was the "Introduction to Hadoop and MapReduce" course. To me, this felt too focused on a particular set of tools and not general enough for a beginner like myself. Other courses in the track look better. I may come back to this after the coursera specialization. Notably, the final three courses listed below are offered via Georgia Tech.

The courses available in the data science track are listed below, along with descriptions taken from the course website.

  • Intro to Computer Science
    • "Learn key concepts in computer science including how to write your own computer programs. This course teaches Python in the context of building a search engine."
  • Intro to Statistics
    • "Statistics is about extracting meaning from data. In this class, we will introduce techniques for visualizing relationships in data and systematic techniques for understanding the relationships using mathematics."
  • Intro to Data Science
    • "What does a data scientist do? In this course, we will survey the main topics in data science so you can understand the skills that are needed to become a data scientist!"
  • Data Wrangling with MongoDB
    • "Data Scientists spend most of their time cleaning data. In this course, you will learn to convert and manipulate messy data to extract what you need."
  • Exploratory Data Analysis
    • "Data is everywhere and so much of it is unexplored. Learn how to investigate and summarize data sets using R and eventually create your own analysis."
  • Intro to Hadoop and MapReduce
    • "In this short course, learn the fundamentals of MapReduce and Apache Hadoop to start making sense of Big Data in the real world!"
  • Machine Learning 1 - Supervised Learning
    • "In this course, you'll learn how to apply Supervised Learning techniques important for solving a range of data science problems. And for surviving the robot uprising."
  • Machine Learning 2 - Unsupervised Learning
    • "Ever wonder how Netflix can predict what movies you'll like? Or how Amazon knows what you want to buy before you do? The answer can be found in Unsupervised Learning!"
  • Machine Learning 3 - Reinforcement Learning
    • "Can we program machines to learn like humans? This course will teach you the algorithms for designing self-learning agents like us!"
Data Analysis Course Via Springboard

Course link

Thanks to reader Tim Osterbuhr for the suggestion. The course looks to be free, with around 310 hours of instruction. The full curriculum is available here. I suspect many readers will be able to skip the "learn to program" and "git" bits, but if you are absolutely new to programming (perhaps with a strong mathematics / statistics background) this course could be especially helpful?

A blurb from the about page:

"The Data Analysis learning path provides a short but intensive introduction to the field of data analysis. The path is divided into three parts. In part 1, we learn general programming practices (software design, version control) and tools (python, sql, unix, and Git). In part 2, we learn R and focus more narrowly on data analysis, studying statistical techniques, machine learning, and presentation of findings. Part 3 includes a choice of elective topics: visualization, social network analysis, and big data (Hadoop and MapReduce)."

Introducing Data Science

Review - 07/10/16 - Over the course of the last year, I had an opportunity to work with the authors of this book as part of its internal review panel. In general, I think it's a fantastic resource with a few caveats:

  • This is an industrial, non-academic resource. Expect a few places to be conceptually accurate but technically imprecise
  • You will need substantial experience setting up and working with a reasonably complex development environment
  • Since this article is primarily about machine learning self-study - I'll add that the chapter covering Machine Learning, especially, was inappropriate (in my opinion) for a book with "introducing" in the title. Both Professor Abu-Mostafa and Professor Ng (courses linked above) do a much better job of introducing this topic, with examples that are approachable by anyone with a basic undergraduate mathematics background. Without the conceptual and technical bootstrapping I received from these courses, it's likely that I would not have been able to follow (at a deep, meaningful level) the more complex examples in this book.
  • I discovered several code samples that would not run, or were otherwise erroneous. I believe many of these are fixed - but have not had an opportunity to revisit them with the material that went to press

All in all, my only real gripe with the book is its title. I think "Case Studies in Applied Data Science" or similar would be more appropriate.

In that context, if you'd enjoy being guided through some real-world problems by hackers / startup founders informed by experience in industry - again, this is a fantastic resource. Just don't expect it to be the hand-holding introduction the title implies. You'll need to work through a few bumps in the road, and will need a foundational background in machine learning and/or statistics to get through the more technical parts. If that background is something you lack, you're much better served by taking advantage of one of the academic resources linked above.

If you enjoyed this post, consider following me on twitter.

Permissions Issues in Ubuntu During Virtualenvwrapper Installation

While setting up a new virtualenv workflow in Ubuntu, I came across a strange error after following the instructions in the virtualenvwrapper installation documentation.

After reloading ~/.bashrc (to run /usr/local/bin/virtualenvwrapper.sh) I got this back:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)

  <--SNIP--> 

  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1316, in _get
    stream = open(path, 'rb')
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/ipython-0.13.2.egg-info/top_level.txt'
virtualenvwrapper.sh: There was a problem running the initialization hooks. 

If Python could not import the module virtualenvwrapper.hook_loader,
check that virtualenv has been installed for
VIRTUALENVWRAPPER_PYTHON=/usr/bin/python and that PATH is
set properly.

A quick google search revealed that I'm not the only one with this problem (I'm sanukcm on that forum).

It seems like a permissions issue...

$ ls -l /usr/local/lib/python2.7/dist-packages/virtualenv_clone-0.2.4.egg-info/top_level.txt`
-rw-r----- 1 root staff 16 May 12 16:53 /usr/local/lib/python2.7/dist-packages/virtualenv_clone-0.2.4.egg-info/top_level.txt

After some digging around, it seems like staff is a default admin-ish group on Mac OS X, perhaps equivalent to Ubuntu's default sudo group?

Whether you are using the old method to install pip via easy_install, or the newer, preferred way to install pip, you still get a version that assumes that you own a Mac.

The only solutions I could find were installing an old version of pip via aptitude, recursively chowning every directory touched by pip (now, and every time I use it in the future) or else creating a new group called staff with the same permissions as Ubuntu's default sudo group and adding myself to it. The latter is probably more secure than installing via apptitute (as you're getting the latest patches for pip with a newer version) so I went with that.

If you enjoyed this post, consider following me on twitter.