Spent most of the past week working on my first “serious” R scripting project.
I’ve been using R for a couple of years, generally as a free minitab replacement (using the R Commander GUI interface) and adjunct to Python projects. Most of my analytics projects to date have been coded in Python, since they are generally heavy on data (eg. require good integration with our data warehouse, dynamic SQL) and require either custom statistics or heavy text parsing. The latest project is somewhat the reverse – involving a relatively small dataset (very painful to assemble, since it reconciles data from two different transaction systems, but small) that I’m going to run a bunch of prepackaged statistical studies and graphics again. Sounds like a perfect fit for R.
Initial impressions:
1) More like Perl than Python. When it comes to writing R code, there is definitely more than one way to do it. I miss duck typing and relatively generic operators like map that can accept pretty much anything you throw at it.
2) But lots of packages and sample code contributed by the community. The biggest advantage of using R – good coverage of the core stats workflow and common variations (like weighting values). Stack overflow, nabble, and the r blogs were a tremendous help.
3) Amazing graphics library with lots of community contributed samples.
4) Easy IO if you are using CSV’s – other potential integration points were less intuitive. Still working on this one.
5) Invest the time learning how to create functions early. Ideally you should try to create generic operations where you can iterate across lists of variables and scenario parameters. Time to create working/tested version of a new model I wanted: one day. Time to code the first 20 permutations of attributes and aggregation logic that I wanted to explore: 5 minutes. Plus the resulting code can actually be read, maintained, and extended.
The latter is a key point since most of the R training material seem to be written for analysts vs hackers. Probably would form the basis of a nice ebook for the right person to write.