Why Every Analyst Should Learn To Code

In recent years, I’ve seen a new breed of analyst emerge. Indeed, more than a few of my superiors have encouraged me to join their ranks. This new breed of analyst knows statistics and finance but prefers to leave the coding to others. They have “people skills”. They know how to sit in meetings and provide “thought leadership” and “design policy”. However, these analysts cannot pull the actual data to support their point of view!

This approach to the technical side of our profession is doing them a serious disservice. Here’s why:

Analytics has many similarities to medicine. A new analyst needs to spend time to understand statistics and basic technique. This is easily the equivalent of a master’s degree. Some degree of social skills are required to be effective. You must regularly present information and collaborate with others. Furthermore, you will need to be able to build credibility (your bedside manner). Finally, you need to apply these skills within a specific problem space. This is our equivalent to the residency requirement.

The net result of this is that you become a trusted advisor, able to deliver useful guidance at the highest levels of the business. However, like medicine, there are some basic expectations.

1) First, you’re working with real world data from  actual processes.  Collected using imperfect tools and inconsistent operating procedures. Generally repeatedly copied / aggregated into some form of storage system. Which is subject to failure / corruption. To quote one of my favorite movies: “I’m smelling a lot of if coming off this plan”. How can you ethically render judgment about a dataset without being able to personally verify it? My joke on a recent project was – “I’ve written 500 queries for this project, 490 of which are titled QA”. But that’s the nature of the beast – an analyst need to be able to understand what is happening at a detailed level. Well over 30% of my time as a Six Sigma Black Belt was spent validating measurement systems and checking data quality. It’s our equivalent of checking a patient’s vital signs.

2) Second, a corollary to the first point. Generally your best insights occur from looking closely at the details of a business and noting where they differ from expectations. Unless you can write your own code (at least SQL, preferably Python, SAS, or R), you’re never going to get a good look at this. At best, you can try to copy the ideas of other analysts in your space (say…Tom said they use quasi-linear regression models to predict product demand at competitor X, lets try it). Which isn’t going to get you very far. In medicine, a good doctor tries to build a total picture, blending the patient’s data with the patient’s history, knowledge of current science, and other events in the community (eg. we’ve got a flu going around) and noting the differences from expected results.

3a) From a practical perspective, you regain control of your schedule – knowing how to code doesn’t preclude you from leveraging others (indeed, I get a lot of help from my team – couldn’t do it without them) but rather allows you to push forward in key areas and step in to address gaps when you need to. You’re no longer beholden to the IT priority queue. Imagine a doctor having to call a nurse to check a patient’s pulse…

3b) You also gain a great deal of political freedom within the organization – particularly on a controversial project where the stewards of the data have their own agenda. It’s really easy to stop the non-coding data analyst – just delay / defer the request until they go away, citing other emergencies. It’s harder to do that when the analyst can pull their own data.

4) And should I mention productivity and the ability to scale up a project? Between scripting repetitive processes and using scheduled jobs, you can build systems that effectively run themselves – sucking data out of your data warehouse, manipulating it, and dropping it into neat little packages you can distribute to the business. Need to create workbooks for 100 business managers? No problem, I’ve got this little concept called a for-loop…. No more repetitive excel workbook generation. One other nice advantage of using scripts is that you can often build some basic QA into the process via test scripts. Which comes in handy when you’re tweaking a thirty step process at 2AM for the fifth time after adding five new steps earlier that evening.

5) Finally, you lose the opportunity to claim a larger role in the implementation of your ideas and accelerate their speed-to-market. Once you master the basics of data slinging, go learn something about how to build GUI applications (MS Access is a great place to start) or web sites. Prototypes speak louder than PowerPoints and are often easier for the typical line manager to understand. A good analytics hacker with some basic GUI skills can take a significant project from cocktail napkin to apps-on-desk in under a couple of weeks. Plus they will build your credibility within the organization.

I should also note that once you’re able to set up your own LAMP / Python / Ruby site, the only thing which separates you from owning your own business is a good idea, a couple of months of hard work, and the cost of a nice steak dinner for two. And you’ll have all the tools you need to start your own word solver site – or whatever else floats your boat….

Excel, Access, Minitab, Crystal Ball, and drag-and-drop database tools are useful. But take the time to learn how to cut real code in a scripting language and learn the syntax of SQL (or the query language for your database of choice). It pays big dividends.

Would you trust a doctor that is unable to check a patient’s vitals? The profession has figured out how to manage a lousy bedside manner (paging Dr. House), but an inability to interact with the patient (in our case, the data) at a basic level? Of course not…

Follow me on twitter to hear about future articles: @MarginHound

8 comments

  1. Great article – I just wrote a similar article on my disdain that we have so many ‘business analysts’ who are really only SMEs within a set field, have no intention to expand that skill-set to anything other than their domain and generally have no awareness (or interest) of the wider things happening around them.

    If nothing else it gives you an appreciation for whats involved and will allow you to do your own job more effectively by seeing the ‘bigger picture’.

    Kudos for the article!

  2. Better to have every programmer learn to be an analyst. Practically any project could survive if the analyst was fired and the programmer had to pick up the slack, but the reverse is not true.

    1. I was hoping to convert a few folks from SQL Server to Python 🙂

      Speaking as someone who lives in both worlds, a good statistician and/or business analyst can add a lot of value to certain projects. While most developers could probably learn the underlying skills, there is a certain fluency an experienced analyst brings to the table that can help the team find the right answer faster (and understand when they’ve found it). There are also certain projects (modeling) where you really do need an experienced analyst to “hit the high notes”. That being said, I’ve dealt with a large number of inept “business analysts” over the course of my career which appears to be folks who washed out of the developer pool…

  3. I agree with everything in this post, with the exception of Access. My god – that application needs a bullet in it. It teaches very poor SQL standards.

    Question for you – I’m in the middle of learning R and Python. Why both? I don’t know. Python seems to provide more generalized technical ability than R. R is very niche. However, it seems like R is easier to pick up for analysis. What are your thoughts? If you were telling someone to learn programming language for analysis, what would you suggest?

    1. Honestly, I would (and did) learn Python & R. Python gives you a real programming language that can be extended into other areas and supports high speed processing using modules like numpy and cython. R allows you to knock out small analytics projects faster, via a good stats and graphics library. R allows you to get faster at your job, freeing up time to invest in Python.

      As a database, Access has serious limitations. However, we actually get a lot done with it since it has a) idiot proof integration with pretty much everything under the sun, b) a forms feature that can knock out decent looking GUI’s in a hurry, and c) has a fairly easy learning curve for non-developers (when you need to mobilize the militia). Query builder is stifling and impossible to QA – we wound up handrolling some dynamic SQL generators and executing scripted commands using DAO. Not my first choice but so dang easy to use for quick projects and new analysts that we hang onto it. Part of the issue with VBA is that there is a major script kiddie issue going on in that space – you actually can write decent VBA code, but most of it’s users don’t know how.

Leave a Reply