Fans of obscure but powerful languages like Lisp, Haskell, Scala, and OCaml can sometimes get overzealous about how much great their language is, especially compared to those (holding nose) “common” languages like Java, C++, or C. Not only is this really annoying, but it sets them up for one the easiest comebacks in all of trash talking: “SCOREBOARD!” (For those not familiar, when fans of one sports team is losing to another team but still talking trash, all the fans of the winning team have to say is “SCOREBOARD” to indicate that, despite all you guys have to say, you’re losing the game). The nerd equivalent of this is “If X is so great (for any powerful value of X), how come there aren’t any programs written in it?” Indeed, people writing trillions of dollars worth of software in C, C++, C#, and Java, it is a valid rebuttal.
To those of you that have ever been hit with this rebuttal, take heart. Although you probably understand that it’s not a completely valid point, allow me to offer a theory. The start for this theory comes from a gem of a quote from Paul Graham, tucked away in a footnote of one of his articles: “Never attribute to malice what can be explained by math.” So please permit me to use some crude math.
Note: to ease the repetition that I got sick of before I was even halfway done writing this, I’ll use the following terms:
- Commercial Languages – These are languages used widely in industry. Languages that people learn so they can have a wide choice of jobs. In this list I include C, C++, C#/VB.Net, Java, Perl, and PHP (I’m sure there are more). Before you get upset, just because a language isn’t on this list, that doesn’t mean it isn’t used in commerce, just that there are relatively very few companies hiring people to use it.
I’ll use these terms in capital letters to indicate that it’s a classification, not an adjective. I prefer these name classifications as opposed to Michael Vanier’s Languages for the Masses and Languages for Smart People. I think we group the languages the same way, but his names are unnecessarily critical of the many brilliant people using commercial languages (yes, there definitely are some). Some people avoid Powerful Languages because some of their users are annoying. A little extra civility goes a long way.
Now that the vocabulary lesson is over, the amount of software written in a language is a function of several variables:
- The number of people writing software in the language
- The skill level of people writing software in the language
- The amount of successful software already written in the language
- The entrepreneurship level of people writing software in the language
- The length of time the language has been a) in existence and b) popular
Let me expand on each of these points:
1) The number of people writing software in the language: This is the most important factor. The more code that gets written, the more chances there are that the code will meet a user/customer need and gain in popularity. There are two effects here. The first is the simple fact that the more software gets written in X, the larger its market share for all software, without regard for quality. On the other hand, if you believe that the best software will win, then you need Black Swan programmers – those who are either so gifted at programming, so connected in business, or so in tune with the market that they hit a hole in one while everyone else is still playing mini-golf. Since Black Swans are by nature unpredictable, the larger pool your language draws from, the more Black Swans you’re likely to have. From either a quantity or quality standpoint, sheer numbers help.
Also, the larger the number of people who can write in the language, the more businesses will choose that language to make hiring and firing easier. Since this is self-reinforcing (people learn languages that will get them jobs, companies use languages that people know), there is little value in being out of the top few. It’s kind of like being the fourth most popular operating system or the 3rd most popular portable MP3 player. And the more businesses that use a language, the more software that will get written in that language; this software is usually a function of the business and not the language/tools used.
2) The skill level of people writing software in the language: I’ll make the claim that the average programmers of different languages are not equal. I won’t say better or worse because I believe that’s much more a function of meaningful practice than anything else. Also, before you get mad, all the rules of statistics and large populations apply. Just because you know someone who you think is dumb and uses Haskell, doesn’t mean that the average is any different. (The upside of this is that I can say anything I want about averages and your anecdotes can’t prove me wrong! j/k). So what generalizations can we make about Commercial Language Users?
First, they’d have to be intellectually curious. There are very, very few companies hiring people to use these languages. For example, I get tons of email about Lisp, and the minifeed at the top of my Gmail constantly has ads from Jane Street Capital. Here’s one: “Do you think in HOFs? – www.janestcapital.com/tech.html – We do too. Lisp programmers welcome.” Jane Street is the ITA Software of OCaml – a successful business using a small, powerful languages and reaping great benefits from it. Now, Google is open for business and anyone can bid on keywords, and either Jane Street is spending tons of cash making sure they’re the only company buying Lisp keywords, or there’s just no market for it, meaning no jobs available. Someone who gets into programming to make money (or who loves programming but needs to support a family – I put myself in this category) will skip these languages even if they find them interesting, because they need to focus on something that will pay the bills. If you do choose a Powerful Language, you probably:
- know more patterns or idioms that aren’t in Commercial Languages (closures, continuations, macros, monads, first-class functions, pattern matching, etc) and have difficulty expressing intentions without them
- are working on a problem that is too difficult to solve without those language features (this drove Lisp usage in AI historically)
- have resource constraints that require one person to do the work of several (like starting a startup)
That doesn’t mean you’re not curious if you use Java or C, just that those languages don’t convey the same information because of their mainstream and commercial use. Curious users of Commercial Languages do other things like write unit testing frameworks, ORMs, refactoring tools, and other projects that improve the power and flexibility of the language they have chosen. Any open-minded, curious Commercial Language developer who encounters one of the three problems above will pragmatically choose to take the dive into a Powerful Language to solve that problem. Many more will just try to shoehorn in some suboptimal solution using their Commercial Language. I’m a .Net developer by day but I know that as a one man startup, I need every advantage I can get to reduce the amount of time my work takes. I couldn’t imagine maintaining a large C# codebase and handling customers and doing marketing and bookkeeping. I’m taking Paul Graham’s word for it that Lisp can give me a smaller, more correct, more adaptable codebase that will leave me time to handle the other aspects of the business.
Since Powerful Language users are either learning for fun or solving a difficult problem, let’s give them the benefit of the doubt and say that the average Powerful Language developer is more productive, more creative, or whatever positive attribute you prefer. This improvement factor will come into play later, when I start pulling numbers out of thin air to make my point. Commercial Language fans, don’t get offended, these are just averages. If you’ve found my obscure website and read this far, I’m sure you’re far above average.
3) The amount of successful software already written in the language: This is a generalization of the principle that if you’re writing a desktop app, it’s best to write it in the language the OS was written in. Successful software products create an ecosystem around themselves, the most prominent example being Windows being written in C and C++. For decades, Windows programming consisted of writing a C or C++ application that made extensive calls to the Windows API. Linux is the same way. The widespread distribution of the JVM did the same thing for Java, which is effectively an operating system for code. Any virtual machine can have this same effect if it becomes popular enough.
Hugely successful applications that include scripting capabilities also create demand for code written in that language. For example, emacs and AutoCad are the canonical Successful Lisp Programs, and there tons of elisp and Autolisp code have built up over decades because these applications provided a reason to write code in these languages. Microsoft Office can run VBScript in all of its applications, and there are billions copies of Office in the world. Although it’s frowned upon by “real” programmers, there is probably more VBScript code written than most general purpose languages (including Commercial ones).
This is correlated to the number of people writing software in the language. Not exactly because successful software is a function of market success, not just total lines of code written.
4) The entrepreneurship level of people writing software in the language: This, like the curiosity improvement factor mentioned previously, is important because entrepreneurs and small companies are much more productive per-person in writing software. When you can’t compete on brand awareness or customer lock-in, you need to focus on writing the best software possible so your quality can shine. Even if entrepreneurial software doesn’t become the most successful, it does drive innovation, features, interfaces, and other aspects of high volume commercial software. If you look at total number of software products (not by sales), probably 90% are written by one person and 99% are written by companies with less than 10 developers. A language with more entrepreneurs could produce a disproportionate amount of software. I see this happening with web based software startups – they’re not writing in C, C++ or even Java. They’re using the best languages they can get, whether it’s for the vast libraries of PHP, the community support of Ruby or the precision of Python. I would wager that a lot more code is getting written in web languages than OS languages now that the web is becoming the default medium of delivering software. Also, for Powerful Languages with small communities, the nature of the community makes a big difference. I’d bet that more software (released software anyway) is written by Paul Graham’s Lisp community than the academic community surrounding Haskell. I could be (very) wrong though; that’s just a guess based on a perception.
5) The length of time the language has been a) in existence and b) popular: This is just a function of the gradual growth (and decay) that almost any tool (not just computing) faces in a changing world. For instance, COBOL has been around forever, used to be huge, but isn’t popular now and has been losing market share for quite some time. Ruby is almost 20 years old but was very, very small (outside of Japan) until Ruby on Rails caused explosive growth for it. So there has probably been more Ruby code written in the last 3 years than the first 20. Obviously not much C# was written before 2001 and not much Java before 1996, but these languages were able to ramp up very quickly because of the corporate backing and marketing behind them. Open Source languages grew more slowly over time, so the oldest ones tend to be the most widely used (Perl, for example). For Powerful Languages with a small user base (single digit thousands), there is simply no way to project future trends. It could stay in the thousands, could grow gradually, or hit an explosive trend like Ruby on Rails. Fortunately, this is about the paast and present, not the future.
6) Other: This is the place for technical that are sort of a business consideration, sort of a technical consideration. This is things like Lisp’s lack of a standard implementation, compatibility, versioning, and dependency problems for standalone programs, virtual machine distribution problems, etc. Basically, it’s here for a fudge factor in case my made up numbers don’t prove exactly what I want them to.
One important thing to remember about these 6 attributes is that they’re not independent variables. If you try to make arbitrary combinations of them, you’ll see why – if a language has no successful apps (#3) and low-skilled programmers (#2), it never will get a lot of users (#1) and it won’t be used by entrepreneurs (#4). You get the gist.
(By the power of Yegge, I’ve written over 2,000 words and I’m just getting finished setting up so I can start making my point – get ready to start swapping out your short term memory cache!)
So, let’s take a couple languages and see how they stack up.
- Number of people writing – monstrous. Just huge. It was the language taught in universities for a very long time, it was needed to write desktop programs for Windows, Unix and Mac(?), and its conceptual model matches the common conceptual model of doing things one step at a time, in order, explicitly. This made it both easy and profitable for new programmers to pick up (even if the details got very hairy, very fast).
- Skill level – Average, in the sense of following a normal distribution. C/C++ are so old, so widespread, and have such a wealth of tools and utilities that it’s population basically matches the population of people who might program as a whole. Some real duds, some uber-geniuses, a bunch of people in the middle.
- Successful software – Unix. Linux. Windows. Office. Pretty much every database engine. Simulations. Device drivers. Embedded programs. I’m pretty sure if I looked hard enough, I’d find out that I am written in C.
- Entrepreneurship – Like #2, it was average for a very long time. However, I think that C has lost some of the entrepreneurship market since Java came out and especially since the Internet became the primary medium for distributing software. While C/C++ still have a very important place in the world of software, I think that is mostly large corporations and open source that are still willing to put up with its limitations. Small teams, except for those with the highest performance/space requirements, need languages that give you more. In 2008, the range of entrepreneurial applications that C/C++ is a good match for is very small.
- Length of time: C has been around since the 70s, C++ since the early 80s, and until the mid 90s, they were the undisputed most popular languages. Even after having their position chipped away for a decade now, it is still more widely used than any (most?) Powerful Languages.
- Other: C/C++ shaped the world around them for decades, but runtime platforms (Java/.Net) have stolen their thunder on the desktop and they were never really suited for the Internet.