Jan 24 2010
Of Braces and Semicolons
Finally, a post that lives up to the second part of this blog’s tagline by being about programming! (Well, there was C in a (Nutshell)2, but my point is it’s been a while.)
Now, I’m a would-be language designer. Programming language theory fascinates me, and to that end I’ve read a number of books on different languages to try and get a feeling for all the various approaches and paradigms. I’ve also read many interesting blog posts found via proggit (and some via Lambda the Ultimate) which argue about language issues.
From all this, I’ve reached some conclusions which I’m going to try and lay out in a series of blog posts.
In this first post, I’ll cover:
The Problem of Delimiting Blocks
All modern languages are block-structured. (Thank you Dijkstra!) That is, code segments are semantically and syntactically organized into a tree. Let’s start with an example of some code in a Blub-average of today’s most popular languages:
class Foo(Object)
{
void bar()
{
if (baz)
quxx();
}
}
The first problem we have is which brace style is the “Right One”. This is a perennial holy war. The style in this example wastes lines just for parentheses. On the other hand, it makes it very easy to perceive the block structure of the code. On the gripping hand, in practice, programmers don’t really care about the braces (except when the compiler gives a syntax error about them being wrong), and instead pay attention to the indentation. Secondly, we have the sub-problem of whether one-statement blocks should have braces around them; we have clarity and maintainability (what if the block expands beyond the one statement and someone forgets to add the braces? In many cases, this leads to subtle bugs the compiler does not warn you about.) versus wasting space and causing unnecessary scrolling. At any rate, we are led to the obvious question of: Why not use the indentation programmers insert anyway to delimit blocks and do away with the braces altogether? And indeed, some languages have tried this. The posterboy is Python (my current favorite existing language).
However, this solution has problems of its own:
- Copy-pasting code into environments where indentation is not preserved (e.g. some online forums) causes it to become unintelligible (well, in practice, one can apply some heuristics and try and re-indent it manually, but there’s no way to ensure you got it exactly right, and if the code is long enough, it’s a real PITA; whereas code using braces can be automatically re-indented by several tools)
- As one sees every so often in the Python community, there are people who harbor unreasonable hate towards syntax-significant indentation. From what I understand, most of these people have bad memories from some early version of FORTRAN and their fear is completely irrational. But if you want your language to be a very popular one, you’re probably going to have to accommodate these people.
- We’re still left with the second code formatting holy-war of spaces vs. tabs as the indentation character. (BTW, have you seen *NIX‘s default tabstop of 8? Personally, it makes my eyes bleed.)
Of course, along with braces are their friends the semicolons, and they annoy the heck out of me; they’re just plain unnatural. I should not have to always remember to end a line with anything other than a linebreak, because 90% of the time, statements are only one line long; the other 10% are the ones I should have to type extra characters for, and only then after considering refactoring the line so it’s more readable anyway. This is basic Huffman coding people! Optimize for the overwhelmingly common case, not the exceptional one. (If this was a research paper, I would back up this 90-10 distribution with some empirical statistics, but since this is just a blog post, I’m not gonna bother.)
So, in summation, semicolons and braces are bad design choices. Eliminating the semicolons is relatively simple. As for the braces, our choices for what to replace them with are the off-side rule (i.e. syntactically-significant indentation) or begin-end keywords ala ALGOL or Pascal. Though I personally prefer the former, I opt for the latter due to the aforementioned issues with the former. Now, does this actually solve our problems, or is this a meaningless textual substitution? We’ll get to that in just a sec. There is a downside to vanilla begin-end though, and that is that it looks retro/outdated (i.e. like Pascal) or like a scripting language (using the pejorative connotation; i.e. like sh or BASIC). However, if we do it intelligently, like Ruby does, we can reduce the resemblance and make the substitution meaningful. All we need to do is realize that the begin keyword is technically superfluous and that we can force end to always go on a line of its own. Here’s an example of code in this style:
class Foo(Object)
void bar()
if baz
quxx()
end
end
end
Advantages:
- No dangling else problem as long as you have a
elifconstruct. - Only one indentation style (Well, you could not indent the
ends, but since they require their own line, there’s no particular reason not to.) - No space is wasted compared to the more compact of the 2 main brace styles.
- The required
ends solve the “what if this one-line code block grows to multiple lines?” problem
Now, in Iguana, I plan to alter this just slightly by allowing 1-line blocks to use colons and skip the end keyword; this let’s us re-special-case one-line bodies without any ambiguity or strain on the parser while continuing to avoid the multiline growth problem. Here’s what this looks like:
class Foo(Object)
void bar()
if baz: quxx()
end
end
Of course, in this particularly simple example, you could also write it like this:
class Foo(Object): void bar(): if baz: quxx()
But I would hope your code reviewers would thoroughly chastise you if you did, so please don’t.
I agree it is not the cleanest or most obvious approach, but it’s a compromise between theoretically-best and familiar-enough-to-be-popular.
Popularity: 2% [?]
No responses yet
In the Quest of the Fragrant Harbour
Lacey Thoughts
League of Desmond