Loudly Recursive: Deep REBOL: Bindology

2009-07-05

Deep REBOL: Bindology

I’m often asked why I think REBOL is a “deep” language. This question is no surprise. REBOL’s ordinary syntax — the so called DO dialect — is unlikely to be very impressive to a veteran Ruby, Haskell, or C# programmer. E.g.,

foreach x [1 2 3] [
    print x
]

While I think the simplicity of the DO dialect is a strength, it causes many — probably most — developers to dismiss REBOL out of hand. But those that stick around eventually discover that there’s more to REBOL than meets the eye.

REBOL’s depth comes from the runtime behavior of three special datatypes, word!, block! and object!, optionally combined with the power of the PARSE dialect.

REBOL words are the symbols of the language, such as foreach, x and print in the example above. The closest equivalent in a traditional programming language is an identifier (or a keyword), except that unlike an identifier, REBOL words are directly and immediately accessible at runtime. There is no special API, such as reflection or runtime type information, required. I urge you not to get too cozy with this idea, however. Though there are some superficial resemblances, REBOL words are neither identifiers nor keywords.

The most common use of a REBOL word is to serve as a variable. For instance, in the example above the word foreach serves as a variable pointing to a built-in function that iterates over a series. However, there is no requirement that words serve as variables. They can be used directly as symbols in their own right.

Blocks are a bit simpler to understand than words, and nearly as ubiquitous. A block is a sequence of zero or more REBOL values (words, integers, strings, urls and so on) enclosed between square brackets. Blocks are not like the curly braces of a language like C. They are actual values and can be (and usually are) passed as arguments to functions. For instance, the foreach function in the example above takes three arguments. First, a word to serve as the iteration variable. Second, a series of some kind. In this case, the block [1 2 3]. And lastly, another block containing the code to be executed for each iteration. To make this more explicit, take a look at the following example.

action: [ print x ]
foreach x [1 2 3] action

(A word with a colon after it is called a set-word!. The colon is not an operator. It is a part of the word itself and cannot be separated from it. A set-word is used to assign a value to a word. In this case, we are assigning the block [ print x ] to the word action.)

This is functionally equivalent to the first example. It also conveniently points out another interesting fact about REBOL blocks: By default, their evaluation is deferred. The action block in the example above contains two (literal, unevaluated) REBOL words, print and x. It is only when we pass this block as an argument to foreach that the block is evaluated and the print word resolves to a function. Note that the act of passing a block as an argument does not necessarily mean it will be evaluated. It depends on the function to which it is passed.

In order to serve as variables, REBOL words must be bound to a context, which is a value of type object!. By default, REBOL provides a global context. Any word not bound to a specific context automatically expands the global context. In fact, the global context is the only context that can be expanded. All other contexts must explicitly declare the words that are bound to them, and once a context is declared, it cannot be expanded further. This is best shown with an example.

x: 99
foo: context [
    x: 17
    print x
]
print x

Look carefully at the example above. We have two different x’s. The first one is declared and assigned in the global context, and the second is declared and assigned in the foo context. The second print x statement prints 99, not 17. These two words are said to have the same spelling, but they are not the same word. This is an important distinction.

At first glance, this looks like the ordinary scope rules one finds in a traditional programming language. It’s actually more interesting than that. Take a look at another example.

x: 99
xs: []
foo: context [
    x: 17
    append xs 'x
]
probe xs
; == [x]
print xs
; == 17

This example will take some explaining. First, the lines beginning with semicolons are comments. I’ve used them here to show what the result of executing the preceding line of code is. So, first we assign the value 99 to the word x in the global context. We then declare an empty block and assign it to xs. Then we declare a context and, inside it, we declare another x bound to that context. We then append the word x to our block, xs. The apostrophe is used to say that we want to refer to the word itself, not its value. This means that after append xs 'x is executed, the variable xs contains [x], not [17], as shown by executing the probe function, which prints out an unevaluated value. The print function, however, does evaluate its argument. In this case, it prints 17. (When passed a block, print spits out all the values with a single space between them. If there’s only one value in the block, it’s just printed as-is.)

Why didn’t print xs spit out 99? How did print know which x to print? REBOL words carry a reference to their context with them. It’s not where a word is evaluated that makes the difference, but where it’s declared. Because of this, it’s entirely possible to have a block containing [x x x] in which each x was declared in a different context and each one has a completely different value! In fact, I think it would be useful to show just such an example.

xs: []
use [x] [
    x: 12
    append xs 'x
]
use [x] [
    x: 9
    append xs 'x
]
use [x] [
    x: "REBOL"
    append xs 'x
]
probe xs
; == [x x x]
print xs
; == 12 9 REBOL

The use function creates an anonymous context. The first block passed to it contains declarations of the words that are bound to the anonymous context, and the second block contains any code we wish to execute in that context. The important thing to note here is that the xs variable ends up containing [x x x]. Although each of these words has the same spelling, “x”, they are not the same word. Each is declared in a different context and points to a different value. This is demonstrated by printing the values.

The relationship between words and contexts is known as Bindology in REBOL jargon. Fortunately, it isn’t necessary to have a deep understanding of Bindology to write basic REBOL code, but (in my humble opinion) it’s indispensible for doing anything advanced.

Once I’d figured all of this out, I thought it was extremely interesting, and a very unique way to design a language. But my first thought after that was to ask myself why anyone would design a language this way. What does it gain you? While you can do some very useful things with it, I think the answer lies in the ability to create REBOL dialects.

In REBOL, a dialect is a sequence of REBOL values inside of a block, interpreted using the parse function. This is more easily demonstrated than explained, so here’s an example.

rules: [
    any [
        set count integer!
        set value [ string! | word! ] (value: either string? value [value] [get value])
        (repeat n count [print value])
    ]
]
last-name: "Sassenrath"
parse [3 "Carl" 4 last-name] rules

The second argument of parse is a block specified in the PARSE dialect, the syntax of which is (far) outside the scope of this blog post. Suffice it to say that these rules provide the grammar for a very simple dialect. This dialect must consist of a sequence of zero or more integer-string or integer-word pairs. For each pair encountered, the string is printed out the number of times specified. If a word is encountered, it is evaluated, and the result is also printed to standard output.

This isn’t a very useful or complex dialect, but truly, the sky’s the limit. The only proviso is that the dialect must consist of a sequence of valid REBOL values, optionally including words. Note that dialects could not exist if blocks were immediately evaluated. By deferring their evaluation, we give parse the opportunity to interpret the block in whatever way we wish, completely bypassing REBOL’s ordinary DO dialect. We also have the opportunity, using Bindology, to evaluate any REBOL words that are included in the dialect, if we wish to do so and if our grammar permits them.

In sum, Bindology allows us to defer the execution of blocks for various purposes, including the specification of dialects, while allowing us to include REBOL words in the block without fear of side effects. This allows for some very powerful and unique methodologies, and it’s what gives REBOL its depth.