A REBOL Dialect For Constructed Languages

I have had a fascination for languages ever since I read Tolkien and discovered that he had created the Elvish languages from scratch. It was this hobby of “conlang” design that ultimately led me to become a programmer.

My first programs (written in line-oriented GW-BASIC) were intended to generate random words according to specific rules. I would then pick the most interesting ones for inclusion in whatever language it was I was designing at the time. Each program was language-specific. That is, I would choose a set of phonemes, create a set of rules for their combination, develop an orthography, and then write a program to generate valid words.

When I discovered REBOL, I realized that its dialecting capabilities would be perfect for this task, but instead of having to create a program per language, or struggle with other less-than-ideal techniques such as XML configuration files, I could simply create a REBOL dialect to describe the task.

To illustrate my Conlang Dialect, we‘ll go through the process of creating a brutally simple language. Obviously we won’t bother with grammar and syntax. We are dealing just with the sounds. We will call this simple language Na. Na has only four consonants, s k t n, and three vowels, a i u. The consonants are as one would expect, and the vowels are pronounced more or less as in Spanish.

A syllable in Na must always end with a vowel, and must begin with at least one and at most two consonants. Thus a is not a valid syllable in Na, but ta and tki are. Words can consist of any number of syllables, but we will stick with those between two and five.

The Conlang Dialect consists of three verbs, rand, rept and join. Verbs are followed by one or more arguments. Our first verb, rand, can be followed either by a string or a by a REBOL block in a specific format. When followed by a string, rand instructs the Conlang Dialect to randomly choose one of the characters from the string.

rand "aiu"

When the expression above is evaluated in the Conlang Dialect, it will result either in "a", "i", or "u". However, sometimes we want to say that one choice can occur more frequently than other choices. In the Na language, "a" is more common than either "i" or "u", which are about equally common. We can express this as follows.

rand [
 3 "a"
 1 rand "iu"

This means that 3 out of 4 times, the evaluation of the whole statement will result in "a", otherwise it will result in the evaluation of the expression rand "iu". Thus, Conlang Dialect expression can be nested within each other, and often are.

Now that we have a way to randomly choose strings, we need a way to stitch them together. This is performed by the join verb, which takes a block containing the expressions we want to join.

join [
 rand "sktn"
 rand [
  3 "a"
  1 rand "iu"

This instructs the Conlang Dialect to take the result of the two expressions inside of the block and combine them. So, for instance, we could get results like "ki" and "su" from the whole expression above.

The last verb in our repertoire is rept. This verb takes three arguments, and is best illustrated with an example.

rept 1 3 join [
 rand "sktn"
 rand [
  3 "a"
  1 rand "iu"

The first two arguments of rept tell the Conlang Dialect to repeat the evaluation of the expression given in the third argument from one to three times, in this case. In other words, pick a random number between 1 and 3 and execute the join expression that number of times, stitching the result together. The result of this expression could be words such as "ka", "kasiki", "sita", and so on.

This is all we need to create the Conlang Dialect, but unfortunately it has some drawbacks. It would be nice if we could assign expressions to names and reuse them, and so we can:

na: [
 consonant: rand "sktn"
 onset: rand [
  3 consonant
  1 join [
 vowel: rand [
  3 "a"
  1 rand "iu"
 syllable: join [onset vowel]
 main: rept 1 5 syllable

Here we have a full specification written in the Conlang Dialect. Expressions are assigned to names using a standard REBOL set-word. It should be fairly obvious from the above example how they are used. Assigning expressions in this way is not required, with one exception: The main expression is required, as it serves as the entry point into the specification. Using named expressions makes the Conlang Dialect much more usable, so I highly encourage their use.

Each verb has a much shorter synonym, and I tend to use these exclusively.


Here is a full program using the Conlang Dialect to produce words in our imaginary Na language. This uses the abbreviated synonyms given above.

 needs: [
  http://r3.revolucent.net/net.revolucent.conlang.v1.r 1.0.2
 license: 'mit

na: [
 consonant: ? "sktn"
 onset: ? [
  3 consonant
  & [
 vowel: ? [
  3 "a"
  ? "iu"
 syllable: & [onset vowel]
 main: * 1 5 syllable

random/seed now
generator: parse-conlang na
words: []
while [greater? 10 length? words] [
 unless find words word: generator/eval [
  append words word
print words

And here is a list of ten words generated by executing the program. Of course, subsequent executions are almost certain to produce a different set of words.



REBOL Module Names

One of the best new features of the upcoming REBOL 3 is the module system. I’m not going to go into the rationale behind modules, but I would like to make a proposal about module names.

Every module can optionally have a name. A name is required if the module author wishes to export words from the module. If the module is contained in a file, it’s important to note that the module name and the file name need not be the same. REBOL uses the module name to ensure that a module is never loaded twice. This allows us to import the module as often as we like without incurring any additional overhead.

The name of a module must be a valid REBOL word. Many REBOL programmers are used to REBOL’s "programming in the small" (PITS) philosophy, so I think their tendency will be to use simple module names. However, modules are not a PITS feature, they are a "programming in the large" (PITL) feature. Using simple module names will decrease the opportunity for reuse by making collisions more likely.

So my proposal is to take a page from Java and use package-style module names. Fortunately, the "." character is perfectly valid inside a REBOL word and has no particular meaning as it does in other languages. Here are some examples:


As you can see, these are names for math modules. My fear is that many REBOL programmers will release modules naively named "math" and this will make it much harder to use them together. By hijacking Java's simple convention of reverse domain names, the problem is neatly solved.


Deep REBOL: Bindology

I’m often asked why I think REBOL is a “deep” language. This question is no surprise. REBOL’s ordinary syntax — the so called DO dialect — is unlikely to be very impressive to a veteran Ruby, Haskell, or C# programmer. E.g.,

foreach x [1 2 3] [
    print x

While I think the simplicity of the DO dialect is a strength, it causes many — probably most — developers to dismiss REBOL out of hand. But those that stick around eventually discover that there’s more to REBOL than meets the eye.

REBOL’s depth comes from the runtime behavior of three special datatypes, word!, block! and object!, optionally combined with the power of the PARSE dialect.

REBOL words are the symbols of the language, such as foreach, x and print in the example above. The closest equivalent in a traditional programming language is an identifier (or a keyword), except that unlike an identifier, REBOL words are directly and immediately accessible at runtime. There is no special API, such as reflection or runtime type information, required. I urge you not to get too cozy with this idea, however. Though there are some superficial resemblances, REBOL words are neither identifiers nor keywords.

The most common use of a REBOL word is to serve as a variable. For instance, in the example above the word foreach serves as a variable pointing to a built-in function that iterates over a series. However, there is no requirement that words serve as variables. They can be used directly as symbols in their own right.

Blocks are a bit simpler to understand than words, and nearly as ubiquitous. A block is a sequence of zero or more REBOL values (words, integers, strings, urls and so on) enclosed between square brackets. Blocks are not like the curly braces of a language like C. They are actual values and can be (and usually are) passed as arguments to functions. For instance, the foreach function in the example above takes three arguments. First, a word to serve as the iteration variable. Second, a series of some kind. In this case, the block [1 2 3]. And lastly, another block containing the code to be executed for each iteration. To make this more explicit, take a look at the following example.

action: [ print x ]
foreach x [1 2 3] action

(A word with a colon after it is called a set-word!. The colon is not an operator. It is a part of the word itself and cannot be separated from it. A set-word is used to assign a value to a word. In this case, we are assigning the block [ print x ] to the word action.)

This is functionally equivalent to the first example. It also conveniently points out another interesting fact about REBOL blocks: By default, their evaluation is deferred. The action block in the example above contains two (literal, unevaluated) REBOL words, print and x. It is only when we pass this block as an argument to foreach that the block is evaluated and the print word resolves to a function. Note that the act of passing a block as an argument does not necessarily mean it will be evaluated. It depends on the function to which it is passed.

In order to serve as variables, REBOL words must be bound to a context, which is a value of type object!. By default, REBOL provides a global context. Any word not bound to a specific context automatically expands the global context. In fact, the global context is the only context that can be expanded. All other contexts must explicitly declare the words that are bound to them, and once a context is declared, it cannot be expanded further. This is best shown with an example.

x: 99
foo: context [
    x: 17
    print x
print x

Look carefully at the example above. We have two different x’s. The first one is declared and assigned in the global context, and the second is declared and assigned in the foo context. The second print x statement prints 99, not 17. These two words are said to have the same spelling, but they are not the same word. This is an important distinction.

At first glance, this looks like the ordinary scope rules one finds in a traditional programming language. It’s actually more interesting than that. Take a look at another example.

x: 99
xs: []
foo: context [
    x: 17
    append xs 'x
probe xs
; == [x]
print xs
; == 17

This example will take some explaining. First, the lines beginning with semicolons are comments. I’ve used them here to show what the result of executing the preceding line of code is. So, first we assign the value 99 to the word x in the global context. We then declare an empty block and assign it to xs. Then we declare a context and, inside it, we declare another x bound to that context. We then append the word x to our block, xs. The apostrophe is used to say that we want to refer to the word itself, not its value. This means that after append xs 'x is executed, the variable xs contains [x], not [17], as shown by executing the probe function, which prints out an unevaluated value. The print function, however, does evaluate its argument. In this case, it prints 17. (When passed a block, print spits out all the values with a single space between them. If there’s only one value in the block, it’s just printed as-is.)

Why didn’t print xs spit out 99? How did print know which x to print? REBOL words carry a reference to their context with them. It’s not where a word is evaluated that makes the difference, but where it’s declared. Because of this, it’s entirely possible to have a block containing [x x x] in which each x was declared in a different context and each one has a completely different value! In fact, I think it would be useful to show just such an example.

xs: []
use [x] [
    x: 12
    append xs 'x
use [x] [
    x: 9
    append xs 'x
use [x] [
    x: "REBOL"
    append xs 'x
probe xs
; == [x x x]
print xs
; == 12 9 REBOL

The use function creates an anonymous context. The first block passed to it contains declarations of the words that are bound to the anonymous context, and the second block contains any code we wish to execute in that context. The important thing to note here is that the xs variable ends up containing [x x x]. Although each of these words has the same spelling, “x”, they are not the same word. Each is declared in a different context and points to a different value. This is demonstrated by printing the values.

The relationship between words and contexts is known as Bindology in REBOL jargon. Fortunately, it isn’t necessary to have a deep understanding of Bindology to write basic REBOL code, but (in my humble opinion) it’s indispensible for doing anything advanced.

Once I’d figured all of this out, I thought it was extremely interesting, and a very unique way to design a language. But my first thought after that was to ask myself why anyone would design a language this way. What does it gain you? While you can do some very useful things with it, I think the answer lies in the ability to create REBOL dialects.

In REBOL, a dialect is a sequence of REBOL values inside of a block, interpreted using the parse function. This is more easily demonstrated than explained, so here’s an example.

rules: [
    any [
        set count integer!
        set value [ string! | word! ] (value: either string? value [value] [get value])
        (repeat n count [print value])
last-name: "Sassenrath"
parse [3 "Carl" 4 last-name] rules

The second argument of parse is a block specified in the PARSE dialect, the syntax of which is (far) outside the scope of this blog post. Suffice it to say that these rules provide the grammar for a very simple dialect. This dialect must consist of a sequence of zero or more integer-string or integer-word pairs. For each pair encountered, the string is printed out the number of times specified. If a word is encountered, it is evaluated, and the result is also printed to standard output.

This isn’t a very useful or complex dialect, but truly, the sky’s the limit. The only proviso is that the dialect must consist of a sequence of valid REBOL values, optionally including words. Note that dialects could not exist if blocks were immediately evaluated. By deferring their evaluation, we give parse the opportunity to interpret the block in whatever way we wish, completely bypassing REBOL’s ordinary DO dialect. We also have the opportunity, using Bindology, to evaluate any REBOL words that are included in the dialect, if we wish to do so and if our grammar permits them.

In sum, Bindology allows us to defer the execution of blocks for various purposes, including the specification of dialects, while allowing us to include REBOL words in the block without fear of side effects. This allows for some very powerful and unique methodologies, and it’s what gives REBOL its depth.


JavaScript & REBOL

After working with both languages a great deal, I’ve come to the realization that JavaScript is the language in common use that’s most akin to REBOL. There’s still a very wide gulf, but there are enough similarities to make it worth a mention. (Not much more than a mention, though.)

At first glance, JavaScript and REBOL don’t look much alike, even when the code is doing the same thing.

// JavaScript
function foo(s) {
 return s + 'foo';
foo: func [s [string!]] [
 rejoin [s "foo"]

We can start to make them look a little more similar if we put the JavaScript inside an object literal and the REBOL inside a block. We’ll also omit the type specifier from the REBOL function, since JavaScript has no equivalent.

 foo: function(s) {
  return s + 'foo';
 foo: func [s] [
  rejoin [s "foo"]

(I should point out that a block and an object literal are not equivalent, but their capabilities overlap a little bit. REBOL’s blocks are much more powerful.)

And that’s about as far as we can go with that. Anyway, it’s not lexical similarity that concerns me here, but functional similarity. For instance, both languages are scripting languages in the sense that they are not (usually) compiled. No big deal there. That’s true of a long list of languages.

However, both languages are prototype languages, and that’s certainly not true of a lot of languages.

function Foo() {
 this.watusi = 3;

function Bar() {
 this.watusi = 4;

Bar.prototype = Foo;

In REBOL, it’s similar, although in my opinion slightly more elegant, even if its syntax looks a bit strange to the curly bracers.

foo: make object! [watusi: 3]
bar: make foo [watusi: 4]

The serialized form of both languages is the language itself. In JavaScript, the JSON format is often used as a way to communicate between client and server. A very similar thing can be done with REBOL, although it has no specific name. (There really isn’t any need to have a special name or notation in REBOL. It’s just REBOL. In fact, being used for messaging is one of the fundamental things REBOL was designed to do.)

{ // JavaScript object literal
 orderno: 37,
 items: [22, 873, 14] // A list of order item numbers

What you see above is exactly what the server would transmit. We’d use it in JavaScript by assigning it to a variable.

var order = getOrder(); // getOrder() returns the object literal specified above

Here’s one REBOL equivalent (of a great many). Here’s what we’d transmit:

orderno: 37
items: [22 873 14]

And here’s how we might use it:

order: get-order
print order/orderno

The reason I didn’t place the REBOL “object literal” inside a block is because it isn’t necessary. When the string is read from the server, the LOAD function will automatically place it inside a block. GET-ORDER would be implemented something like this:

get-order: does [
 make object! load http://localhost:999/foo

In this particular case I tried to make the REBOL look as close to the JavaScript as I could to emphasize their similarities. I would probably implement the above very differently in REBOL under ordinary circumstances, although there’s nothing wrong with how it was done here. Because REBOL has DSLs and the evaluation of blocks is deferred (unlike JavaScript’s object literals), we have quite a few more options.

Another similarity is that both languages allow functions to be passed as arguments.

function callF(f) {

call-f: func [f [any-function!]] [

call-f does [print "called!"]

Both languages are amazingly nimble, although in a flexibility contest REBOL would win easily. It can do everything JavaScript can do and then some. Everything except manipulate the DOM, that is.


A Dirt-Simple DSL In REBOL

REBOL allows the creation of domain-specific languages (DSLs) using the parse function. It takes as its first argument a block! containing the DSL and as its second argument another block! containing the DSL’s specification, e.g.,

parse [x 2 x "hey!"] [ 
	some [
		set value [ integer! | string! ]
		(print either integer? value [ value * 2 ] [ value ])

In this simple (and completely useless) DSL, the literal x is followed either by an integer or a string. This sequence of x followed by a value can be repeated indefinitely. (That’s what some tells us in the DSL’s specification.) If the value is an integer, it’s multiplied by two and then printed. If it’s a string, it’s simply printed as is. This is accomplished by the code inside of the parentheses. In the parse dialect, anything in parentheses is interpreted as REBOL code written in the do dialect, i.e., it’s what we think of as ordinary REBOL code. This code is executed only if the previous parse rule succeeds in matching.

Most REBOL DSLs are declarative, because those are the easiest sort to write, but if you’re motived you can create any sort of DSL you wish. It should be stressed that REBOL DSLs are dialects of REBOL, not completely new languages in their own right. In fact, they are dialects of REBOL’s data exchange dialect, because the only valid lexical items are those that are valid in that dialect. For instance, @ by itself is not a valid REBOL literal, and thus cannot be used directly in any REBOL DSL. Thus, the following is not valid:

parse [2 @ 3] grammar

However, the following is valid because @ appears as part of an email address, which is a valid REBOL literal.

parse [2 test@test.com 3] grammar

DSLs can be used for whatever purpose you wish, but most recently I created a DSL to do list comprehensions, since they aren’t natively supported in REBOL. E.g.,

list [[a * b] for a in [1 2 3] for b in [4 5 6] where [even? a * b]]

This returns [4 6 8 10 12 12 18]. Here’s the source. It depends on the range function which I’ve also added.

range: func [pair [pair! block!] /local min max result][
    min: first pair 
    max: second pair 
    result: copy [] 
    for n min max either min < max [1] [-1] [
        append result n
list: func [
	{Performs a list comprehension.}
    comprehension [block!] 
    	datatype [datatype!] 
    vars: make object! [] 
    rules: [
        set action [block!] 
        some [
            set var word! 
            set list [word! | series! | pair!] 
            (if pair? list [list: range list]) 
            (vars: make vars reduce [to-set-word var either paren? list [do list] [list]])
        opt [
            set filter [block!]
    unless parse comprehension rules [
        make error! "The list comprehension was not valid."
    action: func copy at first vars 2 action 
    filter: func copy at first vars 2 either found? filter [filter] [[true]] 
    elems: 1 
    foreach field at first vars 2 [
        unless found? result [
            result: make either type [datatype] [type? vars/(field)] none
        elems: elems * (length? vars/(field))
    for n 0 (elems - 1) 1 [
        skip: elems 
        args: copy [] 
        foreach field at first vars 2 [
            list: vars/(field) 
            skip: skip / length? list 
            index: (mod to-integer (n / skip) length? list) + 1 
            append args list/(index)
        if do compose [filter (args)] [
            append/only result do compose [action (args)]



When I first looked at REBOL, I didn’t think much of it:

foreach e [1 2 3] [
   print e

Pretty obvious what this does, and it’s the sort of thing you’ve seen a million times, eh? Not to mention the square brackets …

But then I saw stuff like this:

write/binary %Google.html read/binary http://www.google.com

That started to pique my interest! I can't think of another language in which writing the contents of a URL to a file are so concise. It might as well be wget or curl.

I noticed a few other strange things about this syntax as well. First, the URL was not quoted. It was just written out as is. And the file name was prefixed by a percent sign. I had to know more about this, and what I discovered was much deeper than I ever expected.

Like all programming languages, REBOL has tokens whose type can be determined by their lexical form. To give a common example, when an interpreter or compiler encounters the token 3, it interprets it as an integer (of some kind). These kinds of tokens are usually known as literals. Most languages have string and numeric literals, and that’s about it. REBOL on the other hand has a lot of literals: strings, numbers, URLs, blocks, parens, paths, words, files, and so on.

Here are some examples:

ExampleREBOL Type
[1 2 3]block!
(1 2 3)paren!

There are many more. As you can see, a few of these have equivalents in other languages, but most don’t. In fact, some of these look suspiciously like the kinds of things in which a compiler or interpreter would be interested. That’s true, and it’s the basis of what's known as the data exchange dialect.

Let’s digress a bit. REBOL is composed of a hierarchy of dialects. There are several built-in dialects, and REBOL gives developers the ability to create their own. The basis of all the dialects is the data exchange dialect, which isn’t a programming language per se. It's just a free-form sequence of literals that can be interpreted however one wishes. For example:

[http://www.google.com http://www.yahoo.com] fetch no-one@nowhere.com

If we were to replace literals above with their types, we'd get

block! word! email!

(Yes, email addresses are another literal data type.) The data exchange dialect can be used much like XML. You can pass around chunks of it and the REBOL interpreter will happily tell you what the types and values of the literals are. It makes for a much better XML than XML, in my opinion. Here's an example:

data: load "[http://www.google.com http://www.yahoo.com] fetch no-one@nowhere.com"
foreach literal data [
 print type?/word literal

The code above prints out the following:


When the load function is passed a string, it interprets the contents of the string as a sequence of literals, and returns a block! containing those literals. Thus, the data variable has the following value:

[[http://www.google.com http://www.yahoo.com] fetch no-one@nowhere.com]

So, if the block above is an example of the data exchange dialect, what dialect is actual programming code written in? The DO dialect. Ordinarily, when the REBOL interpreter starts processing a file, it assumes that the file is written in the DO dialect. Like all dialects, the DO dialect is a stream of literals whose types are determined by the rules of the data exchange dialect. However, what is done with them afterwards is determined by the rules of the DO dialect.

For example, when the DO dialect encounters the literal* print, it says, "Ah! This is a literal of type word!". It then looks up the value of that word and finds that it points to a function taking a single argument. It then prints the value of that argument to standard output. E.g.,

print "REBOL"

This does exactly what one would expect.

All of this barely scratches the surface of REBOL. There's the syntax of the DO dialect, those strange beasts called words, Bindology and contexts, and so on. The purpose of this post was to get you interested. Hopefully you'll read more at rebol.com.

*Note that strictly speaking, print is a literal in the data exchange dialect, but a token in the DO dialect. However, in the interests of reducing confusion, I stuck with the former term.


Why CHECK Constraints Matter

Well-written applications validate their data. No one disputes that. The debate has always been: Where should the validation logic go?

David Heinemeier Hansson, the inventor of Ruby on Rails, strongly believes that validation and other business logic belong in an ORM layer. I don’t think Hansson would fault anyone for putting check constraints in the database. He’d just think you were wasting your time, and violating one of the first principles of Rails development: Don't Repeat Yourself. In order to take advantage of Rails’ built-in validation machinery, you’ll have to repeat your constraints in the ORM anyway. So why bother putting them in the database at all?

In the case of most Rails applications, I can’t argue with him. The software behind this blog follows exactly that approach and doesn’t suffer. However, this blog is typical of a class of applications in which the database and the application are very tightly bound. They form a single, inseparable unit and the database is not shared.

This arrangement is not typical in the corporate world. There, databases are shared resources accessed by multiple applications, often with a mixture of data access technologies. The mixture arises because databases tend to outlast the technologies that are used to access them. SQL Servers in a Microsoft shop (for instance) may get upgraded, but ODBC, RDO, ADO, and ADO.NET have come (and in some cases gone), not to mention the vast array of DALs and ORMs built atop them.

The database is where the buck stops, and in a shared environment it’s the one constant you can count on. ORMs and DALs will come and go, but if you throw the database out, only then have you truly hit the reset button. This is why at a minimum your validation logic belongs in the database in the form of CHECK constraints.

You will almost certainly have to repeat yourself in your DAL or ORM. I see no responsible way around this. But to bet the farm on a chunk of middle-tier code that may not be used everywhere at all times is not a good idea.

I’m not naive enough to believe that CHECK constraints are the holy grail of data integrity. Along with great testing, talented development, and strong domain knowledge, they form one of the pillars of robust application development. (And they’ll help you build better ORMs, too!)

For the record, I’m not an opponent of ORMs. I think they can serve a very useful purpose. Many ORMs have powerful facilities to interact with the user interface and present the opportunity to fix a problem before it gets written to the database. However, no matter how sophisticated the business rules are in the ORM, the database should be the final arbiter of its own data integrity.

Note: This article was originally hosted on my own server in blogging software I'd written myself in RoR.