Loudly Recursive: dsl

Showing posts with label dsl. Show all posts

2010-08-29

A REBOL Dialect For Constructed Languages

I have had a fascination for languages ever since I read Tolkien and discovered that he had created the Elvish languages from scratch. It was this hobby of “conlang” design that ultimately led me to become a programmer.

My first programs (written in line-oriented GW-BASIC) were intended to generate random words according to specific rules. I would then pick the most interesting ones for inclusion in whatever language it was I was designing at the time. Each program was language-specific. That is, I would choose a set of phonemes, create a set of rules for their combination, develop an orthography, and then write a program to generate valid words.

When I discovered REBOL, I realized that its dialecting capabilities would be perfect for this task, but instead of having to create a program per language, or struggle with other less-than-ideal techniques such as XML configuration files, I could simply create a REBOL dialect to describe the task.

To illustrate my Conlang Dialect, we‘ll go through the process of creating a brutally simple language. Obviously we won’t bother with grammar and syntax. We are dealing just with the sounds. We will call this simple language Na. Na has only four consonants, s k t n, and three vowels, a i u. The consonants are as one would expect, and the vowels are pronounced more or less as in Spanish.

A syllable in Na must always end with a vowel, and must begin with at least one and at most two consonants. Thus a is not a valid syllable in Na, but ta and tki are. Words can consist of any number of syllables, but we will stick with those between two and five.

The Conlang Dialect consists of three verbs, rand, rept and join. Verbs are followed by one or more arguments. Our first verb, rand, can be followed either by a string or a by a REBOL block in a specific format. When followed by a string, rand instructs the Conlang Dialect to randomly choose one of the characters from the string.

rand "aiu"

When the expression above is evaluated in the Conlang Dialect, it will result either in "a", "i", or "u". However, sometimes we want to say that one choice can occur more frequently than other choices. In the Na language, "a" is more common than either "i" or "u", which are about equally common. We can express this as follows.

rand [
 3 "a"
 1 rand "iu"
]

This means that 3 out of 4 times, the evaluation of the whole statement will result in "a", otherwise it will result in the evaluation of the expression rand "iu". Thus, Conlang Dialect expression can be nested within each other, and often are.

Now that we have a way to randomly choose strings, we need a way to stitch them together. This is performed by the join verb, which takes a block containing the expressions we want to join.

join [
 rand "sktn"
 rand [
  3 "a"
  1 rand "iu"
 ]
]

This instructs the Conlang Dialect to take the result of the two expressions inside of the block and combine them. So, for instance, we could get results like "ki" and "su" from the whole expression above.

The last verb in our repertoire is rept. This verb takes three arguments, and is best illustrated with an example.

rept 1 3 join [
 rand "sktn"
 rand [
  3 "a"
  1 rand "iu"
 ]
]

The first two arguments of rept tell the Conlang Dialect to repeat the evaluation of the expression given in the third argument from one to three times, in this case. In other words, pick a random number between 1 and 3 and execute the join expression that number of times, stitching the result together. The result of this expression could be words such as "ka", "kasiki", "sita", and so on.

This is all we need to create the Conlang Dialect, but unfortunately it has some drawbacks. It would be nice if we could assign expressions to names and reuse them, and so we can:

na: [
 consonant: rand "sktn"
 onset: rand [
  3 consonant
  1 join [
   consonant
   consonant
  ]
 ]
 vowel: rand [
  3 "a"
  1 rand "iu"
 ]
 syllable: join [onset vowel]
 main: rept 1 5 syllable
]

Here we have a full specification written in the Conlang Dialect. Expressions are assigned to names using a standard REBOL set-word. It should be fairly obvious from the above example how they are used. Assigning expressions in this way is not required, with one exception: The main expression is required, as it serves as the entry point into the specification. Using named expressions makes the Conlang Dialect much more usable, so I highly encourage their use.

Each verb has a much shorter synonym, and I tend to use these exclusively.

rand	?
rept	*
join	&

Here is a full program using the Conlang Dialect to produce words in our imaginary Na language. This uses the abbreviated synonyms given above.

REBOL [
 needs: [
  2.100.99.2.5
  http://r3.revolucent.net/net.revolucent.conlang.v1.r 1.0.2
 ]
 license: 'mit
]

na: [
 consonant: ? "sktn"
 onset: ? [
  3 consonant
  & [
   consonant
   consonant
  ]
 ]
 vowel: ? [
  3 "a"
  ? "iu"
 ]
 syllable: & [onset vowel]
 main: * 1 5 syllable
]

random/seed now
generator: parse-conlang na
words: []
while [greater? 10 length? words] [
 unless find words word: generator/eval [
  append words word
 ]
]
print words

And here is a list of ten words generated by executing the program. Of course, subsequent executions are almost certain to produce a different set of words.

nkasanatana
ntusnana
sata
sanuna
kasakukana
skannatu
suknakaka
naka
natasasa
kanika

2009-07-05

Deep REBOL: Bindology

I’m often asked why I think REBOL is a “deep” language. This question is no surprise. REBOL’s ordinary syntax — the so called DO dialect — is unlikely to be very impressive to a veteran Ruby, Haskell, or C# programmer. E.g.,

foreach x [1 2 3] [
    print x
]

While I think the simplicity of the DO dialect is a strength, it causes many — probably most — developers to dismiss REBOL out of hand. But those that stick around eventually discover that there’s more to REBOL than meets the eye.

REBOL’s depth comes from the runtime behavior of three special datatypes, word!, block! and object!, optionally combined with the power of the PARSE dialect.

REBOL words are the symbols of the language, such as foreach, x and print in the example above. The closest equivalent in a traditional programming language is an identifier (or a keyword), except that unlike an identifier, REBOL words are directly and immediately accessible at runtime. There is no special API, such as reflection or runtime type information, required. I urge you not to get too cozy with this idea, however. Though there are some superficial resemblances, REBOL words are neither identifiers nor keywords.

The most common use of a REBOL word is to serve as a variable. For instance, in the example above the word foreach serves as a variable pointing to a built-in function that iterates over a series. However, there is no requirement that words serve as variables. They can be used directly as symbols in their own right.

Blocks are a bit simpler to understand than words, and nearly as ubiquitous. A block is a sequence of zero or more REBOL values (words, integers, strings, urls and so on) enclosed between square brackets. Blocks are not like the curly braces of a language like C. They are actual values and can be (and usually are) passed as arguments to functions. For instance, the foreach function in the example above takes three arguments. First, a word to serve as the iteration variable. Second, a series of some kind. In this case, the block [1 2 3]. And lastly, another block containing the code to be executed for each iteration. To make this more explicit, take a look at the following example.

action: [ print x ]
foreach x [1 2 3] action

(A word with a colon after it is called a set-word!. The colon is not an operator. It is a part of the word itself and cannot be separated from it. A set-word is used to assign a value to a word. In this case, we are assigning the block [ print x ] to the word action.)

This is functionally equivalent to the first example. It also conveniently points out another interesting fact about REBOL blocks: By default, their evaluation is deferred. The action block in the example above contains two (literal, unevaluated) REBOL words, print and x. It is only when we pass this block as an argument to foreach that the block is evaluated and the print word resolves to a function. Note that the act of passing a block as an argument does not necessarily mean it will be evaluated. It depends on the function to which it is passed.

In order to serve as variables, REBOL words must be bound to a context, which is a value of type object!. By default, REBOL provides a global context. Any word not bound to a specific context automatically expands the global context. In fact, the global context is the only context that can be expanded. All other contexts must explicitly declare the words that are bound to them, and once a context is declared, it cannot be expanded further. This is best shown with an example.

x: 99
foo: context [
    x: 17
    print x
]
print x

Look carefully at the example above. We have two different x’s. The first one is declared and assigned in the global context, and the second is declared and assigned in the foo context. The second print x statement prints 99, not 17. These two words are said to have the same spelling, but they are not the same word. This is an important distinction.

At first glance, this looks like the ordinary scope rules one finds in a traditional programming language. It’s actually more interesting than that. Take a look at another example.

x: 99
xs: []
foo: context [
    x: 17
    append xs 'x
]
probe xs
; == [x]
print xs
; == 17

This example will take some explaining. First, the lines beginning with semicolons are comments. I’ve used them here to show what the result of executing the preceding line of code is. So, first we assign the value 99 to the word x in the global context. We then declare an empty block and assign it to xs. Then we declare a context and, inside it, we declare another x bound to that context. We then append the word x to our block, xs. The apostrophe is used to say that we want to refer to the word itself, not its value. This means that after append xs 'x is executed, the variable xs contains [x], not [17], as shown by executing the probe function, which prints out an unevaluated value. The print function, however, does evaluate its argument. In this case, it prints 17. (When passed a block, print spits out all the values with a single space between them. If there’s only one value in the block, it’s just printed as-is.)

Why didn’t print xs spit out 99? How did print know which x to print? REBOL words carry a reference to their context with them. It’s not where a word is evaluated that makes the difference, but where it’s declared. Because of this, it’s entirely possible to have a block containing [x x x] in which each x was declared in a different context and each one has a completely different value! In fact, I think it would be useful to show just such an example.

xs: []
use [x] [
    x: 12
    append xs 'x
]
use [x] [
    x: 9
    append xs 'x
]
use [x] [
    x: "REBOL"
    append xs 'x
]
probe xs
; == [x x x]
print xs
; == 12 9 REBOL

The use function creates an anonymous context. The first block passed to it contains declarations of the words that are bound to the anonymous context, and the second block contains any code we wish to execute in that context. The important thing to note here is that the xs variable ends up containing [x x x]. Although each of these words has the same spelling, “x”, they are not the same word. Each is declared in a different context and points to a different value. This is demonstrated by printing the values.

The relationship between words and contexts is known as Bindology in REBOL jargon. Fortunately, it isn’t necessary to have a deep understanding of Bindology to write basic REBOL code, but (in my humble opinion) it’s indispensible for doing anything advanced.

Once I’d figured all of this out, I thought it was extremely interesting, and a very unique way to design a language. But my first thought after that was to ask myself why anyone would design a language this way. What does it gain you? While you can do some very useful things with it, I think the answer lies in the ability to create REBOL dialects.

In REBOL, a dialect is a sequence of REBOL values inside of a block, interpreted using the parse function. This is more easily demonstrated than explained, so here’s an example.

rules: [
    any [
        set count integer!
        set value [ string! | word! ] (value: either string? value [value] [get value])
        (repeat n count [print value])
    ]
]
last-name: "Sassenrath"
parse [3 "Carl" 4 last-name] rules

The second argument of parse is a block specified in the PARSE dialect, the syntax of which is (far) outside the scope of this blog post. Suffice it to say that these rules provide the grammar for a very simple dialect. This dialect must consist of a sequence of zero or more integer-string or integer-word pairs. For each pair encountered, the string is printed out the number of times specified. If a word is encountered, it is evaluated, and the result is also printed to standard output.

This isn’t a very useful or complex dialect, but truly, the sky’s the limit. The only proviso is that the dialect must consist of a sequence of valid REBOL values, optionally including words. Note that dialects could not exist if blocks were immediately evaluated. By deferring their evaluation, we give parse the opportunity to interpret the block in whatever way we wish, completely bypassing REBOL’s ordinary DO dialect. We also have the opportunity, using Bindology, to evaluate any REBOL words that are included in the dialect, if we wish to do so and if our grammar permits them.

In sum, Bindology allows us to defer the execution of blocks for various purposes, including the specification of dialects, while allowing us to include REBOL words in the block without fear of side effects. This allows for some very powerful and unique methodologies, and it’s what gives REBOL its depth.

2009-04-26

A Dirt-Simple DSL In REBOL

REBOL allows the creation of domain-specific languages (DSLs) using the parse function. It takes as its first argument a block! containing the DSL and as its second argument another block! containing the DSL’s specification, e.g.,

parse [x 2 x "hey!"] [ 
	some [
		'x
		set value [ integer! | string! ]
		(print either integer? value [ value * 2 ] [ value ])
	]
]

In this simple (and completely useless) DSL, the literal x is followed either by an integer or a string. This sequence of x followed by a value can be repeated indefinitely. (That’s what some tells us in the DSL’s specification.) If the value is an integer, it’s multiplied by two and then printed. If it’s a string, it’s simply printed as is. This is accomplished by the code inside of the parentheses. In the parse dialect, anything in parentheses is interpreted as REBOL code written in the do dialect, i.e., it’s what we think of as ordinary REBOL code. This code is executed only if the previous parse rule succeeds in matching.

Most REBOL DSLs are declarative, because those are the easiest sort to write, but if you’re motived you can create any sort of DSL you wish. It should be stressed that REBOL DSLs are dialects of REBOL, not completely new languages in their own right. In fact, they are dialects of REBOL’s data exchange dialect, because the only valid lexical items are those that are valid in that dialect. For instance, @ by itself is not a valid REBOL literal, and thus cannot be used directly in any REBOL DSL. Thus, the following is not valid:

parse [2 @ 3] grammar

However, the following is valid because @ appears as part of an email address, which is a valid REBOL literal.

parse [2 test@test.com 3] grammar

DSLs can be used for whatever purpose you wish, but most recently I created a DSL to do list comprehensions, since they aren’t natively supported in REBOL. E.g.,

list [[a * b] for a in [1 2 3] for b in [4 5 6] where [even? a * b]]

This returns [4 6 8 10 12 12 18]. Here’s the source. It depends on the range function which I’ve also added.

range: func [pair [pair! block!] /local min max result][
    min: first pair 
    max: second pair 
    result: copy [] 
    for n min max either min < max [1] [-1] [
        append result n
    ] 
    result
]
	
list: func [
	{Performs a list comprehension.}
    comprehension [block!] 
    /type 
    	datatype [datatype!] 
    /local 
	    args 
	    action 
	    elems 
	    filter 
	    index 
	    list 
	    result 
	    rules 
	    skip 
	    vars
][
    vars: make object! [] 
    rules: [
        set action [block!] 
        some [
            'for 
            set var word! 
            'in 
            set list [word! | series! | pair!] 
            (if pair? list [list: range list]) 
            (vars: make vars reduce [to-set-word var either paren? list [do list] [list]])
        ] 
        opt [
            'where 
            set filter [block!]
        ]
    ] 
    unless parse comprehension rules [
        make error! "The list comprehension was not valid."
    ] 
    action: func copy at first vars 2 action 
    filter: func copy at first vars 2 either found? filter [filter] [[true]] 
    elems: 1 
    foreach field at first vars 2 [
        unless found? result [
            result: make either type [datatype] [type? vars/(field)] none
        ] 
        elems: elems * (length? vars/(field))
    ] 
    for n 0 (elems - 1) 1 [
        skip: elems 
        args: copy [] 
        foreach field at first vars 2 [
            list: vars/(field) 
            skip: skip / length? list 
            index: (mod to-integer (n / skip) length? list) + 1 
            append args list/(index)
        ] 
        if do compose [filter (args)] [
            append/only result do compose [action (args)]
        ]
    ] 
    result
]