Writing mod_perl6 in Perl 6

One of the goals of the mod_parrot project is to provide the infrastructure for running the Perl 6 version of mod_perl, a.k.a. mod_perl6. I've already demonstrated that mod_perl6 works, so that goal is slowly being achieved. Many thanks to Patrick Michaud, Jerry Gay, and everyone else who has worked on the Parrot implementation of Perl 6.

Another lesser known goal of mod_parrot is to allow the high level language (HLL) layers to be written in the HLL itself. That is to say, write mod_perl6 in Perl 6. Up to this point, mod_parrot has five HLL layers (PIR, NQP, Perl6, PHP/Plumhead, Perl1/Punie), all written in Parrot's native PIR. However, yesterday, with some help from Patrick, I was able to port mod_perl6 from PIR to pure Perl 6!

As an example, here is a very bare-bones mod_perl6 (DISCLAIMER: string interpolation in namespaces doesn't actually work yet):

module ModParrot::HLL::perl6;

our %loaded_modules;

# load a Perl 6 handler module
sub load($module)
{
    unless (%loaded_modules{$module}) {
        use $handler;
        %loaded_modules{$module} = 1;
    }
}

# call a Perl 6 response handler
sub handler($name)
{
    my $r = Apache::RequestRec.new();
    load($name);
    my $status = ::($name)::handler($r);
    $status;
}

# call a Perl 6 authentication handler
sub authen_handler($name)
{
    my $r = Apache::RequestRec.new();
    load($name);
    my $status = ::($name)::handler($r);
    $status;
}

When calling a Perl 6 handler, mod_parrot loads this module and calls the individual handler routines according to the Apache configuration. It also provides the interface to Apache, including the Apache::RequestRec class needed by mod_perl6. Everything else it leaves to the Perl 6 compiler.

You might think this code doesn't actually do much, and that's the point. It's really just a simple thunking layer between mod_parrot and your handlers, enforcing the rules of the mod_perl6 implementation. For example, in mod_perl, an Apache::RequestRec object is passed to all response handlers. This layer is responsible for making sure that happens.

As the Perl 6 compiler matures and mod_parrot adds more functionality, this version of mod_perl6 will inevitably change. But what you see above will remain at its core -- loading Perl 6 modules, juggling arguments, and passing control to handler subroutines. And the fact that it's pure Perl 6 will enable scores of Perl programmers to hack on it without having to know anything about Parrot or C programming. Take that, XS.

PHP on mod_parrot

After only 30 minutes of hacking together the code, mod_parrot now supports an implementation of PHP! Plumhead is a simple PHP interpreter utilizing the Parrot Compiler Toolkit (PCT), and comes bundled with Parrot. Because it uses PCT, it plugs right into mod_parrot with minimal effort. The I/O subsystem is still a complete kludge, as I'm using Parrot's string I/O layer to capture output and feed it back to Apache, but that will be worked out eventually.

Here's the code I was able to run:

Hello
<?php
    echo "World!\n";
?>
I am here.

I HATE legacy systems!

Unless you're working at a very young company or only work on new projects, you will undoubtedly run into legacy systems some point. These are systems that you didn't design, that nobody else understands, are likely poorly documented, and are now your responsibility to maintain. I just migrated such a system from one data center to another, and tested it. It worked for my test scenarios. In the real world? BOOM! Utter disaster. I found the problem, but that doesn't make me feel any better -- I have a lot of pride in the quality of my work. Unfortunately, even with documentation, you can't possibly understand every little nuance of systems you didn't design. Unless your testing methodologies are akin to those of NASA (and I've been through test plans like that), something is very likely to go wrong.

So, two lessons to take away from this:

  • Always run your final tests in a production-like environment.
  • EXPECT PAIN!

How mod_perl6 works

NOTE: This article assumes familiarity with mod_perl, Parrot, Perl, and Apache. If any of the code or concepts presented here are unfamiliar to you, please see the references at the end of the article.

Writing the world's first mod_perl6 handler and actually seeing it work was quite a moment for me. It validated years of hard work from myself and the rest of the Parrot, Perl6, and Pugs developers. Here is an example of a mod_perl6 response handler:

sub polly_handler($r)
{
    $r.puts("<h1>Polly, a mod_perl6 handler</h1>\n");
    $r.puts("SQUAWK!  Polly says "~$r.args());
    0; # Apache OK
}

"Polly" takes the contents of the HTTP query string and returns a simple HTML page repeating the string back to the client. Pretty simple. But what's going on behind the scenes is much more complex and actually pretty exciting! So let's take a little trip, and follow the lifecycle of a mod_perl6 response handler through Apache, mod_parrot, mod_perl6, and back again.

Mapping the request to mod_parrot

We'll put our handler code in /myhandlers/polly.p6 and call it with the following simple URI: /polly. We map that location to the mod_parrot handler parrot-code with the following Apache location block:

<Location /polly>
    SetHandler parrot-code
    ParrotHandler /myhandlers/polly::polly_handler
    ParrotLanguage perl6
</Location>

ParrotHandler tells mod_parrot where to find the code for this handler, and that the name of the handler subroutine is polly_handler. Right now it requires a path and a unique subroutine name, but will accept module names once the Perl6 implementation supports namespaces. Note that the .p6 extension will be appended automatically.

The ParrotLanguage directive tells mod_parrot that this is Perl6 code. mod_parrot can support any language that runs on the Parrot virtual machine, so we need to be explicit here.

Let's say our full request looked like this:

/polly?want_a_cracker

Apache will now take that request, map it to our location block and our mod_parrot handler, and place "want_a_cracker" in the args member of Apache's request_rec structure.

Initializing Parrot

Parrot is initialized not at Apache startup, but on demand for each httpd process. At the time of this writing, this happens at the first invocation of a Parrot handler. This behavior will change in a future release, as there are benefits to starting the interpreter earlier.

Once the interpreter has been initialized, it performs a few tasks. The first task is to load the initialization code stored in mod_parrot.pbc. This is Parrot bytecode that declares various mod_parrot namespaces, maps the Apache API to Parrot NCI functions, and loads other supporting libraries for accessing Apache's internal data structures, such as the request_rec structure and APR tables.

Read that last paragraph again. One more time for good measure. Now think about what you've read. mod_parrot is handling the interaction with Apache for us. Language modules like mod_perl6 don't need to touch the Apache internals directly, which makes the code for those modules much simpler. In fact, I expect that the majority of mod_perl6 will eventually be written in pure Perl6, relying on the Parrot backend for the nitty-gritty details of dealing with Apache.

Loading and parsing the Perl6 handler

Now that mod_parrot's interpreter has been initialized, it can load the code for our handler. But what do Parrot or mod_parrot know about the naming conventions, file layout, or module loading semantics of any particular language? Nothing! So mod_parrot handles this in a very graceful manner: it delegates the work to someone else!

That someone else is called the HLL layer (High Level Language layer). At the very minimum the HLL layer is responsible for loading, compiling, and running the language handlers. It can also serve as a proxy between the HLL and mod_parrot's Apache API, creating language-specific objects to pass to handlers, or calling handlers with different arguments than mod_parrot would by default. Anything you need to change about mod_parrot to suit your language's implementation belongs here. The HLL layer can be written in PIR (Parrot Intermediate Representation) or the high level language itself (e.g. Perl6). For now, the Perl6 HLL layer is written in PIR.

Each HLL layer defines various subroutines that will be called by mod_parrot (subject to change):

_load(handler_name) should load and compile the code for handler_name.

_config() is reserved for use during the Apache configuration phase and is not currently implemented.

_*handler(handler_name) should run the code for handler_name, calling _load() if necessary. There is one subroutine for each handler in the Apache lifecycle.

A practical example of HLL layer functionality is the ".p6" extension on our handler's name. mod_perl6's _load() subroutine takes care of appending it for us so we don't need to add it to every ParrotHandler directive.

Executing the mod_perl6 handler

When our handler for Polly is called, mod_parrot calls the Perl6 _handler() subroutine, passing it the name of the handler, /myhandlers/polly::polly_handler, as an argument. _handler() loads the code for this handler if it hasn't already by calling _load(), and then executes it.

The handler can interact with mod_parrot and Apache in various ways during execution. The most common way is through objects that are passed as arguments or reqeusted directly from mod_parrot. For example, unless the HLL layer says otherwise, response handlers are passed a Parrot ['Apache';'RequestRec'] object. This object lets us access various aspects of the request in Apache's request_rec structure, such as the query string that our handler will use.

Perl6 will place the ['Apache';'RequestRec'] object in $r, as specified in our handler's declaration:

sub polly_handler($r)

Perl6 objects are implemented as Parrot objects, therefore a method call on a Perl6 object will invoke the underlying Parrot method, whether the object was created in Perl6, PIR, or even another Parrot-based language. mod_perl6 exploits this behavior for its own benefit. In our handler we call two methods: args() to retrieve the query string, and puts() to output strings to the client:

$r.puts("SQUAWK!  Polly says "~$r.args());

Internally, these methods are actually PIR that call C functions to interact with Apache. In the case of puts(), Parrot calls Apache's ap_puts() function:

    dlfunc func, nul, "ap_rputs", "itp"
    set_root_global [ 'Apache'; 'NCI' ], "ap_rputs", func

...

.sub puts :method
    .param string data
    .local pmc r
    .local pmc ap_rputs

    getattribute r, self, 'r'
    ap_rputs = get_root_global [ 'Apache'; 'NCI' ], 'ap_rputs'
    ap_rputs( data, r )
.end

args() is a bit more complex, as Apache doesn't provide an API function for getting or setting this value. So mod_parrot provides its own C function to maniupulate the request_rec structure:

char *mpnci_request_rec_args(Parrot_Interp interp, request_rec *r, char *args, int update_r)
{
    if (update_r == 1) {
        r->args = (char *)apr_pstrdup(r->pool, args);
    }
    return r>args;
}

It also provides a corresponding PIR method for calling that function:

.sub args :method
    .param string data :optional
    .param int update_r :opt_flag
    .local pmc r
    .local pmc request_rec_args
    .local string args

    getattribute r, self, 'r'

    if update_r goto call_it
    data = ""
call_it:
    request_rec_args = get_root_global  '_modparrot'; 'NCI' ], 'request_rec_args'
    args = request_rec_args( r , data, update_r )

    .return(args)
.end

Passing control back to Apache

The last thing a handler does is returns a status code to Apache to indicate whether the request was handled successfully or not, or whether it has declined to process it. A status of 0 (OK) tells Apache that the request was handled successfully. However, since return has not yet been implemented for Perl6, we declare the status alone on its own line, as Perl will by default use the last value seen as a subroutines's return value:

0; # Apache OK

Upon successful execution of a handler subroutine, mod_parrot will pass the return value and control of the request back to Apache. From there the data is sent to the client, and all is well with the world.

Persistence

Like Perl interpresters in mod_perl, Parrot interpreters in mod_parrot and their data are persistent. So suppose our handler is called again. Assuming the request hits the same Apache process (each process has its own interpreter), it will not bother initializing an interpreter or loading the code; it will just run the handler. Additionally, any global variables will retain their values, so it's easy to maintain persistent data structures like caches or counters.

Concluding remarks

I hope this has been an informative overview of how mod_perl6 works, as well as a peek into the internals of mod_parrot. Remember that most of the concepts and processes described here don't just apply to mod_perl6, but to any other language running on the Parrot VM. For example, assuming someone writes a compiler for Python, mod_python could be implemented in the same fashion as mod_perl6. And a language like PHP would be trivial to implement using mod_parrot, as it doesn't require the level of access to Apache internals as mod_perl6 does. Anyone feel like writing a compiler? ;-)

References

World's first mod_perl6 handlers

From my post to parrot-porters:

It gives me great pleasure to introduce you to the world's first mod_perl6 handlers! They are run using Parrot's Perl6 compiler on top of mod_parrot, and are compiled on the fly the first time a handler is called. Each handler is passed an [Apache;RequestRec] object instantiated by mod_parrot, and the handlers can call methods on that object from Perl6 land.

First is Polly. Polly repeats everything from the query string of a URL. It uses the puts() method for output and args() to retrieve the query string from Apache.

sub polly_handler($r)
{
    $r.puts("<h1>Polly, a mod_perl6 handler</h1>\n");
    $r.puts("SQUAWK!  Polly says "~$r.args());
    0; # Apache OK
}

Second is the counter, which increments a counter each time it is called. It demonstrates the persistence of the interpreter and proper scoping of the counter variable using our. Since each Apache process has its own interpreter, the count might seem to jump between calls, espcially if your browser isn't using keepalives.

sub counter_handler($r)
{
    our $x;
    unless ($x) {
        $x = 1;
    }
    $r.puts("<h1>Hello, I'm a mod_perl6 response handler!</h1>\n");
    $r.puts("Page views for this interpreter: $x\n");
    $x++;
    0; # Apache OK
}

mod_parrot Registry Scripts

mod_parrot now supports registry scripts for the Perl6 and NQP languages. Behold:

sub handler($r)
{
    our $x;
    unless ($x) {
        $x = 1;
    }
    print "<h1>Hello, I'm a mod_perl6 registry script!</h1>\n";
    print "Page views for this interpreter: $x\n";
    $x++;
}

And the corresponding Apache configuration for the script directory:

Alias /perl6-bin/ /home/jeff/mod_parrot/perl6-bin/
<Directory /home/jeff/mod_parrot/perl6-bin>
    SetHandler parrot-code
    ParrotLanguage perl6
    ParrotHandler ModPerl6::Registry
    Order allow,deny
    Allow from all
</Directory>

This script returns a page with number of times it's been called for that instance of the parrot interpreter. It demonstrates the persistence of the interpreter by storing the count in the $x variable between requests. Pretty simple, but it shows how mod_parrot can load, compile, and run code from a file in any supported language.

OSCON wrapup

Just got back from Portland, and while the trip was short, it was fun and completely worth it. The Petfinder development team was there, and as we're all remote it's good to see each other face to face every so often. I also reconnected with some folks I haven't seen in a couple years, and, with some help from particle, resurrected mod_parrot from its feathery ashes. More on that soon.

As for the conference itself, there was a mix of good and not so good, as should be expected. Some of the highlights for me were Simon Peyton-Jones' keynote on software transactional memory, Tim Bunce's session on DBI::Gofer, which may prove to be useful at the day job, and Jonathan Worthington's .NET to Parrot bytecode talk. He may be crazier than I am. Chris Shiflett's Security 2.0 talk was good as well -- no surprise there. (You're welcome, Chris. ;-)

Of course the hallway track was the highlight for me, as always. I spent a lot of time hacking on mod_parrot and catching up with everyone, but I still missed a few people I knew were there. Oh well.

And now I'm off to get some rest. That redeye flight back to Philly was not pleasant at all.

GPL madness

Linus went on a rant yesterday about interpretations of the GPL. As he said, it's a legal license, and in my words, not some moral lesson to be learned at the end of an after-school special (RMS might think differently). Linus takes offense to the FSF defining what "freedom" is, and while I do think he's a bit jaded by the whole thing, I tend to agree with him. Personally, while I respect the GPL for what it is, and adhere to it where necessary, I never have, and never will never release any of my original code under the GPL. Quite simply, I don't like forcing users of my software to do something just because they've bundled or integrated my software into theirs.

I think it's great that we have so many licensing models to choose from. GPL is a perfectly valid option, and I respect it when I see it. I've just chosen not to use it for my software. Respect that.

Road Trip

And suddenly, I'm going to Portland for OSCON. The day job is paying for a bunch of us to go out this year, and I'm excited to be able to reconnect with people I haven't seen since YAPC 2005 in Toronto. Due to a prior commitment, I'll only be there Tuesday afternoon through Thursday evening, so track me down quick if I owe you a beer. :)

Linux Device Mapper

I just found a great article on the Linux device mapper, which explains how it works behind the scenes for things like LVM and encrypted file systems. It's two years old, but still valid, and a good read for anyone interested in the wonders of Linux block device management. That's what gets you up in the morning, right? ;-)

Syndicate content