Blogs

mod_parrot HLL Module Developer's Guide

Thinking about helping out with mod_perl6? Crazy enough to embed another Parrot language in Apache? Then you'll want to read the new mod_parrot HLL Module Developer's Guide at http://trac.parrot.org/parrot/wiki/ModParrotHLLDocs. It's a first draft, but it's pretty complete. Comments welcome!

Got me some ESX

I often use VMware to test new operating systems or to test software like Parrot and mod_parrot on multiple operating systems.

Ever since the announcement of VMWare ESXi, I've been pining for an ESX server of my own. I already run VMware desktop, and I've run VMware Server, but ESX is nice because it runs on the bare metal, no underlying host OS required, giving you better performance, flexibility and scalability. The problem is that driver support for ESX is very limited, making it quite finicky about what hardware you have.

So while I was deciding whether to build my own server (~$900 for what I wanted) or buy a used Dell server (~$1200 for a Poweredge 1900), I came across this deal at Microcenter: a Dell Inspiron 530 mini-tower desktop, E2220 Dual Core 2.4 GHz CPU (1 MB L2 cache), 320 GB hard drive and 3 GB RAM for just over $400. Not the greatest specs for a server, and I wished it were quad core, but this is for a home lab, and it would be half the cost of building it myself. I knew ESX worked with the SATA controller on the 530's motherboard (ICH9), but it didn't like the onboard NIC. So I picked one up last weekend along with a new hard drive so I could leave the bundled drive intact in case I needed to return it. When I got home I ordered an Intel Gigabit Pro NIC, which finally came today. I threw it in and everything worked!

I'm currently uploading ISOs so I can get the lab going, and I'll spend the next few days migrating services and files from my old server "groovy". She was a good server, but it's time I gave her barely adequate Celeron bones some rest.

UPDATE: The new virtual groovy builds parrot in 2:40, blowing away the 7:08 it took the old decrepit groovy. The respective numbers for building Rakudo: 46 seconds versus 2:27. This is really going to help my development workflow, and this is just a measly desktop. Sweet.

The release that wasn't

I was planning to release mod_parrot 0.5 this week. All of the milestones were completed. All the tests passed. And then the pdd22io branches were merged back into Parrot trunk, and ModPerl6::Registry and mod_pipp broke very loudly. DOH!

Here's the scoop.

Perl 6 registry scripts and PHP scripts running under mod_pipp relied on Parrot's (former) string IO layer to capture output from scripts and feed it to Apache. It was suboptimal and I knew this layer was going away, but it was the easiest solution to the problem at the time I wrote the modules. Now, with the PDD22 IO changes in place, Parrot has a very nice FileHandle PMC that I can subclass to do my bidding. Unfortunately, the subclassing doesn't work yet because Parrot's IO functions still poke at the internals of the PMC and Parrot's opcodes ignore the PMC's methods, rendering any subclass useless. DOUBLE DOH!

What I've done in the meantime is sick and twisted, yet very "Perl 6-ish". When a registry script is requested, I rebind Rakudo's say and print builtins to my own versions which send the output to Apache. I then run the script, resetting the bindings afterward. It all works quite nicely. Of course this only works for mod_perl6, but that's my primary concern right now.

The point of all of this is that while ModPerl6::Registry is usable again, I'm delaying the release of 0.5 until I have a more palatable solution in place. Hopefully it's a subclassed FileHandle, but Allison may also provide a stringified version of the FileHandle PMC as a stop-gap solution.

November in December

With some relatively minor tweaks, I have November running as a mod_perl6 registry script. This is the first third-party application to run under mod_perl6, and it's a fairly complex one at that. The performance gains are minimal since November precompiles its modules, but it still runs faster than CGI because the bytecode stays in memory, class definition and data persist, etc. I could also reduce the memory footprint by preloading modules with Perl6Module, but I'm not feeling that ambitious today. :)

And Jeff said, "Let there be merging"

In order to support per-directory options for mod_perl6, I had to implement configuration merging, which I've been avoiding like the plague. But it's done, and it looks like this (some code omitted for brevity):

sub server_merge(%base, %new)
{
    my %merged;

    # merge handlers -- never inherit
    for @server_phases.map({$_ ~ '_handler'}) -> $h {
        %merged{$h} = %new{$h};
    }

    return %merged;
}

sub dir_merge(%base, %new)
{
    my %merged;

    # merge options -- inherit only if not set
    %merged = {};
    for %valid_options.keys -> $k {
        %merged{$k} = %new.exists($k) ??
            %new{$k} !! %base{$k};
    }

    # merge handlers -- never inherit
    for @dir_phases.map({$_ ~ '_handler'}) -> $h {
        %merged{$h} = %new{$h};
    }

    return %merged;
}

I still think it's cool that this is written in Perl 6. mod_parrot handles all the hard stuff behind the scenes so mod_perl6 code can stay lean and mean!

mod_parrot Documentation

I've finally begun the process of writing detailed documentation for mod_parrot and publishing it on the mod_parrot wiki. Of particular interest are the shiny new roadmap and an architecture document for internal developers (about 80% complete). Once I finish the architecture doc, I'll get to work on the HLL module developer's guide and a user's guide.

This means that the mod_parrot page on the Perl Foundation wiki is deprecated. I'll most likely replace it with a link to the new page.

Recent progress

I've made lots of progress recently on both mod_parrot and mod_perl6. Here's what's been happening.

mod_parrot

The most significant addition to mod_parrot is the ability to specify which Apache request phases an HLL module is interested in servicing. This saves a lot of overhead during each phase of a request. For example, mod_perl6 is interested in every phase, since it's designed to hook into every part of the request phase. PHP on the other hand, only serves content, and is thus only interested in the response phase. The whole thing was extremely difficult to implement and debug, but it's done and I'm happy not to have to look at it again!

One gaping hole in mod_parrot was the inability to read the HTTP request body, which is how POST data is sent to the server. That is no longer a problem, as I was able to implement Apache;RequestRec.read(). Many thanks to mod_perl2, for both helping me understand and providing code for me to adapt.

Other goodies include methods for manipulating headers, better error handling in places, and general code cleanup.

mod_perl6

I love writing mod_perl6. It's written in, well, Perl 6! I'll admit it does get frustrating at times -- I still run into a lot of unimplemented features or bugs in Rakudo, but I can usually get a fix fairly quickly or work around it with a little PIR scaffolding. Eventually the only PIR we'll have is the module bootstrapping code, which is minimal. Anyway, onto the details...

I've been putting a lot of effort into ModPerl6::Registry, as I'd like to get November running as a registry app. That said, I added support for more CGI environment variables, including HTTP_COOKIE, CONTENT_LENGTH, and even a MODPERL6 variable so scripts can detect if they're running under mod_perl6. I would have loved to get POST data tied to STDIN, but the IO support just isn't there yet in Parrot (shouldn't be long though). The last remaining headache was parsing response headers from STDOUT, but I believe this will require some output filter magic that I'm not ready to tackle yet. So as it stands now, November does run as a registry script, but it needs a few patches here and there, and its CGI implementation needs some mod_perl6 awareness. So it's close, but no cigar.

Probably the most useful feature I implemented was module preloading with Perl6Module. Due to the current parsing overhead of Rakudo, using Perl6Module can shave a large chunk of time off the initial request to a handler, and save memory as well.

At my request (more like begging), Jonathan and Patrick added support to Rakudo for passing flattened arrays to subroutines, and having their elements mapped to arguments. This is the behavior we're used to in Perl 5, and I needed it for implementing different handler forms. So now we have regular handlers, where we specify the module name, and mod_perl6 calls the handler subroutine. But now we also have literal handlers, where you can specify a subroutine name other than handler. This is useful when you want a single module to handle multiple phases. And we also have method handlers (Perrin asked me about this at YAPC), where you provide a class name, and the handler is called as a method. This allows your handler to take advantage of inheritance and other OO goodies.

With all of these new features, the code is growing, and it's taking longer and longer to start and restart Apache for testing. I tried precompiling mod_perl6.pm to PIR, and guess what -- it shaved about 90% off the startup time! Needless to say, I added support for that into the bootstrapping code, but you'll have to precompile it manually until I break the project out from mod_parrot and create a proper Makefile for it.

Next steps

I'm extremely pleased with the progress I've made over the past few months, and I owe many thanks to everyone on #parrot and #perl6 for implementing functionality and fixing bugs in Parrot and Rakudo. I'm very close to a release of mod_parrot (0.5). Once that is complete I'll break off mod_perl6 into a separate project. I'd also like to have a more formal roadmap for each, which was difficult in the past due to the immaturity of Parrot and Rakudo at the time. But we don't really have that problem anymore. Fun times ahead!

Re-architecting mod_parrot

WARNING: This post is long and technical. Approach with caution.

The design of mod_parrot has always closely followed mod_perl2. After all, mod_perl does a lot of what mod_parrot does, at least when it comes to integrating with Apache. And up to this point that design has held up, whether we were using just one language, or multiple languages (e.g. mod_perl6 and PHP on the same server).

That all changed two weeks ago when I was thinking about how to support multiple high-level languages (HLLs) with handlers in the same Apache phase. A good example of this would be a directory with two different acceptable authentication schemes, written in two different languages. Both would have register an authentication handler with mod_parrot. Seems simple enough, but there are two major problems with this.

The first problem is that mod_parrot does not support "stacked" handlers, which are multiple handlers for a single phase in a particular section. Adding support for stacked handlers would be a fairly simple task, but then that brings us to our other problem: handling the semantics of each phase. If a module handler fails or declines, does mod_parrot immediately pass that status back to Apache, or does it move onto the next stacked handler in the phase? The answer to this question depends on which phase you are in and what status was returned from the handler. I was about to write code to support all of this when I realized something -- I was rewriting Apache!

Apache does exactly what we are trying to do, except with individual Apache modules. With this in mind, the solution was obvious -- every HLL supported by mod_parrot must be represented by a first class Apache module. This is a monumental task, but fortunately I had already added support for adding Apache modules when I implemented custom Apache directives. The remaining work was adding HLL hooks for each phase. This is easy in C. But we can't use C -- the point of mod_parrot is to support new HLLs without requiring C. Unfortunately, Apache wants module hooks to be C functions.

I solved this problem by writing a common set of hooks in C that do nothing but figure out which HLL module is in effect*, and call the corresponding HLL metahandler for that hook (metahandlers implement the semantics of an HLL and call the real handler code). And as an added bonus, refactoring the hook code reduced the size of mod_parrot.c by about 8K. Sweet!

Now we just needed each HLL to manage its own configuration data. Previously, HLLs would register their handlers with mod_parrot, which worked, but wasn't optimal since mod_parrot had to manage all the configuration data. Now that the HLL layer is implemented in a regular Apache module, we can remove that responsibility from mod_parrot. The implementation was fairly straightforward. Using a model similar to the one found in mod_perl2, each HLL can have callbacks to create and merge both server and directory configurations. And of course, callbacks for custom directives can update configurations and, and the metahandlers can read from them so they know what handlers to call, etc.

Ok, that was a mouthful. So what have I accomplished with all of this? Well, mod_parrot is now a much thinner layer between Apache and the various HLL modules. The HLL layer is responsible for much more now, but that results in greater flexibility down the road.

That's all I have the energy to type for now. My next post will demonstrate how to use all this new functionality to create an HLL module of your own!

*Apache makes no effort to tell you what module a hook is currently running in because there is usually a one-to-many mapping of modules to hooks. This is not the case in mod_parrot (a many-to-many mapping), so we maintain an index into a list of HLL modules that always points to the current module.

HLL configuration directives live!

mod_parrot now has support for HLL configuration directives! They're still a bit tedious to configure since everything is in PIR and I'm missing a few Apache constants, but it works! I committed a proof-of-concept directive for mod_perl6 called, well, what else, Perl6ResponseHandler. So this works now:

<Location /foo>
    SetHandler perl6-code
    Perl6ResponseHandler Foo::Bar
</Location>

The only caveat is that you have to use ParrotLoadImmediate to load the perl6 HLL layer, since ParrotLoad doesn't start the interpreter until after the configuration phase. mod_perl has a similar setup with PerlLoadModule. Eventually the loading of HLL layers will be automatic, or at least in their own config file so you don't have to worry about it.

Now for the interesting bits. As I noted in my previous post, I got a proof-of-concept working where I was able to create and load a module at runtime, complete with custom directives; I just needed to refactor it for mod_parrot. It actually wasn't that bad, except for one minor thing. Unlike modules written in an HLL which implement their own directives and modify their own configs, the directives implementing the HLL itself need to modify mod_parrot's configuration. In other words, one module needs to modify another's configuration! This was painful to implement, especially for the directory configuration, but I did it in about 2 days without doing anything shady, and it works.

Here's what Perl6ResponseHandler looks like in the perl6 HLL layer:

.sub __onload :anon :load

    # [skip perl6 initialization code]
 
    # register apache directives
    .local pmc add_module, cmds

    load_bytecode 'Apache/Module.pbc'

    cmds = new 'Array'
    cmds = 1
    $P0 = new 'Hash'

    $P1 = new 'String'
    $P1 = 'Perl6ResponseHandler'
    $P0['name'] = $P1
    $P1 = new 'Integer'
    $P1 = 1 # TAKE1
    $P0['args_how'] = $P1
    $P1 = get_hll_global 'cmd_perl6responsehandler'
    $P0['func'] = $P1
    $P1 = new 'Integer'
    $P1 = 8 # OR_AUTHCFG
    $P0['req_override'] = $P1
    $P1 = new 'String'
    $P1 = "usage: Perl6ResponseHandler handler-name"
    $P0['errmsg'] = $P1
    cmds[0] = $P0

    add_module = get_hll_global [ 'Apache'; 'Module' ], 'add'
    $P1 = add_module("modparrot_perl6_module", cmds)
.end

.sub cmd_perl6responsehandler
    .param pmc args
    .local string handler
    handler = args[0]
    $P0 = get_hll_global ['Apache'; 'Module'], 'modparrot_dircfg_handler'
    $P1 = $P0('perl6', handler)
.end

Of course it will be a bit prettier in the future when I work out the Apache constants and get this working in Perl 6 instead of PIR, but you get the picture. With this major milestone out of the way I can focus on exposing more of the Apache internals and really start kickin' the tires!

HLL Configuration Directives

For the first time in a long while I have copious amounts of free time over the weekend. So I decided to look at what I had long considered the most difficult part of mod_parrot: HLL configuration directives. Turns out they're not so hard after all.

What is an HLL configuration directive? Well, consider what you have to do now to declare a Perl 6 response handler (in httpd.conf):

<Location /foo>
  SetHandler parrot-code
  ParrotLanguage perl6
  ParrotHandler My::Handler
</Location>

These directives violate one of the goals of mod_parrot: stay invisible. But with HLL configuration directives, we could have something like this:

<Location /foo>
  SetHandler perl6-code
  Perl6ResponseHandler My::Handler
</Location>

This makes configuration much simpler, more familiar to current mod_perl developers, and completely hides mod_parrot from view.

One Apache module implements this functionality, and that's mod_perl2, where you can add custom Apache directives from your modules. I did a lot of research on how this is accomplished, and it's actually quite simple:

  • Write a thunking layer for the various configuration signatures to pass configuration info from Apache to the HLL
  • Create a new Apache module dynamically at runtime
  • Create a command_rec structure containing your directives and pointers to the thunking layer
  • Associate the command_rec with the new module
  • Register the module with Apache

Of course the actual implementation is the difficult part, but today I was able to write a simple proof-of-concept. It creates a new module at runtime and adds a directive that writes to the error log when it's encountered in the configuration. Hard part done!

What's next? I have to refactor everything I've done to make it extensible and callable from Parrot. Then I can write the PIR interface for HLLs to add their directives. And if I'm careful about the design, modules like mod_perl6 will be able to reuse the interface to add directives for their own handlers, like mod_perl2 does today. My goal is to have this in a usable state by OSCON. My fingers are crossed!

Syndicate content