Embedding Perl in HTML with Mason Chapter 12: Custom Mason Subclasses- P1

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:22

Thêm vào BST

Báo xấu

76
lượt xem 8
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Tham khảo tài liệu 'embedding perl in html with mason chapter 12: custom mason subclasses- p1', công nghệ thông tin, kỹ thuật lập trình phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Embedding Perl in HTML with Mason Chapter 12: Custom Mason Subclasses- P1

Chapter 12: Custom Mason Subclasses- P1 Something that we have tried very hard to do beginning with the 1.10 release of Mason is to make it easier to customize Mason's behavior. Jon Swartz was already on this track even back with the release of 0.80, which saw the first appearance of the HTML::Mason::Resolver classes, but 1.10 tries to bring this to new levels. Starting with 1.10 it has become possible to subclass almost every core class that comes with Mason. Some obvious candidates for subclassing include the Lexer, Compiler, and Resolver. This chapter will demonstrate how you might go about implementing subclasses of various Mason objects. Class::Container as a Superclass A number of modules in Mason are subclasses of Class::Container. This is a class that was created to encapsulate some common behaviors for Mason objects. Originally, it was called HTML::Mason::Container, but Ken Williams decided to package this class separately and release it to CPAN, as it solves some fundamental problems of a large object-oriented system. Any Mason object that takes parameters to its constructor must inherit from this module. Of course, since all of the classes that you might consider subclassing inherit from Class::Container already, you shouldn't need to inherit from it directly. However, you may need to use some of its methods. We will briefly cover a few of them here, but see the Class::Container documentation for more details. The modules in the Mason core distribution that are Class::Container subclasses are HTML::Mason::ApacheHandler,
HTML::Mason::CGIHandler, HTML::Mason::Interp, HTML::Mason::Compiler, HTML::Mason::Lexer, HTML::Mason::Resolver, and HTML::Mason::Request. The most important methods that Class::Container provides are valid_params() and contained_objects(), both of which are class methods. The first, valid_params(), is called in order to register the valid parameters for a class's new() constructor. The second method, contained_objects(), is used to register the objects, if any, that a given class contains. The contained_objects() method is not something you will have to use for all of your subclasses, since most of the time you won't be altering the structure of Mason's framework, you'll just be plugging your own classes into it. This method is called with a hash that contains as its keys parameter names that the class's constructor accepts and as its values the default name of the contained class. For example, HTML::Mason::Compiler contains the following code: __PACKAGE__->contained_objects( lexer => 'HTML::Mason::Lexer' ); This says that the HTML::Mason::Compiler->new() method will accept a lexer parameter and that, if no such parameter is given, then an object of the HTML::Mason::Lexer class will be constructed. Class::Container also implements a bit of magic here, so that if HTML::Mason::Compiler->new() is called with a lexer_class
parameter, it will load the class, instantiate a new object of that class, and use that for the lexer. In fact, it's even smart enough to notice if parameters given to HTML::Mason::Compiler->new() are really intended for this subclass, and it will make sure that they get passed along. The valid_params() method is a bit more complex. It also takes a list of key/value pairs as arguments. The keys are the names of parameters accepted by the new() method, while the values are hash references defining a validation specification for the parameter. This specification is largely the same as that used by the Params::Validate module, with a few additions (but no subtractions). One addition is that each parameter, excluding those that represent contained objects, may also define a value for parse. This tells Mason how to parse this parameter if it is defined as part of an Apache configuration file. If no parse parameter is provided, a sensible default will be guessed from the value of the Params::Validatetype argument. The upshot of this is that your subclasses can define their own constructor parameters and Mason will then check for these parameters in an Apache configuration file. As an example, HTML::Mason::Compiler contains the following: __PACKAGE__->valid_params ( allow_globals => { parse => 'list', type => ARRAYREF, default => [ ],
descr => "An array of names of Perl variables that are" . " allowed globally within components" }, default_escape_flags => { parse => 'string', type => SCALAR, default => '', descr => "Escape flags that will apply by default to" . " all Mason tag output" }, lexer => { isa => 'HTML::Mason::Lexer', descr => "A Lexer object that will scan component" . " text during compilation" }, preprocess => { parse => 'code', type => CODEREF, optional => 1, descr => "A subroutine through which all component text" .
" will be sent during compilation" }, postprocess_perl => { parse => 'code', type => CODEREF, optional => 1, descr => "A subroutine through which all Perl code" . " will be sent during compilation" }, postprocess_text => { parse => 'code', type => CODEREF, optional => 1, descr => "A subroutine through which all plain text will" . " be sent during compilation" }, ); __PACKAGE__->contained_objects( lexer => 'HTML::Mason::Lexer' ); The type , default, and optional parameters are part of the validation specification used by Params::Validate. The various
constants used, ARRAYREF , SCALAR, and so on, are all exported by Params::Validate. The parameters passed to valid_params() correspond to the MasonAllowGlobals, MasonDefaultEscapeFlags, MasonLexerClass, MasonPreprocess, MasonPostprocessPerl, and MasonPostprocessTexthttpd.conf configuration variables. Yes, Class is added automatically to the lexer param because lexer was also given to the contained_objects() method. The descr parameter is used when we generate the HTML::Mason::Params documentation and is probably not something you'd need to use. For more details, see both the Class::Container and Params::Validate documentation. Syntax: Your Very Own Lexer A request heard every so often on the Mason users list is for some way to create an XML-based markup language that can be used with Mason and that can be compiled to a Mason component object. Despite the panic the thought of such a thing inspires in us, in the interests of good documentation, we will show the beginnings of such a lexer. This lexer object will make use of several modules from CPAN, including XML::SAX::ParserFactory and XML::SAX::Base . The former is what it sounds like, a factory for SAX parsers (SAX2 parsers, actually). The latter is what any SAX2 handler should use as a base class. It implements a default no-op method for all the possible SAX2 methods, allowing you to
simply implement those that you need. Our lexer will be a SAX2 handler, so we will inherit from XML::SAX::Base. A quick side note on SAX (Simple API for XML): SAX is an event-based API for parsing XML. As the parser finds XML constructs, such as tags or character data, it calls appropriate methods in a SAX handler, such as start_element() or characters(). The parser is an event producer and the handler, like our Lexer, is an event consumer. In our case, the Lexer will also be generating events for the Compiler, though these will not be SAX events. For more information on Perl's implementation of SAX2, see the perl-xml project on Sourceforge at http://perl-xml.sourceforge.net/. For the purposes of our example, let's assume that any element that is not in the mason XML namespace will be output verbatim, as will any text. For tags, we'll just implement , , , and in this example.1 The tag will contain XML-escaped Perl code, while the tag will contain zero or more tags. Each tag will have the attributes name and default , with name being required. We will also implement a tag in order to provide a single top-level containing tag for the component, which is an XML requirement. This is only a subset of the Mason syntax set, but it's enough to show you how to customize a fairly important part of the system. Using these tags, we might have some XML like this:
This is plain text. This is text in an HTML tag my $x; if ($y > 10) { $x = 10; } else { $x = 100; } $x is $x $y is $y $y *= $_ foreach @z;
OK, that looks just beautiful! Let's start with the preliminaries. package HTML::Mason::Lexer::XML; $VERSION = '0.01'; use strict; use HTML::Mason::Exceptions( abbr => [ qw( param_error syntax_error error ) ] ); use HTML::Mason::Lexer; use Params::Validate qw(:all); use XML::SAX::Base; use XML::SAX::ParserFactory; use base qw(HTML::Mason::Lexer XML::SAX::Base); # Lexer comes first As mentioned before, XML::SAX::Base provides default no-op methods for all of the possible SAX2 events, of which there are many. Since we're not interested in most of them, it's nice to have them safely ignored. We inherit from HTML::Mason::Lexer because it provides a few methods that the compiler class needs, such as object_id().
Because we're staunch generalists, we won't insist that the XML namespace of our tags needs to be ' mason' . We'll let the user override this with a parameter if desired: __PACKAGE__->valid_params ( xml_namespace => { parse => 'string', type => SCALAR, default => 'mason', descr => "Prefix of XML tags indicating Mason sections" }, ); We don't need to make a separate new() method in our module, since we can just inherit the one provided by our base Lexer class. The main action will happen in the lex() method: sub lex { my ($self, %p) = @_; local $self->{name} = $p{name}; local $self->{compiler} = $p{compiler}; We need a convenient place to keep these, so we stick them into $self for the duration of lexing. Perl's local() function makes sure these entries expire at the end of the lex() method: $self->{state} = [ ];
We'll need to keep a stack of what tags we've seen so we can check that tags aren't improperly nested and in order to handle characters() events correctly: my $parser = XML::SAX::ParserFactory->parser( Handler => $self ); We could have created the parser object in our new() method, but to store it we would have had to save it in the lexer object's structure, which would have created a circular reference. Doing it this way guarantees that the reference to the parser will go out of scope when we're finished using it. $parser->parse_string( $p{comp_source} ); } The last bit tells the parser to parse the component text we were given. That will cause the parser to in turn call methods for each SAX event that occurs while parsing the string. Now we'll take a look at our event-handling methods. The first is start_element() , which will be called whenever an XML tag is first encountered: sub start_element { my $self = shift; my $elt = shift; if ( ! defined $elt->{Prefix} || $elt->{Prefix} ne $self->{xml_namespace} ) {
$self->_verbatim_start_element($elt); return; } If we got something that isn't in our designated namespace we'll just pass it through to the compiler as text to be output: if ( $elt->{LocalName} eq 'component' ) { $self->{compiler}->start_component; } When the component starts, we notify the compiler so it can do any initialization that it needs to do: foreach my $block ( qw( init perl args ) ) { if ( $elt->{LocalName} eq $block ) { $self->_start_block($block); last; } } if ( $elt->{LocalName} eq 'output' ) { $self->_start_output; } if ( $elt->{LocalName} eq 'arg' ) {
$self->_handle_argument($elt); } } The rest of this method is basically a switch statement. Depending on what type of element we receive, we call the appropriate internal method to handle that element. Let's look at some of the individual methods that are called: sub _verbatim_start_element { my $self = shift; my $elt = shift; my $xml = '
} $xml .= '>'; $self->{compiler}->text( text => $xml ); } Basically, this method goes through some contortions to regenerate the original XML element and then passes it on to the compiler as plain text. It should be noted that this implementation will end up converting tags like into tag pairs like . This is certainly valid XML but it may be a bit confusing to users. Unfortunately, there is no easy way to retrieve the exact text of the source document to determine how a tag was originally written, and with XML you're not supposed to care anyway. Back to our subclass. The next method to implement is our _start_block() method. This will handle the beginning of a number of blocks in a simple generic fashion: sub _start_block { my $self = shift; my $block = shift; if ( $self->{state}[-1] && $self->{state}[-1] ne 'def' && $self->{state}[-1] ne 'method' ) {
syntax_error "Cannot nest a $block tag inside a $self->{state}[-1] tag"; } What we are doing here is making it impossible to do something like nest a tag inside a block. In fact, the only tags that can contain other tags are method and subcomponent definition tags, which are unimplemented in this example. We notify the compiler that a new block has started and then push the block name onto our internal stack so we have access to it later: $self->{compiler}->start_block( block_type => $block ); push @{ $self->{state} }, $block; } Again, we check for basic logical errors: sub _start_output { my $self = shift; if ( $self->{state}[-1] && $self->{state}[-1] ne 'def' && $self->{state}[-1] ne 'method' ) { syntax_error "Cannot nest an output tag inside a $self->{state}[-1] tag";
} Again, we push this onto the stack so we know that this was the last tag we saw: push @{ $self->{state} }, 'output'; } The variable name and default are expressed as attributes of the element. The weird '{}name' syntax is intentional. Read the Perl SAX2 spec mentioned earlier for more details on what this means. sub _handle_argument { my $self = shift; my $elt = shift; my $var = $elt- >{Attributes}{'{}name'}{Value}; my $default = $elt- >{Attributes}{'{}default'}{Value}; We want to check that the variable name is a valid Perl variable name: unless ( $var =~ /^[\$\@%][^\W\d]\w*/ ) { syntax_error "Invalid variable name: $var"; } Then we tell the compiler that we just found a variable declaration.
$self->{compiler}->variable_declaration( block_type => 'args', type => substr( $var, 0, 1 ), name => substr( $var, 1 ), default => $default ); } That wraps up all the methods that start_element() calls. Now let's move on to handling a characters() SAX event. This happens whenever the SAX parser encounters data outside of an XML tag. sub characters { my $self = shift; my $chars = shift; if ( ! $self->{state}[-1] || $self->{state}[-1] eq 'def' || $self->{state}[-1] eq 'method' ) { $self->{compiler}->text( text => $chars- >{Data} ); return; }
If we're in the main body of a component, subcomponent, or method, we simply pass the character data on as text: if ( $self->{state}[-1] eq 'init' || $self->{state}[-1] eq 'perl' ) { $self->{compiler}->raw_block( block_type => $self->{state}[-1], block => $chars->{Data} ); return; } Character data in a or section is passed to the compiler as the contents of that block. The compiler knows what type of tag is currently being processed and handles it appropriately. if ( $self->{state}[-1] eq 'output' ) { $self->{compiler}->substitution( substitution => $chars->{Data} ); } } If we are in a substitution tag, we call a different compiler method instead. Otherwise, we'll simply end up discarding the contents. Since we may be dealing with text where whitespace is significant (as opposed to HTML), we'll want to pass on whitespace as if it were character data:
sub ignorable_whitespace { $_[0]- >characters($_[1]->{Data}) } This method may be called if the XML parser finds a chunk of "ignorable whitespace." Frankly, we can never ignore whitespace, because it is just so cool, and without it our code would be unreadable. But apparently XML parsers can ignore it.2 The last thing we need to handle is an end_element() event: sub end_element { my $self = shift; my $elt = shift; if ( ! defined $elt->{Prefix} || $elt->{Prefix} ne $self->{xml_namespace} ) { $self->_verbatim_end_element($elt); return; } Again, XML elements not in our designated namespace are passed on verbatim to the compiler: if ( $elt->{LocalName} eq 'component' ) { $self->{compiler}->end_component; return; }
If we have reached the end tag of the component, we inform the compiler that the component is complete and we return: return if $elt->{LocalName} eq 'arg'; We don't need to do anything to end argument declarations. The work needed to handle this element happened when we called _handle_argument() from our start_element() method. if ( $self->{state}[-1] ne $elt->{LocalName} ) { syntax_error "Something very weird happened. " . "We encountered an ending tag for a $elt->{LocalName} tag " . "before ending our current tag ($self->{state}[-1])."; } Actually, this should just never happen: XML does not allow tag overlap and, if the parser finds overlapping tags, it should die rather than passing them to us. But we believe in being paranoid. If there is an error in the logic of this lexer code, this might help us in catching it. if ( $elt->{LocalName} eq 'output' ) { pop @{ $self->{state} }; return; }