Advanced Perl Programming

Advanced Perl ProgrammingSearch this book
Previous: 20.4 Stacks and Messaging ProtocolChapter 20
Perl Internals
Next: 20.6 Easy Embedding API
 

20.5 Meaty Extensions

Having armed ourselves to the teeth with information, and having hand-built a few extensions, we are now ready to exploit SWIG and XS to their hilts. In this section, we'll first look at the type of code produced by XS. As it happens, SWIG produces almost identical code, so the explanation should suffice for both tools. Then we will write typemaps and snippets of code to help XS deal with C structures, to wrap C structures with Perl objects, and, finally, to interface with C++ objects. Most of this discussion is relevant to SWIG also, which is why we need study only one SWIG example. That said, take note that the specific XS typemap examples described in the following pages are solved simply and elegantly using SWIG, without the need for user-defined typemaps.

20.5.1 Anatomy of an XS Extension

To understand XS typemaps, and the effect of keywords such as CODE and PPCODE, it pays to take a good look at the glue code generated by xsubpp. Consider the following XS declaration of a module, Test, containing a function that takes two arguments and returns an integer:

MODULE = Test  PACKAGE = Test
int
func_2_args(a, b)
   int    a
   char*  b

xsubpp translates it to the following (comments in italic have been added):

XS(XS_Test_func_2_args) /*  Mangled function name, with package name */
                        /* added to make it unique                                 */
{
    dXSARGS;            /* declare "items", and init it with                     */
    if (items != 2)     /* the number of items on the stack                    */
        croak("Usage: Test::func_2_args(a, b)");

    {   /* Start a fresh block to allow variable declarations          */
        /* Built-in typemaps translate the stack to C variables       */
        int      a = (int)SvIV(ST(0));
        char*    b = (char *)SvPV(ST(1),na);
        /* RETVAL's type matches the function return                    */ 
        int      RETVAL;

        RETVAL = func_2_args(a, b);
        ST(0)  = sv_newmortal();

        /* Outgoing typemap to translate C var. to stack                */
        sv_setiv(ST(0), (IV)RETVAL);
    }
    XSRETURN(1); /* Let Perl know one return param has been put back */
}

This is practically identical to the code we studied in the section "The Called Side: Hand-Coding an XSUB." Notice how the arguments on the stack are translated into the two arguments a and b. The XS function then calls the real C function, func_2_args, gets its return value, and packages the result back to the argument stack.

Let us now add some of the more common XS keywords to see how they are accommodated by xsubpp. The XS snippet

int
func_with_keywords(a, b)
    int    a
    char*  b
  PREINIT:
    double c;
  INIT:
    c = a * 20.3;
  CODE:
    if (c > 50) {
        RETVAL = test(a,b,c);
    }
  OUTPUT:
    RETVAL

gets translated to this:

XS(XS_Test_func_with_keywords)
{
    dXSARGS;
    if (items != 2)
        croak("Usage: Test::func_with_keywords(a, b)");
    {
        int     a = (int)   SvIV(ST(0));
        char*   b = (char *)SvPV(ST(1),na);
        double  c;                   /* PREINIT section                    */
        int     RETVAL;
        c = a * 20.3;                /* INIT section                           */
        if (c > 50) {                /* CODE section                        */
            RETVAL = test(a,b,c);    /* Call any function                  */
        }
        ST(0) = sv_newmortal();      /* generated due to OUTPUT      */
        sv_setiv(ST(0), (IV)RETVAL);
    }
    XSRETURN(1);
}

As you can see, the code supplied in PREINIT goes right after the typemaps to ensure that all declarations are complete before the main code starts. The location is important for traditional C compilers, but would not be an issue for ANSI C or C++ compilers, which allow variable declarations anywhere in a block. The INIT section is inserted before the automatically generated call to the function or, in this case, before the CODE section starts. The CODE directive allows us the flexibility of inserting any piece of code; without it, xsubpp would have simply inserted a call to func_with_keywords(a,b), as we saw in the prior example.

The CODE keyword behaves like a typical C call: you can modify input parameters, and you can return at most one parameter. To deal with a variable number of input arguments or output results, you need the PPCODE keyword. To illustrate the implementation of PPCODE, consider a C function, permute, that takes a string, computes all its permutations and returns a dynamically allocated array of strings (a null-terminated char**). Let's say that we want to access it in Perl as follows:

@list = permute($str); 

We use PPCODE here because the function expects to return a variable number of scalars. The following snippet of code shows the XS file:

void
permute(str)
   char *     str
  PPCODE:
   int i = 0;
  
   /* Call permute. It returns a null-terminated array of strings */
   char ** ret = permute (str);

   /* Copy these parameters to mortal scalars, and push them onto 
    * the stack */
   for ( ; *ret ; ret++, ++i) {
       XPUSHs (sv_2mortal(newSVpv(*ret, 0)));
   }
   free(ret);
   XSRETURN(i);

This gets translated to the following:

XS(XS_Test_permute)
{
    dXSARGS;
    if (items != 1)
        croak("Usage: Test::permute(str)");

    /* PPCODE adjusts stack pointer (CODE does not do this) */
    SP -= items;

    {
       char *  str = (char *)SvPV(ST(0),na);
       int     i   = 0;
       /* Call permute.It returns a null-terminated array of strings */

       char ** ret = permute (str);
       /* Copy these parameters to mortal scalars, and push them onto 
        * the stack */
       for ( ; *ret ; ret++, ++i) {
          XPUSHs (sv_2mortal(newSVpv(*ret, 0)));
       }
       free(ret);
       XSRETURN(i);
       PUTBACK;          /* These two statements are redundant */
       return;           /* because XSRETURN does both            */
    }
}

The PPCODE directive differs from CODE in one small but significant way: it adjusts the stack pointer SP to point to the bottom of the Perl stack frame for this function call (that is, to ST(0)), to enable us to use the XPUSHs macro to extend and push any number of arguments (recall our discussion in the section "Ensuring that the stack is big enough"). We'll shortly see why we cannot do this using typemaps.

20.5.2 XS Typemaps: An Introduction

A typemap is a snippet of code that translates a scalar value on the argument stack to a corresponding C scalar entity (int, double, pointer), or vice versa. A typemap applies only to one direction. It is important to stress here that both the input and the output for a typemap are scalars in their respective domains. You cannot have a typemap take a scalar value and return a C structure, for example; you can, however, have it return a pointer to the structure. This is the reason why the permute example in the preceding section cannot use a typemap. We could write a typemap to convert a char** to a reference to an array and then leave it to the script writer to dereference it. In SWIG, which doesn't support a PPCODE equivalent, this is the only option.

Another constraint of typemaps is that they convert one argument at a time, with blinkers on: you cannot take a decision based on multiple input arguments, as we mentioned in Chapter 18, Extending Perl:A First Course, ("if argument 1 is `foo', then increase argument 2 by 10"). XS offers the CODE and PPCODE directives to help you out in this situation, while SWIG doesn't. But recall from the section "Degrees of Freedom" that the two SWIG restrictions mentioned are easily and efficiently taken care of in script space.

While xsubpp is capable of supplying translations for ordinary C arguments, we have to write custom typemaps for all user-defined types. Assume that we have a C library with the following two functions:

Car*  new_car();
void  drive(Car *);

In Perl, we want to access it as

$car = Car::new_car;
Car::drive($car);

Let us first write the XS file for this problem:

/* Car.XS */
#include <EXTERN.h>
#include <perl.h>
#include <XSUB.h>

#include <Car.h>  /* Don't care what Car* looks like */

MODULE = Car  PACKAGE = Car
Car *
new_car ()

void
drive (car)
   Car *   car

As you can see, we need two typemaps: an output typemap for converting a Car* to $car and an input typemap for the reverse direction. We start off by editing a typemap file called typemap,[11] which contains three sections: TYPEMAP, INPUT, and OUTPUT, as follows:

TYPEMAP
Car *      CAR_OBJ

INPUT 
CAR_OBJ
           $var = (Car *)SvIV($arg);
OUTPUT
CAR_OBJ
           sv_setiv($arg, (I32) $var);

[11] We choose this particular name because the h2xs-generated makefile recognizes it and feeds it to xsubpp. It also allows for multiple typemap files to be picked up from different directories.

The TYPEMAP section creates an easy-to-use alias (CAR_OBJ, in this case) for your potentially complex C type (Car *). The INPUT and OUTPUT sections in the typemap file can now refer to this alias and contain code to transform an object of the corresponding type to a Perl value, or vice versa. When a typemap is used for a particular problem, the marker $arg is replaced by the appropriate scalar on the argument stack, and $var is replaced by the corresponding C variable name. In this example, the output typemap stuffs a Car* into the integer slot of the scalar (recall the discussion in the section "SVs and object pointers").

The advantage of the TYPEMAP section's aliases is that multiple types can be mapped to the same alias. That is, a Car* and a Plane* can both be aliased to VEHICLE, and because the INPUT and OUTPUT sections use only the alias, both types end up sharing the same translation code. The Perl distribution comes with a typemap file that supplies all the basic typemaps (see lib/ExtUtils/typemap), and you can freely use one of the aliases defined in that file. For example, you can use the alias T_PTR (instead of CAR_OBJ) and thereby use the corresponding INPUT and OUTPUT sections for that alias. In other words, our typemap file need simply say:

TYPEMAP
Car *      T_PTR

It so happens that the T_PTR's INPUT and OUTPUT sections look identical to that shown above for CAR_OBJ.

20.5.3 Object Interface Using XS Typemaps

Let us say we want to give the script writer the ability to write something like the following, without changing the C library in any way:

$car = Car::new_car(); # As before 
$car->drive();

In other words, the OUTPUT section of our typemap needs to convert a Car* (returned by new_car) to a blessed scalar reference, as discussed in the section "SVs and object pointers." The INPUT section contains the inverse transformation:

TYPEMAP
Car *     CAR_OBJ

OUTPUT
CAR_OBJ
       sv_setref_iv($arg, "Car", (I32) $var);

INPUT
CAR_OBJ
       $var = (Car *)SvIV((SV*)SvRV($arg));

sv_setref_iv gives an integer to a freshly allocated SV and converts the first argument into a reference, points it to the new scalar, and blesses it in the appropriate module (refer to Table 20.1). In this example, we cast the pointer to an I32, and make the function think we are supplying an integer.

20.5.4 Making XS Typemaps More Generic

The typemap in the preceding example is restricted to objects of type Car only. We can use the TYPEMAP section's aliasing capability to generalize this typemap and accommodate any object pointer. Consider the following typemap, with changes highlighted:

TYPEMAP
Car *     ANY_OBJECT

OUTPUT
ANY_OBJECT
     sv_setref_pv($arg, CLASS, (void*) $var);

INPUT
ANY_OBJECT
     $var = ($type) SvIV((SV*)SvRV($arg));

All we have done is generalize the alias, the cast, and the class name. $type is the type of the current C object (the left-hand side of the alias in the TYPEMAP section), so in this case it is Car*. Because we want to make the class name generic, we adopt the strategy used in Chapter 7, Object-Oriented Programming - ask the script user to use the arrow notation:

$c = Car->new_car();

This invocation supplies the name of the module as the first parameter, which we capture in the CLASS argument in the XS file:

Car *
new_car (CLASS)
    char *CLASS

The only thing remaining is that we would like the user to say Car->new instead of Car->new_car. Just because C doesn't have polymorphism doesn't mean the script user has to suffer. The CODE keyword achieves this simply:

Car *
new (CLASS)
    char *CLASS
   CODE:
     RETVAL = new_car();
   OUTPUT:
     RETVAL

The drive method doesn't need any changes.

Having generalized this alias, we can apply the ANY_OBJECT alias to other objects too, as long as they also follow the convention of declaring and initializing a CLASS variable in any method that returns a pointer to the type declared in the TYPEMAP section. In the preceding example, the initialization happened automatically because Perl supplies the name of the class as the first argument.

20.5.5 C++ Objects and XS Typemaps

Suppose you have a C++ class called Car that supports a constructor and a method called drive. You can declare the corresponding interfaces in the XS file as follows:

Car *
Car::new ()

void 
Car::drive()

xsubpp translates the new declaration to an equivalent constructor call, after translating all parameters (if any):

XS(XS_Car_new)
{
    dXSARGS;
    if (items != 1)
        croak("Usage: Car::new(CLASS)");
    {
        char *  CLASS = (char *)SvPV(ST(0),na);
        Car *   RETVAL;
        RETVAL = new Car();
        ST(0) = sv_newmortal();
        sv_setref_pv(ST(0), CLASS, (void*) RETVAL);
    }
    XSRETURN(1);
}

Unlike the previous example, xsubpp automatically supplies the CLASS variable. You still need the typemaps, however, to convert Car* to an equivalent Perl object reference. The drive interface declaration is translated as follows:

XS(XS_Car_drive)
{
    dXSARGS;
    if (items != 1)
        croak("Usage: Car::drive(THIS)");
    {
        Car *    THIS;
        THIS = (Car *) SvIV((SV*)SvRV(ST(0)));;
        THIS->drive();
    }
    XSRETURN_EMPTY;
}

xsubpp automatically generates the THIS variable to refer to the object. Both CLASS and THIS can be used in a CODE section.

Dean Roehrich's XS Cookbooks [5] provide several excellent examples of XS typemaps, so be sure to look them up before you start rolling your own.

20.5.6 Memory Management Using XS

We have conveniently ignored the issue of memory management so far. In the preceding sections, the new function allocates an object that is subsequently stuffed into a scalar value by the typemapping code. When the scalar goes out of scope or is assigned something else, Perl ignores this pointer if the scalar has not been blessed - not surprising, considering that it has been led to believe that the scalar contains just an integer value. This is most definitely a memory leak. But if the scalar is blessed, Perl calls its DESTROY routine called when the scalar is cleared. If this routine is written in XS, as shown below, it gives us the opportunity to delete allocated memory:

void
DESTROY(car)
    Car *car
  CODE:
    delete_car(car); /* deallocate that object */

The C++ interface is simpler:

void
Car::DESTROY()

In this case, xsubpp automatically calls "delete THIS", where THIS represents the object, as we saw earlier.

20.5.6.1 Recommended memory allocation and deallocation routines

The Perl library provides a set of functions and macros to replace the conventional dynamic memory management routines (listed on the left-hand side of the table):

Instead of:

Use:

malloc

New

free

Safefree

realloc

Renew

calloc

Newz

memcpy

Move

memmove

Copy

memzero

Zero

The Perl replacements use the version of malloc provided by Perl (by default), and optionally collect statistics on memory usage. It is recommended that you use these routines instead of the conventional memory management routines.

20.5.7 SWIG Typemaps

SWIG produces practically the same code as xsubpp. Consequently, you can expect its typemaps to be very similar (if not identical) to that of XS. Consider the permute function discussed earlier. We want a char** converted to a list, but since typemaps allow their input and output to be scalars, the following typemap translates it to a list reference :

%typemap(perl5,out) char ** {   // All functions returning char ** 
                                // get this typemap
    // $source is of type char **
    // $target is of type RV (referring to an AV)
    AV *ret_av = newAV();
    int i      = 0;
    char **p   = $source;
    /* First allocate a new AV, of the right size */
    while (*p++)
        ;            /* Incr. p while *p is non-null */ 
    av_extend(ret_av, p - $source);

    /* For each element in the array of strings, create a new
     * mortalscalar, and stuff it into the above array */
    p = $source;
    for (i = 0, p = $source; *p; p++, i++ {
        av_store(ret_av, i, sv_2mortal(newSVPV(*p, 0)));
        p++;
    }
    /* Finally, create a reference to the array; the "target"
       of this typemap */
    $target = sv_2mortal(newRV((SV*)ret_av));
}

SWIG typemaps are specific to language, hence the perl5 argument. out refers to function return parameters, and this typemap applies to all functions with a char** return value. $source and $target are variables of the appropriate types: for an in typemap, $source is a Perl type, and $target is the data type expected by the corresponding function parameter. Note that unlike XS's $arg and $val, SWIG's $source and $target switch meanings depending on the direction of the typemap.

If you don't want this typemap applied to all functions returning char**'s, you can name exactly which parameter or function you want it applied to, like this:

%typemap(perl5,out) char ** permute {
    ...
}

Please refer to the SWIG documentation for a number of other typemap-related features.


Previous: 20.4 Stacks and Messaging ProtocolAdvanced Perl ProgrammingNext: 20.6 Easy Embedding API
20.4 Stacks and Messaging ProtocolBook Index20.6 Easy Embedding API