data + munging

The Perl Journal

Volumes 1–6 (1996–2002)

Code tarballs available for issues 1–21.

I reformatted the CD-ROM contents. Some things may still be a little wonky — oh, why hello there <FONT> tag. Syntax highlighting is iffy. Please report any glaring issues.

all issues

The Perl Journal

#22

Winter 2001

vol 5

num 6

Creating XML-RPC Web Services

Jon Orwant

Unidecode!

Helping the Disabled with Perl/Tk

Jon Orwant (2001) Creating XML-RPC Web Services. The Perl Journal, vol 5(6), issue #22, Winter 2001.

Creating XML-RPC Web Services

Jon Orwant

Packages Used

Frontier::Client
Frontier::Daemon
LWP::User Agent

Web service needn't use the Web. It needn't even be a service. In fact, no one agrees on exactly what a Web service is, but there is a strong sense that, by golly, they're important.

To me, a Web service is a program that encourages other programs to send it requests, and that also could be (and often is) implemented via a set of Web pages. Put another way, a Web service is a networked program for which a CGI script could be used as a GUI.

Clay Shirky, an O'Reilly analyst, polled a few experts to see how they defined Web services:

My proposed 3 definitions were:

1. Web Services are an attempt to do for computing what the Web itself did for publishing: to create a simple, loosely coupled method for two arbitrary applications to communicate with one another.

2. Web Services are an attempt to define XML interfaces for applications and business processes that can be exposed over the Internet.

3. Web Services are applications that have SOAP interfaces accessible via HTTP.

Unsurprisingly, there was universal assent to the first definition, and near-universal grumbling about the last one, often on the grounds that while that was what was making it into the press, it was far too narrow.

In this article, I'll describe a Web service I created at O'Reilly, tell you how to install the software you need to create your own Web services, and demonstrate two sample clients and a server.

ISBNs

At O'Reilly, I'm involved in writing programs that analyze the technical book market. Part of this involves gathering information about books, and while we have a few sources available to us that regular consumers don't, we still rely on Amazon for some of our data.

Every book sold in stores (or on Amazon) is identified by a single number: the 10-digit ISBN. (The last "digit" can actually be an X, since their checksum uses base 11.) You can find out the ISBN number for any book by searching for it on Amazon.

I wrote an LWP program that automated this process so that I never had to visit Amazon myself. I called it isbnfind:

  % isbnfind programming perl
  0596000278

This is great, but if I want to make it available to the rest of the world, I have two choices. I could distribute my isbnfind program and hope that people have all the necessary modules and libraries installed. Or, I could make my program available as a Web service so that people can connect to it with a client. This also allows me to keep my source code private, and for this reason some people hail Web services as a salvation against the scourge of Open Source. I am not one of those people.

XML-RPC or SOAP?

In the example shown above, isbnfind pretends to be a human typing programming perl into the little search box at the top left of Amazon's main page. Amazon's Web pages obviously constitute a Web application. But how about a Web service? My isbnfind program tries pretty hard to treat it as one, but it's brittle -- it will fail as soon as Amazon makes a substantial change to the formatting of their Web pages, and they inevitably will. Per my definition, Amazon's site isn't a Web service, because Amazon doesn't encourage other programs to riffle through its databases for ISBN numbers. My program has to masquerade as a human to get its request answered.

There are two popular protocols through which a Web site can encourage programmatic use: XML-RPC and SOAP. We can make ISBN numbers available as a Web service by turning isbnfind into a program that speaks XML-RPC.

We could use SOAP instead, which many programmers prefer to XML-RPC. XML-RPC is simple and lightweight; SOAP is more featureful (some would say bloated). SOAP gets more mention in the press, in part because of Microsoft's SOAP development efforts.

You can use both XML-RPC and SOAP with Perl; in this article, I'll show you how to use Ken MacLeod's Frontier::Client and Frontier::Daemon modules to implement an XML-RPC server and client. I'm a big fan of keeping simple tasks simple; converting a book title into an ISBN doesn't require the extra overhead of SOAP, so I'll stick with XML-RPC. (If this were an article about SOAP, I'd be recommending Paul Kulchenko's SOAP::Lite module.) Paul also distributed an XMLRPC::Lite module, and Randy Ray created an RPC::XML module. Both are available on the CPAN and may be a better match for your needs than the Frontier modules.

You don't have to know anything about XML to use XML-RPC, and the only thing you have to know about RPC (which stands for "Remote Procedure Calls") is that it's a way for you to invoke subroutines on someone else's computer.

XML-RPC is a simple protocol. Usually, the client encodes its request ("invoke this subroutine with these arguments") as a wee XML document and sends it via HTTP to a server. The server composes its own wee XML document in response ("here's what the subroutine returned") and sends it back.

This is similar to what happens when you read a Web page: your browser sends a request via HTTP to the Web server, and the Web server sends a response via HTTP back to the browser. The only difference is that a Web application's response is an HTML Web page, while an XML-RPC Web service's response is the result of a subroutine (or "method", in XML-RPC parlance).

To create an XML-RPC Web service, you create a server that exposes methods (i.e., Perl subroutines) to the outside world. Making your subroutines available is simple: just include them in a program and create a Frontier::Daemon object to expose them to the Internet. I'll turn now to the details of getting your computer ready for XML-RPC.

Installation

To create an XML-RPC server, you need three things besides Perl: the XML::Parser and Frontier::RPC modules, and the expat library (on which XML::Parser depends). Follow these three steps.

1. Install the expat XML parsing library from:

https://sourceforge.net.

Both Windows binaries and source code for Unix/Linux compilation are available.

2. Install the XML::Parser module.

On UNIX, you can use the CPAN.pm module bundled with Perl to install modules available on the Comprehensive Perl Archive Network:

   % perl -MCPAN -e 'install "XML::Parser"'

If you're using ActivePerl on Windows, XML::Parser will already be installed.

3. Install the Frontier::RPC modules. On UNIX/Linux, you can use the CPAN.pm module again:

   % perl -MCPAN -e 'install "Frontier::Daemon"'

With ActivePerl, type ppm from your command prompt and then install Frontier-RPC .

Creating a Client

Servers are more important than clients -- after all, if there's no one to answer your requests, you needn't bother asking -- but since clients are simpler, I'll show those first. I'll demonstrate two: one stripped down to its bare essentials so that you can see the critical elements, and the "real" client I used when writing this article.

Client 1: The Platonic Ideal

Here's a very simple XML-RPC client. We use the Frontier::Client module, create a new Client object primed to connect to port 8088 of labs.oreilly.com, and finally invoke the isbn method with the command-line arguments.

  #!/usr/bin/perl
  # Jon Orwant, 9/9/01
  # isbnclient

  use Frontier::Client;

  $client = Frontier::Client->new(url => "https://labs.oreilly.com:8088/RPC2");

  print $client->call("isbn", @ARGV), "\n";

Presuming the XML-RPC server at labs.oreilly.com is running on port 8088, we can treat it just like an isbnfind that is available to everyone instead of just me:

  % isbnclient programming perl
  0596000278

Note: the RPC2 at the end of the URL is necessary for XML-RPC services created with the Frontier::Daemon module I used, even though it has nothing to do with isbnfind and isn't mentioned by name in the server you'll soon see. Omitting the RPC2 is a common mistake among XML-RPC novices.

Client 2: The Developer's Client

The Platonic ideal client looks pretty, but it's not what I actually used when developing the code for this article. Here's the real code:

  #!/usr/bin/perl

  use Frontier::Client;

  $client = Frontier::Client->new( url => "https://localhost:8088/RPC2",
                             debug => 1);
							 
  print $client->call("isbn", @ARGV), "\n";

There are two differences between this and the previous client. First, I tested everything on my laptop in case DNS or firewall problems prevented my laptop from accessing labs.oreilly.com. Using localhost instead of labs.oreilly.com kept both server and client local.

Second, I used debug => 1 to turn on debugging, letting me see the exact XML that's being sent from client to server (the request) and from the server back to the client (the response):

  % isbnclient dava sobel longitude
  ---- request ----
  <?xml version="1.0"?>
  <methodCall>
  <methodName>isbn</methodName>
  <params>
  <param><value><string>dava</string></value></param>
  <param><value><string>sobel</string></value></param>
  <param><value><string>longitude</string></value></param>
  </params>
  </methodCall>
  ---- response ----
  <?xml version="1.0"?>
  <methodResponse>
  <params>
  <param><value><i4>0140258795</i4></value></param>
  </params>
  </methodResponse>
  0140258795

This example highlights a feature of isbnclient , which is really an Amazon feature: you don't have to match the title exactly, and can specify as many keywords as you like. isbnserver , which we'll see in the next section, simply grabs the first ISBN it sees from the ranked list of books that Amazon suggests for those keywords.

Creating the Server

Our XML-RPC server is significantly larger than either client, but because it only needs to include the transformed isbnfind. Here, we use the Frontier::Daemon module instead of the Frontier::Client module. The lines in bold highlight the critical lines for creating an XML-RPC service.

  #!/usr/bin/perl
  # Jon Orwant, 9/9/01
  # isbnserver

  use Frontier::Daemon;
  use LWP::UserAgent;

  # Create the LWP UserAgent object,
  # used to send requests to Amazon
  $ua = new LWP::UserAgent;
  $ua->agent("TPJ/0.1 " . $ua->agent);

  # Create the XML-RPC service
  Frontier::Daemon->new(LocalPort => 8088,
                        methods => { "isbn" => \&isbn });

  # Given keywords, search for them on Amazon,
  # and return the ISBN of the first book found.
  sub isbn {
    my ($keywords) = "@_";
    $keywords =~ s/ /%20/g;  # Replace each space with "%20"
    $keywords =~ s/'/%27/g;  # Replace each apostrophe with "%27"

    # Prepare the request for sending to Amazon
    my $req = new HTTP::Request POST =>
     'https://www.amazon.com/exec/obidos/search-handle-form/103-2425912-6530239';
    $req->content_type('application/x-www-form-urlencoded');
    $req->content("field-keywords=$keywords");

    # Send the request to Amazon
    my $res = $ua->request($req);

    # Examine the response and return the first ISBN found
    if ($res->is_success) {      # If we got a page back from Amazon...
        $content = $res->content;
        ($isbn) = ($content =~ m!<a href=/exec/obidos/ASIN/([\dX]+)!gism);
        if ($isbn) { return $isbn }
        else       { return "No ISBN found." }
    } else {
        return "Amazon changed their page.";
    }
  }

After using the Frontier::Daemon and LWP::UserAgent modules, isbnserver creates an LWP agent that will be used for every request to Amazon. (Each request identifies itself as being a "TPJ" browser, version 0.1.) The server is then launched by creating a new Frontier::Daemon object, providing (on port 8088) exactly one method: isbn , which is defined in the subroutine at the end of the program.

As mentioned earlier, this code is brittle: the 103-2425912-6530239 in the Amazon URL is ample evidence that they don't intend this URL to stick around forever. But as a demonstration of how quickly you can throw an XML-RPC interface around a conventional program, it serves its purpose.

What Now?

XML-RPC is an encoding scheme. This is level two of what Clay Shirky calls the "consensus stack" that people can use to explain the Web services universe. Below the encoding layer, at the lowest level, is transport: roughly, how you get the bits from server to client (or peer to peer). The transport we're using here is HTTP, but it could almost as easily be Jabber instant messaging or regular email.

There are two layers above encoding: description and discovery. Description is a formalized way of describing how programs can talk to your Web service; WSDL (Web Services Description Language) is the best known, although no description layer is commonly used yet.

Above description is discovery -- making it possible for people and programs to learn about your Web service, typically by visiting a repository that describes in a structured fashion what the Web service offers. UDDI (Universal Description, Discovery, and Integration) is a business-oriented repository of Web services that allows companies to describe their services to customers and partners. There is also a nascent Web service repository at:

https://www.salcentral.com/salnet/webserviceswsdl.asp

although it has only a fraction of all the Web services out there. A better directory service for Web services is needed. After all, you can't just write a magazine article about your Web service and assume everyone will learn about it.

Acknowledgments

Some of the information in this article was gleaned from O'Reilly's Programming Web-Services with XML-RPC. Tim O'Reilly reminded me recently how nice it would be to provide ISBNs as a Web service, Tim Allwine is O'Reilly's scraper extraordinaire, and a conversation with Michael Bernstein back in March put the notion of regularizing book publication data into my head. And we don't think Amazon minds our programs masquerading as humans; if they do, the next steps are to add caching to our program (so that it saves the results of Amazon searches, eliminating the need to search for the same keywords twice) and to have it use other online booksellers.

Jon Orwant is an editor at O'Reilly & Associates, a co-author of O'Reilly's Programming Perl and Mastering Algorithms with Perl, and is the emcee of the Internet Quiz Show.