ParaCite: An Overview

Michael Jewell

Introduction

As an increasing number of authors are publishing articles online, there is a very real need for a simple way to build links between an online document and the documents which it cites. ParaCite was created in parallel with the EPrints.org software as a possible solution to this problem, and has since grown into a usable yet powerful system for both reference parsing and location.

The ParaCite Model

ParaCite consists of two core modules, a parser and resolver. Typically a reference is supplied via the web interface, and the parsed reference is passed to the resolver, but further interfaces have been developed for both modules. Figure 1 shows the current structure of the ParaCite system.

Paracite Components

The ParaCite components. The Reference Parser and Resolver are linked via an OpenURL `proxy', which converts the metadata obtained from the reference into an OpenURL-Compliant format.

Reference Parser
The ParaCite parser is based around a high-level template match, with a collection of reference templates being compared against the reference and the best fitting template being used to split the reference into metadata (the fitness of a template is decided by weighting the different fields, and choosing the template with the highest score). At the time of writing, 235 templates were present in the collection. By storing the references in a high-level form (for example, _AUTHORS_ (_YEAR_) _TITLE_), the individual fields can be substituted with complex regular expressions, thus keeping the meaning and functionality separate.
Reference Resolver
The ParaCite resolver provides search functionality for article location. When a search is carried out, the user is presented with a framed interface, with a list of suggested resources in the left-hand frame. When a resource is selected from this frame, the main view redirects to the search results for that resource. The resources are categorised using a system of 5 resource bands or `strata', with the higher strata having a higher probability of matching the article exactly. The fifth stratum is user-customizable, and can contain generic search engines or resources from the three strata above it.
Web Query
This provides a simple interface to the system, with a text box for entering references. The parser processes the input, creates an OpenURL, and forwards this to the resolver.
Web Service
The web service interface exposes 3 methods for clients: doParaciteSearch, which parses a provided reference and returns a search result data structure, doOpenURLConstruct, which converts a reference into an OpenURL, and doReferenceParse, which takes a reference and returns a data structure containing the parsed metadata. The latter is primarily for users who want to format references or store the metadata in some other format (such as BibTEX).
URL Interface
To further ease OpenURL creation, the ParaCite parser can output an OpenURL instead of redirecting to the resolver. This is achieved by appending &return_openurl to the standard URL.
OpenURL Query
As the ParaCite resolver is OpenURL compliant, the parser can be bypassed in preference for a direct OpenURL query. This is ideal for cases where a user already has the required metadata, or where a system is OpenURL-enabled. The ParaCite resolver is currently at http://paracite.eprints.org/cgi-bin/openurl.cgi.

Implementation

ParaCite is written entirely in Perl, with its regular expression support making up the core of the parsing functionality. The web service interface uses the SOAP::Lite module, together with a WSDL description of the services, and a MySQL backend is used to store resources, interfaces, publications, and subjects, together with information about previous searches.

The OpenURL resolver is compliant with the OpenURL 1.0f specification, although an optional subject field has been added to support the fourth stratum. This takes a Library of Congress category code which is used to reduce the matches to resources within the selected subject.

Conclusion

ParaCite is already integrated with the EPrints.org software, and links are currently being investigated between CiteBase, RefLink, and ISI Web of Science. Furthermore, I believe the resolver could provide a variety of other services. As the resources can be found by subject or publication it could act as a very useful way to find the most popular resources for certain subjects, or could be used to find free online versions of existing journals. Finally, it would be possible to use the web service interface in plug-ins for existing software, such as web browsers and document editors, that would automatically create OpenURLs for references.

About this document ...

ParaCite: An Overview

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir10639Qa4jn6/lyx_tmpbuf1/techsum.tex

The translation was initiated by Mike Jewell on 2002-12-03


Mike Jewell 2002-12-03