Internet DRAFT - draft-hakala-sici

draft-hakala-sici



Network Working Group                                       Juha Hakala
Internet-Draft                              Helsinki University Library
Category: Informational                                  28 August 2001
draft-hakala-sici-00.txt
Expires: 28 February 2002





            Using Serial Item and Contribution Identifiers as
                         Uniform Resource Names

Status of this Memo

This document is an Internet-Draft and is in full conformance with all 
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task 
Force (IETF), its areas, and its working groups. Note that other groups 
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or obsoleted by other documents at any 
time. It is inappropriate to use Internet-Drafts as reference material 
or to cite them other than as "work in progress."


     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/1id-abstracts.html

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

This Internet-Draft will expire on 28 February 2002.

Abstract

This document discusses how Serial Item and Contribution Identifiers 
(SICIs; persistent and unique identifiers for serial issues and 
contributions such as articles) can be supported within the URN 
framework and the syntax for URNs defined in RFC 2141 [Moats]. Much of 
the discussion below is based on the ideas expressed in RFC 2288 
[Lynch]. Chapter 5 contains a URN namespace registration request 
modelled according to the template in RFC 2611 [Daigle et al.].


1. Introduction 

As part of the validation process for the development of URNs the IETF 
working group agreed that it is important to demonstrate that the 
current URN syntax proposal can accommodate existing identifiers from 
well-established namespaces.  One such infrastructure for assigning and 
managing names comes from the bibliographic community.  Bibliographic 
identifiers function as names for objects that exist both in print and, 
increasingly, in electronic formats.  RFC 2288 [Lynch et. al.] 
investigated the feasibility of using three identifiers (ISBN, ISSN and 
SICI) as URNs. 

SICI is an American national standard defined by NISO/ANSI Z39.56-1996 
[NISO]. The need to develop a new version of the standard is at present 
being investigated by NISO. 

RFC 2288 does not û and it was not the aim of its authors û to analyse 
how SICI-based URNs can actually be resolved. This text will specify one 
solution to this question. There may be other, complementary resolution 
services. 

Generally, the difficulty of designing a URN resolution service is 
dependent on two factors:

* Is the identifier dumb, or does it provide a hint on where to find a 
resolution service?

* How many potential resolution services are there?

ISBN (International Standard Book Number) is a good example of an 
intelligent identifier. Analysis of the ISBN will reveal not only the 
region where the ISBN has been assigned, but also the publisher who is 
responsible for the book. Resolution of ISBN-based URNs can be 
decentralised to national bibliography databases, maintained by the 
national libraries. If the ISBN was a dumb identifier, this would be 
impossible.

International Standard Serial Number (ISSN) is a dumb identifier. It 
does not have a publisher identifier; serials published by a certain 
company get seemingly random ISSNs. Although ISSNs are allocated to 
regional agencies in blocks, which gives the system some "intelligence", 
a resolution service should not rely on these blocks, but use the global 
ISSN database. It contains a bibliographic description of every 
periodical that has received an ISSN. Thus, it is easy to resolve ISSN-
based URNs even though the identifier itself does not help in localising 
the resolution service.  

SICI is based on ISSN (see below for a description of its syntax). Like 
ISSN, it is therefore a dumb identifier. But there is not, and will 
never be, a global SICI database, which would contain bibliographic 
information about every serial issue and/or article published in the 
world. Most articles will not be catalogued at all, and the existing 
bibliographic information about articles is dispersed into a large 
number of databases maintained by publishers, libraries and other 
information intermediaries. Although it might be technically possible to 
merge records from these databases into a union catalogue, in practice 
such an enterprise is not politically possible.

As a "dumb" identifier with a large and ever growing number of potential 
resolution services SICI poses interesting challenges to the design of 
the URN resolution process.

Generally, a combination of dumb identifier and multiple resolution 
services is a problem, since there is no simple way of finding out which 
resolution service is the correct one. A gateway service is needed for 
providing this valuable information. Below we propose that for SICI-
based URNs, the global ISSN database will be capable of acting as a link 
between the user and the resolution service. 

The registration request for acquiring a Namespace Identifier (NID) 
"SICI" for Serial Item and Contribution Identifiers has been written by 
the National Library of Finland on behalf of the National Information 
Standards Organization (NISO). The request is included in chapter 5 of 
this text. 

The document at hand is part of a global co-operation of the national 
libraries to foster identification of electronic documents in general 
and utilisation of URNs in particular. This work is co-ordinated by a 
working group established by the Conference of Directors of National 
Libraries (CDNL). 

We have used the URN Namespace Identifier "SICI" for the Serial Item and 
Contribution Identifiers in examples below. 


2. Identification vs. Resolution

As a rule the SICIs identify finite, manageably-sized objects, but these 
objects may still be large enough so that resolution to a hierarchical 
system, such as all articles published in a serial issue, is 
appropriate.

The materials identified by a SICI may exist only in printed or other 
physical form, not electronically. The best that a resolver service will 
be able to offer in this case is bibliographic data from the database 
providing resolution services, including information about where the 
physical resource is stored in the owner institution's holdings. 


3. Serial Item and Contribution Identifier

3.1 Overview

The Serial Item and Contribution Identifier (SICI) standard defines a 
variable length code that provides unique identification of serial items 
(e.g., issues) and the contributions (e.g., articles) contained in a 
serial title. SICI is specified in NISO/ANSI Z39.56-1996 [NISO2]. Like 
other NISO standards, the SICI document is available for free in the Web. 

SICI is based on ISSN (International Standard Serial Number), but 
augments it extensively. SICI is a combination of three segments, all of 
which are required:

Item segment; the data elements needed to describe the serial item such 
as serial issue (ISSN, Chronology, Enumeration)

Contribution segment, the data elements needed to identify contributions 
within an item (Location, Title Code)

Control segment, the data elements needed to record those administrative 
elements that determine the validity, version, and format of the SICI 
code representation. 

RFC 2288 provides the following example:

   0015-6914(19960101)157:1<62:KTSW>2.0.TX;2-F

   The first nine characters are the ISSN identifying the serial title.
   The second component, in parentheses, is the chronology information
   giving the date the particular serial issue was published.  In this
   example that date was January 1, 1996.  The third component, 157:1,
   is enumeration information (volume, number) for the particular issue
   of the serial.  These three components comprise the "item segment" of
   a SICI code.  By augmenting the ISSN with the chronology and/or
   enumeration information, specific issues of the serial can be
   identified.  The next segment, <62:KTSW>, identifies a particular
   contribution within the issue.  In this example we provide the
   starting page number and a title code constructed from the initial
   characters of the title.  Identifiers assigned to a contribution can
   be used in the contribution segment if page numbers are
   inappropriate.  The rest of the identifier is the control segment,
   which includes a check character.  Interested readers are encouraged
   to consult the standard for an explanation of the fields in that
   segment.

SICI can be seen as a logical extension of the ISSN to the items and 
individual contributions that make up a serial's hierarchical structure. 
The current version of the SICI does have some limitations; it does not 
allow identification of subsections of an article such as paragraphs or 
diagrams. If deemed necessary, the functionality needed for article 
subsection identification could be added to the standard. 

The current version of SICI guarantees uniqueness in most situations; 
however, the standard does not always differentiate between multiple 
variant formats in which an electronic article may be published. For 
instance, variants of a digitised article published in PDF and HTML 
formats will receive the same SICI, provided that the ISSN is the same.  

According to the rules of the ISSN centre, ISSN numbers can be applied 
retrospectively to old periodicals. If the original printed document has 
an ISSN, the same identifier is also valid for the digitised version. 
ISSN guidelines formulate this principle in the following way:

A reproduction is a copy of an item and intended to function as a 
substitute for that item. The reproduction may be in a different medium 
from the original but it is not a different edition in itself. The ISSN 
assigned to the original is valid for the reproduction, a new ISSN is not 
assigned to the reproduction.

ISSN numbers are assigned by regional agencies, which receive ISSN 
blocks from the ISSN International Centre. SICI usage is not dependent 
on such formal agencies; the aim is that once ISSN is known, SICI codes 
can be created, manually or by computer program, by publishers, 
libraries, document delivery services or even by individual users. 

Given the complexity of SICI codes, the recommended practice is to 
automate the SICI creation process. If an article is structured enough, 
all elements of SICI can be extracted from the document. A tool capable 
of this has been built by the E.U. project DIEPER; this tool, of course, 
only works properly if the document is structured in the way the DIEPER 
project recommends. Another, less challenging option is a SICI 
generator, which builds syntactically correct SICIs including the check 
character if the basic ingredients are typed in manually. 


3.2 Encoding Considerations and Lexical Equivalence

RFC 2288 contains the following simple and yet sufficient analysis of 
SICI encoding: 

   The character set for SICIs is intended to be email-transport-
   transparent, so it does not present major problems.  However, all
   printable excluded and reserved characters from the URN syntax are
   valid in the SICI character set and must be %-encoded.

   Example of a SICI for an issue of a journal:

          URN:SICI:1046-8188(199501)13:1%3C%3E1.0.TX;2-F

   For an article contained within that issue:

          URN:SICI:1046-8188(199501)13:1%3C69:FTTHBI%3E2.0.TX;2-4

   Equivalence rules for SICIs are not appropriate for definition as
   part of the namespace and incorporation in areas such as cache
   management algorithms.  It is best left to resolver systems which try
   to determine if two SICIs refer to the same content.  Consequently,
   we do not propose any specific rules for equivalence testing through
   lexical manipulation.


3.3 Resolution of SICI-based URNs

Since ISSN is a dumb code, SICI does not contain any explicit hint on 
where to find the URN resolution service or services. However, an 
efficient and global resolution service can be accomplished by using the 
ISSN register as a way station. In spring 2001, the ISSN register 
contained about one million bibliographic records describing serials, 
including thousands of electronic journals. There are several other 
databases, which contain hundreds of thousands of serial records, but 
the ISSN register has the best coverage.

The first step in resolving a SICI-based URN is a query to the ISSN 
register. The SICI resolution service in the ISSN register will parse 
the SICI code in order to extract the ISSN from it. 

ISSN will then be used as a search key for retrieving the bibliographic 
record of the serial from the ISSN register. 

Currently the ISSN register already contains thousands of records 
describing electronic journals. These records contain the URL of the 
serial's home page. 

This URL is appropriate for resolving the URN based on the ISSN of the 
periodical. The mechanism for resolving such URNs via the ISSN register 
has been specified in RFC 3044 [Rozenfeld]. The ISSN International 
Centre has already built a demonstration URN resolution service for 
ISSN-based URNs into their present information system. 

In order to resolve SICI-based URNs, a new data element has to be added 
into the records in the ISSN register. This data element would contain 
the network address (URL) of the database, which holds the article 
required and/or bibliographic information about it. It must also be 
possible to specify volumes and if necessary issues which are included 
in the database within this data element. The data element should be 
repeatable, since the same article may be available from multiple 
sources. For instance, the publisher, Library of Congress 
(http://www.loc.gov/), JSTOR (http://www.jstor.org/) and a number of 
host services such as EBSCO (http://www.ebsco.com/home/) may all have a 
copy of the same resource.

The SICI resolution service built into the ISSN register will check if 
database address information is available in the bibliographic record of 
the serial. Then it makes sure that the volume and/or issue needed is 
available via the service. If this is the case, the application will 
make the query, receive the result û article or bibliographic 
information about it - and pass it on to the user. 

The functionality described above was implemented in co-operation 
between the ISSN International Centre and the E.U. project DIEPER 
(http://gdz.sub.uni-goettingen.de/dieper/). The SICI resolution service 
is an extension of the service built for resolving ISSN-based URNs. By 
March 2001 a demonstrator service via which several of the databases 
maintained by the project partners could be accessed was released for 
internal use within the project. The ISSN IC and project partners wish 
to maintain the service also after the formal end of the project. 

Discussions about adding the new data element into bibliographic records 
in the ISSN register are under way. 

Please note that the discussion herein applies to SICIs assigned to 
serial contributions. Since serial items (issues) have seldom been 
described or digitised as such, a search by serial item SICI will in 
practice be expanded into retrieval of all contributions (articles) 
within the serial item (issue) in question. 

If a resolution service for the resource at hand does not exist, or the 
user is not authorised to utilise it, he/she may get the bibliographic 
description of the serial from the ISSN register. 


3.4 Additional considerations

Electronic journals have rapidly become very popular in scientific 
publishing. The main reasons for this are the emergence of viable 
business models (e.g. licensing) and the birth of a reliable and 
efficient delivery mechanism (the Web). 

New content is being added via two different channels. A significant 
number of scientific journals is published in electronic form, usually 
alongside a printed version. On the other hand, old printed volumes are 
digitised and made available in electronic form. Digitisation is done by 
development projects such as DIEPER, established services such as JSTOR, 
or publishers - for instance Elsevier is digitising all printed journals 
the company has published.

Reliable linking of articles to references and bibliographic data about 
the articles is an important issue. URLs are as of this writing the most 
common means used for linking, but their reliability is low; average 
lifetime for a URL is estimated to be two years.  

A more reliable linking mechanism than URLs is urgently needed. Many 
scientific publishers are already using Digital Object Identifiers (DOI) 
for their materials. DOI resolution service is based on Handle system, 
which is "a comprehensive system for assigning, managing, and resolving 
persistent identifiers, known as "handles," for digital objects and 
other resources on the Internet" (see 
http://www.handle.net/introduction.html). Handles can be used as Uniform 
Resource Names(URNs).

URN is both an identifier and a non-commercial and technically advanced 
resolution service. Due to the co-operation of the ISSN International 
Centre the URN resolution service for articles outlined in this Internet 
standard is global, and can accommodate an unlimited number of article 
services located anywhere in the world. 

For instance, in order to establish URN-based links to articles 
digitised in JSTOR service, a number of steps are necessary. First, each 
article must be identified by SICI, and these SICIs must be indexed in 
the JSTOR database. Second, bibliographic records of JSTOR journals in 
the ISSN register must all be enriched with a link to the JSTOR search 
interface and volume/issue information. For instance, the bibliographic 
record describing the journal "Ecology" must contain the information 
that volumes 1-77 (1920-1996) are available via JSTOR. This information 
may be quite volatile, and maintenance of the ISSN register must 
therefore be frequent and efficient.

Apart from modification of the data, some programming work is needed. 
Due to the work done in the DIEPER project, the ISSN register already 
has the functionality needed for resolving SICI-based URNs. Adding the 
required functionality into the JSTOR database may or may not be 
difficult depending on the system architecture; in DIEPER some partners 
were able to implement the required functionality quite easily.

Since the Web browsers do not support URN resolution yet, the final step 
in enabling resolution of URN-based SICIs is installation of the browser 
plug-in developed by the ISSN International Centre.   

For various reasons, one article may be available in several locations. 
Every article copy may have a different set of users who are allowed 
access to it. For instance, a copy acquired by a national library via 
legal deposit may only be available within the library premises. 

Making the links context sensitive û provide only those links that 
"work" for a user is a challenge. OpenURL framework [Van de Sompel] 
provides a means for sensitive linking. As of this writing OpenURL is 
rapidly gaining popularity, and there are already a few integrated 
library systems which support it. The ISSN register may in the future 
support OpenURL usage; this would be very valuable when the same 
resource (article) is available from several sources, which have 
different user population.  

In their present form the URN resolution services provided via the ISSN 
register suit those services best, which are available in public domain, 
and are reasonably stable. Numerous digitisation projects such as DIEPER 
are currently making printed articles available in the Web in digital 
form. 

An additional benefit of coding the needed location and volume 
information into the ISSN register would be that this database then 
could also serve as a global registry of serial digitisation efforts. 
Such a register is badly needed to avoid duplicate work. 

Since the number of SICI resolution services will eventually be high, 
the capacity of the server on which the ISSN register runs and its 
network connection may become a bottleneck, especially if the articles 
were delivered via the ISSN server to the users. Setting up mirror sites 
would in this case be the most efficient means for load control and 
balancing. Technically the setting up of mirror sites is not difficult. 
The ISSN register contains approximately a million bibliographic 
records, and is therefore not a very large database. 


4. Security Considerations

This document proposes means of encoding and using Serial Item and 
Contribution Identifiers within the URN framework. This document does 
not discuss resolution except at a generic level; thus questions of 
secure or authenticated resolution mechanisms in the ISSN register or in 
actual resolution services are out of scope.  This text does not address 
means of validating the integrity or authenticating the source or 
provenance of URNs that contain SICIs.  Issues regarding intellectual 
property rights associated with objects identified by the various 
bibliographic identifiers are also beyond the scope of this document, as 
are questions about rights to the databases that might be used to 
construct resolvers.


5. Namespace registration

URN Namespace ID Registration for the Serial Item and Contribution 
Identifier (SICI)

Namespace ID:

SICI

SICI is a well-established acronym for Serial Item and Contribution 
Identifiers; giving this NID for any other system would cause a lot of 
confusion. 

This namespace ID has already been used in SICI-based URNs in the E.U. 
project DIEPER.

Registration Information:

Version: 1
Date: 2001-08-28


Declared registrant of the namespace:

Name: Patricia Harris
E-mail: pharris@niso.org
Affiliation: National Information Standards Organisation
Address: 4733 Bethesda Avenue, Suite 300, Bethesda, MD 20814

Declaration of syntactic structure:

Each SICI contains three segments:

Item segment; the data elements needed to describe the serial item such 
as serial issue (ISSN, Chronology, Enumeration)

Contribution segment, the data elements needed to identify contributions 
within an item (Location, Title Code)

Control segment, the data elements needed to record those administrative 
elements that determine the validity, version, and format of the SICI 
code representation. 

Example:

   0015-6914(19960101)157:1<62:KTSW>2.0.TX;2-F

SICI codes can be generated and parsed by computer programs. 


Relevant ancillary documentation:

SICI is an American national standard defined by NISO/ANSI Z39.56-1996 
[NISO2]. A new version of the standard is currently under development.


Identifier uniqueness considerations:

SICI codes will almost always be unique. Since SICI is based on ISSN, 
articles from different journals will definitely never get the same 
SICI. Since enumeration and chronology information must also be given, 
articles and other contributions published in different volumes and 
issues will also never get the same SICI.  

SICIs may not be unique if and only if: 

If two or more contributions are published on the same page(s) and if 
they have similar enough titles (the first letter of each word is the 
same).

In a single issue of an electronic journal (which lacks page numbers) 
there are two or more contributions with titles similar enough. 

If there are several technical variants of an electronic serial 
contribution (multiple formats, multiple resolutions) the current 
version of SICI will not make any difference between these variants. In 
this case the intellectual content will usually be the same, but layout 
will differ from one version to another. 

The new version of the SICI standard will be enhanced in order to 
diminish the risk of non-unique SICIs. 


Identifier persistence considerations:

Once assigned, SICI will never change. The same SICI will not be used 
again for other serial items and contributions. 

Process of identifier assignment:

There will not be a national, regional or international agency governing 
the SICI assignment process. Publishers, libraries or other information 
intermediaries will create SICIs when needed. The most important 
prerequisite is that the journal must have an ISSN. 

Although SICI assignment is decentralised, the national ISSN agencies 
and the ISSN International Centre may support publishers and other 
interested parties in SICI implementation. 

SICI can - and should - be built via automated means. If the source 
document such as article is sufficiently structured, SICI can be 
generated without human involvement. Another option is a semi-automated 
process, in which a human user types in the relevant data elements, and 
the application takes care of building the code. 

Process for identifier resolution:

Resolution will take place in two steps as defined in chapter 3.3. First 
the ISSN register is used for finding the location of the resolution 
service(s) for the serial and volume at hand. Using the linking 
information stored in the serial's bibliographic record, the correct 
resolution service is contacted, and the requested resource is delivered 
to the user.
 

Rules for Lexical Equivalence:

We do not propose any specific rules for equivalence testing through 
lexical manipulation.


Conformance with URN Syntax:

According to the RFC 2288:

The character set for SICIs is intended to be email-transport-
transparent, so it does not present major problems.  However, all
printable excluded and reserved characters from the URN syntax are
valid in the SICI character set and must be %-encoded.

Example of a SICI for an issue of a journal:

     URN:SICI:1046-8188(199501)13:1%3C%3E1.0.TX;2-F

For an article contained within that issue:

     URN:SICI:1046-8188(199501)13:1%3C69:FTTHBI%3E2.0.TX;2-4


Validation mechanism:

Validity of a SICI string can be checked by modulus 37 check digit.


Scope:

Global.


6. References

[Daigle et al.]: Daigle, L., van Gulik, D., Iannella, R. & Faltstrom, 
P.: URN Namespace Definition Mechanisms, RFC2611, June 1999.

[Lynch] Lynch, C., Using Existing Bibliographic Identifiers as Uniform 
Resource Names, RFC 2288, February 1998

[Moats] Moats, R., URN Syntax, RFC 2141, May 1997.

[NISO] NISO/ANSI Z39.56-1996 Serial Item and Contribution Identifier. 
Electronic resource, available at http://www.techstreet.com/cgi-
bin/pdf/free/152629/z39-56.pdf

[Rozenfeld] Rozenfeld, S., Using The ISSN (International Serial Standard 
Number) as URN (Uniform Resource Names) within an ISSN-URN Namespace, 
RFC 3044, January 2001.

[Van de Sompel] Van de Sompel, Herbert & Beit-Arie, Oren: Open Linking 
in the Scholarly Information Environment Using the OpenURL Framework. D-
Lib Magazine, March 2001. Electronic resource, available at 
http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html
 

7. Authors' Address

   Juha Hakala
   Helsinki University Library - The National Library of Finland
   P.O. Box 26
   FIN-00014 Helsinki University
   FINLAND

   E-mail: juha.hakala@helsinki.fi


8.  Full Copyright Statement

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.