Internet DRAFT - draft-hernacki-nntpsrch
draft-hernacki-nntpsrch
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 00:21:31 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Mon, 07 Oct 1996 22:18:00 GMT
ETag: "304c9f-5cd5-32598198"
Accept-Ranges: bytes
Content-Length: 23765
Connection: close
Content-Type: text/plain
INTERNET-DRAFT B. Hernacki
Expires: April 4, 1997 B. Polk
<draft-hernacki-nntpsrch-00.txt> Netscape Communications, Inc.
October 4, 1996
NNTP Full-text Search Enhancements
1. Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working docu-
ments of the Internet Engineering Task Force (IETF), its areas, and its
working groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference material
or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow
Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe),
ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim).
2. Abstract
This document describes a set of enhancements to the Network News Tran-
sport Protocol [NNTP-977] that allows full-text searching of news arti-
cles across multiple newsgroups.
This new search mechanism also allows search criteria to be saved into
search profiles. Articles arriving on the server are checked against
the profiles, and the articles that match are collected together for the
client.
The availability of the extensions described here will be advertised by
the server using the extension negotiation-mechanism described in the
new NNTP protocol specification currently being developed [NNTP-NEW].
Hernacki & Polk [Page 1]
INTERNET-DRAFT October 4, 1996
3. Introduction
The new SEARCH NNTP command is sent from the client to specify and ini-
tiate a full-text search. The server constructs a "virtual newsgroup"
consisting of articles that matched the search criteria. The virtual
newsgroup acts in most ways like a normal newsgroup, allowing access
through the standard NNTP commands.
The new PROFILE command makes a virtual newsgroup permanent, and saves
the search criteria that generated the newsgroup. The server will show
newly arrived articles that match the search criteria as new articles in
the virtual newsgroup. This can be implemented on the server by reexe-
cuting the search periodically or by using a profile mechanism that
checks each article as it arrives.
Because the virtual newsgroup usually consists of articles from many
other newsgroups, clients might want to display it differently than a
non-virtual newsgroup. For example, clients may want to display the
source newsgroup of each article. To make this easier, and to resolve
some of the longstanding problems with XOVER, the OVER command is intro-
duced.
To control the headers returned by the OVER command, and to allow the
client and server to communicate information that does not fit through
other channels, the SET and GET commands have been added. SET allows
the client to send an attribute/value pair to the server. GET allows
the client to retrieve an attribute/value pair by attribute name.
In addition, the XPAT command is extended so that it can be used to
full-text search articles within a single newsgroup. Both the headers
and the body of the articles are searched.
3.1. New and Enhanced NNTP Commands
There are five new NNTP commands, three new options to the existing LIST
command, and enhancements to one existing command.
* GET
* SET
* OVER
* SEARCH
* PROFILE
* LIST SRCHHEADERS
Hernacki & Polk [Page 2]
INTERNET-DRAFT October 4, 1996
* LIST SEARCHES
* LIST XACTIVE
* XPAT
The GET and SET commands communicate per-session information between the
client and server.
The OVER command returns specific headers requested by the client. This
command functions much like the widely implemented XOVER command.
The SEARCH command runs a one-time search.
The PROFILE command converts search results into saved profiles and
manipulates them.
The LIST SRCHHEADERS command returns the headers that the server allows
in full-text searches.
The LIST XACTIVE command functions in most ways like the LIST ACTIVE
command. It is different because it can be made to return information
about a single newsgroup, and it supports new newsgroup flags for the
virtual newsgroups. It also can return multpile newsgroup flags per
newsgroup.
The LIST SEARCHES command allows the client to determine which news-
groups are full-text indexed. Only these newsgroups are full-text
searchable.
The XPAT command has a simple extension to allow the header "TEXT".
This specifies a full-text (headers and body) search of the articles in
a single newsgroup.
4. Use of NNTP Extension Mechanism
The NNTP extension mechanism allows a server to describe its capabili-
ties. The following extensions are used to describe the capabilities
described in this document.
4.1. SETGET Extension
The SETGET extension means that the server supports the SET and GET com-
mands.
4.2. OVER Extension
The OVER extension means that the server supports the OVER command. In
Hernacki & Polk [Page 3]
INTERNET-DRAFT October 4, 1996
addition, any server that supports the OVER extension must also support
the SETGET extension, and must explicitly include SETGET in the list of
extensions it supports.
4.3. SEARCH Extension
The SEARCH extension means that the server supports the following com-
mands: SEARCH, LIST SEARCHES, LIST SRCHHEADERS, LIST XACTIVE. In addi-
tion, any server that supports the SEARCH extension must also support
the OVER and SETGET extensions, and must explicitly include OVER and
SETGET in the list of extensions it supports.
4.4. PROFILE Extension
The PROFILE extension means that the server supports the PROFILE com-
mand. In addition, any server that supports the PROFILE extension must
also support the SEARCH, OVER, and SETGET extensions, and must expli-
citly include these extensions in the list of extensions it supports.
4.5. XPATTEXT Extension
The XPATTEXT extension means that the server supports the TEXT header in
the XPAT command, as described by this document.
5. Command Descriptions
5.1. GET command
GET [ATTRIBUTE [ATTRIBUTE]...]
GET allows the client to retrieve session-specific state information
from the server.
The only characters allowed in attributes or values are uppercase and
lowercase letters, numbers, and the characters "-_:". Case is not signi-
ficant in the attribute names. This information must not be preserved
by the client across server sessions.
If no ATTRIBUTE is specified, all of the attributes are returned by the
server.
5.2. Responses
The server will either return the values (209), indicate a syntax error
(501), or indicate that the attribute was not recognized (409).
209 values follow
501 command syntax error
Hernacki & Polk [Page 4]
INTERNET-DRAFT October 4, 1996
409 unknown attribute
5.3. Example
C: GET
S: 209 values follow
S: OVERFIELDS Subject:Newsgroups:From:References:Lines:Bytes:
S: .
5.4. OVER command
OVER [range]
The optional range argument may be any of the following:
an article number
an article number followed by a dash to indicate
all following
an article number followed by a dash followed by
another article number
If no argument is specified, then information from the current article
is displayed. Successful responses start with a 224 response, followed
by a line listing the headers, followed by the overview information for
all matched messages. Once the output is complete, a period is sent on
a line by itself. If no argument is specified, the information for the
current article is returned. If a newsgroup has not been selected, a
412 error response is returned. If no articles are in the range speci-
fied, a 420 error response is returned. If the client only has permis-
sion to transfer articles, a 502 response will be returned
By default, the headers returned are as specified in the OVERVIEW.FMT
file, and will therefore be the same as the server would return for an
XOVER command.
The SET command may be used to specify what headers are returned and in
what order. The SET attribute OVERFIELDS is used to specify the names of
the headers to return, with the headers concatenated together, including
the terminating ":".
This use of SET for the OVERFIELDS attribute must be supported. The
server must honor this request and return only the headers specified in
subsequent OVER commands in that session.
The number of lines in the article is available in the Lines: field.
The number of bytes in the article is available in the Bytes: field.
Hernacki & Polk [Page 5]
INTERNET-DRAFT October 4, 1996
5.5. Responses
224 data follows
412 not in group
420 no articles in range
501 command syntax error
502 no permission
5.6. Example
C: SET OVERFIELDS Subject:From:Lines:
S: 209 OK
C: OVER
S: 224 data follows
S: Subject:From:Lines:
S: Re: Long runing subjects/tfrequent-poster@somewhere.com/t593
S: .
5.7. SEARCH command
SEARCH <query>
The specified query is executed, and the name of the resulting virtual
newsgroup is returned.
Search result virtual newsgroups are not permanent. The server must
keep them for at least ten minutes after the last client access to the
newsgroup, but after that time the server is free to remove them. This
ten minute period must be observed even if the client terminates it's
session with the server. "Access to the newsgroup" is defined to mean
any command executed while the virtual newsgroup was the current news-
group.
The query is the full-text search criteria expressed in the syntax
described below.
5.7.1. Search Syntax
The search query syntax is derived from the search syntax defined for
the IMAP4 protocol. It is somewhat different because of the way inter-
national character sets need to be encoded. See RFC 1730 [IMAP4] for
the IMAP4 search syntax.
One exception defined by this RFC to the 7bit character set restriction
for commands in [NNTP-977] is that the 8bit ISO-8859-1 character set is
allowed in unencoded form in search strings. This is allowed because it
simplifies handling this widely used character set, without requiring
support of arbitrary binary data.
Hernacki & Polk [Page 6]
INTERNET-DRAFT October 4, 1996
Here is a semi-formal definition of the search query syntax.
query = HEADER Newsgroups <group_pat> <search_term> [<search_term>...]
group_pat = "<group_specifier>[,group_specifier...]"
group_specifier = Either a single * for all searchable groups,
a full newsgroup name, or a part of the news
hierarchy, suffixed with .*.
search_term = TEXT <search_string> |
HEADER <header_line> <search_string> |
SENTBEFORE date |
SENTON date |
SENTAFTER date |
NOT <search_term> |
OR <search_term> <search_term> |
( <search_term> )
search_string = "<simple_string>" |
"<MIME-2String>"
date = Date in DD-MMM-YYYY form.
simple_string = US-ASCII or ISO-8859-1 text.
MIME-2String = A MIME-2 encoded string.
The double quotes are always required around the group pattern and the
search strings.
BODY requests a search through the body of the article, excluding the
headers.
TEXT requests a search through all indexed parts of the article, includ-
ing the body and all indexed headers.
If multiple search_terms are listed without being prefixed by the OR
operator, they are ANDed together.
SENTBEFORE, SENTON, and SENTAFTER may only be used if the Date: header
is indexed, as specified by the LIST SRCHHEADERS command.
The searches should be case insensitive.
5.7.2. Query Examples
SEARCH HEADER Newsgroups "comp.*, alt.*" BODY "nntp" SENTAFTER 25-DEC-1995
SEARCH HEADER Newsgroups "comp.*" HEADER From "Salz" NOT HEADER From "Bob"
Hernacki & Polk [Page 7]
INTERNET-DRAFT October 4, 1996
SEARCH HEADER Newsgroups "*" BODY "Election" ( OR TEXT "Bob" TEXT "Bill" )
SEARCH HEADER Newsgroups "comp.lang.c++" TEXT "=?ISO-8859-1?Q?QPtext?="
5.8. Responses
A successful search returns the name of a newsgroup in which the server
has placed the results. This newsgroup can then be treated like any
other non-postable newsgroup. If no articles matched the search cri-
teria, an error (460) is returned.
260 groupname
460 no matches found
462 error performing search
501 command syntax error
5.9. Example
C: SEARCH header newsgroups "*" TEXT "internet"
S: 260 virtual.group.temp5423
5.10. PROFILE command
PROFILE NEW [profilenamehint] | RET | DEL
The PROFILE subcommands specify what operation to perform:
NEW creates a new profile from the current search result.
RET returns the search criteria of a profile.
DEL deletes a profile.
5.10.1. NEW Subcommand
NEW converts a SEARCH result group into a profile.
The profilenamehint is used by the server as part of the name of the
newsgroup. The client must not make any assumptions that any part of
the name hint will be used. The name hint must be 32 characters or
less, and consist of valid newsgroup name characters, except that no
"."s are allowed in the profilenamehint.
5.10.2. RET Subcommand
RET retrieves the QUERY field stored on the server for the current pro-
file newsgroup.
5.10.3. DEL Subcommand
DEL deletes the current profile newsgroup. This command also indicates
Hernacki & Polk [Page 8]
INTERNET-DRAFT October 4, 1996
that the group should be deleted, although the server does not have to
delete it immediately. The server must clear the current group context,
so that no commands that require a group context can be done.
5.11. NEW Subcommand Responses
If the profile newsgroup is created, the 260 response is returned,
including the name of the new newsgroup. If there's no current news-
group, the error response 412 is returned. If the current newsgroup
isn't a search result virtual newsgroup, the 461 error response is
returned.
5.12. RET Subcommand Responses
If the PROFILE RET is successful, the 261 response is returned, includ-
ing the criteria. If there's no current newsgroup, the error response
412 is returned. If the current newsgroup isn't a profile virtual news-
group, the 461 error response is returned.
5.13. DEL Subcommand Responses
If the PROFILE DEL is successful, the 260 response is returned, includ-
ing the name of the deleted virtual newsgroup. If there's no current
newsgroup, the error response 412 is returned. If the current newsgroup
isn't a profile virtual newsgroup, the 461 error response is returned.
5.14. Responses
260 groupname
261 returned search criteria
412 not in group
461 current group is not a correct virtual newsgroup
462 profile error
501 command syntax error
5.15. Example 1 - Create New Profile
C: SEARCH header newsgroups "comp.*" TEXT "fortran"
S: 260 virtual.search.temp3254
C: GROUP virtual.search.temp3254
S: 211 103 402 504 virtual.search.temp32
C: PROFILE NEW myprofile
S: 260 virtual.profile.myprofile
5.16. Example 2 - Return Profile
C: GROUP virtual.profile.myprofile
S: 211 103 402 504 virtual.profile.myprofile
Hernacki & Polk [Page 9]
INTERNET-DRAFT October 4, 1996
C: PROFILE RET
S: 261 TEXT searchstring
5.17. Example 3 - Delete Profile
C: GROUP virtual.profile.myprofile
S: 211 103 402 504 virtual.profile.myprofile
C: PROFILE DEL
S: 260 virtual.profile.myprofile deleted
5.18. SET command
SET ATTRIBUTE <value> [ATTRIBUTE <value> ...]
SET allows the client to set session specific state information. This
might include things like what language it wants to use, what version of
the protocol it wants, what type of authentication it will be using, or
optional article compressions. The only characters allowed in attri-
butes or values are upper and lower case letter, number, and the charac-
ters "-_:". Case is not significant in the attribute names. This infor-
mation must not be preserved by the server across client sessions.
If multiple attributes are specified and the server does not recognize
one or more of them, it must return an error and not set any of them.
5.19. Responses
The server will either return that it set the value (209), return a syn-
tax error (501), or indicate that one or more of the attributes was not
recognized (409).
209 OK
501 command syntax error
409 unknown attribute
5.20. Example
C: SET LANG USEnglish
S: 209 OK
5.21. LIST SRCHHEADERS
LIST SRCHHEADERS
Returns a list of which headers can be specified in full-text search
queries on the server.
Hernacki & Polk [Page 10]
INTERNET-DRAFT October 4, 1996
5.22. Responses
Returns a list of headers, one per line. A "." on its own line ter-
minates the list.
5.23. Example
C: LIST SRCHHEADERS
S: 215 Data follows.
S: From:
S: Date:
S: Subject:
S: .
5.24. LIST SEARCHES
LIST SEARCHES
Returns a list of strings that define which newgroups are being indexed
by the news server and are thus available for searching. In addition,
the character sets allowed for each group is returned.
5.25. Responses
When there are newsgroups indexed it will return 215, followed by each
portion of the tree that is indexed. If all groups are indexed, a line
with "*" is returned. If only some parts of the newsgroup hierarchy are
indexed, they are identified in the form <indexed-hierarchy>.*. Clients
should not assume that these will always be top level hierarchies. A
"." on its own line terminates the list.
The character sets allowed in full-text searches for each entry is also
returned. The character sets are identified by the name as defined in
[MIME-1].
5.26. Example
C: LIST SEARCHES
S: 215 Data follows.
S: alt.* US-ASCII
S: comp.lang.* US-ASCII ISO-8859-1 ISO-8859-2
S: mcom.* ISO-8859-1
S: .
Hernacki & Polk [Page 11]
INTERNET-DRAFT October 4, 1996
5.27. LIST XACTIVE
LIST XACTIVE [newsgroup]
The LIST XACTIVE command functions in most respects like the LIST ACTIVE
command. It differs in the following ways:
First, multiple flags may be returned. The flags are concatenated
together.
Second, LIST XACTIVE allows two new flags to be returned, "s" or "p",
indicating a search results virtual newsgroup or profile virtual news-
group, respectively. In both these cases the "n" or "y" flag is also
set, indicating whether the virtual group can be posted to. So the flag
field in the response line for a search result virtual group that can
not be posted to will be "ns".
Third, other flags may be added in the future. Clients must ignore
flags they do not recognize.
5.28. Responses
The responses are exactly the same as the LIST ACTIVE command, except
for the new flags.
5.29. Example
C: LIST XACTIVE virtual.guest.temp3453
S: 215 Newsgroups in form "group high low flags".
S: virtual.guest.temp3453 0000000000 0000000001 ns
S: .
5.30. XPAT command enhancement
XPAT header range|<message-id> pat [pat...]
The XPAT command is enhanced in a simple way: The new value TEXT will
be supported as a header when invoking the command. The TEXT header
requests a full-text search the body and all headers of the specified
articles.
When TEXT is specified for the header, only a single "pat" is allowed,
and it must be a full word to search for, rather than a wildmat pattern
as allowed otherwise.
Hernacki & Polk [Page 12]
INTERNET-DRAFT October 4, 1996
5.31. Responses
If TEXT isn't specified as the header, the response is the same as it
always has been for XPAT, with each result line containing the article
number and the value of the header that matched the pattern.
If the TEXT header is specified, the constant string "TEXT" is returned
in place of the value of the header that matched the pattern.
5.32. Example
C: XPAT TEXT 1000-2000 searchtext
S: 221 Header follows
S: 1021 TEXT
S: 1024 TEXT
S:.
6. Security Considerations
The search and profile commands must be implemented in a way that does
not allow access to articles in newsgroups that a client is otherwise
restricted from reading due to access control rules.
Clients will in some cases want to control access to virtual newsgroups
or profiles. No means to support this kind of protection is defined in
this document, as it requires access control infrastructure that is not
currently defined for NNTP.
The OVER command should be treated the same as the XOVER command for
access control and security purposes.
The other commands do not introduce any new security issues.
7. Bibliography
[NNTP-977]
Network News Transfer Protocol. B. Kantor, Phil Lapsley, Request
for Comment (RFC) 977, February 1986.
[NNTP-NEW]
Network News Transfer Protocol. S. Barber INTERNET DRAFT, Sep-
tember 1996.
[IMAP4]
IMAP4 INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4. M Crispin,
Request for Comment (RFC) 1730, December 1994
Hernacki & Polk [Page 13]
INTERNET-DRAFT October 4, 1996
[MIME-1]
Borenstein N., and N. Freed, MIME (Multipurpose Internet Mail
Extensions) Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies, RFC 1521, Bellcore, Innosoft,
September 1993.
[MIME-2]
Moore, K., MIME (Multipurpose Internet Mail Extensions) Part Two:
Message Header Extensions for Non-ASCII Text, RFC 1522, University
of Tennessee, September 1993.
8. Author's Address
Brian Hernacki
Netscape Communications, Inc.
685 W. Middlefield Road
Mountain View, CA 94043
USA
Phone: +1 415-937-6738
Email: bhern@netscape.com
Ben Polk
Netscape Communications, Inc.
685 W. Middlefield Road
Mountain View, CA 94043
USA
Phone: +1 415-937-3686
Email: bpolk@netscape.com
This Internet Draft expires April 4, 1997.
Hernacki & Polk [Page 14]