Message filtering and e-mail

(1)

Niclas Eberhagen April 1997

Department of Mathematics, Statistics and Computer Science Växjö University

S-351 95 VÄXJÖ Sweden

Niclas.Eberhagen@masda.hv.se

Abstract

In this paper the concept of filtering is discussed and the background of why it has developed is given light. Also the possibilities for filtering electronic communication systems and how it can be done within e-mail applications are presented. Here a popular e-mail client, Eudora Pro™ 3.0, is presented and the different filtering mechanisms and functions are presented as to show what can be done and how it is done. Different inadequacies and limits are discussed concerning the filtering and what can be done in order to achieve efficient filtering.

Keywords: Message filtering and electronic mail.

Working paper within the Information System Science's project, IMG-WP-42.

(2)

Introduction

What is the concept of filtering, i.e. information filtering specifically? Information filtering as a concept is tied to the process of selecting information that is relevant to an individual’s or a group of individuals’ needs. These needs of information are tied either consciously or

unconsciously to the achievement of goals, either formally or informally stated, and are either tied to matter at hand or long term planning. The existence of such a process as information filtering indicates that there is too much information at hand and that much of it is of small relevance to the needs. In the pursuit of achieving the goals or completeness of function the individual scanning for the needed information experiences an overload of information that puts a high strain on his cognitive capabilities of going through it all at small revenue.

Information overload is here defined “as the economic loss associated with the examination of a number of non- or less-relevant messages” (Loose 1989, p. 179).

What does the filtering processes look like? In a study, made by Malone, Grant, Lai, Rao, and Rosenblitt (1987), of how different types information is distributed within organizations, individuals were interviewed about their needs and wants, and how they made the selections, in order to establish their information filtering processes. Focus in the study was placed on different types of environments of information filtering such as electronic mailboxes, e-mail systems, electronic billboards, etc., from which the individuals retrieved information.

Out of the results from this study Malone et al. (1987) identified three general filtering processes which individuals use in order to select information:

Cognitive filtering

The cognitive filtering process is mainly based on a characterization of the content of the distributed information. The individual’s stated information needs are represented in a filter profile consisting of keywords. Distributed information is thus matched against this list of keywords in order to establish the relevance according to the needs of the individual.

Keywords are often of the type: date, sender, title words, message type, and specific topic describing words from the content of the text. These keywords may be combined into complex patterns of selection criteria. The techniques used here are mainly to be found in the field of information retrieval (Belkin and Croft 1992).

However, further research needs to be done here since they do not usually involve any actual text understanding but merely a match of strings that take no consideration to the semantics of the words (Ram 1992).

Social filtering

Social filtering is based upon the relations that exist between individuals in an organization or group. The relation between the distributor and receiver often dictates how the receiver will classify the information. If the distributor of the information is the manager of another department the information may be classified as less interesting and non-relevant. However, should the information come from the closest boss it might be selected and processed immediately because the individual sees his boss every day and knows that he requires an instant reply.

Goldberg et al. (1992) has designed a collaborative filtering system, Tapestry, which filters messages based on the number of responses received. The system assumes that the messages which receive multiple responses from eager users are likely to be useful or interesting to casual user of the system and is in effect a variant of social filtering where it is possible to base one’s judgment whether to read a specific message or not on who has read what.

Economic filtering

Receivers of information often try to value the effort it would to take to read the messages. If the messages are to long the receiver may ignore it because it would take too much of his time in order to process the information. Here the length or amount of information becomes a

(3)

criterion upon which the information is classified.

Another aspect of economic filtering concerns the distributor of information. Usage of information channels is expensive, however the broader the information is i.e. addressing several receivers, the cheaper it becomes to formulate and distribute the information. The best message to a receiver is that which is specially formulated to his needs but which in that case leads to an increasing cost of formulating the information to the distributor. Therefore much of the information, distributed through common channels, will be of general character and have little relevance to most of the receivers’ needs (Malone et al. 1987).

What is the background that has led to the formation of the concept of information overload tied to process of information filtering? To answer this we will have to look upon the work of others.

Huber in his article (1984) points out that in the post-industrial society organizations are believed to face a different world compared to that of today. Huber (1984) characterize the post-industrial society as experiencing greater levels of knowledge, complexity, and turbulence, and that each of these will be increasing at a considerable rate.

The amount of available information will grow and its absolute growth will increase. The increase of knowledge will lead to large increases in technological, economical, and social specialization and diversity. The high diversity and specialization will lead to large increase in societal interdependencies and thus escalate the level of complexity and its absolute growth.

The increase of turbulence follows from the rapidity of events and increasing knowledge, causing many technologies to be more effective and shorten the duration of the events, thus permitting more events per time unit (Huber 1984).

Effects of advanced communication technologies, such as e-mail system, voice mail systems, radio-phones, etc., and computing technologies, mainly used for storing, retrieving, and processing information to derive new information, will be that they increase in availability to the individuals and increase the efficiency of communication, making the communication timeliness, and that they will open up new sources of information which originally were external to the organization and keep the information more up-to-date.

Decision making in post-industrial organization will be more frequent, faster, and complex thereby increasing the decision-task loads greater than before. This will put high demands on the acquisition and distribution of the available information. Organizations will need to, more than before, scan the environment of information about existence of problems and

opportunities or for information to be used in the future, and probe the environment for information not routinely gathered, but also guard them self against information overload as the amount of available information will increase to the individual decision maker (Huber 1984).

Denning (1982) puts the focus of the relevance to the problem on the implications of automatic document preparation system and electronic mail, and on the quantity of information being received by the end users by stating the following:

“The visibility of personal computers, individual workstations, and local area networks has focused most of the attention on generating information - the process of producing documents and disseminating them. It now time to focus more attention on receiving information - the process of controlling and filtering that which reaches the persons who must use it” (Denning 1982, p163-165).

The growth of use of electronic mail systems

Why is it interesting to study electronic mail systems in this context? To answer that question we have to look at why people communicate within organizations and why they use e-mail

(4)

systems and what these systems can accomplish and what effects they have.

Organizations use communication activities to make sense of their environments, to coordinate and control internal activities, and to make decisions. Mintzberg (1973) found that

communications activities accounts for a significant portion of the time of white-collar workers and that among the top executives 75% of their time was spent in communication activities.

Even though these findings are some twenty years in the past the nature of the communication activities of white-collar workers and top management most certainly has not changed into a direction pointing to a lower degree communication.

The focus on e-mail systems is motivated by their increased use within the workspace of people. As Rudy (1996) points out that during the last twenty years the use of these kinds of systems have come to increase markedly and the growth has been motivated by mainly two factors: the increase in the number of desktop computers and the increase in the number of data transmission links able in principle to connect them together. These two factors, as Rudy (1996) further points out, have thus allowed the development of communication protocols and software to conform to these protocols allowing individuals to exchange messages and access huge amounts of data far more easily than in the past. This leads to a shift in the information processing of people to take place within such systems.

E-mail systems thus increase the communication efficiency and productivity of the

organizations and individuals who use these systems as it allows individuals to span both time and distance. But also reduce the need and difficulty for communication to move from one medium to another (Culnan and Markus 1989). These are just some reasons why people and organizations use such systems as e-mail.

Rudy (1996) also points out other major features of e-mail that quite obvious but also explains the popularity: discussion can occur asynchronously, sender and receiver do not have to be present simultaneously; data can be shared more easily, as individuals can be sent data more rapidly than via ordinary postal mail, and messages or data in electronic form can be copied easily to many individuals; individuals have time to think before replying to a message; and it is non-intrusive compared with face-to-face conversations or the telephone, the receiver of messages can choose when to read them. She then points some of the shortcomings such as: a possible drop in face-to-face contact and building of relationships and consensus; possible information overload; and the extra time required to type in and read messages compared to face-to-face meetings and telephone conversations.

One of the benefits of e-mail systems is that they tend to increase the communication activities of the individuals, thus allowing a huger throughput of information/message exchange. The information artifacts we create and capture on a certain media will not without great difficult be transformed into a representation in another media. Thus, what is created in one media tends to stay there. Information artifacts created in an electronic communication environment tend to accumulate and persist in these environments. The most fundamental problem here of

information organization is to cope with the overload of information and bring order into the high load. This can be done by sorting information into categories which relate to different aspects of the processes that people engage in, and further through complex searching mechanisms it is possible to narrow in to information which is relevant to matters at hand.

The information overload within e-mail application is most often caused by the subscription to different mailing lists or the mass spreading of information. Loose (1989) in his work has come to develop a formal model of the decision process used in deciding whether or not to examine a message in an organizational setting. The model predicts the usefulness of a message based on the available message features and may be useful to rank incoming message or electronic mail by expected importance or economic worth and thus try to come to terms with problems of information overload.

(5)

Electronic mail messages and filtering

In the context of electronic mail and filtering a message is defined “as information made available to an organization or individual who may, in turn, choose to examine, ignore, memorize, or discard the information. Message may be in any form or transmitted through any medium; the only restriction we will place upon a message is that it have feature value which are available to facilitate a decision by the recipient to examine or not examine the message’s full text” (Loose 1989, p. 180).

Electronic mail (e-mail) is defined as “the creation, editing, receiving, storage, and printing of text--facilitated by the computer” (Rice and Bair, 1984, p. 191). It is in the function of

receiving text that the concept of information filtering is applied. Text messages distributed and received within an e-mail has inherited features that seem to facilitate the filtering.

E-mail messages are formatted according to RFC 822 (request for comments) (Crocker 1982).

A mail message consists of lines of ASCII text, European characters not found within the standard ASCII definition are code with a special character combination before transmission and decoded when received. The first few lines are called the header and have a defined format, i.e. they are quite structured. The lines after the header are called the body and are not specified in any way, i.e. they contain unstructured text. In fact, in some cases a message may not even have a body. The headers and body are separated by an empty line, a line whit no character on it. A header starts with a field name followed by a colon and the field body. The contents of the field body may here be rigidly defined or on some cases in free unstructured form. The following field names are the bare necessity and quite mandatory, i.e. there must be a header with each of these names: date, From, or Sender and From; To, or Cc (carbon copy), or Bcc (blind carbon copy), it is possible to have several of each. All the recipients from To and Cc fields are listed in the message and everyone who receives it knows who else got a copy. Recipients from Bcc fields are not listed in the message, and consequently no one knows that they got a copy. The following field names are optional: Return-path; Received; Reply-To;

Message-ID; In-Reply-to; Reference; Keywords; Subject; Comments; and Encrypted. Of all these field names all require a rigidly define structure of the contents of the field body except:

Keywords; Subject; and Comments. These may be composed of free unstructured text.

Each field may stretch over several lines, if required by lengthy addresses or subject

descriptions. In effect, each header may be indefinitely stretched over several lines as long as the structure of the header is not compromised.

Message may be transmitted through the SMTP (simple message transfer protocol) as

described by RFC 821 (Postel 1982) and may be received through POP3 (post office protocol version 3 for remote post office boxes) as described by RFC 1939 (Myers and Rose 1996).

Both of these protocols define how long a line of text is, since that is the smallest unit being transmitted or received, but from a users point of view that is not an obvious concern. All he has to do is to comply with the message format rules as described above.

The structure of the header fields make it easy to filter from. Here highly structured fields make it easy to implement filtering based on exact string matching of the contents of the header. The body, being in free text form, is much harder to deal with since exact string matching may be possible but since different words may be used to describe different phenomena no exact match may be expected between the message and the stated filtering criteria. Even though the lexical features of words say nothing about the actual content of the message and that most filtering techniques do not involve any semantic analysis of the content Belkin and Croft (1992) has shown that in spite of this fact the filtering based upon

information retrieval techniques, such as described by Salton (1989) and Eberhagen (1993), involving nothing more than lexical matching, they have proven quite sufficient.

Most modern e-mail applications support some functionality for filtering and constructing filters for messages. Most of them base their filtering upon the well structured header of the

(6)

message. How much filtering that is implemented is varying. Most seem to allow for filtering according to date, sender, and subject, either alone or in combination. Some e-mail application allow the coupling of a filtering criteria to a specific action, such as moving or copying the mail, i.e. routing it to a specific folder, or to notify a user of a specific incoming message, or even triggering an automatic reply. Some applications even use the size of the message as a filtering criterion. The variation here is too great but at a bare necessity level most seem to have implemented some form of routing message according to the contents of one or in combination several fields in the header.

Some of the work made within this area of filtering electronic message can be found with Malone et al. (1987). They found that most of the information which is distributed within organizations is semi-structured, i.e. keywords such as date, sender, message type, and so forth, were to be found in specific reoccurring places within the messages. This semi structure is important for how the receiver filters the information. It is upon the occurrence of these keywords that most of the cognitive filtering is based. Malone et al. (1987) defines semi- structured information as messages of identifiable type, where each type has a clearly defined set of templates, but some the templates can include unstructured text or other sort of

information.

In this approach both the distributed information and the individual’s filter profile is structured according to these templates. Not all of the templates need to be filled in. The templates of the structured information are matched against the templates of the individual’s filter profile. If the templates, consisting of structured information, of both the message and filter profile match, the information is classified as relevant.

Malone et al. (1987) advocates the usage of semi-structured templates through communication and points out:

- semi-structured templates enables computers to automatically process a wider range of information and eliminates the need for natural language analysis in order to classify the messages while still being able to allow advanced processing of the information.

- semi structure templates allows individuals to communicate unstructured information as well as structured information in the same message and thus making the communication more flexible and user friendly.

- most of the processing of information within organizations is done in a structured form and usage of semi-structured templates helps to reflect this processing.

Malone et al. (1987) has also developed a system based on cognitive filtering of semi- structured messages, the Information Lens system. It has at its basis an e-mail system for communication and lends support to its users for formulating semi-structured messages based on semi-structured templates, as described previously (Malone et al. 1987). The system supports the formulation of information filtering rules or criteria, which are formulated as if- then clauses and describes what should be done with the messages once they are selected, e.g.

to be put in a folder or to be discarded. The match between the filtering rules or criteria are made possible by the user’s specification of what the semi-structured fields should contain in order for the messages to be of value. Several rules can be formulated as to deal with several different types of messages that are addressed to the individual. The system here comes to function as an intelligent secretary or assistant. However the Information Lens system does not only let the user filter message addressed to the individual but can include other message that move through the main flow which are not specifically directed toward a recipient. Malone et al. (1987) points out the fact that most of the information circulating within an organization is of general character and only a small part is specifically addressed to one individual or a group of individuals. Most of this general information, going down the mainstream, is not of

relevance to the individual but in order to be able to select information without a specific

(7)

addressee and which is of relevance the Information Lens system offers the possibility for users to formulate general message without a specific recipient. These are then placed in a general public mailbox and each individual is able to formulate filtering rules aimed to this public mailbox. When these filtering rules are executed the mail-server is forced to redistribute the messages within the general public mailbox to all users so that their filtering rules may be matched against them.

Eudora Pro™ 3.0, an e-mail application exemplified

In this section I will present an e-mail application, Eudora Pro™ 3.0 by Qualcomm

Incorporated in order to show an example of what can be done. Eudora Pro™ 3.0 is a modern e-mail application which has received high recommendations from evaluations such as CNET’s (CNET 1996 and 1995). It uses the SMTP to transmit messages and the POP3 to receive them.

It has allows for several user defined mailboxes, besides the standard in- and outbox. It also allows for a hierarchical organization of its user defined boxes, which are shown in a graphical tree representation. Several windows can be opened simultaneously as to show the content of the different boxes and also allows for several windows to be opened in order to show the content of the full messages. The moving of message between different boxes is greatly

enhanced through drag-and-drop. The sorting of the message within the boxes is fairly complex and based on different criteria according to the fields of the header. Eudora Pro 3.0 strength seems to lie in its filtering function which also bases its criteria on the items in the header, one or in combination, but also on the words in the message body. Complex rules can be

constructed to route different messages which can be coupled with many of the basic functions of the e-mail application, e.g. the moving of messages, posting automatic replies and alerting the user to mail from a specific sender, these will be discussed later. Eudora Pro™ 3.0 also allows for the inclusion of several attachments with the messages. The e-mail application also has the capability of showing embedded hotlinks such as URL links. It has built in spell

checking capability that may prevent you from embarrassing spelling mistakes. The application allows for several address books and the capability of using nicknames and creating mailing lists.

Drawbacks are the lack of secure transmission of messages and the lack of three-pane view that simplifies the overview greatly and enhances the browsing, the possibility of the clutter of many opened separate windows severely limits this overview.

When it comes to the filtering capabilities or the routing of messages which actually is the case, Eudora Pro™ 3.0 offers the possibility of creating complex filtering rules as we shall see in the following section.

Filtering within Eudora Pro™ 3.0

Many of the e-mail management functions in Eudora Pro™ 3.0 can be done automatically using filters. For example, you can automatically reply to a request for information, transfer all the messages from your friends into a Personal mailbox, and label all the messages from your colleagues as “Noteworthy”.

How do I go about and what can be done? In order to create a filter you have to open the Filters window, select Filters from the Tools menu. The Filters window is displayed, and any filters you have created are listed on the left, as shown in the figure 1.

To create or modify a filter in Eudora Pro™ 3.0, first click on the New button or select an existing filter. Then, secondly, select the options for how you want the filter to be used: as an automatic filter to be invoked on any incoming and/or outgoing mail; or as a manual filter that can be invoked when you select Filter Messages action from the Special menu, here any messages that still are in the inbox are filtered. Any combination of these options works fine.

Thirdly, define the criteria for the filter, use the header item popups and the text fields to specify which header items should include a particular string of text. You may use the

(8)

conjunction popup to define two related terms for the criteria so that your filter is as specific as possible. Eudora allows for two different filtering criteria to be used in conjunction with each other. Lastly, define the action to be taken on messages that fit the criteria and save the filters.

Figure 1. The filtering dialog of Eudora Pro™ 3.0

In the Filters window, as shown in figure 1, the Headers item specifies which section of the message the filter will search. You can select an option from the popup menu or enter one yourself if you prefer to use one which does not appear on the menu. The “«Any Recipient»”

option searches all possible recipient items (To, Cc, Bcc), the “«Any Header»” option searches all message headers (including hidden headers that are shown with the “Blah Blah Blah”

option), and the “«Body»” option allow for searches within the message body.

The next item in the filters window specifies the type of match the filter shall perform when comparing the character string to the specified header. Here several options are found:

- the contain or does not contain - if the specified header item contains or does not contain a specified text string then the message is filtered.

- is or is not - if the specified header item is or is not a complete match of the text string the message is filtered.

- starts with or ends with - if the specified header item starts with or ends with the text string, filter the message. The text string may include any number of non-whitespace in the beginning since they are ignored.

- appears or does not appear - if the header item appears or does not appear in the message then the message is filtered. Here the text field is totally ignored, which is useful for filtering messages based only on the types of fields they contain.

- intersects nickname - if the text string is included in a nickname, whether it is a full address or a nickname within the nickname, then the message is filtered.

(9)

The next item in the Filters window contains the specified character string that the filter searches for. This could be made up any type of text but due to the fact that the greater

complexity of the search string is the less likely it is that the filter generates an exact match. So keep the matching text as specific and brief as possible.

The two filtering criteria can be made in conjunction with each other and a special popup in the filters window allows us to specify how the second filtering criteria will be used. The

following rules for conjunction are applicable:

- ignore - ignore the second term but if the message matches the first term then the message is filtered.

- and - if the message matches both the first and the second terms the message is filtered.

- or - if either filtering criteria is matched the filtering occurs.

- unless - if the message matches both the first and second terms no filtering occurs. This lets you exclude certain variations of the first filtering criteria.

The next section of the filtering dialog and the fourth step in the creation of a filter is the selection of what actions should be taken once a message has been selected. There is the possibility of creating up to five different responses, all independent of each other:

- none - specifies that no action is wanted.

- Make Status - assigns the selected status to messages. Select a status from the popup menu.

- Make Priority - assigns the selected priority to messages. Select a priority from the popup menu.

- Make Label - assigns the selected label to messages. Select a label from the popup menu.

- Make Subject - assigns the subject to messages. Enter a subject in the text field. The new subject is given in the message summary in the mailbox window, but the subject is not changed in the message itself.

- Play Sound - plays the selected sound when messages are received.

- Open - opens the mailbox and/or message when a message is received. If you set a previous action to filter messages into a mailbox, then that mailbox is opened.

- Print - prints one copy of each message.

- Notify User - notifies you “As normal” and/or “In report” when messages are received. The

“As normal” option notifies you based on the options have selected in the “Getting Attention Options”. The “In report” option notifies you by displaying a filter report that detail what filter actions have been taken.

- Notify Application - notifies the selected application when messages are received, and provide information from the message. Here you must specify the application to use and the part of the message to be included. Use the Browse button to select an application, or enter the command line yourself. See the Eudora Pro™ 3.0 manual for more details concerning the format of the command line.

- Forward To - Forwards messages to the e-mail address. Enter an e-mail address in the text

(10)

field. Forwarded messages are placed in the queue in the out mailbox, and sent the next time you send queued messages.

- Redirect To - redirects messages to the e-mail address. Enter an e-mail address in the text field. Redirected messages are placed in the queue in the out mailbox, and sent the next time you send queued messages.

- Reply With - replies to messages with the selected stationery file. Select a stationery file from the popup menu. Replies are placed in the queue in the out mailbox, and sent the next time you send queued messages.

- Server Options - assigns the selected server actions to messages. The Fetch option specifies that the entire message be transferred from the server the next time mail is checked, and the Delete option specifies that the message be deleted from the server the next time mail is checked. If you select both, the message will be transferred and deleted.

- Copy To - copies messages to the selected mailbox. Select a mailbox from the popup menu.

- Transfer To - transfers messages to the selected mailbox. Select a mailbox from the popup menu.

- Skip Rest - stops filtering for the message and the message is not matched to the rest of the filters in the list.

In Eudora Pro™ 3.0 when the filters are invoked (automatically or manually), each message is matched against each filter in order from top to bottom. If the message meets a filter’s criteria, the actions are done as specified until there are no more actions, and then the message is matched against the next filter. If at any point a “Skip rest” action is done, nothing else is done with that message, and the next message is filtered.

As have been shown fairly complex filtering rules coupled with different actions can be formulated. These may seem enough for any casual user and even a power user may be pleased. However we must remember that the filtering is still based upon lexical character matching of strings and as such has inherited a lot of limitations. Some of those limitations will be discussed in the following section.

Limits of e-mail applications

There are two major limits that stand out when looking at any e-mail application. The first has to do with the technical aspects and second with the human/cognitive aspects of filtering.

As we have seen filtering is nothing more than creating complex string matching rules, although the can be intricately combined with different logical clauses of how and where the matching should be or take place, it is still only lexical pattern matching. As such, to be

effective, there must be an exact match between the filter and the messages in order for them to be selected. This severely hampers lengthy filtering criteria due to the fact that they will become all too narrow in order to be effective. The recall of such filtering criteria would too low to be useful. On the other hand broad filtering criteria, consisting of only one word, most surely invites the problem of overload. There is not allowed any fuzzy matching where a threshold can be identified in order to state that what reaches this amount of overlap is relevant. There seems almost to be no problem formulating a filter and achieve a fair level of efficiency if the filtering was based only upon the rigidly structured fields of the header.

However when it comes to header fields and the body of the message that can contain unstructured free text the efficiency drops. To be able get anything at all one is forced to specify quite broad filter criteria. The problem seems to lie in the technical incapability of handling free text search and matching as described by Salton (1989). Is this a real problem or is it just technicalities? Thankfully it’s just technicalities. Most e-mail applications have not

(11)

implemented the functionality of free text matching according to some threshold. Why may one ask? Maybe to keep it simple and usable to the great number of casual users. It is most likely the power users of these systems that would require these capabilities. The casual users don’t seem to be needing advanced capabilities and therefore it is wasteful implement it.

The second limitation, the one of human or cognitive character, is probably the one which is more severe. This limits stems from the fact that we as humans use different words to describe the same phenomena. This has not always to do with the context from where one is situated during the process of creating the message but is even experienced within the same context area, i.e. where both sender and receiver are preoccupied with the same type of work, schooling and so forth and basically speak the same language.

This limitation even hampers the solution to the first limitation based upon the technical capabilities. The grammar we use is too context free and the variations of forms of expressions and words is too unlimited. Words that lexically look the same but due to different meaning of the content as whole have different meanings make effective string matching quite tough.

Anyone who has used one of the many search engines available on the internet to search for something specific will most often be surprised to as how much there is that is selected but of little interest, some documents even turn out be in foreign language which indicates that a string match is nothing more than a pattern match. We even use different words to express the same thing, jargon differs and even modes of communicating differ. Some write lengthy documents, richly describing in many angles and perspectives that which is to be conveyed and others, of the more silent characters, promptly cut the story short by few words. This also severely hampers the filtering. Lengthy documents trigger every filter but short ones trigger none. It all lies in the human nature and her cognitive capabilities.

Efficient message filtering

Since technical capabilities can be overcome, with more money and time, the cognitive aspect of the same problem are not so easily overcome. But once solved the greatly enhance life for existing technical capabilities and make the ones today adequate.

The best conditions for efficient filtering are those where the sender and receiver use the same language for communication. There is a one-to-one correspondence between the sender’s use of language to describe a phenomena and the receiver’s language. How then can this be achieved? The only way, it seems, to ensure that such conditions for efficient filtering is to be met are through administrative routines and measures.

As an example of administrative measures I here mention that which two of my colleagues did in order to ensure that as efficient filtering conditions as possible where to be achieved. They held a university course with students not living in the area and thus utilized e-mail

communication to distribute reports and other course assignments which needed to commented upon and redistributed. Both teachers where heavily loaded with other incoming mail not belonging to this course and consequently their inboxes got cluttered. Filtering here became a must in order to efficiently manage the course and guarantee that the students report where correctly received in time, and not discovered at a later stage due to an archaeological

expedition digging into the sediments to find treasures of nostalgia. However experiencing the efficiency of string matching in filtering, they soon discovered all the previously discussed drawbacks. This forced them to create a policy for communication. They told all their students that they must include, in the subject heading, a specific word, relating to the course. This word served as a marker for the course and the filters made to select messages pertaining to that course where based upon occurrence of this specific word. Without it present the messages that should belong to the course where not selected and thus buried deep in the sediments of the inbox.

This example is a most trivial one but it greatly enhanced the filtering functionality. What this example shows was that they had here developed, although in a small scale, some form of

(12)

protocol for efficient communication. The protocol just consisted of a keyword that was to be included in the subject heading for all messages concerning the course and thus to ensure that the students messages the correctly selected and dumped in a special mailbox. This is a small example of what can be done in order to achieve as great efficiency as possible. The example shows the strength in formulating protocols, dictating how the form of the messages should be, that all the involved in the communication act must adhere to in order to achieve efficient filtering. Here it shows that much can be done when it comes to supportive methods. Both methods and support for methods for developing protocols and upholding them seems to be critical in order to force the sender and receiver in to a context where the usage of vocabulary is a common basis for understanding. This could also be coupled with the use of a thesaurus where groups of words and phrases are substituted for ones which are defined in the protocol.

Another administrative measure which, besides formulation of protocols, which may solve the above described limitations, is the construction, usage and managing of mailing lists. Mailing list on mail servers can be devoted to different topics, such as for example a course as in the case of above, and receivers may subscribe to it. Senders post their messages on the mailing list belonging to the specific topic or content of the message. This is a low cost mass distribution service and also ensures that filters based upon messages coming from a specific mailing list function near perfection. All course administration could be run by the usage of mailing list.

Everyone attending the course thus subscribes to the specific mailing list and can then dump everything coming from that specific mailing list into an especially dedicated box.

Another possibility is to create and maintain mailing lists which are assembled from the address book of the e-mail application and kept in it. They are used when sending the message to a number of recipients. All of the addresses of the recipients are placed in the address field automatically before the actual transmission, although this is transparent to the sender. This list can be distributed to all addressees within it so all come to share the same mailing list ensuring that messages who are given the mailing list as an address by any sender on the list reaches everyone on the list. The addresses which are substituted for the name of the mailing list can then be used as a filtering criterion to select everything that is distributed with the help of this list.

Current studies made with mailing lists can be found with May (1997) who performed a study of automatic classification of e-mail messages within the HUMANIST electronic discussion group. A system was developed to categorize the message into four different classes: questions;

responses; announcements; and administrative. The automatic classification of a message was based on string matches between a message text and predefined string sets for each of the message types. Here the system’s automated ability to accurately classify messages was then compared to manually assigned codes and proven to be successful. One major problem of messages that were not given a classification was due to the exact string-matching which the system required.

Mailing lists have a drawback in that the message sent on them can’t be directed to specific receivers. What is distributed there everyone who subscribing to the list receives. Filtering based upon the messages coming from a specific mailing list is no guarantee that what comes is relevant. So in effect the filtering based on mailing list is only efficient as long as strict

discipline is enforced on behalf of everyone involved with the distribution of messages on it.

The same critique could be raised towards protocol dictating how and in what form certain markers, words, or phrases should be included in order for the filtering to effective. As soon as someone misuses it the protocol goes corrupt. However I do feel that the most viable attempt in order to establish efficient filtering through administrative measures is through the use of some sort of protocol forcing the receiver and sender into the same context where a shared language is made possible. Here methods for developing and maintaining them must be designed as well as support for the methods.

(13)

References

Belkin, N.J., and Croft, W.B. (1992). Information Filtering and Information Retrieval: Two Sides of the Same Coin?, Communication of the ACM, Vol. 35, No. 12, December.

CNET (1996 and 1995). URL:http://www.cnet.com/.

Crocker, D.H. (1982). Standards for the format of ARPA internet text messages, RFC No. 822.

University of Delaware, August.

Culnan, M.J., and Markus, M.L. (1989). Information Technologies, Handbook of organizational communication, SAGE.

Denning, P. (1982). President’s Letter on “Junk Mail”, Communication of the ACM, March.

Eberhagen, N. (1993). Information filtering. In Proceedings of Sundsvall 42: ADB i verksamhetens tjänst. Sundsvall, Sweden: Sundsvall 42 and The Swedish Information Processing Society.

Foltz, P.W., and Dumais, S.T. (1992). Personalized Information Delivery: An Analysis of Information Filtering Methods, Communication of the ACM, Vol. 35, No. 12, December.

Goldberg, D., Nichols, D., Oki, B.M., and Terry, D. (1992). Using Collaborative Filtering to Weave an Information Tapestry, Communication of the ACM, Vol. 35, No. 12, December.

Huber, G.P. (1984). The nature and design of post-industrial organizations. Management Science, Vol. 30, No. 8, August.

Loose, R.M. (1989). Minimizing information overload: the ranking of electronic messages, Journal of Information Science No. 15.

Malone, T.W., Grant, R.K., Lai, K-Y, Rao, R., and Rosenblitt, D. (1987). Semistructured Messages Are Surprisingly Useful for Computer-Supported Coordination, ACM Transactions on Office Information Systems, Vol. 5, No. 2, April.

Malone, T.W., Grant, K.R., Turbak, F.A., Brobst, S.A., and Cohen, M.D. (1987). Intelligent information-sharing systems. Communications of the ACM, Vol. 30, No 5, May.

May, A.D. (1997). Automatic Classification of E-mail Messages by Message Type, Journal of the American Society for Information science, Vol. 48, No. 1.

Mintzberg, H. (1973). The nature of managerial work. New York: Harper & Row.

Myers, J., and Rose, M. (1996) Post Office Protocol - Version 3, Request for comments 1939.

Postel, J.B. (1982) Simple Mail Transfer Protocol, Request for comments 821.

Ram, A. (1992). Natural Language Understanding for Information-Filtering Systems, Communication of the ACM, Vol. 35, No. 12, December.

Rice, R., and Bair, J.H. (1984). New organizational media and productivity. In R. E. Rice (Ed.), The new media (pp. 185-215). Beverly Hills, CA: Sage.

Rudy, I. (1996). A critical review of research on electronical mail, European Journal of Information Systems, No. 4.

Salton, G. (1989). Automatic Text Processing; The Transformation, Analysis, and Retrieval of

(14)

Information by Computer. Addison Wesley, USA.