by Stevi Deter

QUICK! What is 18 USC 1030? The answer is out there on the web. How do you find it?

You use a bookmark to get to Yahoo. You type the subject you're looking for into the field, you hit the search key . . . and come up with no matches. You try the Altavista search option. You come up with 37,614 matches. The first twenty-five listed seem to bear no relation whatsoever to the information you're seeking.

The WWW is a huge place, chock full of data. The problem is finding information.

Search Engines: Directories and Crawlers

One of the major formats for research on the web is the directory. The weakness of this format is that the directories usually rely on reporting: if you want your pages listed in the directory, you have to tell the directory where they are and what they contain. The directories index the listings, but aren't yet equipped to search actively for material to index.

This leads to two problems. The first is getting people to list their sites. Not all content providers know that they should be listing their sites with the directories. How often have you wondered why you can't find a database that someone mentioned she saw on the web? A related issue is deciding which directories you should list with. There is a baffling array, which have forms that take time and accuracy to register with.

Of course, there are now services that promise to list you with all the major directories. For a company doing business on the web, this can be a service worth paying for. But what about the individual, who has made it a personal project to make Arizona waterskiing reports available to the web browsing public?

A second problem with directories is keeping up with changing URLs. You find a listing for the Ultimate Guide to Jesuit Science Fiction Writers. You click, only to discover the link doesn't work. Is the page gone? Moved with no forwarding address? Until there is a pobox for web pages, the plague of the broken link will continue to be visited upon us.

Another major format for finding pages is the webcrawler concept; millions of pages are indexed. You can search for words or phrases appearing in those pages. Since there are millions of pages, you can get millions of hits, even on what you think is a very limited search string. Sometimes the same page appears multiple times in your hit list, for no apparent reason. And sometimes the page you're looking for is buried in the seventeenth page of a thousand possible hits. Who has time to search through hundreds of pages with often arcane titles trying to find a tiny tidbit. It's faster to drive to the library.

Links to Links to Links

One helpful resource is often the links page. Someone interested in crochet has gone to the effort to make an extensive list of sites that offer information about the subject. Making a links page is far easier than actually providing informative content, however, so we have a new development: lists of links that list pages that are lists of links that list pages . . . . Some searches turn up a series of pages that recursively refer to one another. And of course the link to the one page of actual information is broken on all of them.

The Ultimate Guide to Absolutely Everything

One of the most basic problems to finding information, however, is that many content providers have no idea of how to organize it. You find the nice and accurate guide to Shakespeare . . . and then you spend half an hour following multiple links and making heavy use of the back button, trying to find the one answer for which you're looking. Just as in any other publication, web pages and web sites need to focus on organization: how to present the information in an intuitive, accessible way. The concept seems to get buried under the desire to have newer, better, fancier graphics and cgi scripts.

Even the most basic helpful tools are overlooked: how many pages have you found that don't take advantage of putting a title in the header? Or put something uninformative as the title? This relates directly to the use of crawler-type search engines. Search results usually show the page title. If the page title is non-existent or uninformative, the hit might be useless.

The Platonic Ideal

Certainly the search engines available on the web are constantly becoming more sophisticated, but they are short of the Platonic ideal. What we need is a reverse dictionary for web research. Enter "[random question here]?" Hit the search button, and you'll be transported to the web page with the exact answer.

In the meantime, it might be helpful to apply a more structured method to categorizing material on the web. Application of the Dewey decimal or some similar classification system would allow a common basis for classifying web pages. Content providers could stick an alt tag in the header with the relevant number.

Of course, that would require someone to establish and maintain a series of web pages breaking down the system. And the rest of us would have to be able to find it.

