QUICK! What is 18 USC 1030? The answer is out there on the web. How do you
find it?
You use a bookmark to get to Yahoo. You
type the subject you're looking for into the field, you hit the search key
. . . and come up with no matches. You try the Altavista search
option. You come up with 37,614 matches. The first twenty-five listed seem
to bear no relation whatsoever to the information you're seeking.
The WWW is a huge place, chock full of data. The problem is finding
information.
Search Engines: Directories and Crawlers
One of the major formats for research on the web is the directory. The
weakness of this format is that the directories usually rely on reporting: if you want your pages
listed in the directory, you have to tell the directory where they are and
what they contain. The directories index the listings, but aren't yet
equipped to search actively for material to index.
This leads to two problems. The first is getting people to list their
sites. Not all content providers know that they should be listing their
sites with the directories. How often have you wondered why you can't find
a database that someone mentioned she saw on the web? A related issue is
deciding which directories you should list with. There is a baffling array, which have forms that
take time and accuracy to register with.
Of course, there are now services that promise to list you with all the
major directories. For a company doing business on the web, this can be a
service worth paying
for. But what about the individual, who has made it a personal project
to make Arizona
waterskiing reports available to the web browsing public?
A second problem with directories is keeping up with changing URLs. You
find a listing for the Ultimate Guide to Jesuit Science Fiction Writers.
You click, only to discover the link doesn't work. Is the page gone? Moved
with no forwarding address? Until there is a pobox for web pages, the plague of the broken
link will continue to be visited upon us.
Another major format for finding pages is the webcrawler concept; millions of pages are
indexed. You can search for words or phrases appearing in those pages.
Since there are millions of pages, you can get millions of hits, even on
what you think is a very limited search string. Sometimes the same page
appears multiple times in your hit list, for no apparent reason. And
sometimes the page you're looking for is buried in the seventeenth page of
a thousand possible hits. Who has time to search through hundreds of pages
with often arcane titles trying to find a tiny tidbit. It's faster to drive
to the library.
Links to Links to Links
One helpful resource is often the links page. Someone interested in crochet
has gone to the effort to make an extensive list of sites that offer
information about the subject. Making a links page is far easier than
actually providing informative content, however, so we have a new
development: lists of links that list pages that are lists of links that
list pages . . . . Some searches turn up a series of pages that recursively
refer to one another. And of course the link to the one page of actual
information is broken on all of them.
The Ultimate Guide to Absolutely Everything
One of the most basic problems to finding information, however, is that
many content providers have no idea of how to organize it. You find the
nice and accurate guide to Shakespeare . .
. and then you spend half an hour following multiple links and making heavy
use of the back button, trying to find the one answer for which you're
looking. Just as in any other publication, web pages and web sites need to
focus on organization: how to present the information in an intuitive,
accessible way. The concept seems to get buried under the desire to have
newer, better, fancier graphics and cgi scripts.
Even the most basic helpful tools are overlooked: how many pages have you
found that don't take
advantage of putting a title in the header? Or put something uninformative
as the title? This relates directly to the use of crawler-type search
engines. Search results usually show the page title. If the page title is
non-existent or uninformative, the hit might be useless.
The Platonic Ideal
Certainly the search engines available on the web are constantly becoming
more sophisticated, but they are short of the Platonic ideal. What we need
is a reverse dictionary for web
research. Enter "[random question here]?" Hit the search button, and
you'll be transported to the web page with the exact answer.
In the meantime, it might be helpful to apply a more structured method to
categorizing material on the web. Application of the Dewey decimal or some
similar classification system would allow a common basis for classifying
web pages. Content providers could stick an alt tag in the header with the
relevant number.
Of course, that would require someone to establish and maintain a series of
web pages breaking down the system. And the rest of us would have to be
able to
find it.