Thursday, June 11, 2009

"It's too hard", Part I: Access

This morning I was thinking a lot about my current profession. I have a lot of ambivalence about librarianship in the 21st century, and I’m sure I’m being kind when I say that. I started thinking about exactly what’s wrong with the profession today. I realized that I can summarize the whole complex mess with three words: “It’s too hard.”

Librarianship has two principle components: organization/access, and editorial process. The former is traditionally associated with cataloging, technical services and systems. The latter is associated with reference services and collection development. However, there is not a strict separation—both components need to be integrated to have a successful library environment.

Let’s look at the first one. Organizing information into somewhat meaningful categories is the backbone of what we do, even if the taxonomies used are highly imperfect. Over the years, standards of organization have been developed that are very complex—in the United States we use AACR2 and LCRI for description, LCSH for subject access, LCC or DDC for classification, and MARC format to make the records accessible in an inventory system.

This system worked great for books, but the information world is no longer about just books, or even about physical items that you can check out. A lot of material is now digital and virtual, and this presents a challenge in an organization system reliant on describing physical items. But we still have many, many physical items in our libraries, so we shouldn’t throw out the baby with the bathwater. Unfortunately, “it’s too hard”.

We already have massive amounts of data in the form of MARC records—bibliographic data and authority data. For the non-librarian—bibliographic data is what you see in your catalog when you search for an item. It describes the book/DVD/CD, whatever. Within that bibliographic data is something called “authority controlled” data—author names and subject headings in particular (yes, some titles too, but I’m not going there right now). There is a separate unseen authority record for every author, every subject in your catalog. Those records not only contain the chosen “authoritative” form of the name or subject, but all other seen variations—and in the case of subjects, related terms, broader terms, and narrower terms. The idea is that one heading is chosen as the one listed, and if you search for something different, a message will pop up telling you to “see” the authoritative heading, or to suggest other possible headings. Sounds good, yes?

Well, it should be, but no. Library catalog systems have not been designed to make good use of authority data. Keyword searches only search the bibliographic record. So, if you haven’t used the terms in the bibliographic record, you’re “SOL” if you stop with a keyword search. The only way many systems allow you to access the authority records is to do a “browse” search—if you do a subject browse, you will then get the benefit of those extra messages if you type in the wrong thing.
There are many problems with traditional catalog searching, but I won’t go into detail about that here. Recently, integrated library system vendors have been trying the “faceted” search approach, which does allow keyword searches to access the authority data. This is a big improvement, and I do hope libraries implement faceted search interfaces when they become available.

However, we still have another problem—library users don’t search the library catalog. Yes, I know, it’s a generalization, but I’ve seen enough surveys of library database usage to know that the catalog is the last place people look, especially if their past experience has been trying to navigate a crappy Boolean/BRS search engine. In university libraries, I’ve seen another problematic trend with respect to the catalog—students don’t want to use books, or at least they want the bulk of their information from online resources. And who can blame them? It’s easier to get things online, and in certain fields, the information in books is out of date by the time the book is printed and distributed.

Still, there is a lot of good information in books that is missed, so libraries want users to use the catalog, and they want them to check out or at least look at the book sources. Ideally, it would be nice to have a federated search that allows you to search the catalog AND the electronic resources at the same time, and give you one result set. But wait!, you say—such products already exist! Indeed they do. And they are highly inadequate, because of yet another problem—the searching protocol problem. Try this: go to your local library, and look up something in the catalog. Look up anything you want, though a more complicated search makes it more fun. Keep your result list—print it out, whatever. Now go to your library catalog via another site--
Did you get a list of items? Good. Save that, or print it out. Now—go home and go to a neighboring library’s site that offers access to your catalog. It could be through another County library system (e.g., if you are in New Jersey, search catalog.mainlib.org in Morris County, then search the same catalog via Passaic County or Bergen County’s site). Do the same search you did in your library and look at the list of items. Compare it to the first list. Are they the same? I’m betting they’re not, especially if your search was more complex.

This is because of the Z39.50 protocol. When you use a search protocol that allows the different library servers to “talk” to each other, the search is not executed by the system in the same way. And that’s just in a regular library catalog—try putting an Ebsco or Proquest database into the mix. Their searching is entirely different.
The bottom line is that one federated search will often give you disparate and confusing results. A standard, consistent search that would give you relevant results across different platforms has not been effectively developed.

The point I’m trying to make is that we have very good data, and for all of our technological developments, we don’t seem to be able to design library systems that make good use of this data in a simple, user-friendly way. Why not? Apparently, “it’s too hard”.

Instead of treating this as a programming/technology issue, the library profession has attempted to rewrite the rules. There are huge flowcharts about “relationships” between different types of “entities” that looks like a huge, confusing Peyton Place of data, only not nearly as interesting. Ever practical, the Tech Services staff and librarians are responding by saying. “Nice flowchart. So what do we do with it?” Over the last 4 years of workshops, conferences, and lectures, I’ve not yet heard a coherent answer to that question. Two big mistakes are being made—letting academics drive the rule-making, and allowing the technology to compromise basic organizational principles. Programmers don’t understand the principles behind the library data structures, and rather than try to learn them, they’d rather we dropped all that “fancy” stuff and made it simpler. The academics go along with this, because I think they somehow believe that the idea of traditional librarianship is obsolete, and that they have to “get with the times”. All I can say to that is—librarianship is still librarianship regardless of the technology. The fact that a lot of things are “born digital” now shouldn’t be making us scramble to change our standards. Digital object metadata is not useful for physical items by itself, and shouldn’t be treated as such—just as the converse is also true. But we don’t demand that the technology accommodate us because we’re told “it’s too hard”.

Part II tomorrow...

No comments: