“Explicit” Searching

Every once in awhile I like to get down in the weeds of library geekdom, this is one of those days. I recently was pondering how index-searching functionality in many commercial information databases is flawed, and how this impacts user experience. There are often broad disparities in how vendors implement Boolean logic in advanced searching, which can make it quite difficult for users to effectively use their products.    Wouldn’t it be great if all vendors used the same basic search construction functionality?

I’m going to use a recent example of strange search behavior in one of our commercial products a colleague reported to illustrate my point, as I agree with her that the way the system behaves isn’t helpful.

The search she was constructing was:

postshot1

Using the old-fashioned Boolean logic and syntax I learned years ago, the clear intent for the search is:

Subject = Reconstruction (U.S. history, 1865-1877) AND (subject=sources OR subject=diaries OR subject=personal narratives)

To spell it out, the system was expected to find all items that had any of the subject keywords sources, diaries or personal narratives, and then retrieve all items that were in that set and also had the subject phrase= Reconstruction (U.S. history, 1865-1877), resulting in a fairly specific an manageable results set.

Now I believe that my colleague’s search was intelligently and reasonably constructed, and in many databases it would retrieve the desired set, but in this case it doesn’t retrieve anything near like what she intended.   What appears to happen is that the search behaves like the following.

Subject =(Reconstruction (U.S. history, 1865-1877) AND subject=sources) OR subject=diaries OR subject=personal narratives

Spelling it out, the system retrieves all the items with that have subject =Reconstruction… AND subject =sources, a fairly small response set but then adds all items that have either subject= diaries OR subject = personal narratives, a very large result set, and clearly not the intent.

So what could my colleague and I have done differently?

In the advanced search I would most likely have done this:

post shot 2

First, capitalize the Boolean operator “OR” to be clear that is what it is, and not a search term. This still doesn’t work; as it doesn’t respect what I believe should be an implied parentheses on the second command line, but it does help.   If we do explicitly add the parenthesis on the second line it does work

post shot3

with the search syntax looking much like the below:

Subject = Reconstruction (U.S. history, 1865-1877) AND subject=(sources OR subject=diaries OR subject=personal narratives)

This construction also works in the “simple search” box with the appropriate index identifiers.

How do I think advance searches like the above should work across systems and thus could be efficiently and effectively taught?

  • Require that Boolean Operators be capitalized. This removes ambiguity
  • Terms with Boolean Operators between them on the same line should be treated as if they have parenthesis around them
  • If a searcher enters parenthesis next to a Boolean operator it should be respected.
  • Vendors should always make the correct index abbreviations for search fields easy to discover for searchers who want to construct explicit search strings on their own.

So, what do you think?? Would this kind of consistency across commercial product hinder or help the user experience?