In 1985 a well-known publication in 1985 reported on a retrieval experiment where lawyers searched for legal information. They found only about 20% of the potentially relevant documents. The authors concluded that full text search has serious limitations.

In a succeeding article, the publication was criticized by Gerald Salton, a professor at Cornell University. An essential aspect of the debate was about set-oriented retrieval as opposed to ranked retrieval. Today, set-oriented search term reports are still widely used in court cases while ranked output is adopted by internet search services such as Google and Bing. Internet search engines use Artificial Intelligence (AI), in particular machine learning, as the search quality is significantly better. However, modern approaches are less transparent and more difficult to verify.

To understand how such machines learn and what they eventually learned from the training samples is complex. The dependency between search quality and training material is still subject to ongoing research. On the other hand, simple search term reports are easy to understand and verify. In summary, there is a trade-off between search quality and verifiability.

Information Retrieval

The objective of Information Retrieval (IR) is to search large data collections for information relevant to a user’s information requirements. The term “information retrieval” was coined by Calvin Mooers in 1950. Like “research” the word “retrieval” does not refer to refinding something. It rather relates to the information retrieval paradox: “If I knew what I was searching for, I wouldn’t be searching for it.”

Information retrieval is focuses on three dimensions: systems and applications, theory and models, evaluation. Various retrieval models exist, such as Vector Space Model (VSM) and probabilistic and language models. For evaluatio,n recall and precision are often used. SMART was an early retrieval system that dealt with all three aspects. RankBrain is a more recent retrieval system based on TensorFlow.

WebGND

The Integrated Authority File (German: Gemein­same Norm­datei or GND) is an inter­national authority file used and maintained by the German National Library (German: Deutsche National­bibliothek or DNB), all German-language library associations, the Zeit­schrift­en­daten­bank (ZDB) and many other insti­tutions. WebGND is an online application that supports navigation and search within this large database which consists of more than 11 million records covering personal names, corporate names, meeting names, geographic names, topical terms and uniform work titles.

Eurospider Information Technology AG
Schaffhauserstrasse 18
8006 Zürich