How Transparent and Verifiable is Artificial Intelligence (AI)?

In 1985 a well-known publication in 1985 reported on a retrieval experiment where lawyers searched for legal information. They found only about 20% of the potentially relevant documents. The authors concluded that full text search has serious limitations.

In a succeeding article, the publication was criticized by Gerald Salton, a professor at Cornell University. An essential aspect of the debate was about set-oriented retrieval as opposed to ranked retrieval. Today, set-oriented search term reports are still widely used in court cases while ranked output is adopted by internet search services such as Google and Bing. Internet search engines use Artificial Intelligence (AI), in particular machine learning, as the search quality is significantly better. However, modern approaches are less transparent and more difficult to verify.

To understand how such machines learn and what they eventually learned from the training samples is complex. The dependency between search quality and training material is still subject to ongoing research. On the other hand, simple search term reports are easy to understand and verify. In summary, there is a trade-off between search quality and verifiability.

Back to Knowledge Management

Information Retrieval

The objective of Information Retrieval (IR) is to search large data collections for information relevant to a user’s information requirements. The term “information retrieval” was coined by Calvin Mooers in 1950. Like “research” the word “retrieval” does not refer to refinding something. It rather relates to the information retrieval paradox: “If I knew what I was searching for, I wouldn’t be searching for it.”

Information retrieval is focuses on three dimensions: systems and applications, theory and models, evaluation. Various retrieval models exist, such as Vector Space Model (VSM) and probabilistic and language models. For evaluatio,n recall and precision are often used. SMART was an early retrieval system that dealt with all three aspects. RankBrain is a more recent retrieval system based on TensorFlow.

WebGND

The Integrated Authority File (German: Gemeinsame Normdatei or GND) is an international authority file used and maintained by the German National Library (German: Deutsche Nationalbibliothek or DNB), all German-language library associations, the Zeitschriftendatenbank (ZDB) and many other institutions. WebGND is an online application that supports navigation and search within this large database which consists of more than 11 million records covering personal names, corporate names, meeting names, geographic names, topical terms and uniform work titles.

Go to WebGND

Compliance

Media Analysis

Knowledge Managament