Text similarity search such as similarity search for short texts has applications in many diverse areas. It can be used for query correction, query substitution, paraphrasing, analogy detection, data cleaning, question answering, and so on. Typically, a word or a sentence are taken as an input for which one wish to return similar corresponding words or similar sentences from a large volume of candidate ones. Searching for similar texts can be carried within the same domain such as the same homogenous document collection or can be conducted across-domains. The latter case represents search over heterogeneous document collections such as in the collections of documents published at different time periods, documents originating from (or related to) diverse geographic areas/cultures or scientific domains. For example, a user may wish to find corresponding entities across time by searching within temporal document archives or analogous objects in diverse geographical spaces. Searching across different domains is particularly difficult due to multiple aspects including different vocabulary, context, relationships, etc.
In this tutorial we will begin with the general overview of methods for text similarity search within homogenous collections, their applications and importance. In particular, we will introduce effective and efficient approaches for searching similar strings, words and sentences in different kinds of text based applications. We will then explain several techniques for finding semantically corresponding terms by analysing temporal document archives and other types of heterogenous document collections. We plan also to overview recent achievements in analogy detection and analogical search.
The goal of this tutorial is to give participants an overview about different theories and techniques that are relevant in this field and demonstrate the possibilities of novel search scenarios and functionalities.