The World Wide Web has grown to be a primary source of information for millions of people. Due to the size of the Web, search engines have become the major access point for this information. However, "commercial" search engines use hidden algorithms that put the integrity of their results in doubt, collect user data that raises privacy concerns, and target the general public thus fail to serve the needs of specific search users. Open source search, like open source operating systems, offers alternatives. The goal of the Open Source Information Retrieval Workshop (OSIR) is to bring together practitioners developing open source search technologies in the context of a premier IR research conference to share their recent advances, and to coordinate their strategy and research plans. The intent is to foster community-based development, to promote distribution of transparent Web search tools, and to strengthen the interaction with the research community in IR. A workshop about Open Source Web Information Retrieval was held last year in Compigne, France as part of WI 2005. The focus of this worksop is broadened to the whole open source information retrieval community. We want to thank all the authors of the submitted papers, the members of the program committee:, and the several reviewers whose contributions have resulted in these high quality proceedings. ABSTRACT There has been a resurgence of interest in index maintenance (or incremental indexing) in the academic community in the last three years. Most of this work focuses on how to build indexes as quickly as possible, given the need to run queries during the build process. This work is based on a different set of assumptions than previous work. First, we focus on latency instead of through-put. We focus on reducing index latency (the amount of time between when a new document is available to be indexed and when it is available to be queried) and query latency (the amount of time that an incoming query must wait because of index processing). Additionally, we assume that users are unwilling to tune parameters to make the system more efficient. We show how this set of assumptions has driven the development of the Indri index maintenance strategy, and describe the details of our implementation.
translated by 谷歌翻译