Archive4J is an archive engine for large document collections written in Java, i.e. a set of algorithmic tools and implementations that make it possible to build a direct index of a document collection. In particular, for each document some basic data can be recovered, such as the length of the document in words, the list of distinct terms appearing in the document, and the number of occurrences of each term in the document (the count). Goals include a very high compression rate and very fast random access. To obtain this result, Archive4J combines techniques typical of search engines with succinct data structures.
|Tags||Software Development Libraries Java Libraries|
No changes have been submitted for this release.