We examine text retrieval strategies using the sparsified concept decomposition matrix. The centroid vector of a tightly structured text collection provides a general description of text documents in that collection. The union of the centroid vectors forms a concept matrix. The original text data matrix can be projected into the concept space spanned by the concept vectors. We propose a procedure to conduct text retrieval based on the sparsified concept decomposition (SCD) matrix. Our experimental results show that text retrieval based on SCD may enhance the retrieval accuracy and reduce the storage cost, compared with the popular text retrieval technique based on latent semantic indexing with singular value decomposition.
Mathematics Subject Classification:
Technical Report 412-04, Department of Computer Science, University of Kentucky, Lexington, KY, 2004.
The research work of the authors was supported in part by the U.S. National Science Foundation under grants CCR-0092532, and ACR-0202934, and in part by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961.