[Objective] To develop a web information archive WARC file parsing and indexing system to fully tap the value of scientific and technological web site archived resources. Application Background In the field of network resource collection and archiving, WARC file format has been widely used. With the diversification of network information, the existing WARC file indexing tool is increasingly difficult to meet the user’s diverse query needs. 【Method】 Analyze WARC file with modular scheme. Analyze the more commonly used indexing tools, choose Solr platform to develop full-text indexing system. 【Result】 The content-based retrieval and access service to WARC files was realized. In the index of WARC, the content of subject classification, resource type and archive time were added to retrieve the contents of WARC files, revealing the content of WARC files from multiple dimensions. 【Conclusion】 The user is provided with a wealth of science and technology website archived data and information to improve the retrieval efficiency of the user.