Lucene full text search

1.2 the difference between database like query and full-text retrieval
1.2.1 structured data and unstructured data

The data stored in the database is structured data, namely row data java, which can be logically expressed by two-dimensional table structure. The data that is inconvenient to be represented by the two-dimensional logical table of the database is called unstructured data, including all formats of office documents, texts, pictures, subsets of XML, HTML, various reports, images and audio / video information under the standard general markup language, etc.
Structured data: refers to data with fixed format or limited length, such as database metadata.
Unstructured data: refers to data with variable length or without fixed format, such as mail, word documents, etc.
Semi structured data: data between completely structured data (such as data in relational database and object-oriented database) and completely unstructured data (such as voice and image files). HTML and XML documents belong to semi-structured data. The structure and content of data are mixed together without obvious distinction.

1. entry
Import jar package
lucene-core-7.4.0.jar
lucene-analyzers-common-7.4.0.jar
IK-Analyzer-1.0-SNAPSHOT.jar Chinese analyzer
commons-io-2.6.jar tool class for reading files
Lucene-queryparser-7.4.0.jar jar queryparser required jar package
2. Create index

   //Select the index store location FSDirectory store in hard disk RAMDirectory store in memory
        Directory directory = FSDirectory.open(new File("").toPath());
        //Create index library configuration and use IKAnalyzer Chinese analyzer
        IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
        //Create index library
        IndexWriter writer = new IndexWriter(directory,config);

        //Read file information
        File f = new File("");
        for (File file : f.listFiles()) {
            //File path
            String path = file.getPath();
            //file name
            String name = file.getName();
            //file size
            long size = FileUtils.sizeOf(file);
            //Read the contents of the file, utf-8 encoding
            String content = FileUtils.readFileToString(file, "utf-8");
            //Put the acquired information into the domain
            Field fieldName = new TextField("name",name,Field.Store.YES);
            Field fieldPath = new TextField("path",path,Field.Store.YES);
            Field fieldSizeValue = new LongPoint("size",size);
            Field fieldSizeStore = new StoredField("size",size);
            Field fieldContent = new TextField("content",content,Field.Store.YES);
           //Save domain to document
            Document document = new Document();
            document.add(fieldName);
            document.add(fieldPath);
            document.add(fieldSizeValue);
            document.add(fieldSizeStore);
            document.add(fieldContent);
        }
        //Closed flow
        writer.close();


3. Query index

  //Read index library
        IndexReader reader = DirectoryReader.open(directory);
        //Create indexSearcher object
        IndexSearcher indexSearcher = new IndexSearcher(reader);
        /**
         * Three query methods
         * 1.Query TermQuery by term keyword
         * 2.Range query via LongPoint
         * 3.Query through QueryParser 
         */
        //Query query = new TermQuery(new Term("name","spring"));
        //Query query = LongPoint.newRangeQuery("size",0L,100L);
        QueryParser queryParser = new QueryParser("name",new IKAnalyzer());
        Query query = queryParser.parse("spring,");
        //The first is to put query objects, and the second is to query how many
        TopDocs topDocs = indexSearcher.search(query, 10);
        //topDocs.totalHits summary results
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            //Get the id of the document and query the document object
            Document document = reader.document(scoreDoc.doc);
            //Query the value of the corresponding field
            System.out.println(document.get("name"));
        }
        //Shut off flow
        reader.close();

Use of default analyzer, Chinese analyzer IKAnalyzer

  //Default analyzer
        Analyzer analyzer = new StandardAnalyzer();
        TokenStream tokenStream = analyzer.tokenStream("name", "spring mybatis springmvc");
        //Like setting a reference in TokenStream, equivalent to a pointer
        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
        tokenStream.reset();
        while (tokenStream.incrementToken()) {
            System.out.println(charTermAttribute);
        }
        analyzer.close();
23 original articles published, praised 0, 278 visitors
Private letter follow

Tags: Database Spring xml Java

Posted on Mon, 13 Jan 2020 07:07:50 -0500 by James25