Novel reader implemented by python

Catalog

brief introduction

Implementation process

epilogue

brief introduction

In this paper, a novel reader is developed by using python language. The contents of all chapters are captured by the novel book number and saved on the computer. At the same time, the contents of corresponding chapters can also be read by the reader;

Preview effect: according to the filled in novel number, there are two ways to display the captured novel content;

Development environment: Windows7+python3.7+pycharm2018.2.4 (development tools);

Directory structure:

Tips: I hope you don't grab too much data at one time in the process of practice, causing too much pressure on the server environment.

Implementation process

1, Reader UI design

1. To install the required third-party modules PyQt5 and PyQt5 tools (file settings), you can directly use "+" on the right to install them. If you cannot install them, you can use "pip install XXX" on the command interface to install them (note that pycharm2018 version is used);

2. Configuration tools QtDesigner (designer) and pyUIC (convert to py code, Arguments set "$filename $- o $filenamewithoutextension $. Py");

3. After running the tool QtDesigner (Figure 1), use the QtDesigner toolbox to design the interface effect of Figure 2 (the required control can be viewed in the right area), and the saving effect is the file fiction menu reader.ui;

4. execute pyUIC (UI is converted into py code) on the file fiction reader.ui, and generate the file fiction reader.py after execution;

2, Code design

1. Add built-in module (used in the following code) and main method (used for pop-up reader after running);

# Add code
from PyQt5.QtWidgets import QMessageBox, QFileDialog
import os
import sys
import requests
import re
# Main method (add code)
if __name__ == '__main__':
    app = QtWidgets.QApplication(sys.argv)
    MainWindow = QtWidgets.QMainWindow()  # Create form object
    ui = Ui_MainWindow()  # Create a PyQt designed form object
    ui.setupUi(MainWindow)  # Call the method of PyQt form to initialize the form object
    MainWindow.show()  # Display Form
    sys.exit(app.exec_())  # Exit process on program shutdown

2. Function setupUi, add code (Figure 1) to modify the first table display with two columns (list display); add code (Figure 2) to modify the second table display mode (Chart Display), use setViewMode to set the chart display mode, and the number 405 is the width of table;

self.tableWidget.setColumnCount(2)  # Change to two columns
self.tableWidget.setRowCount(0)
# Add code (the first tab is divided into two columns)
item = QtWidgets.QTableWidgetItem()
self.tableWidget.setHorizontalHeaderItem(0, item)
item = QtWidgets.QTableWidgetItem()
self.tableWidget.setHorizontalHeaderItem(1, item)
self.tableWidget.setColumnWidth(0, 130)  # Set first column width
self.tableWidget.horizontalHeader().setStretchLastSection(True)  # Set autofill container
self.tableWidget.setVerticalScrollBarPolicy(QtCore.Qt.ScrollBarAlwaysOn)  # Vertical scroll bar
# Add code
self.listWidget.setViewMode(QtWidgets.QListView.IconMode) # Icon format display
self.listWidget.setIconSize(QtCore.QSize(50, 50))  # Icon size
self.listWidget.setMaximumWidth(405)  # Maximum width
self.listWidget.setSpacing(15)  # Spacing size
self.listWidget.setVerticalScrollBarPolicy(QtCore.Qt.ScrollBarAlwaysOn)  # Vertical scroll bar

3. Modify retranslateUi;

Note: self.lineEdit.setText is used to set the default value of the book number of the novel. Self.lineedit ﹣ 2.settext is used to set the save path as the file of the current path, self.pushButton.clicked.connect is the binding event of the select button (click Select to pop up the computer selection window), self.pushbutton ﹣ 2.clicked.connect is used to click OK to start obtaining data;

def retranslateUi(self, MainWindow):
    _translate = QtCore.QCoreApplication.translate
    MainWindow.setWindowTitle(_translate("MainWindow", "reader"))
    self.groupBox.setTitle(_translate("MainWindow", "Grab settings"))
    self.label.setText(_translate("MainWindow", "Please fill in the novel number:"))
    # Add code (set default book number)
    book_number = '5_5871'
    self.lineEdit.setText(_translate("MainWindow", book_number))  # Set default book number

    self.label_2.setText(_translate("MainWindow", "Please select a save path:"))
    # Add code (set the default path to the file folder under the current program path)
    self.lineEdit_2.setText(_translate("MainWindow", os.getcwd() + '\\file'))

    self.label_3.setText(_translate("MainWindow", "(For example, 5_5871)"))
    self.pushButton.setText(_translate("MainWindow", "Choice"))
    self.pushButton_2.setText(_translate("MainWindow", "Determine"))
    self.tabWidget.setTabText(self.tabWidget.indexOf(self.tab), _translate("MainWindow", "List display"))
    self.tabWidget.setTabText(self.tabWidget.indexOf(self.tab_2), _translate("MainWindow", "Chart display"))
    # Add code (set list title)
    item = self.tableWidget.horizontalHeaderItem(0)  # Get the first column of the table
    item.setText(_translate("MainWindow", "Book number"))  # Set the title of the first column of the table
    item = self.tableWidget.horizontalHeaderItem(1)  # Get the second column of the table
    item.setText(_translate("MainWindow", "Name"))  # Set the title of the second column of the table

    self.pushButton.clicked.connect(self.msg)  # Binding events for selection buttons
    self.pushButton_2.clicked.connect(self.getDatas)  # Click OK to get the data

4. Realize the function of selecting and saving path, and define the function msg;

Note: os.getcwd() is used to pop up the selection window to this path by default. Self. Lineedit? 2. Settext displays the selected path;

def msg(self):
    try:
        # Dir? Path is the absolute path of the selected folder, the second parameter is the dialog box title, and the third is the default path after the dialog box is opened
        self.dir_path = QFileDialog.getExistingDirectory(None, "Selection path", os.getcwd())
        self.lineEdit_2.setText(self.dir_path)  # Display the selected save path
    except Exception as e:
        print(e)

5. Analyze the principle of grabbing data, first get the web address information of the chapters on the front page of the novel, then cycle these web addresses to get the content of the corresponding chapters, and save it locally;

Notes:

Package function urlTotext, get the web page data according to the incoming URL, and pay attention to the response.encoding to set the encoding method to grab the web site, otherwise it will display garbled code;

Encapsulate the getData function, obtain the content of the number of chapters under the corresponding web address according to the obtained web address, and save it locally:

1) check the source code of the first page of the novel. 2. It is found that the chapter number websites a re all under the < div id = "list" >;

2) check the source code of the front page of the novel, and it can be seen that the first eight chapters are the latest part. In order to filter out the use of for item in links[8:20], the cycle starts from 8;

3) serial Ou number = item [0: - 5] to obtain the number of the web address, which is used to sort and display the number of chapters;

4) check the source code of chapter number of the novel (Figure 3). It is found that the content is all under the < div id = "content" >;

The GetData function is used to grab all data, save it locally, and then display it on the reader;

# Grab all data
def getDatas(self):
    try:
        try:
            while True:  # Infinite loop (execute this to crawl through the display)
                self.book_number = self.lineEdit.text()  # Record the book number set by the user
                self.baseurl = 'https://Www.booktxt.net / '+ self.book_number +' / 'ාset Book initial address
                self.getData(self.baseurl, self.lineEdit_2.text())  # Execute master method
        except Exception:
            pass
        self.getFiles()  # Get all files
        self.bindList()  # Bind list
        self.bindTable()  # Bind table
        self.listWidget.itemClicked.connect(self.itemClick)  # Bind list click method
        self.tableWidget.itemClicked.connect(self.tableClick)  # Bind table click method
    except Exception:
        QMessageBox.warning(None, "warning", "No data, please reset the book number", QMessageBox.Ok)
        return

# Grab data
def getData(self, url, path):
    html = self.urlTotext(url)
    dl = re.findall(r'id="list".*?</dl>', html, re.S)[0]
    links = re.findall(r'<a href="(.*?)">', dl)
    path = path + "\\" + self.book_number + "\\"  # Set article storage path
    if not os.path.isdir(path):  # Determine whether the path exists
        os.mkdir(path)  # Create path
    for item in links[8:20]:  # Traverse article list
        # print(item)
        serial_number = item[0:-5]
        print(serial_number)
        articleUrl = self.baseurl + item  # Get the specific article address traversed
        articleHtml = self.urlTotext(articleUrl)
        # Extract chapter content
        article_content = re.findall(r'id="content">(.*?)</div>', articleHtml, re.S)[0]
        # Filter out the space characters, line breaks, etc
        article_content = article_content.replace('<br /><br />', '')
        article_content = article_content.replace('</br>', '')
        article_content = article_content.replace('&nbsp;', '')

        title = re.findall(r'<h1>(.*?)</h1>', articleHtml, re.S)[0]  # Get article title
        fileName = path + serial_number + title + '.txt'  # Set article saving path (including article name)
        newFile = open(fileName, "w")  # Open or create a file
        newFile.write("<<" + title + ">>\n\n")  # Write title to file and wrap
        newFile.write(article_content)  # Write content to file
        newFile.close()  # Close file
    QMessageBox.Information(None, "Tips", self.book_number + "'s novel is saved", QMessageBox.Ok)

# Extract data from web pages
def urlTotext(self, url):
    response = requests.get(url)
    # Coding mode
    response.encoding = 'gbk'
    html = response.text
    return html

6. Realize the function of getting all local files and define the function getFiles;

Note: sorting by sorted is helpful for readers to read according to the number of chapters;

def getFiles(self):
    self.list = os.listdir(self.lineEdit_2.text() + '\\' + self.lineEdit.text())  # List all directories and files under the folder
    self.list = sorted(self.list) # sort

7. Display the file to the first table, and click the corresponding chapter number txt to read;

Note: the first column displays the book number content self.lineEdit.text(), and the second column displays the chapter number Title self.list[i]; if 'txt' in item.text() solves the problem of clicking the book number to exit the reader;

# Show files in Table (list display)
def bindTable(self):
    for i in range(0, len(self.list)):  # Traverse file list
        self.tableWidget.insertRow(i)  # Add new line
        # Set the value of the first column as book number
        self.tableWidget.setItem(i, 0, QtWidgets.QTableWidgetItem(self.lineEdit.text()))
        # Set the value of the second column to file name
        self.tableWidget.setItem(i, 1, QtWidgets.QTableWidgetItem(self.list[i]))

# Table click method to open the selected item
def tableClick(self, item):
    if 'txt' in item.text(): # Click file name to pop up
        os.startfile(self.lineEdit_2.text() + '\\' + self.lineEdit.text() + '\\' + item.text())

8. Display the file to the second table, and click the corresponding chapter number txt to read;

Note: self.list[i])[7:13] in order not to display the serial number before the chapter number;

# Display files in the List (Chart Display)
def bindList(self):
    for i in range(0, len(self.list)):  # Traverse file list
        self.item = QtWidgets.QListWidgetItem(self.listWidget)  # Create list item
        self.item.setIcon(QtGui.QIcon('images/fiction.png'))  # Set list item icon
        self.item.setText(str(self.list[i])[7:13] + '...')  # Truncation string (no sequence number)
        self.item.setToolTip(self.list[i])  # Set prompt text
        self.item.setFlags(QtCore.Qt.ItemIsSelectable | QtCore.Qt.ItemIsEnabled)  # Set check or not

# List click method to open the selected item
def itemClick(self, item):
    os.startfile(self.lineEdit_2.text() + '\\' + self.lineEdit.text() + '\\' + item.toolTip())

9. The final code is as follows:

# -*- coding: utf-8 -*-

# Form implementation generated from reading ui file 'fiction_reader.ui'
#
# Created by: PyQt5 UI code generator 5.13.0
#
# WARNING! All changes made in this file will be lost!


from PyQt5 import QtCore, QtGui, QtWidgets
# Add code
from PyQt5.QtWidgets import QMessageBox, QFileDialog
import os
import sys
import requests
import re

class Ui_MainWindow(object):
    def setupUi(self, MainWindow):
        MainWindow.setObjectName("MainWindow")
        MainWindow.resize(500, 480)
        self.centralwidget = QtWidgets.QWidget(MainWindow)
        self.centralwidget.setObjectName("centralwidget")
        self.groupBox = QtWidgets.QGroupBox(self.centralwidget)
        self.groupBox.setGeometry(QtCore.QRect(39, 20, 421, 131))
        self.groupBox.setObjectName("groupBox")
        self.label = QtWidgets.QLabel(self.groupBox)
        self.label.setGeometry(QtCore.QRect(20, 36, 101, 16))
        self.label.setObjectName("label")
        self.label_2 = QtWidgets.QLabel(self.groupBox)
        self.label_2.setGeometry(QtCore.QRect(20, 86, 101, 16))
        self.label_2.setObjectName("label_2")
        self.label_3 = QtWidgets.QLabel(self.groupBox)
        self.label_3.setGeometry(QtCore.QRect(282, 36, 101, 20))
        self.label_3.setObjectName("label_3")
        self.lineEdit = QtWidgets.QLineEdit(self.groupBox)
        self.lineEdit.setGeometry(QtCore.QRect(120, 31, 161, 28))
        self.lineEdit.setObjectName("lineEdit")
        self.lineEdit_2 = QtWidgets.QLineEdit(self.groupBox)
        self.lineEdit_2.setGeometry(QtCore.QRect(120, 81, 161, 28))
        self.lineEdit_2.setObjectName("lineEdit_2")
        self.pushButton = QtWidgets.QPushButton(self.groupBox)
        self.pushButton.setGeometry(QtCore.QRect(288, 83, 51, 23))
        self.pushButton.setObjectName("pushButton")
        self.pushButton_2 = QtWidgets.QPushButton(self.groupBox)
        self.pushButton_2.setGeometry(QtCore.QRect(350, 83, 51, 23))
        self.pushButton_2.setObjectName("pushButton_2")
        self.tabWidget = QtWidgets.QTabWidget(self.centralwidget)
        self.tabWidget.setGeometry(QtCore.QRect(39, 175, 421, 231))
        self.tabWidget.setObjectName("tabWidget")
        self.tab = QtWidgets.QWidget()
        self.tab.setObjectName("tab")
        self.tableWidget = QtWidgets.QTableWidget(self.tab)
        self.tableWidget.setGeometry(QtCore.QRect(5, 5, 405, 197))
        self.tableWidget.setObjectName("tableWidget")
        self.tableWidget.setColumnCount(2)  # Change to two columns
        self.tableWidget.setRowCount(0)
        # Add code (the first tab is divided into two columns)
        item = QtWidgets.QTableWidgetItem()
        self.tableWidget.setHorizontalHeaderItem(0, item)
        item = QtWidgets.QTableWidgetItem()
        self.tableWidget.setHorizontalHeaderItem(1, item)
        self.tableWidget.setColumnWidth(0, 130)  # Set first column width
        self.tableWidget.horizontalHeader().setStretchLastSection(True)  # Set autofill container
        self.tableWidget.setVerticalScrollBarPolicy(QtCore.Qt.ScrollBarAlwaysOn)  # Vertical scroll bar

        self.tabWidget.addTab(self.tab, "")
        self.tab_2 = QtWidgets.QWidget()
        self.tab_2.setObjectName("tab_2")
        self.listWidget = QtWidgets.QListWidget(self.tab_2)
        self.listWidget.setGeometry(QtCore.QRect(5, 5, 405, 197))
        self.listWidget.setObjectName("listWidget")
        # Add code
        self.listWidget.setViewMode(QtWidgets.QListView.IconMode) # Icon format display
        self.listWidget.setIconSize(QtCore.QSize(50, 50))  # Icon size
        self.listWidget.setMaximumWidth(405)  # Maximum width
        self.listWidget.setSpacing(15)  # Spacing size
        self.listWidget.setVerticalScrollBarPolicy(QtCore.Qt.ScrollBarAlwaysOn)  # Vertical scroll bar

        self.tabWidget.addTab(self.tab_2, "")
        MainWindow.setCentralWidget(self.centralwidget)
        self.menubar = QtWidgets.QMenuBar(MainWindow)
        self.menubar.setGeometry(QtCore.QRect(0, 0, 500, 23))
        self.menubar.setObjectName("menubar")
        MainWindow.setMenuBar(self.menubar)
        self.statusbar = QtWidgets.QStatusBar(MainWindow)
        self.statusbar.setObjectName("statusbar")
        MainWindow.setStatusBar(self.statusbar)

        self.retranslateUi(MainWindow)
        self.tabWidget.setCurrentIndex(0)
        QtCore.QMetaObject.connectSlotsByName(MainWindow)

    def retranslateUi(self, MainWindow):
        _translate = QtCore.QCoreApplication.translate
        MainWindow.setWindowTitle(_translate("MainWindow", "reader"))
        self.groupBox.setTitle(_translate("MainWindow", "Grab settings"))
        self.label.setText(_translate("MainWindow", "Please fill in the novel number:"))
        # Add code (set default book number)
        book_number = '5_5871'
        self.lineEdit.setText(_translate("MainWindow", book_number))  # Set default book number

        self.label_2.setText(_translate("MainWindow", "Please select a save path:"))
        # Add code (set the default path to the file folder under the current program path)
        self.lineEdit_2.setText(_translate("MainWindow", os.getcwd() + '\\file'))

        self.label_3.setText(_translate("MainWindow", "(For example, 5_5871)"))
        self.pushButton.setText(_translate("MainWindow", "Choice"))
        self.pushButton_2.setText(_translate("MainWindow", "Determine"))
        self.tabWidget.setTabText(self.tabWidget.indexOf(self.tab), _translate("MainWindow", "List display"))
        self.tabWidget.setTabText(self.tabWidget.indexOf(self.tab_2), _translate("MainWindow", "Chart display"))
        # Add code (set list title)
        item = self.tableWidget.horizontalHeaderItem(0)  # Get the first column of the table
        item.setText(_translate("MainWindow", "Book number"))  # Set the title of the first column of the table
        item = self.tableWidget.horizontalHeaderItem(1)  # Get the second column of the table
        item.setText(_translate("MainWindow", "Name"))  # Set the title of the second column of the table

        self.pushButton.clicked.connect(self.msg)  # Binding events for selection buttons
        self.pushButton_2.clicked.connect(self.getDatas)  # Click OK to get the data

    # Add code (select save path)
    def msg(self):
        try:
            # Dir? Path is the absolute path of the selected folder, the second parameter is the dialog box title, and the third is the default path after the dialog box is opened
            self.dir_path = QFileDialog.getExistingDirectory(None, "Selection path", os.getcwd())
            self.lineEdit_2.setText(self.dir_path)  # Display the selected save path
        except Exception as e:
            print(e)

    # Grab all data
    def getDatas(self):
        try:
            try:
                while True:  # Infinite loop (execute this to crawl through the display)
                    self.book_number = self.lineEdit.text()  # Record the book number set by the user
                    self.baseurl = 'https://Www.booktxt.net / '+ self.book_number +' / 'ාset Book initial address
                    self.getData(self.baseurl, self.lineEdit_2.text())  # Execute master method
            except Exception:
                pass
            self.getFiles()  # Get all files
            self.bindList()  # Bind list
            self.bindTable()  # Bind table
            self.listWidget.itemClicked.connect(self.itemClick)  # Bind list click method
            self.tableWidget.itemClicked.connect(self.tableClick)  # Bind table click method
        except Exception:
            QMessageBox.warning(None, "warning", "No data, please reset the book number", QMessageBox.Ok)
            return

    # Grab data
    def getData(self, url, path):
        html = self.urlTotext(url)
        dl = re.findall(r'id="list".*?</dl>', html, re.S)[0]
        links = re.findall(r'<a href="(.*?)">', dl)
        path = path + "\\" + self.book_number + "\\"  # Set article storage path
        if not os.path.isdir(path):  # Determine whether the path exists
            os.mkdir(path)  # Create path
        for item in links[8:20]:  # Traverse article list
            # print(item)
            serial_number = item[0:-5]
            print(serial_number)
            articleUrl = self.baseurl + item  # Get the specific article address traversed
            articleHtml = self.urlTotext(articleUrl)
            # Extract chapter content
            article_content = re.findall(r'id="content">(.*?)</div>', articleHtml, re.S)[0]
            # Filter out the space characters, line breaks, etc
            article_content = article_content.replace('<br /><br />', '')
            article_content = article_content.replace('</br>', '')
            article_content = article_content.replace('&nbsp;', '')

            title = re.findall(r'<h1>(.*?)</h1>', articleHtml, re.S)[0]  # Get article title
            fileName = path + serial_number + title + '.txt'  # Set article saving path (including article name)
            newFile = open(fileName, "w")  # Open or create a file
            newFile.write("<<" + title + ">>\n\n")  # Write title to file and wrap
            newFile.write(article_content)  # Write content to file
            newFile.close()  # Close file
        QMessageBox.Information(None, "Tips", self.book_number + "'s novel is saved", QMessageBox.Ok)

    # Extract data from web pages
    def urlTotext(self, url):
        response = requests.get(url)
        # Coding mode
        response.encoding = 'gbk'
        html = response.text
        return html

    # Get all files
    def getFiles(self):
        self.list = os.listdir(self.lineEdit_2.text() + '\\' + self.lineEdit.text())  # List all directories and files under the folder
        self.list = sorted(self.list) # sort
        print(self.list)

    # Show files in Table (list display)
    def bindTable(self):
        for i in range(0, len(self.list)):  # Traverse file list
            self.tableWidget.insertRow(i)  # Add new line
            # Set the value of the first column as book number
            self.tableWidget.setItem(i, 0, QtWidgets.QTableWidgetItem(self.lineEdit.text()))
            # Set the value of the second column to file name
            self.tableWidget.setItem(i, 1, QtWidgets.QTableWidgetItem(self.list[i]))

    # Table click method to open the selected item
    def tableClick(self, item):
        if 'txt' in item.text(): # Click file name to pop up
            os.startfile(self.lineEdit_2.text() + '\\' + self.lineEdit.text() + '\\' + item.text())

    # Display files in the List (Chart Display)
    def bindList(self):
        for i in range(0, len(self.list)):  # Traverse file list
            self.item = QtWidgets.QListWidgetItem(self.listWidget)  # Create list item
            self.item.setIcon(QtGui.QIcon('images/fiction.png'))  # Set list item icon
            self.item.setText(str(self.list[i])[7:13] + '...')  # Truncation string (no sequence number)
            self.item.setToolTip(self.list[i])  # Set prompt text
            self.item.setFlags(QtCore.Qt.ItemIsSelectable | QtCore.Qt.ItemIsEnabled)  # Set check or not

    # List click method to open the selected item
    def itemClick(self, item):
        os.startfile(self.lineEdit_2.text() + '\\' + self.lineEdit.text() + '\\' + item.toolTip())

# Main method (add code)
if __name__ == '__main__':
    app = QtWidgets.QApplication(sys.argv)
    MainWindow = QtWidgets.QMainWindow()  # Create form object
    ui = Ui_MainWindow()  # Create a PyQt designed form object
    ui.setupUi(MainWindow)  # Call the method of PyQt form to initialize the form object
    MainWindow.show()  # Display Form
    sys.exit(app.exec_())  # Exit process on program shutdown

epilogue

In this paper, we use python language to develop a novel reader. The core is to use QtDesigner to design the grabbing setting interface, then use requests to grab the website data, save it locally, and display the chapter data to the reader.

Published 33 original articles, won praise 23, visited 20000+
Private letter follow

Tags: Qt encoding Python pip

Posted on Wed, 12 Feb 2020 04:14:54 -0500 by Rob2005