As we all know, Python has relatively third-party modules, so long as you are skilled in using these modules, you can accomplish various tasks. Before starting to collect data, you need to have a target site, and then use Python script to collect quality data. Python's pycurl module is needed to detect web quality. It can obtain HTTP request status code, DNS parsing time, connection time, total transmission time, download packet size, HTTP header size, average download speed and other parameters. From these parameters, we can understand the quality of service of the web, and then optimize it. Write the acquired data into the Excel table, using Python's xlsxwrite module. The idea is to save the acquired data into a file, then create an Excel table, write the data into the table and draw a chart. By analogy, the data in the file will be appended, and then create Excel. Table 1 will be completely overwritten, of course, there will be many modules to create Excel tables, which will not be detailed here.
Prepare for Python scripting:
- download pycurl module, just double-click installation.
- xlsxwriter is installed using the pip command, which requires attention to the configuration of environment variables.
1. Because pycurl is downloaded and installed directly, it's not written here. It's relatively simple.
2. Install xlsxwriter module (need to be connected to the Internet)
3. The data acquisition script is as follows:
# _._ coding:utf-8 _._ import os,sys import pycurl import xlsxwriter URL="www.baidu.com" #To detect the url of the target, which target needs to be detected and which one can be changed here c = pycurl.Curl() #Create a curl object c.setopt(pycurl.URL, URL) #Define url constants for requests c.setopt(pycurl.CONNECTTIMEOUT, 10) #Define the waiting time for a request connection c.setopt(pycurl.TIMEOUT, 10) #Define request timeout c.setopt(pycurl.NOPROGRESS, 1) #Shield download progress bar c.setopt(pycurl.FORBID_REUSE, 1) #Force disconnection after completing interaction, no reuse c.setopt(pycurl.MAXREDIRS, 1) #Maximum number of HTTP redirections specified is 1 c.setopt(pycurl.DNS_CACHE_TIMEOUT, 30) #Create a file object that opens in'wb'to store the returned http header and page content indexfile = open(os.path.dirname(os.path.realpath(__file__))+"/content.txt","wb") c.setopt(pycurl.WRITEHEADER, indexfile) #Directing the returned http header to the indexfile file c.setopt(pycurl.WRITEDATA, indexfile) #Directing the returned html content to the indexfile file c.perform() NAMELOOKUP_TIME = c.getinfo(c.NAMELOOKUP_TIME) #Get DNS parsing time CONNECT_TIME = c.getinfo(c.CONNECT_TIME) #Get the connection setup time TOTAL_TIME = c.getinfo(c.TOTAL_TIME) #Get the total time of transmission HTTP_CODE = c.getinfo(c.HTTP_CODE) #Get HTTP status code SIZE_DOWNLOAD = c.getinfo(c.SIZE_DOWNLOAD) #Get download packet size HEADER_SIZE = c.getinfo(c.HEADER_SIZE) #Get HTTP header size SPEED_DOWNLOAD=c.getinfo(c.SPEED_DOWNLOAD) #Get Average Download Speed print u"HTTP Status code: %s" %(HTTP_CODE) #Output status code print u"DNS Analytical time: %.2f ms" %(NAMELOOKUP_TIME*1000) #Output DNS parsing time print u"Establishment of connection time: %.2f ms" %(CONNECT_TIME*1000) #Output Connection Establishment Time print u"Total transmission end time: %.2f ms" %(TOTAL_TIME*1000) #Total Output Transfer End Time print u"Download Packet Size: %d bytes/s" %(SIZE_DOWNLOAD) #Output download packet size print u"HTTP Head size: %d byte" %(HEADER_SIZE) #Output HTTP header size print u"Average download speed: %d bytes/s" %(SPEED_DOWNLOAD) #Output average download speed indexfile.close() #Close file c.close() #Close the curl object f = file('chart.txt','a') #Open a chart.txt file to append f.write(str(HTTP_CODE)+','+str(NAMELOOKUP_TIME*1000)+','+str(CONNECT_TIME*1000)+','+str(TOTAL_TIME*1000)+','+str(SIZE_DOWNLOAD/1024)+','+str(HEADER_SIZE)+','+str(SPEED_DOWNLOAD/1024)+'\n') #Write the output above to the chart.txt file f.close() #Close chart.txt file workbook = xlsxwriter.Workbook('chart.xlsx') #Create an excel file for chart.xlsx worksheet = workbook.add_worksheet() #Create a worksheet object, default to Sheet1 chart = workbook.add_chart({'type': 'column'}) #Create a chart object title = [URL , u' HTTP Status code',u' DNS Analytical time',u' Establishment of connection time',u' End time of transmission',u' Download Packet Size',u' HTTP Head size',u' Average download speed'] #Define a list of data table headers format=workbook.add_format() #Define format format objects format.set_border(1) #Define format for format object cell border bolding (1 pixel) format_title=workbook.add_format() #Define format_title format objects format_title.set_border(1) #Define the format of format_title object cell border bolding (1 pixel) format_title.set_bg_color('#00FF00') #Definition format_title The background color of the object cell is'#cccccc' format_title.set_align('center') #Define format_title object cell centered format format_title.set_bold() #Define format_title object cell content coarsening format worksheet.write_row(0, 0,title,format_title) #Write the title content to the first line f = open('chart.txt','r') #Open chart.txt file as read-only line = 1 #Define variable line equal to 1 for i in f: #Open for loop to read files head = [line] #Define variable head equals line lineList = i.split(',') #Converting Strings into List Forms lineList = map(lambda i2:int(float(i2.replace("\n", ''))), lineList) #Delete the lastn in the list, delete the number after the decimal point, and convert the floating-point type into an integer lineList = head + lineList #Add up two lists worksheet.write_row(line, 0, lineList, format) #Write the data into the execl table line += 1 average = [u'average value', '=AVERAGE(B2:B' + str((line - 1)) +')', '=AVERAGE(C2:C' + str((line - 1)) +')', '=AVERAGE(D2:D' + str((line - 1)) +')', '=AVERAGE(E2:E' + str((line - 1)) +')', '=AVERAGE(F2:F' + str((line - 1)) +')', '=AVERAGE(G2:G' + str((line - 1)) +')', '=AVERAGE(H2:H' + str((line - 1)) +')'] #Find the average of each column worksheet.write_row(line, 0, average, format) #Write the average below the last line of data f.close() #Close file def chart_series(cur_row, line): #Define a function chart.add_series({ 'categories': '=Sheet1!$B$1:$H$1', #The parameters to be output are labeled as chart data (X axis) 'values': '=Sheet1!$B$'+cur_row+':$H$'+cur_row, #Get data from column B to column H 'line': {'color': 'black'}, #Line color is defined as black 'name': '=Sheet1!$A'+ cur_row, #Refer to business name as legend item }) for row in range(2, line + 1): #Data Series Function Calls from the Second Line to the Last Line in Text chart_series(str(row), line) chart.set_size({'width':876,'height':287}) #Define the width and height of the chart worksheet.insert_chart(line + 2, 0, chart) #Insert the chart in two rows below the last row of data workbook.close() #Close the execl document
4. After running the script, three files will be generated in the directory where the script is located. Two are txt text files and one is Excel files. After executing the script, the following information will be displayed:
5. The files generated under the current directory are as follows:
Among them, the two txt format files are designed to pave the way for Excel, so they can be selectively ignored, mainly looking at the data in Excel. The data in Excel is as follows (the following is the display result after executing six scripts, that is to say, six probes):
———————— This is the end of the article. Thank you for reading.————————