08 how to make a table of data crawled from the Internet

1, Background

Get a data from the Internet, as shown in the figure:

How can I make an excel sheet? The final result is:

2, Operation method

1. Because the final table is generated, the csv module is used here. I don't know whether there is a response module of xlxs, and I haven't used it. This csv module is python's own package, so you don't need to download it yourself.

import csv

2. It is found that the crawled file is a dictionary list. We try to analyze the fields in it and add a header to it:

headers = ['positionName', 'workYear', 'education', 'jobNature', 'financeStage',
               'city','salary','positionAdvantage','companyFullName']

3. Start operation file:

 with open(r"toBeCSV\data1.txt","rb") as f:
        #rows1 = eval(f.read().decode("gbk"))  #Convert bytes to str and use decode; Otherwise, use encode
        rows1 = eval(f.read().decode("gbk"))  #What is read from the disk is the byte stream, that is, Bytes
        #print(rows1)

4. As for decode and encode, I still can't tell them clearly when using them. There's nothing wrong with using decode here. I'll try encode. Encoding:
AttributeError: 'bytes' object has no attribute 'encode'
AttributeError: 'bytes' object has no attribute 'encoding'

  • It is said that there is no such attribute. In fact, encoding can also be used in Open, such as python with Open (R "tobecvs \ data1. TXT", "R", encoding ='gbk ') as F: utf-8 is used here, and an error is reported:
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 25: invalid start byte

After checking, it turns out that I used 'rb' to read the contents of the file in binary form, so I need to decode it later.

Then I changed 'rb' to 'r' and found that there was no error, and the content of the generated file was not garbled. It seems that I wasted my time.

5. As for why you use with open as, please move to this blog (I just Baidu):

Generally speaking, due to security, it is likely to cause an accident in the process of reading files by the operating system, resulting in the termination of the program and the inability to close the files in time, resulting in a memory emergency (the number of files that the os can open is limited). Although the try -- finally method can be used, it is not as concise as with open... As.

https://blog.csdn.net/xrinosvip/article/details/82019844

6. About the eval() function, it is used to execute a string expression and return the value of the expression.
There is no need to report an error here: AttributeError: 'str' object has no attribute 'keys'.
Compare the output:

7. Write header to new file:

#It is found that one line is empty in the csv file written. The solution is to set newline to null in the open() parameter
    with open(r"toBeCSV\OutData1BeCSV3.csv",'w',newline='') as f:  
        f_csv = csv.DictWriter(f, headers)
        f_csv.writeheader()  #Write header

f_csv is just a variable, which is not the same thing as the file operation pointer F, and it has little to do with it.

8. Write the rest to the file:

f_csv.writerows(rows1)

3, Total code

At first, I snuggled up to use pandas. Later, I found that I only needed to import the csv module, which was the package of python and did not need to download it myself.

def usePandasFromTxtToBeCSV():
    
    import csv

    headers = ['positionName', 'workYear', 'education', 'jobNature', 'financeStage',
               'city','salary','positionAdvantage','companyFullName']
    #{"positionName": "python engineer", "workYear": "3-5 years", "education": "undergraduate", "jobNature": "full time", "financeStage": "listed company", "city": "Shanghai", "salary": "12k-24k", "positionAdvantage": "large promotion space and good welfare", "companyFullName": "Shanghai paipaidai Financial Information Service Co., Ltd.}

    with open(r"toBeCSV\data1.txt","rb") as f:
        #rows1 = eval(f.read().decode("gbk"))  #Convert bytes to str and use decode; Otherwise, use encode
        rows1 = eval(f.read().decode("gbk"))  #What is read from the disk is the byte stream, that is, Bytes
        #print(rows1)


    # You can convert a dictionary list to table mode
    #It is found that one line is empty in the csv file written. The solution is to set newline to null in the open() parameter
    with open(r"toBeCSV\OutData1BeCSV3.csv",'w',newline='') as f:  
        f_csv = csv.DictWriter(f, headers)
        f_csv.writeheader()  #Write header
        f_csv.writerows(rows1)
usePandasFromTxtToBeCSV()

Tags: Python

Posted on Tue, 30 Nov 2021 20:43:46 -0500 by desoto0311