Batch verification of 1000 apk s in Python UU (with hands-on materials)

Next to the previous case: Download 1000 apk s in batch in Python , we only talked about how to download,
In fact, the form provided by the marketing department includes not only the download address of apk, but also the version number of apk and MD5 information,
How to ensure that the 1000 apk s you downloaded are correct in the download process, with the correct version number and MD5 information?
Attachment: table provided by marketing department containing apk version number and md5 information.

This case mainly introduces how to quickly implement batch verification of downloaded app s.


Preparation stage
  1. This article only deals with verifying the version number and md5 information of apk, not downloading.
  2. You need to make sure that aapt has been successfully added to the environment variable
  3. The "AAPT dump bagging XXX. apk | findstr version" command can resolve the versionName information of an apk.
  4. The Windows operating system can use the command "certutil -hashfile XXX.apk MD5" to calculate the MD5 hash value of an apk,
    MD5 is a common calculation method used to verify the integrity of the file download process, to ensure that the downloaded file is not tampered or damaged in the process of network transmission.
  5. The Linux operating system can use the command "md5sum XXX.apk" to calculate the MD5 hash value of an apk.
  6. In fact, Python's hashlib module can also calculate the MD5 hash value, which is not affected by the operating system.
  7. In the previous case, the downloaded APK is placed in the "downloaded_apk" file. The os.listdir() function can list all the APK files in the folder.
  8. openpyxl is still recommended for Excel reading and writing. It needs to be compared with the version number and MD5 value of the original excel automatically,
    So we added 2 columns for verification and comparison. If the values are the same, we backfill Ok. If the values are different, we backfill the difference values and mark the red background.

Python batch script form

Remember the essence of batch scripts: execute statements in batch order

# coding=utf-8

import os
import re
import openpyxl
from openpyxl.styles import PatternFill

# Step 1: generate the dictionary of the entire excel table. key is the apk Chinese name, and value is a list [number of lines, version number, MD5]
print("Being generated apk Information index dictionary...")
apkinfo_dict = {}
apkname_col = 2
apkversion_col = 4
apkmd5_col = 6
apkversion_col_newadd = 5  # New version validation column
apkmd5_col_newadd = 7  # New MD5 validation column
error_fill = PatternFill(fill_type='solid', fgColor="FF3300")  # Mark red background
excel = openpyxl.load_workbook('Top_1000_apks.xlsx')  # Read the contents of excel
table = excel.active
rows = table.max_row
for r in range(2, rows + 1):  # It has nothing to do with the title line of the first line of excel. Start with the text content of the second line
    apk_name = table.cell(row=r, column=apkname_col).value  # Get apk name
    apk_version = table.cell(row=r, column=apkversion_col).value  # Get apk name
    apk_md5 = table.cell(row=r, column=apkmd5_col).value  # Get apk name
    apkinfo_dict[apk_name] = [r, apk_version, apk_md5]
print(apkinfo_dict)

# Step 2: read all the files in the downloaded? APK folder, and then compare and backfill them
curpath = os.getcwd()
apk_dir = os.path.join(curpath, "downloaded_apk")
apk_list = os.listdir(apk_dir)
for apk in apk_list:
    print("Underway%s Version and MD5 Value comparison and backfill operation..." % apk)
    apk_path = os.path.join(apk_dir, apk)
    file_name = apk.replace(".apk", "")  # Get apk file name, remove the suffix
    s1 = os.popen("aapt.exe dump badging %s | findstr version" % apk_path).read()
    version_name = re.findall(r"versionName=\'(.*)\'", s1)[0]
    print(version_name)
    s2 = os.popen("certutil -hashfile %s MD5" % apk_path).read()
    md5_value = s2.splitlines()[1]
    md5_value = md5_value.replace(" ", "")
    print(md5_value)

    r = apkinfo_dict[file_name][0]  # Get the line number of the apk

    # If the version number matches up, backfill Ok. If it does not match up, backfill the new version number
    if version_name == apkinfo_dict[file_name][1]:
        table.cell(row=r, column=apkversion_col_newadd).value = "OK"
    else:
        table.cell(row=r, column=apkversion_col_newadd).value = version_name
        table.cell(row=r, column=apkversion_col_newadd).fill = error_fill  # Mark red background

    # If MD5 matches, backfill Ok, if not, backfill new MD5
    if md5_value == apkinfo_dict[file_name][2]:
        table.cell(row=r, column=apkmd5_col_newadd).value = "OK"
    else:
        table.cell(row=r, column=apkmd5_col_newadd).value = md5_value
        table.cell(row=r, column=apkmd5_col_newadd).fill = error_fill  # Mark red background

print("The comparison and backfilling are completed and saved to New_Top_1000_apks.xlsx,Please refer to...")
excel.save("New_Top_1000_apks.xlsx")
os.system("pause")

Python procedure oriented function form

Process function oriented programming thinking should be as follows:
How many functions do you need to do this,
It is better to encapsulate all functions as much as possible, only exposing some parameter interfaces.
After command-line tools are skillfully used, you can consider using Python modules as much as possible to realize the functions of command-line tools,
For example, certutil or md5sum should not be used as much as possible, and the hashlib module should be used instead,
Reduce the dependency on a command-line tool to make it more portable (less restrictive on the operating system).

# coding=utf-8

import os
import re
import openpyxl
from openpyxl.styles import PatternFill
import hashlib

# Define some "global variables" that may be called by this module (current. py file)
apkname_col = 2
apkversion_col = 4
apkmd5_col = 6
apkversion_col_newadd = 5  # New version validation column
apkmd5_col_newadd = 7  # New MD5 validation column
error_fill = PatternFill(fill_type='solid', fgColor="FF3300")  # Mark red background


def parse_apk_excel(excel_file):
    '''Used for generation apk Information index dictionary'''
    print("Being generated apk Information index dictionary...")
    apk_info_dict = {}
    excel = openpyxl.load_workbook(excel_file)  # Read the contents of excel
    table = excel.active
    rows = table.max_row
    for r in range(2, rows + 1):  # It has nothing to do with the title line of the first line of excel. Start with the text content of the second line
        apk_name = table.cell(row=r, column=apkname_col).value  # Get apk name
        apk_version = table.cell(row=r, column=apkversion_col).value  # Get apk name
        apk_md5 = table.cell(row=r, column=apkmd5_col).value  # Get apk name
        apk_info_dict[apk_name] = [r, apk_version, apk_md5]
    return apk_info_dict, excel, table


def get_apk_version(apk_path):
    s = os.popen("aapt.exe dump badging %s | findstr version" % apk_path).read()
    version_name = re.findall(r"versionName=\'(.*)\'", s)[0]
    print(version_name)
    return version_name


def get_apk_md5(apk_path):
    with open(apk_path, "rb") as hf:
        apk_md5 = hashlib.md5(hf.read()).hexdigest()
        print(apk_md5)
        return apk_md5


def compare_rewrite(apk_info_dict, excel, table):
    curpath = os.getcwd()
    apk_dir = os.path.join(curpath, "downloaded_apk")
    apk_list = os.listdir(apk_dir)
    for apk in apk_list:
        print("Underway%s Version and MD5 Value comparison and backfill operation..." % apk)
        file_name = apk.replace(".apk", "")  # Get apk file name, remove the suffix
        apk_path = os.path.join(apk_dir, apk)
        version_name = get_apk_version(apk_path)
        md5_value = get_apk_md5(apk_path)
        r = apk_info_dict[file_name][0]  # Get the line number of the apk

        # If the version number matches up, backfill Ok. If it does not match up, backfill the new version number
        if version_name == apk_info_dict[file_name][1]:
            table.cell(row=r, column=apkversion_col_newadd).value = "OK"
        else:
            table.cell(row=r, column=apkversion_col_newadd).value = version_name
            table.cell(row=r, column=apkversion_col_newadd).fill = error_fill  # Mark red background

        # If MD5 matches, backfill Ok, if not, backfill new MD5
        if md5_value == apk_info_dict[file_name][2]:
            table.cell(row=r, column=apkmd5_col_newadd).value = "OK"
        else:
            table.cell(row=r, column=apkmd5_col_newadd).value = md5_value
            table.cell(row=r, column=apkmd5_col_newadd).fill = error_fill  # Mark red background

    print("The comparison and backfilling are completed and saved to New_Top_1000_apks.xlsx,Please refer to...")
    excel.save("New_Top_1000_apks.xlsx")


apk_info_dict, excel, table = parse_apk_excel("Top_1000_apks.xlsx")  # Get index dictionary
compare_rewrite(apk_info_dict, excel, table)  # Start comparison and backfilling
os.system("pause")

Python object oriented class form

The programming thinking of object-oriented class should be as follows:
If you are given a blank world, what kinds of things do you need in this world,
What are the common attributes and methods of these kinds of things,
What is the relationship between these kinds of things (objects) and other kinds of things (objects).
Try to encapsulate these classes and only expose the external attributes (variables) and methods (functions).

# coding=utf-8

import os
import re
import openpyxl
from openpyxl.styles import PatternFill
import hashlib

# Define some "global variables" that may be called by this module (current. py file)
apkname_col = 2
apkversion_col = 4
apkmd5_col = 6
apkversion_col_newadd = 5  # New version validation column
apkmd5_col_newadd = 7  # New MD5 validation column
error_fill = PatternFill(fill_type='solid', fgColor="FF3300")  # Mark red background


class ExcelParser():
    def __init__(self, excel_file):
        self._excel_file = excel_file  # There's no need to expose it to the outside world_

    def parse_apk_excel(self):  # This is a method (function) that needs to be exposed. It cannot be added_
        '''Used for generation apk Information index dictionary'''
        print("Being generated apk Information index dictionary...")
        apk_info_dict = {}
        excel = openpyxl.load_workbook(self._excel_file)  # Read the contents of excel
        table = excel.active
        rows = table.max_row
        for r in range(2, rows + 1):  # It has nothing to do with the title line of the first line of excel. Start with the text content of the second line
            apk_name = table.cell(row=r, column=apkname_col).value  # Get apk name
            apk_version = table.cell(row=r, column=apkversion_col).value  # Get apk name
            apk_md5 = table.cell(row=r, column=apkmd5_col).value  # Get apk name
            apk_info_dict[apk_name] = [r, apk_version, apk_md5]
        return apk_info_dict, excel, table


def get_apk_version(apk_path):
    s = os.popen("aapt.exe dump badging %s | findstr version" % apk_path).read()
    version_name = re.findall(r"versionName=\'(.*)\'", s)[0]
    print(version_name)
    return version_name


def get_apk_md5(apk_path):
    with open(apk_path, "rb") as hf:
        apk_md5 = hashlib.md5(hf.read()).hexdigest()
        print(apk_md5)
        return apk_md5


def compare_rewrite(apk_info_dict, excel, table):
    curpath = os.getcwd()
    apk_dir = os.path.join(curpath, "downloaded_apk")
    apk_list = os.listdir(apk_dir)
    for apk in apk_list:
        print("Underway%s Version and MD5 Value comparison and backfill operation..." % apk)
        file_name = apk.replace(".apk", "")  # Get apk file name, remove the suffix
        apk_path = os.path.join(apk_dir, apk)
        version_name = get_apk_version(apk_path)
        md5_value = get_apk_md5(apk_path)
        r = apk_info_dict[file_name][0]  # Get the line number of the apk

        # If the version number matches up, backfill Ok. If it does not match up, backfill the new version number
        if version_name == apk_info_dict[file_name][1]:
            table.cell(row=r, column=apkversion_col_newadd).value = "OK"
        else:
            table.cell(row=r, column=apkversion_col_newadd).value = version_name
            table.cell(row=r, column=apkversion_col_newadd).fill = error_fill  # Mark red background

        # If MD5 matches, backfill Ok, if not, backfill new MD5
        if md5_value == apk_info_dict[file_name][2]:
            table.cell(row=r, column=apkmd5_col_newadd).value = "OK"
        else:
            table.cell(row=r, column=apkmd5_col_newadd).value = md5_value
            table.cell(row=r, column=apkmd5_col_newadd).fill = error_fill  # Mark red background

    print("The comparison and backfilling are completed and saved to New_Top_1000_apks.xlsx,Please refer to...")
    excel.save("New_Top_1000_apks.xlsx")


if __name__ == '__main__':
    e_obj = ExcelParser("Top_1000_apks.xlsx")
    apk_info_dict, excel, table = e_obj.parse_apk_excel()  # Get index dictionary
    compare_rewrite(apk_info_dict, excel, table)  # Start comparison and backfilling
    os.system("pause")

Download the training materials of this case

Jump to download materials on the official website of selfie tutorial
Wusanren products, please feel free to download and use!

Operation mode and effect

Make sure the Android device is connected to the computer through the USB cable, and the adb device is effectively connected,
The three implementation forms of the above code can be run directly, for example, save it as verify ﹣ apks.py and put it in the same folder as the downloaded ﹣ APK folder and top ﹣ 1000 ﹣ apks.xlsx,
It is recommended to run Python verify ﹣ apks.py, or double-click it.
The operation effect is as follows:

Finally, a new top 1000 apks.xlsx will be generated. Its verification and backfill effect are as follows,
The red color indicates that the apk actually downloaded is different from the version information on Excel and Md5 provided by the marketing department.


For more and better original articles, please visit the official website: www.zipython.com
Selfie course (Python course of automatic test, compiled by Wu Sanren)
Original link: https://www.zipython.com/#/detail?id=f13a1efe25424b679e663a63fb64a10c
You can also follow the wechat subscription number of "wusanren" and accept the article push at any time.

Tags: Python Excel Programming Windows

Posted on Tue, 07 Apr 2020 06:58:11 -0400 by sameerni