ERA5 data processing, simple recording. Arcpy

  I'm a little excited to write this for the first time. Novice, simply record it. There may be different methods to learn the same problem. If there are improvements in codes and methods, please point out more! hhh

  In fact, the code is still running, but there is no problem at present. Write it first. It is the processing process after eras5 data is downloaded and mosaic. Write the download and splicing process next time you have time.

  To extract the hourly Lai of China in 2015_ lv( Leaf area index, low vegetation )For example, the data of ERA5 is extracted according to the abbreviation of variables. See ERA5-Land: data documentation - Copernicus Knowledge Base - ECMWF Confluence Wiki)

Ideas are also included in the code. Go directly to the code.

Let's talk about the idea. The til area after splicing is larger than that in China. The ERA5hourly data is about 0.1 °, that is, about 10km (about, don't spray me). The goal is to get the data of 1km in China

It involves projection, resampling, clipping and so on (this is written casually)

First of all, because it is land data and the ocean is worthless, there is a big difference between 10km grid and 1km grid for the coastline boundary, so resampling is not feasible at the beginning.

1. First set the null value, and set the outliers that are < 0 in this paper to 0. Then you will find that some grids along the coastline are 0. Therefore, China's area is small. How can we do that? We have to add hhhh to it.

2. What to do? Use neighborhood analysis. The meteorological elements are generally stable. So we use nearby values to supplement them. I use a circle with a radius of 2 unit s. The overall effect is OK. In this way, you will find that China has a large area, because as long as there is a lattice in the circle with values, it will do neighborhood analysis. Just cut and resample later.

3. Remember, after neighborhood analysis, there is another smoothed graph, and the graph with null value is two graphs. What we want is the partial grid of the missing coastline in the null value graph, that is, the values of these grids are taken from the graph of neighborhood analysis. Never use the graph of neighborhood analysis directly later, because the values in inland areas are not the original data.     This step is the long sentence, fill_raster =................

4. Projection is to call the method and choose it yourself.

5. Resample to one kilometer and choose your own method.

6. Use the Chinese boundary mask, and the lattice corresponds to the top.

It is recommended to set null value and neighborhood on the original data first,   There will be problems with the coastal boundary of resampling first, because there is no value at sea, and resampling also fills the value... Pay attention to the processing sequence

Here is the code

# -*- coding: utf-8 -*-
#Import package
import arcpy
from arcpy import env
from import *
import os,datetime
#Get program start time
start =
#Set processing environment
variable = "2015_lai_lv"     #This is just the name of the folder to be processed. There is a pile of data to be processed under the folder
final_Dir = "D:/First folder/ERA5/" +variable        #This is the final result output folder, which is the available data. Generally, there is no Chinese path. Because this path has a name, you can type it casually.
if not os.path.exists(final_Dir):
     os.makedirs(final_Dir)           #It's an ordinary judgment sentence
arcpy.env.workspace = final_Dir   #Setting the workspace is very important!!
arcpy.env.parallelProcessingFactor = "21"    #Call the multi-core operation of computers. Now many computers are multi-core, but is it useful? What the official gives is that some gis tools have the function of parallel processing
elevRaster = 'D:/Random folder/Boundary.tif'    #  This is a tailor-made tiff, which is the Chinese border. As for the path, the name chosen casually when writing the article, and the code is better in English.
#Open the toolbox and obtain the License
#Overlay output
arcpy.env.overwriteOutput = True  
#Get projection
Projection = arcpy.Describe(elevRaster).spatialReference   
#Open file
# variables = os.listdir(r"D:\Data_collect\ERA5\ERA5_tiff")   
# for variable in variables:   #Get the file name of the input folder. At that time, it was written to cycle many variables. The final amount of data was too large and too slow. Give up. Delete it yourself when copying the code.
inDir = "F:/Data_collect/ERA5/ERA5_tiff/"  +variable     #Open the folder where you want to process the data
# prj_Dir = "D:/Data_collect/ERA5/try/ERA5_fill_prj/" +variable
# resam_Dir = "D:/Data_collect/ERA5/try/ERA5_fill_prj_resam/"+variable
# if not os.path.exists(prj_Dir):
#     os.makedirs(prj_Dir)
# if not os.path.exists(resam_Dir):
#     os.makedirs(resam_Dir)                                     #These comments are useless. Because some arcpy spatial analysis tools need to save a location when they are used, some intermediate data folders are generated.
inList = [name for name in os.listdir(inDir) if name.endswith(".tif")]    # A simple loop to get all tif files
# Traverse all TIFS
for file in inList:
    # Enter the full path of the file
    mem_Dir = "in_memory/"     #This is a memory path
    input_raster = os.path.join(inDir, file)         #join get full path
    final_raster = os.path.join(final_Dir,file)
    #Judge whether the file exists. If it exists, skip
    if os.path.exists(final_raster):
    set_null_ra = SetNull(input_raster, input_raster, "Value < 0")  #Set a value less than O to null
    fill_raster =,
                     ,, "cell"), "MEAN", "DATA"),
                               set_null_ra)  # The neighborhood calculates and replaces null values
    file = file[:-4]    #Get the file name of the file, but do not include the extension (i.e.. tif), because file expansion is not allowed when intermediate data is placed in memory space
    prj_file = "prj_"+file                       #Intermediate data name
    resam_file = "resam" +file             #Intermediate data name
    prj_raster = os.path.join(mem_Dir,prj_file)
    resam_raster = os.path.join(mem_Dir,resam_file)       #join is the same as above, but it is the path of memory. Otherwise, it should be possible to directly name "in_memory /
    arcpy.ProjectRaster_management(fill_raster, prj_raster, Projection, "NEAREST", )  #Re projection
    arcpy.Resample_management(prj_raster,resam_raster, "1000", "BILINEAR")  # Perform resampling, elevRaster,final_raster)  # Execute the mask and pay attention to the length of the output file name    #Delete files in memory for fear that there is not enough memory
#Gets the end time of the program
end =
SpendTime = end-start
print ("%s is ok!"%variable)
print("spend time :%s"%SpendTime)           

I don't know how to modify this thing. It's too difficult to use. The code can't be changed when it's passed on.   I often see nanny level. How can I get to primary school level? The code may be a little messy. I've just learned. Forgive me.

A few questions to explain:

1. Because I had 21 variables, it was too slow to write a loop and traverse all the variables. Moreover, there was not enough hard disk space. Later, I used a variable and a python file to run. Anyway, I changed variable = a variable, which was very fast.

2. When setting the processing environment, why put the statement of creating the output folder in front of it,   It should be noted that when arcpy is executed, if the workspaces of multiple pythons are the same, because arcpy is called below. Some temporary files will be generated when a tool is used up and will be deleted automatically. However, if the workspaces are the same, they may not be deleted because other python files are using this file, and these temporary files cannot be named, anyway I don't know if it's because of this, but I won't make an error after I change the workspace of multiple python files.                           The code is just like that

3. About calling multi-core, arcpy.env.parallelProcessingFactor = "21" is this sentence. At present, the official says that some tools can be accelerated. 21 is the number of cores you want to use, but my program has no effect. I want to check it or find someone else's blog. Anyway, I'm trying to accelerate it. Official instructions Parallel Processing Factor (Environment setting)—ArcGIS Pro | Documentation

4. Although I wrote arcpy.env.overwriteOutput = True output overlay, many of the output files I ran before have been output, and the code has made mistakes. It's too worrying to run again. As we all know, ERA5 runs 8760 hours a year. If it runs thousands of times, it's not necessary to run again. So first add a judgment statement to judge whether the file is in if os.path.exists(final_raster):

As for the syntax application of arcpy. A tool, copy it and search it. I won't say more.   

The happiest thing about me is to save the intermediate data into the memory space and then delete it. The speed increased several times directly. At first, arcpy.ProjectRaster_management,arcpy.Resample_management, these tools require you to specify input and output paths, and I foolishly create multiple intermediate data folders. First output to, and then read tif for the next step, slow     Later, the intermediate data will be output to memory, and the subsequent steps will be deleted after calling, which is several times faster. hhhh, the fun of changing the code.     Be sure to delete. This memory is limited. That's the delete one!!!!!

The code is a little messy, novice, forgive me. Then many notes are written when writing the article here. Others were tried before. Just delete them.

Finally, a picture is drawn. The research area is the Chinese mainland. Ha, of course, the final research is finished. We should pay attention to the nine line of the South China Sea and the territorial integrity of the country.

  It's a little wordy. Don't dislike it when you write it for the first time......

Tags: Python Algorithm Machine Learning AI

Posted on Wed, 17 Nov 2021 05:27:30 -0500 by