This paper uses the WindAlpha single factor research framework of wankuang platform to explore the impact of COVID-19 epidemic on the stock market from the perspective of stock factors. IC analysis found that there was a significant negative correlation between the severity of the epidemic and the return of the stock market, which increased with time and peaked on the 10th day. According to the COVID-19 epidemic factors, the return rate of the first group was significantly higher than that of the other groups, and the long-term combination could outperform the market. ( Code clone point here)
Article directory
1. Factor construction
First, obtain the time series of epidemic data of each province. For the method of obtaining the time series, see n methods of obtaining the historical data of COVID-19 epidemic . The data source here is nCov2019 Library in R language:
library(nCov2019) x <- load_nCov2019() data <- summary(x)[1:3] write.csv(data,"Data/nCovProvince.csv",row.names = FALSE) ## province time cum_confirm ## Shanghai 2020 / 2 / 10 302 ## Yunnan 2020 / 2 / 10 149 ## Inner Mongolia 2020 / 2 / 10 58 ## Beijing 2020 / 2 / 10 342 ## Taiwan 2020 / 2 / 10 18
Then extract the registered place and office location of all A shares of non ST and non PT shares on the wankuang platform, and find the province corresponding to the location. According to the province information, find the corresponding number of confirmed cases in the epidemic data obtained before, and take the number of confirmed cases as the factor value of the stock day.
from WindPy import * #api from datetime import datetime from scipy import stats, optimize from WindCharts import * import pandas as pd import WindAlpha as wa w.start(show_welcome=False) # Read epidemic data: before reading, you need to upload the csv file to the "data file / nCov" folder nCov_province = pd.read_csv("data/nCov/nCovProvince.csv",encoding = "gbk",index_col = ["time", "province"]) start_date = nCov_province.index[-1][0] end_date = nCov_province.index[1][0] def get_province(address): # Return the province corresponding to the address according to the address information province_list = ['Shanghai','Yunnan','Inner Mongolia','Beijing','Taiwan','Jilin','Sichuan','Tianjin','Ningxia','Anhui','Shandong', 'Shanxi','Guangdong','Guangxi','Xinjiang','Jiangsu','Jiangxi','Hebei','Henan','Zhejiang','Hainan','Hubei', 'Hunan','Macao','Gansu','Fujian','Tibet','Guizhou','Liaoning','Chongqing','Shaanxi','Qinghai','Hong Kong','Heilongjiang'] for province in province_list: if province in address: return province def factor_prepare(nCov_province): # Get the list of trading days and convert it to string format trade_dates = w.tdays(start_date, end_date, period="d").Data[0] trade_dates = [dt.strftime("%Y-%m-%d") for dt in trade_dates] # Obtain all A share codes of non ST, non PT stock_set = w.wset("sectorconstituent", "date="+start_date+";sectorId=a001010f00000000;field=wind_code", usedf=True) stock_list = list(stock_set[1]['wind_code']) # Obtain the registered place and office address of the company and extract the province raw_data=w.wss(stock_list, "address,office", "rptDate= %s" %(start_date),usedf=True)[1] raw_data["ADDRESS"] = raw_data["ADDRESS"].map(get_province) raw_data["OFFICE"] = raw_data["OFFICE"].map(get_province) # Data format required to organize data into factors index=pd.MultiIndex.from_product([trade_dates,stock_list]) factor = pd.DataFrame(columns=raw_data.columns, index=index) for trade_date in trade_dates: factor.loc[trade_date,"ADDRESS"] = np.asarray(raw_data["ADDRESS"]) factor.loc[trade_date,"OFFICE"] = np.asarray(raw_data["OFFICE"]) # Transfer the information of the province where the company is located to the cumulative number of confirmed patients on that day and take it as the final factor value for trade_date in trade_dates: for security_code in stock_list: for columns in factor.columns : security_prvince = factor.loc[(trade_date,security_code),columns] try: # There is a possibility that the index of the assigned value cannot reach the corresponding value, because there is no case in the corresponding province at that time factor.loc[(trade_date,security_code),columns] = nCov_province.loc[(trade_date,security_prvince), "cum_confirm"] except (KeyError): factor.loc[(trade_date,security_code),columns] = 0 # The date index of upload factor needs to be converted to datetime format date_index = [datetime.datetime.strptime(dt,"%Y-%m-%d") for dt in index.levels[0]] index.set_levels(date_index, level=0, inplace = True) factor.set_index(index,inplace = True) inds_ret = wa.load_local_factors(factor) return inds_ret raw_ind_ret = factor_prepare(nCov_province)
The structure of the uploaded factor data is as follows. The last two columns are the data automatically added by wankuang platform in the process of uploading the factor. MKT ﹣ cap ﹣ ashare represents the market value of A shares (excluding restricted shares), and next ﹣ RET represents the next earnings of the shares.
2. Factor pretreatment
Next, preprocess the factors:
- Missing value processing: delete the missing row directly
- De extremum: the extremum of Hubei Province will be deleted when the extremum is processed, so the de extremum is not processed
- Standardization: the market value weighted standardization method is adopted, which is equivalent to the neutralization of market value at the same time
- Industry neutralization: in order to eliminate the potential impact of industry on factor performance, the form of linear regression is established here for neutralization, that is, the original factor value is the dependent variable, and the industry dummy variable is the independent variable for regression. The residual obtained by regression is the part of the original factor that cannot be explained by the industry, so the residual is extracted as the new factor after neutralization.
# The above process directly calls the process? Raw? Data function of WindAlpha processed_inds_ret = wa.process_raw_data(raw_ind_ret, missing_method = "dele", extreme_method = False, scale_method = "cap", neutralize_method = 'sw_1',isinclude_cap = False)
Comparison between the original factor and the processed factor:
import seaborn as se import matplotlib.pyplot as plt fig = plt.figure(figsize=(7,5)) plt.subplot(211) processed_inds_ret.loc['2020-02-04']["OFFICE"].hist() plt.subplot(212) raw_inds_ret.loc['2020-02-04']['OFFICE'].hist() plt.suptitle(u"Processed Data VS Raw Data(Date:2020-02-04)")
2. Factor analysis
2.1 IC analysis
Information coefficient (IC) refers to the correlation between the factor value of each period and the stock return of the next period. The size of IC reflects the linear relationship between factor exposure value and stock performance. The larger the IC value is, the higher the predictability of factor is, the better the effect of stock selection is. In this paper, the IC value is calculated by the RankIC method, i.e. the cross-sectional correlation coefficient between the exposure value ranking of factors and the ranking of next period return of stocks.
# Call the IC analysis function of WindAlpha directly ic_ana = wa.ic_analysis(processed_inds_ret) ind = "OFFICE" fig_ic=WLine("IC Sequence:{}".format(ind),"{}-{}".format(start_date, end_date), ic_ana.ic_series.loc[ind]) fig_ic.plot()2.1.1 IC signal attenuation
The previous IC Series calculated the correlation between the factor value of the current period and the yield of the next period, that is, the difference between the factor value and the yield is one cycle. The IC decay describes the correlation between the factor value and the yield in LAG period. The specific calculation method is that if there is a total of N period factor data and yield data, first calculate the IC value of all I period factors and i+1 period yield to find the average, and then calculate the IC value of I period factor and i+2 period yield to find the average (i=1,... , N-LAG), and finally get the average value of each IC of LAG, which reflects the IC attenuation.
ic_ana.ic_decayADDRESS OFFICE LAG0 -0.016098 -0.014985 LAG1 -0.037245 -0.035465 LAG2 -0.055568 -0.054450 LAG3 -0.081646 -0.080446 LAG4 -0.100494 -0.099304 LAG5 -0.114949 -0.112260 LAG6 -0.131797 -0.132035 LAG7 -0.142606 -0.145498 LAG8 -0.143235 -0.149350 LAG9 -0.141615 -0.150159 LAG10 -0.146176 -0.154830 LAG11 -0.144909 -0.152806
By visualizing the results of IC attenuation, we can find a very obvious rule. There is a significant negative correlation between the severity of the epidemic and the return of the stock market, which increases with time and reaches the peak on the 10th day.
# According to the results of IC attenuation, the epidemic data of office places have a greater impact on the stock market ind = "OFFICE" fig_decay=WBar('{} IC Decay'.format("COVID-19"), '',ic_ana.ic_decay[ind].to_frame()) fig_decay.plot()
2.2 yield analysis
According to the factor value, the stocks are divided into five groups, and a multi empty combination of multi empty and multi empty is constructed. The average return (market value weighted) of each group of stocks in each period is directly used to calculate the cumulative return and other indicators of each group.
# Based on Wind all A index direction_dict = {ind: 'descending'} ret_ana = wa.return_analysis(processed_inds_ret,'881001.WI',start_date, end_date,ind_direction=direction_dict)
Visualizing the results of each group, the first group's yield is significantly higher than other groups, and the long and short combination can win the market.
sig_ret_line = WLine("Long and short portfolio yield:{}".format("COVID-19"),"{}-{}".format(start_date, end_date) , round(ret_ana.group_cum_return[['G01','G02','G03','G04','G05','G01-G05','BENCH_RET']].loc[ind],4),auto_yaxis=True) sig_ret_line.plot()
The above is the whole content of this article, welcome to pay attention to my Know about|Brief book|CSDN|WeChat official account PurePlay will share research and learn dry goods from time to time.