Value access of Pandas (. LOC,. Iloc,. IX difference in accessing data) + query + set ﹣ index and reset ﹣ index

The three little knowledge points of pandas sorted out today are from Industrial production forecast Three of them are used in this competition, so let's organize them together:

1. pandas value access (. LOC,. Iloc,. IX)

First, create a pd.DataFrame for demonstration:

import pandas as pd
data = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]},index=["a","b","c"])
data

## Result:

    A	B	C
a	1	4	7
b	2	5	8
c	3	6	9
  1. . loc [] function:
    The row data is fetched by the specific value in the row Index "Index". Inside the brackets are the first column followed by a comma. The row and column are row labels and column labels respectively. For example, I want to get the number 5:

    data.loc["b","B"]
    

    Because row label is B, column label is B, similarly, then 4 is data ["a", "B"]
    The above only selects a certain value, so if I want to select a region, for example, I want to select 5, 8, 6, 9, then I can do this:

    data.loc['b':'c','B':'C']
    

    Because the value of the selected area is 5 in the upper left corner and 9 in the lower right corner, then the value of this rectangular area is between these two coordinates, that is, the row label corresponding to 5 to 9, the column label corresponding to 5 to 9, the comma separated between row labels, the row label and row label, the colon separated between column labels and column labels. Remember,. loc is the row label To select the data, and the front is closed and the back is closed.

    Then, we will think, what should we do if we only know the data of the first row and the second column? It happens that. iloc does this.

  2. .iloc[]
    . iloc [] is the same as loc. The brackets are also the first column and the second column. The row and column labels are separated by commas. The difference with loc is that. Iloc is indexed by the number of rows and columns. For example, the number 5 mentioned above is data.iloc[1,1], because 5 is the second row and column. Note that the index starts from 0. Similarly, 4 is data.iloc[0,1] If we need to select a region, for example, I want to select 5, 8, 6, 9, then using iloc to select is

    data.iloc[1:3,1:3]
    

    Because 5 is in the second row and the second column, 9 is in the third row and the third column. Note that here the interval is closed before and opened after closing, so it is 1:3. Unlike loc, loc is closed before closing and closed after closing, and loc is based on the row and column labels. iloc is based on the number of rows and columns

  3. .ix[]
    . ix I found that he can use either of the above two methods. It can be based on both the row and column labels and the number of rows and columns. For example, he can get 5

    data.ix[1,1]
    data.ix["b","B"]
    

    You can do both of the above. Select a region in the same way

    data.ix[1:3,1:3]
    data.ix['b':'c','B':'C']
    
    Both of the above methods take 5, 6, 8, 9
    

More details Detailed explanation of the usage of loc and iloc functions in Pandas (source code + instance)

2. Pandas.set-index and reset-index

  • DataFrame can set single index and composite index through set ﹣ index method.

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

  • keys: Column label or column label/Array list, columns to be indexed
  • drop: Default is True,Delete column used as new index
  • append: Default is False,Attach column to existing index
  • inplace: Default is False,Appropriate modification DataFrame(Do not create new objects)
  • verify_integrity: Default is false,Check the copy of the new index. Otherwise, please postpone the inspection until necessary. Set it to false The performance of this method will be improved.
  • Reset  index can restore the index and change from new to the default integer index

DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=")

  • level: int,str,tuple or list,None by default, only the given level is removed from the index. All levels are removed by default. Control the index of the specific level to restore
  • drop: drop by False The index column will be restored to normal column, otherwise it will be lost
  • inplace: Default is false,Appropriate modification DataFrame(Do not create new objects)
  • col_level: int or str,The default is 0, which determines to which level the label is inserted if the column has more than one level. By default, it is inserted to the first level.
  • col_fill: Object, default'',If the column has more than one level, determines how other levels are named. If not, duplicate index name

Here are the operations:

import pandas as pd
df = pd.DataFrame({ 'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']})
df

df is a normal DataFrame structure:

Set column A as index column:

drop_t = df.set_index('A',drop=True, append=False, inplace=False, verify_integrity=False)
drop_t

# This allows you to access the elements by index and column names
drop_t.loc["A0":"A2"]


drop=False, so the original column will not be deleted

# drop=False, the original column will not be deleted
no_drop_t = df.set_index('A',drop=False, append=False, inplace=False, verify_integrity=False)
no_drop_t

The result is as follows: but this one doesn't use much.

Restore index: reset_index()

reset_drop_t = drop_t.reset_index(drop=False) #Index columns will be restored to normal columns
reset_drop_t

reset_no_drop_t = no_drop_t.reset_index(drop=True) #Index columns will be restored to normal columns
reset_no_drop_t

Back to its original form:

3. pandas.query

This is a string expression query based on the calculation algebra of DataFrame column. For filtering operations, you can use the query() method. Note that string is supported and boolean type is not supported. A function similar to query is eval, which uses string expression to calculate the operation on DataFrame. It supports arithmetic, comparison, bit, object index, inter column calculation, etc. both of these functions are high-performance functions, which can be sorted out later.

import pandas as pd
d={
    'name':['xiao','dan','qi'],
    'sex':['male','female','male'],
    'age':[23,24,24]
}
df=pd.DataFrame(d)

The results are as follows:

# df query('age'>23)  # This will report an error
df.query('age>23')

It is also similar to selecting qualified samples in pandas.

Well, today's sharing is three little knowledge points of pandas, and also some basic operations commonly used. It comes from one Competition of industrial chemical production forecast.

All from a practical point of view. Hope to help you!

94 original articles published, 121 praised, 20000 visitors+
Private letter follow

Posted on Sat, 14 Mar 2020 09:21:26 -0400 by y.t.