Differences between Hive internal and external tables

Difference between internal table and external table

  • Tables that are not modified by external are managed table s, and tables that are modified by external are external table s;
  • The data of the internal table is managed by Hive itself, and the external table is managed by HDFS;
  • The data storage location of the internal table is hive.metastore.warehouse.dir (default: / user/hive/warehouse). The data storage location of the external table is specified by itself (if there is no LOCATION),Hive will create a folder with the table name of the external table under the / user/hive/warehouse / folder on HDFS, and store the data belonging to this table here);
  • Deleting internal tables will directly delete metadata and store data. Deleting external tables will only delete metadata. Files on HDFS will not be deleted. Hive creates internal tables by default.
  • Changes to internal tables will be synchronized directly to metadata, while changes to the table structure and partition of external tables will be modified and repaired.
MSCK REPAIR TABLE table_name

Create internal table

create table test.t1(
   id    int
  ,name  string
  ,hobby array<string>
  ,add   map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ":";


Time taken: 0.112 seconds
hive> 
    > 
    > 
    > create table test.t1(
    >    id    int
    >   ,name  string
    >   ,hobby array<string>
    >   ,add   map<string,string>
    > )
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '_'
    > map keys terminated by ":";
OK
Time taken: 0.056 seconds
hive> 

View table description

Time taken: 0.056 seconds
hive> desc test.t1;
OK
col_name        data_type       comment
id                      int                                         
name                    string                                      
hobby                   array<string>                               
add                     map<string,string>                          
Time taken: 0.048 seconds, Fetched: 4 row(s)
hive> 

Loading data

Note: generally, insert (not insert overwrite) statements are rarely used, because MapReduce will be called even if a statement is inserted. Here we choose the Load Data method.

Raw data
1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin

//Loading data
load data local inpath '/opt/software/gouyang/5.txt' overwrite into table test.t1;
                        
Time taken: 0.048 seconds, Fetched: 4 row(s)
hive> load data local inpath '/opt/software/gouyang/5.txt' overwrite into table test.t1;
Loading data to table test.t1
Table test.t1 stats: [numFiles=1, numRows=0, totalSize=147, rawDataSize=0]
OK
Time taken: 0.253 seconds
hive> select * from test.t1;
OK
t1.id   t1.name t1.hobby        t1.add
1       xiaoming        ["book-TV-code"]        {"beijing":"chaoyang-shagnhai:pudong"}
2       lilei   ["book-code"]   {"nanjing":"jiangning-taiwan:taibei"}
3       lihua   ["music-book"]  {"heilongjiang":"haerbin"}
Time taken: 0.058 seconds, Fetched: 3 row(s)
hive> 

Tags: Big Data hive

Posted on Thu, 19 Mar 2020 11:47:10 -0400 by Jyotsna