Hive使用DDL
Database
创建
删除
修改
使用
Table
创建
Hive创建表的执行步骤:
在hdfs下创建表目录
在元数据库mysql创建相应表的描述数据(元数据)内部表和外部表在drop时有不同的特性:
drop时,元数据都会被清除
drop时,内部表的表目录会被删除,但是外部表的表目录不会被删除。
外部表的使用场景:使用后数据不想被删除的情况使用外部表(推荐使用) 所以,整个数据仓库的最底层的表使用外部表。
Managed and External Tables
Managed tables
A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /user/hive/warehouse/databasename.db/tablename/
. The default location can be overridden by the location
property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
External tables
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
Example:
The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system.
Storage Formats
Storage Format
Description
STORED AS TEXTFILE
Stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting. Use the DELIMITED clause to read delimited files.
STORED AS SEQUENCEFILE
Stored as compressed Sequence File.
INPUTFORMAT and OUTPUTFORMAT
in the file_format to specify the name of a corresponding InputFormat and OutputFormat class as a string literal. For example, 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'.
Row Formats & SerDe
You can create tables with a custom SerDe or using a native SerDe. A native SerDe is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified. To change a table's SerDe or SERDEPROPERTIES, use the ALTER TABLE statement as described below in Add SerDe Properties.
Row Format
Description
RegEx
Stored as plain text file, translated by Regular Expression.
CSV/TSV
Stored as plain text file in CSV/TSV format.
Create Table As Select (CTAS)
CTAS has these restrictions:
The target table cannot be an external table.
The target table cannot be a list bucketing table.
Example:
The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2 etc. In addition, the new target table is created using a specific SerDe and a storage format independent of the source tables in the SELECT statement.
Create Table Like
The LIKE form of CREATE TABLE allows you to copy an existing table definition exactly (without copying its data). In contrast to CTAS, the statement below creates a new empty_key_value_store table whose definition exactly matches the existing key_value_store in all particulars other than table name. The new table contains no rows.
Temporary Tables
A table that has been created as a temporary table will only be visible to the current session. Data will be stored in the user's scratch directory, and deleted at the end of the session.
If a temporary table is created with a database/table name of a permanent table which already exists in the database, then within that session any references to that table will resolve to the temporary table, rather than to the permanent table. The user will not be able to access the original table within that session without either dropping the temporary table, or renaming it to a non-conflicting name.
Temporary tables have the following limitations:
Partition columns are not supported.
No support for creation of indexes.
Example:
删除
DROP TABLE removes metadata and data for this table. The data is actually moved to the .Trash/Current directory if Trash is configured (and PURGE is not specified). The metadata is completely lost.
When dropping an EXTERNAL table, data in the table will NOT be deleted from the file system.
Otherwise, the table information is removed from the metastore and the raw data is removed as if by 'hadoop dfs -rm'. In many cases, this results in the table data being moved into the user's .Trash folder in their home directory; users who mistakenly DROP TABLEs may thus be able to recover their lost data by recreating a table with the same schema, recreating any necessary partitions, and then moving the data back into place manually using Hadoop. This solution is subject to change over time or across installations as it relies on the underlying implementation; users are strongly encouraged not to drop tables capriciously.
Last updated
Was this helpful?