Hudi Spark SQL Call Procedures学习总结(一)(查询统计表文件信息)
创始人
2024-04-12 08:30:17
0

前言

学习总结Hudi Spark SQL Call Procedures,Call Procedures在官网被称作存储过程(Stored Procedures),它是在Hudi 0.11.0版本由腾讯的ForwardXu大佬贡献的,它除了官网提到的几个Procedures外,还支持其他许多Procedures命令。本文先学习其中的几个我觉得比较常用的命令,主要是查询统计表路径下的各种文件信息。

版本

Hudi master 0.13.0-SNAPSHOT
Spark 3.1.2 (实际上所有Hudi支持的Spark版本都支持Call Procedures)
Kyuubi 1.5.2 (使用Kyuubi是因为返回结果可以展示列名,Spark自带的spark-sql不返回列名)

参数形式

按名称传递参数,没有顺序,可以省略可选参数

CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n)

按位置参数传递参数,有顺序,可以省略可选参数

CALL system.procedure_name(arg_1, arg_2, ... arg_n)

支持的Procedures命令

我们可以在类HoodieProcedures获取对应版本支持的所有的Procedures命令。目前支持如下:

show_fs_path_detail
show_bootstrap_partitions
repair_deduplicate
create_metadata_table
stats_file_sizes
validate_metadata_table_files
show_commit_partitions
show_commit_extra_metadata
show_table_properties
run_clustering
run_bootstrap
show_commit_files
run_clean
show_rollback_detail
rollback_to_savepoint
show_fsview_all
show_compaction
copy_to_temp_view
show_invalid_parquet
delete_savepoint
show_bootstrap_mapping
show_archived_commits
show_fsview_latest
show_metadata_table_files
export_instants
show_commits_metadata
rollback_to_instant
delete_metadata_table
delete_marker
show_metadata_table_stats
sync_validate
copy_to_table
show_savepoints
init_metadata_table
repair_overwrite_hoodie_props
show_metadata_table_partitions
show_logfile_records
downgrade_table
show_clustering
repair_migrate_partition_meta
show_rollbacks
show_logfile_metadata
upgrade_table
repair_add_partition_meta
hive_sync
commits_compare
hdfs_parquet_import
show_commit_write_stats
show_commits
show_archived_commits_metadata
run_compaction
create_savepoint
repair_corrupted_clean_files
stats_wa

具体的定义:

  private def initProcedureBuilders: Map[String, Supplier[ProcedureBuilder]] = {Map((RunCompactionProcedure.NAME, RunCompactionProcedure.builder),(ShowCompactionProcedure.NAME, ShowCompactionProcedure.builder),(CreateSavepointProcedure.NAME, CreateSavepointProcedure.builder),(DeleteSavepointProcedure.NAME, DeleteSavepointProcedure.builder),(RollbackToSavepointProcedure.NAME, RollbackToSavepointProcedure.builder),(RollbackToInstantTimeProcedure.NAME, RollbackToInstantTimeProcedure.builder),(RunClusteringProcedure.NAME, RunClusteringProcedure.builder),(ShowClusteringProcedure.NAME, ShowClusteringProcedure.builder),(ShowCommitsProcedure.NAME, ShowCommitsProcedure.builder),(ShowCommitsMetadataProcedure.NAME, ShowCommitsMetadataProcedure.builder),(ShowArchivedCommitsProcedure.NAME, ShowArchivedCommitsProcedure.builder),(ShowArchivedCommitsMetadataProcedure.NAME, ShowArchivedCommitsMetadataProcedure.builder),(ShowCommitFilesProcedure.NAME, ShowCommitFilesProcedure.builder),(ShowCommitPartitionsProcedure.NAME, ShowCommitPartitionsProcedure.builder),(ShowCommitWriteStatsProcedure.NAME, ShowCommitWriteStatsProcedure.builder),(CommitsCompareProcedure.NAME, CommitsCompareProcedure.builder),(ShowSavepointsProcedure.NAME, ShowSavepointsProcedure.builder),(DeleteMarkerProcedure.NAME, DeleteMarkerProcedure.builder),(ShowRollbacksProcedure.NAME, ShowRollbacksProcedure.builder),(ShowRollbackDetailProcedure.NAME, ShowRollbackDetailProcedure.builder),(ExportInstantsProcedure.NAME, ExportInstantsProcedure.builder),(ShowAllFileSystemViewProcedure.NAME, ShowAllFileSystemViewProcedure.builder),(ShowLatestFileSystemViewProcedure.NAME, ShowLatestFileSystemViewProcedure.builder),(ShowHoodieLogFileMetadataProcedure.NAME, ShowHoodieLogFileMetadataProcedure.builder),(ShowHoodieLogFileRecordsProcedure.NAME, ShowHoodieLogFileRecordsProcedure.builder),(StatsWriteAmplificationProcedure.NAME, StatsWriteAmplificationProcedure.builder),(StatsFileSizeProcedure.NAME, StatsFileSizeProcedure.builder),(HdfsParquetImportProcedure.NAME, HdfsParquetImportProcedure.builder),(RunBootstrapProcedure.NAME, RunBootstrapProcedure.builder),(ShowBootstrapMappingProcedure.NAME, ShowBootstrapMappingProcedure.builder),(ShowBootstrapPartitionsProcedure.NAME, ShowBootstrapPartitionsProcedure.builder),(UpgradeTableProcedure.NAME, UpgradeTableProcedure.builder),(DowngradeTableProcedure.NAME, DowngradeTableProcedure.builder),(ShowMetadataTableFilesProcedure.NAME, ShowMetadataTableFilesProcedure.builder),(ShowMetadataTablePartitionsProcedure.NAME, ShowMetadataTablePartitionsProcedure.builder),(CreateMetadataTableProcedure.NAME, CreateMetadataTableProcedure.builder),(DeleteMetadataTableProcedure.NAME, DeleteMetadataTableProcedure.builder),(InitMetadataTableProcedure.NAME, InitMetadataTableProcedure.builder),(ShowMetadataTableStatsProcedure.NAME, ShowMetadataTableStatsProcedure.builder),(ValidateMetadataTableFilesProcedure.NAME, ValidateMetadataTableFilesProcedure.builder),(ShowFsPathDetailProcedure.NAME, ShowFsPathDetailProcedure.builder),(CopyToTableProcedure.NAME, CopyToTableProcedure.builder),(RepairAddpartitionmetaProcedure.NAME, RepairAddpartitionmetaProcedure.builder),(RepairCorruptedCleanFilesProcedure.NAME, RepairCorruptedCleanFilesProcedure.builder),(RepairDeduplicateProcedure.NAME, RepairDeduplicateProcedure.builder),(RepairMigratePartitionMetaProcedure.NAME, RepairMigratePartitionMetaProcedure.builder),(RepairOverwriteHoodiePropsProcedure.NAME, RepairOverwriteHoodiePropsProcedure.builder),(RunCleanProcedure.NAME, RunCleanProcedure.builder),(ValidateHoodieSyncProcedure.NAME, ValidateHoodieSyncProcedure.builder),(ShowInvalidParquetProcedure.NAME, ShowInvalidParquetProcedure.builder),(HiveSyncProcedure.NAME, HiveSyncProcedure.builder),(CopyToTempView.NAME, CopyToTempView.builder),(ShowCommitExtraMetadataProcedure.NAME, ShowCommitExtraMetadataProcedure.builder),(ShowTablePropertiesProcedure.NAME, ShowTablePropertiesProcedure.builder))

打印代码

initProcedureBuilders.keySet.foreach(println)

建表造数

create table test_hudi_call_cow (id int,name string,price double,ts long,dt string
) using hudipartitioned by (dt)options (primaryKey = 'id',preCombineField = 'ts',type = 'cow');
insert into test_hudi_call_cow values (1,'hudi',10,100,'2021-05-05');
insert into test_hudi_call_cow values (2,'hudi',10,100,'2021-05-05');
insert into test_hudi_call_cow values (3,'hudi',10,100,'2021-05-05');
insert into test_hudi_call_cow values (4,'hudi',10,100,'2021-05-05');

show_table_properties

查看表的properties,以key,value的形式返回hoodie.properties中表的配置

参数

  • table 表名
  • path 表路径
  • limit 可选 默认值10
    table和path两个参数必须得有一个,table的优先级高于path,即如果同时指定table和path,那么以table为准,path不生效。
    输出返回字段:
    key,value

示例

call show_table_properties(table => 'test_hudi_call_cow');
call show_table_properties(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow');

默认展示前10条

+--------------------------------------------------+----------------+
|                       key                        |     value      |
+--------------------------------------------------+----------------+
| hoodie.table.precombine.field                    | ts             |
| hoodie.datasource.write.drop.partition.columns   | false          |
| hoodie.table.partition.fields                    | dt             |
| hoodie.table.type                                | COPY_ON_WRITE  |
| hoodie.archivelog.folder                         | archived       |
| hoodie.timeline.layout.version                   | 1              |
| hoodie.table.version                             | 5              |
| hoodie.table.recordkey.fields                    | id             |
| hoodie.table.metadata.partitions                 | files          |
| hoodie.datasource.write.partitionpath.urlencode  | false          |
+--------------------------------------------------+----------------+

可以通过设置limit,将limit值设置大一点,查看所有的配置

call show_table_properties(table => 'test_hudi_call_cow', limit => 100);
+--------------------------------------------------+----------------------------------------------------+
|                       key                        |                       value                        |
+--------------------------------------------------+----------------------------------------------------+
| hoodie.table.precombine.field                    | ts                                                 |
| hoodie.datasource.write.drop.partition.columns   | false                                              |
| hoodie.table.partition.fields                    | dt                                                 |
| hoodie.table.type                                | COPY_ON_WRITE                                      |
| hoodie.archivelog.folder                         | archived                                           |
| hoodie.timeline.layout.version                   | 1                                                  |
| hoodie.table.version                             | 5                                                  |
| hoodie.table.recordkey.fields                    | id                                                 |
| hoodie.table.metadata.partitions                 | files                                              |
| hoodie.datasource.write.partitionpath.urlencode  | false                                              |
| hoodie.database.name                             | hudi                                               |
| hoodie.table.name                                | test_hudi_call_cow                                 |
| hoodie.table.keygenerator.class                  | org.apache.hudi.keygen.SimpleKeyGenerator          |
| hoodie.datasource.write.hive_style_partitioning  | true                                               |
| hoodie.table.create.schema                       | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"_hoodie_commit_time","type":["string","null"]},{"name":"_hoodie_commit_seqno","type":["string","null"]},{"name":"_hoodie_record_key","type":["string","null"]},{"name":"_hoodie_partition_path","type":["string","null"]},{"name":"_hoodie_file_name","type":["string","null"]},{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]},{"name":"price","type":["double","null"]},{"name":"ts","type":["long","null"]},{"name":"dt","type":["string","null"]}]} |
| hoodie.table.checksum                            | 2721425243                                         |
| hoodie.allow.operation.metadata.field            | false                                              |
+--------------------------------------------------+----------------------------------------------------+

show_commits

参数

  • table 表名 必选
  • limit 默认值10 可选
    输出返回字段:
    commit_time,action,total_bytes_written,total_files_added,total_files_updated,total_partitions_written,total_records_written,total_update_records_written,total_errors

示例

call show_commits(table => 'test_hudi_call_cow');
call show_commits(table => 'test_hudi_call_cow', limit => 1);

|    commit_time     | action  | total_bytes_written  | total_files_added  | total_files_updated  | total_partitions_written  | total_records_written  | total_update_records_written  | total_errors  |
+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
| 20221123205701931  | commit  | 435308               | 0                  | 1                    | 1                         | 4                      | 0                             | 0             |
| 20221123205650038  | commit  | 435279               | 0                  | 1                    | 1                         | 3                      | 0                             | 0             |
| 20221123205636715  | commit  | 435246               | 0                  | 1                    | 1                         | 2                      | 0                             | 0             |
| 20221123205546254  | commit  | 435148               | 1                  | 0                    | 1                         | 1                      | 0                             | 0             |
+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+

show_commits_metadata

和show_commits功能差不多,不同的是输出字段不一样,和show_commits一样都是通过ShowCommitsProcedures实现的,区别是show_commits_metadata的includeExtraMetadata为true,show_commits的includeExtraMetadata为false

参数

  • table 表名 必选
  • limit 默认值10 可选
    输出返回字段:
    commit_time,action,partition,file_id,previous_commit,num_writes,num_inserts,num_deletes,num_update_writes,total_errors,total_log_blocks,total_corrupt_log_blocks,total_rollback_blocks,total_log_records, total_updated_records_compacted,total_bytes_written

示例

call show_commits_metadata(table => 'test_hudi_call_cow');
call show_commits_metadata(table => 'test_hudi_call_cow', limit => 1);
+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+
|    commit_time     | action  |   partition    |                 file_id                 |  previous_commit   | num_writes  | num_inserts  | num_deletes  | num_update_writes  | total_errors  | total_log_blocks  | total_corrupt_log_blocks  | total_rollback_blocks  | total_log_records  | total_updated_records_compacted  | total_bytes_written  |
+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+
| 20221123205701931  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205650038  | 4           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435308               |
| 20221123205650038  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205636715  | 3           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435279               |
| 20221123205636715  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205546254  | 2           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435246               |
| 20221123205546254  | commit  | dt=2021-05-05  | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | null               | 1           | 1            | 0            | 0                  | 0             | 0                 | 0                         | 0                      | 0                  | 0                                | 435148               |
+--------------------+---------+----------------+-----------------------------------------+--------------------+-------------+--------------+--------------+--------------------+---------------+-------------------+---------------------------+------------------------+--------------------+----------------------------------+----------------------+

show_commit_files

根据instantTime返回对应的文件信息,比如fileId

参数

  • table 表名 必选
  • instant_time 必选
  • limit 默认10 可选
    输出返回字段:
    action,partition_path,file_id,previous_commit,total_records_updated,total_records_written,total_bytes_written,total_errors,file_size

示例

call show_commit_files(table => 'test_hudi_call_cow', instant_time => '20221123205701931');

因为测试数据比较少件,且只有一个分区,,所以只有一个文件

+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
| action  | partition_path  |                 file_id                 |  previous_commit   | total_records_updated  | total_records_written  | total_bytes_written  | total_errors  | file_size  |
+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
| commit  | dt=2021-05-05   | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123205650038  | 0                      | 4                      | 435308               | 0             | 435308     |
+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+

执行下面的sql,使一次commit涉及两个文件

merge into test_hudi_call_cow as t0
using (select 5 as id, 'hudi' as name, 112 as price, 98 as ts, '2022-11-23' as dt,'INSERT' as opt_type unionselect 2 as id, 'hudi_2' as name, 10 as price, 100 as ts, '2021-05-05' as dt,'UPDATE' as opt_type unionselect 4 as id, 'hudi' as name, 10 as price, 100 as ts, '2021-05-05' as dt ,'DELETE' as opt_type) as s0
on t0.id = s0.id
when matched and opt_type!='DELETE' then update set *
when matched and opt_type='DELETE' then delete
when not matched and opt_type!='DELETE' then insert *;

先用show_commits查看最新的commit_time为20221123232449644

call show_commits(table => 'test_hudi_call_cow', limit => 1);+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
|    commit_time     | action  | total_bytes_written  | total_files_added  | total_files_updated  | total_partitions_written  | total_records_written  | total_update_records_written  | total_errors  |
+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+
| 20221123232449644  | commit  | 870474               | 1                  | 1                    | 2                         | 4                      | 1                             | 0             |
+--------------------+---------+----------------------+--------------------+----------------------+---------------------------+------------------------+-------------------------------+---------------+

然后再用show_commit_files看一下20221123232449644对应的文件

call show_commit_files(table => 'test_hudi_call_cow', instant_time => '20221123232449644');+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
| action  | partition_path  |                 file_id                 |  previous_commit   | total_records_updated  | total_records_written  | total_bytes_written  | total_errors  | file_size  |
+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+
| commit  | dt=2022-11-23   | 8f2aecfd-198f-405b-ab5d-46e0cc997d97-0  | null               | 0                      | 1                      | 435176               | 0             | 435176     |
| commit  | dt=2021-05-05   | 35b07424-6e63-4b65-9182-7c37cbe756b1-0  | 20221123231230786  | 1                      | 3                      | 435298               | 0             | 435298     |
+---------+-----------------+-----------------------------------------+--------------------+------------------------+------------------------+----------------------+---------------+------------+

show_commit_partitions

根据instantTime返回涉及的每个分区对应的文件和记录信息

参数

  • table 表名 必选
  • instant_time 必选
  • limit 默认10 可选
    输出返回字段:
    action,partition_path,total_files_added,total_files_updated,total_records_inserted,total_records_updated,total_bytes_written,total_errors

示例

call show_commit_partitions(table => 'test_hudi_call_cow', instant_time => '20221123232449644');
+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+
| action  | partition_path  | total_files_added  | total_files_updated  | total_records_inserted  | total_records_updated  | total_bytes_written  | total_errors  |
+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+
| commit  | dt=2022-11-23   | 1                  | 0                    | 1                       | 0                      | 435176               | 0             |
| commit  | dt=2021-05-05   | 0                  | 1                    | 0                       | 1                      | 435298               | 0             |
+---------+-----------------+--------------------+----------------------+-------------------------+------------------------+----------------------+---------------+

show_commit_write_stats

根据instantTime返回write_stats

参数:

  • table 表名 必选
  • instant_time 必选
  • limit 默认10 可选
    输出返回字段:
    action,total_bytes_written,total_records_written,avg_record_size

示例

call show_commit_write_stats(table => 'test_hudi_call_cow', instant_time => '20221123232449644');
+---------+----------------------+------------------------+------------------+
| action  | total_bytes_written  | total_records_written  | avg_record_size  |
+---------+----------------------+------------------------+------------------+
| commit  | 870474               | 4                      | 217619           |
+---------+----------------------+------------------------+------------------+

show_commit_extra_metadata

返回.commit、.deltacommit、.replacecommit中的extraMetadata

参数

  • table 表名 必选
  • instant_time 可选
  • limit 默认100 可选
  • metadata_key 可选 如schema
    输出返回字段:
    instant_time,action,metadata_key,metadata_value

默认返回最后一个commit文件中的extraMetadata,如果指定了instant_time,那么返回指定instant_time对应的commit文件中的extraMetadata

先看一下.commit里的内容,可以看到里面有一个extraMetadata,并且包含一个key:schema以及schema对应的value

hadoop fs -cat hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/.hoodie/20221123232449644.commit{"partitionToWriteStats" : {"dt=2022-11-23" : [ {"fileId" : "8f2aecfd-198f-405b-ab5d-46e0cc997d97-0","path" : "dt=2022-11-23/8f2aecfd-198f-405b-ab5d-46e0cc997d97-0_1-238-2983_20221123232449644.parquet","cdcStats" : null,"prevCommit" : "null","numWrites" : 1,"numDeletes" : 0,"numUpdateWrites" : 0,"numInserts" : 1,"totalWriteBytes" : 435176,"totalWriteErrors" : 0,"tempPath" : null,"partitionPath" : "dt=2022-11-23","totalLogRecords" : 0,"totalLogFilesCompacted" : 0,"totalLogSizeCompacted" : 0,"totalUpdatedRecordsCompacted" : 0,"totalLogBlocks" : 0,"totalCorruptLogBlock" : 0,"totalRollbackBlocks" : 0,"fileSizeInBytes" : 435176,"minEventTime" : null,"maxEventTime" : null,"runtimeStats" : {"totalScanTime" : 0,"totalUpsertTime" : 0,"totalCreateTime" : 1013}} ],"dt=2021-05-05" : [ {"fileId" : "35b07424-6e63-4b65-9182-7c37cbe756b1-0","path" : "dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-238-2982_20221123232449644.parquet","cdcStats" : null,"prevCommit" : "20221123231230786","numWrites" : 3,"numDeletes" : 1,"numUpdateWrites" : 1,"numInserts" : 0,"totalWriteBytes" : 435298,"totalWriteErrors" : 0,"tempPath" : null,"partitionPath" : "dt=2021-05-05","totalLogRecords" : 0,"totalLogFilesCompacted" : 0,"totalLogSizeCompacted" : 0,"totalUpdatedRecordsCompacted" : 0,"totalLogBlocks" : 0,"totalCorruptLogBlock" : 0,"totalRollbackBlocks" : 0,"fileSizeInBytes" : 435298,"minEventTime" : null,"maxEventTime" : null,"runtimeStats" : {"totalScanTime" : 0,"totalUpsertTime" : 4162,"totalCreateTime" : 0}} ]},"compacted" : false,"extraMetadata" : {"schema" : "{\"type\":\"record\",\"name\":\"test_hudi_call_cow_record\",\"namespace\":\"hoodie.test_hudi_call_cow\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"price\",\"type\":\"double\"},{\"name\":\"ts\",\"type\":\"long\"},{\"name\":\"dt\",\"type\":\"string\"}]}"},"operationType" : "UPSERT"
}

示例

call show_commit_extra_metadata(table => 'test_hudi_call_cow');
+--------------------+---------+---------------+----------------------------------------------------+
|    instant_time    | action  | metadata_key  |                   metadata_value                   |
+--------------------+---------+---------------+----------------------------------------------------+
| 20221123232449644  | commit  | schema        | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"},{"name":"dt","type":"string"}]} |
+--------------------+---------+---------------+----------------------------------------------------+
call show_commit_extra_metadata(table => 'test_hudi_call_cow', instant_time => '20221123205701931', metadata_key => 'schema', limit => 10);
+--------------------+---------+---------------+----------------------------------------------------+
|    instant_time    | action  | metadata_key  |                   metadata_value                   |
+--------------------+---------+---------------+----------------------------------------------------+
| 20221123205701931  | commit  | schema        | {"type":"record","name":"test_hudi_call_cow_record","namespace":"hoodie.test_hudi_call_cow","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"},{"name":"dt","type":"string"}]} |
+--------------------+---------+---------------+----------------------------------------------------+

我目前已知的extraMetadata中只有schema,且只有一条,所以只返回一条记录,且指定不指定metadata_key效果一样,不确定是否还有其他的extraMetadata

show_fs_path_detail

展示指定路径下面的文件和路径的统计信息,默认按照文件大小进行排序
返回.commit、.deltacommit、.replacecommit中的extraMetadata

参数

  • path 表文件路径 必选
  • is_sub 可选 是否查询子目录,只查询一级子目录 默认false
  • sort 可选 是否按文件大小排序 默认true
  • metadata_key 可选 如schema
    输出返回字段:
    instant_time,action,metadata_key,metadata_value

示例

call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow');+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
| path_num  | file_num  | storage_size  | storage_size(unit)  |                    storage_path                    | space_consumed  |  quota   | space_quota  |
+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
| 22        | 53        | 3200109       | 3.05MB              | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow | -1              | 9600327  | -1           |
+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+

查询一级子目录

call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow', is_sub => true);+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
| path_num  | file_num  | storage_size  | storage_size(unit)  |                    storage_path                    | space_consumed  |  quota   | space_quota  |
+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
| 1         | 7         | 2611728       | 2.49MB              | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05 | -1              | 7835184  | -1           |
| 1         | 2         | 435272        | 425.07KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2022-11-23 | -1              | 1305816  | -1           |
| 19        | 44        | 153109        | 149.52KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/.hoodie | -1              | 459327   | -1           |
+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+

查询二级子目录

call show_fs_path_detail(path => 'hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05', is_sub => true);+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
| path_num  | file_num  | storage_size  | storage_size(unit)  |                    storage_path                    | space_consumed  |  quota   | space_quota  |
+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+
| 0         | 1         | 435353        | 425.15KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-192-1549_20221123231230786.parquet | -1              | 1306059  | -1           |
| 0         | 1         | 435308        | 425.11KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-147-118_20221123205701931.parquet | -1              | 1305924  | -1           |
| 0         | 1         | 435298        | 425.10KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-238-2982_20221123232449644.parquet | -1              | 1305894  | -1           |
| 0         | 1         | 435279        | 425.08KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-105-83_20221123205650038.parquet | -1              | 1305837  | -1           |
| 0         | 1         | 435246        | 425.04KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-66-52_20221123205636715.parquet | -1              | 1305738  | -1           |
| 0         | 1         | 435148        | 424.95KB            | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/35b07424-6e63-4b65-9182-7c37cbe756b1-0_0-27-21_20221123205546254.parquet | -1              | 1305444  | -1           |
| 0         | 1         | 96            | 96B                 | hdfs://cluster1/warehouse/tablespace/managed/hive/hudi.db/test_hudi_call_cow/dt=2021-05-05/.hoodie_partition_metadata | -1              | 288      | -1           |
+-----------+-----------+---------------+---------------------+----------------------------------------------------+-----------------+----------+--------------+

相关阅读

  • Apache Hudi 入门学习总结
  • Hudi Spark SQL总结
  • Spark3.12+Kyuubi1.5.2+kyuubi-spark-authz源码编译打包+部署配置HA
  • Hudi Spark SQL源码学习总结-Create Table
  • Hudi Spark SQL源码学习总结-CTAS
  • Hudi Spark源码学习总结-df.write.format(“hudi”).save
  • Hudi Spark源码学习总结-spark.read.format(“hudi”).load
  • Hudi Spark源码学习总结-spark.read.format(“hudi”).load(2)
  • Hudi Spark SQL源码学习总结-select(查询)
  • Hudi源码 | Insert源码分析总结(二)(WorkloadProfile)
  • 开源经验分享 | 如何从一名小白成为Apache Hudi Contributor

相关内容

热门资讯

人的一生就是为了,传宗接代吗? 人的一生就是为了,传宗接代吗?那当然不是,每个人都有自己的价值观,如果你有这个想法,证明你被他们影响...
洛克王国诙谐神殿怎么打啊,我有... 洛克王国诙谐神殿怎么打啊,我有70的罗隐和51的少林咕咕,和45的音速,还有一只44的火神,这样能过...
手机屏幕上怎么弄出字 手机屏幕上怎么弄出字手机屏幕上怎么弄出字手机屏幕上怎么弄出字:长按手机桌面,然后点击窗口小工具或小组...
跪求花开伊吕波的结局是怎么样的... 跪求花开伊吕波的结局是怎么样的!真的是得了白血病吗?是的话!KIR了编剧楼主哪里得到的消息? 花开...
苏格兰民歌 一路平安 苏格兰民歌 一路平安还没听过,不好意思,帮不了你
谁能帮我起一个好听的英文名(要... 谁能帮我起一个好听的英文名(要中文)男女?女 我比较喜欢--joy(乔伊)、Renee(瑞妮)、Ma...
寻高手对下联~ 寻高手对下联~琵琶琴瑟八大王 王王在上魑魅魍魉四小鬼,鬼鬼靠边!魑魅魍魉四小鬼,鬼鬼靠边魑魅魍魉四...
《一位母亲与家长会》的3道阅读... 《一位母亲与家长会》的3道阅读题1.我没有原文。对不起。2.因为母亲在鼓励她的孩子,她要使自己的孩子...
深深的喜欢等于爱吗? 深深的喜欢等于爱吗?喜欢不等于爱。但当你站在你喜欢的人面前,你只感到开心但当你与你喜欢的人四目交投,...
《人性的弱点》一书作者是谁? 《人性的弱点》一书作者是谁?《人性的弱点》·作者:(美)戴尔·卡耐基文名:DaleCarnegie戴...
浪漫一生的英文怎么写 浪漫一生的英文怎么写a so long liferomance all one's lifeRoma...
古代什么词可以指代美女? 古代什么词可以指代美女?谢谢!一楼的你说的不对吧?我说的是指代,不是形容啊。形容我也会,倾国倾城,冰...
以前有看过部分小说是《神雕侠侣... 以前有看过部分小说是《神雕侠侣》后面的,不知是那部小说? 想问问?不是,是别人续写的一部小说?只是记...
在工作中学习到什么?” 在工作中学习到什么?”工作中可以学到与本职工作相关的技术、技巧。了解工作的流程。以及本工作的重点及注...
《西游记》中女妖怪有不少,其中... 《西游记》中女妖怪有不少,其中最可怜的女妖怪是谁?中女妖怪有不少,其中最可怜的女妖怪是白骨夫人最可怜...
英文名字‘爱丽儿’的英文到底怎... 英文名字‘爱丽儿’的英文到底怎么写?!Alier Ariel作为英文名字,这两个里哪个更好?先谢谢...
武术在实战中有用吗 武术在实战中有用吗真的打架能不能那么帅?有用,一个会武术的人和一个会武术的人格斗,就要使用一些较为复...
关于模拟人生3夜店人生 关于模拟人生3夜店人生1你的Net Framework版本过低,去安装 一个最新版的4.0的2这个问...
吴启华版倚天屠龙记张无忌哪一集... 吴启华版倚天屠龙记张无忌哪一集上的武当山吴启华版倚天屠龙记张无忌哪一集上的武当山,就是扮成个小道童,...
《三国演义》战长沙的时候,如果... 《三国演义》战长沙的时候,如果关羽的拖刀计用全了,能够斩杀黄忠吗?我认为是可以斩杀黄忠的,因为当时黄...