PostgreSQL数据库TableAM——HeapAM synchronized scan machinery
创始人
2024-05-07 10:19:34
0

当多个后端在同一个表上运行顺序扫描时,我们尝试使它们保持同步,以减少所需的总体I/O。目标是只将每个页面读入共享缓冲区缓存一次,并让参与共享扫描的所有后端在页面脱离缓存之前处理该页面。When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. The goal is to read each page into shared buffer cache only once, and let all backends that take part in the shared scan process the page before it falls out of the cache.

由于一组后端中的“领头羊”在进行seqscan时必须等待I/O,而“跟随者”则不需要,因此一旦我们可以让后端同时检查表的大致相同部分,就会产生强烈的自同步效果。因此,所有真正需要的是获得一个新的后端,开始seqscan,以接近其他后端正在读取的位置。我们可以循环扫描表,从块X到最后,然后从块0到X-1,以确保我们在仍然参与公共扫描的同时访问所有行。Since the “leader” in a pack of backends doing a seqscan will have to wait for I/O, while the “followers” don’t, there is a strong self-synchronizing effect once we can get the backends examining approximately the same part of the table at the same time. Hence all that is really needed is to get a new backend beginning a seqscan to begin it close to where other backends are reading. We can scan the table circularly, from block X up to the end and then from block 0 to X-1, to ensure we visit all rows while still participating in the common scan.

为了实现这一点,我们跟踪每个表的扫描位置,并在上一次扫描的位置附近开始新的扫描。我们不尝试进行任何额外的同步,以便在之后将扫描保持在一起;有些扫描的进度可能比其他扫描慢得多,例如,如果需要通过慢速网络将结果传输到客户端,并且我们不希望这样的查询减慢其他查询的速度。实际上,在任何时候都只能在不同的表上进行几次大规模的连续扫描。因此,我们只需将扫描位置保存在一个小的LRU列表中,每当我们需要查找或更新扫描位置时,我们都会扫描该列表。整个机制仅适用于超过阈值大小的表(但这不是本模块关注的问题)。To accomplish that, we keep track of the scan position of each table, and start new scans close to where the previous scan(s) are. We don’t try to do any extra synchronization to keep the scans together afterwards; some scans might progress much more slowly than others, for example if the results need to be transferred to the client over a slow network, and we don’t want such queries to slow down others. There can realistically only be a few large sequential scans on different tables in progress at any time. Therefore we just keep the scan positions in a small LRU list which we scan every time we need to look up or update a scan position. The whole mechanism is only applied for tables exceeding a threshold size (but that is not the concern of this module).

SYNC SCAN LRU list

ss_scan_locations_t结构体就是SYNC SCAN LRU,items就是存储SYNC_SCAN_NELEM个items的柔性数组。ss_scan_locations_t.head和.tail指向该LRU的头元素和尾元素。ss_lru_item_t组织为双向链表,其真正数据成员为ss_scan_location_t结构体。

typedef struct ss_scan_locations_t{ss_lru_item_t *head;ss_lru_item_t *tail;ss_lru_item_t items[FLEXIBLE_ARRAY_MEMBER]; /* SYNC_SCAN_NELEM items */
} ss_scan_locations_t;
typedef struct ss_lru_item_t
{struct ss_lru_item_t *prev;struct ss_lru_item_t *next;ss_scan_location_t location;
} ss_lru_item_t;
typedef struct ss_scan_location_t
{RelFileNode relfilenode;	/* identity of a relation */BlockNumber location;		/* last-reported location in the relation */
} ss_scan_location_t;

scan_locations指向共享内存中的SYNC SCAN LRU list。头指针指向scan_locations.items数组的头元素,尾指针指向scan_locations.items数组的尾元素。对每个数组成员进行初始化,并对ss_lru_item_t成员建立双向链表关联。

static ss_scan_locations_t *scan_locations; /* Pointer to struct in shared memory */
void SyncScanShmemInit(void) {bool		found;scan_locations = (ss_scan_locations_t *)ShmemInitStruct("Sync Scan Locations List",SizeOfScanLocations(SYNC_SCAN_NELEM),&found);if (!IsUnderPostmaster){/* Initialize shared memory area */scan_locations->head = &scan_locations->items[0];scan_locations->tail = &scan_locations->items[SYNC_SCAN_NELEM - 1];for (int i = 0; i < SYNC_SCAN_NELEM; i++){ss_lru_item_t *item = &scan_locations->items[i]; /* Initialize all slots with invalid values. As scans are started, these invalid entries will fall off the LRU list and get replaced with real entries. */item->location.relfilenode.spcNode = InvalidOid;item->location.relfilenode.dbNode = InvalidOid;item->location.relfilenode.relNode = InvalidOid;item->location.location = InvalidBlockNumber;item->prev = (i > 0) ? (&scan_locations->items[i - 1]) : NULL;item->next = (i < SYNC_SCAN_NELEM - 1) ? (&scan_locations->items[i + 1]) : NULL;}}else Assert(found);
}

ss_search函数在scan_locations结构中搜索具有给定relfilenode的条目。ss_search — search the scan_locations structure for an entry with the given relfilenode. 如果“set”为真,则位置将更新为给定位置。如果找不到给定relfilenode的条目,即使“set”为false,也将在列表的开头以给定位置创建该条目。在任何情况下,都会返回可能更新后的位置。调用者负责获取共享数据结构上的适当锁。If “set” is true, the location is updated to the given location. If no entry for the given relfilenode is found, it will be created at the head of the list with the given location, even if “set” is false. In any case, the location after possible update is returned. Caller is responsible for having acquired suitable lock on the shared data structure.

static BlockNumber ss_search(RelFileNode relfilenode, BlockNumber location, bool set) {ss_lru_item_t *item = scan_locations->head;for (;;) {bool		match = RelFileNodeEquals(item->location.relfilenode, relfilenode);if (match || item->next == NULL) {/* If we reached the end of list and no match was found, take over the last entry */if (!match){item->location.relfilenode = relfilenode;item->location.location = location;}else if (set)item->location.location = location;if (item != scan_locations->head){ /* Move the entry to the front of the LRU list *//* unlink */if (item == scan_locations->tail)scan_locations->tail = item->prev;item->prev->next = item->next;if (item->next)item->next->prev = item->prev;/* link */item->prev = NULL;item->next = scan_locations->head;scan_locations->head->prev = item;scan_locations->head = item;}return item->location.location;}item = item->next;}/* not reached */
}

ss_get_location

ss_get_location函数获取扫描的最佳起始位置。ss_get_location — get the optimal starting location for scan. 返回表上连续扫描的上次报告位置,如果未找到有效位置,则返回0。我们预计调用者刚刚完成了RelationGetNumberOfBlocks(),因此该数字将被传入,而不是再次计算。保证结果小于relnblocks(假设该值>0)。Returns the last-reported location of a sequential scan on the relation, or 0 if no valid location is found.We expect the caller has just done RelationGetNumberOfBlocks(), and so that number is passed in rather than computing it again. The result is guaranteed less than relnblocks (assuming that’s > 0).

BlockNumber ss_get_location(Relation rel, BlockNumber relnblocks) {LWLockAcquire(SyncScanLock, LW_EXCLUSIVE);BlockNumber startloc = ss_search(rel->rd_node, 0, false);LWLockRelease(SyncScanLock);/* If the location is not a valid block number for this scan, start at 0. This can happen if for instance a VACUUM truncated the table since the location was saved. */if (startloc >= relnblocks) startloc = 0;return startloc;
}

ss_report_location

ss_report_location函数更新当前扫描位置。ss_report_location — update the current scan location. 将条目写入表单的共享同步扫描状态(relfilenode,blocknumber),覆盖同一relfilenode的任何现有条目。Writes an entry into the shared Sync Scan state of the form (relfilenode, blocknumber), overwriting any existing entry for the same relfilenode.

void ss_report_location(Relation rel, BlockNumber location) {/* To reduce lock contention, only report scan progress every N pages. For the same reason, don't block if the lock isn't immediately available. Missing a few updates isn't critical, it just means that a new scan that wants to join the pack will start a little bit behind the head of the scan.  Hopefully the pages are still in OS cache and the scan catches up quickly. */if ((location % SYNC_SCAN_REPORT_INTERVAL) == 0){if (LWLockConditionalAcquire(SyncScanLock, LW_EXCLUSIVE)){(void) ss_search(rel->rd_node, location, true);LWLockRelease(SyncScanLock);}}
}

相关内容

热门资讯

加拿大马尼托巴省博内湖附近山火... 总台记者获悉,加拿大曼尼托巴省博内湖(Lac du Bonnet)附近失控的山火迅速蔓延,近千人被迫...
天士力医药集团股份有限公司20... 证券代码:600535 证券简称:天士力 编号:临2025-042号天士力医药集团股份有限公司202...
银川:“再生水”解发展之“渴” 转自:光明日报  5月14日上午,在宁夏银川市民大厅,当地3家新材料龙头企业,在中国水权交易所与银川...
【光明时评】“小切口”立法 助... 转自:光明日报  【光明时评】  近日,《四川省促进川菜发展条例》通过,明确将川菜产业协同、人才培养...
春风化雨,育桃李芬芳 转自:光明日报  【一线讲述】  我是河南人,2003年9月经全国招聘到海口市属重点中学——海南华侨...
城投鹏基相关公司新增一项134... (转自:快查一企业中标了)快查APP显示,城投鹏基相关公司克拉玛依市中奥城投城市服务有限公司于202...
中国昆明至越南河内国际道路运输... 转自:云南日报整装待发5月14日,中越(昆明—河内)国际道路运输开通活动发车仪式在昆明综合保税区举行...
芬兰开发出以脂肪酸为溶剂提取银... 转自:光明日报新华社赫尔辛基5月13日电(记者朱昊晨、徐谦)芬兰赫尔辛基大学与于韦斯屈莱大学联合研发...
湘潭电机股份有限公司关于公司涉... 证券代码:600416 证券简称:湘电股份 公告编号:2025临-037湘潭电机股份有限公司关于公...
货币金融政策先行 助力稳市场稳... 转自:光明日报  日前,我国推出一揽子金融政策支持稳市场稳预期。其中,人民银行通过数量型、价格型等货...
露笑科技股份有限公司第六届董事... 证券代码:002617 证券简称:露笑科技 公告编号:2025-026露笑科技股份有限公司第六届董...
上海华鑫股份有限公司关于董事离... 证券代码:600621 证券简称:华鑫股份 编号:临2025-017上海华鑫股份有限公司关于董事离...
思科第三财季营收141.5亿美... 转自:财联社【思科第三财季营收141.5亿美元 高于市场预期】财联社5月15日电,思科第三财季营收1...
延续百年的艺术生命 转自:光明日报    布勒东(1896—1966)资料图片    马格利特作品《人类的境况》资料图片...
崇义章源钨业股份有限公司关于参... 证券代码:002378 证券简称:章源钨业 公告编号:2025-030崇义章源钨业股份有限公司关于...
成都市新筑路桥机械股份有限公司... 证券代码:002480 证券简称:新筑股份 公告编号:2025-037成都市新筑路桥机械股份有限公...
中粮糖业控股股份有限公司关于参... 证券代码:600737 证券简称:中粮糖业 公告编号:2025-014中粮糖业控股股份有限公司关于...
获格莱美四项提名歌手参加我是歌... 【#获格莱美四项提名歌手参加我是歌手# #3名国际歌手谈参加我是歌手#】5月14日,#歌手首发阵容官...
教育部禁止中小学生用AI写作业... 【#教育部禁止中小学生用AI写作业##教育部给学生的AI依赖立规矩#】5月12日,教育部基础教育教学...
中国建筑全资子公司新增一项41... (转自:快查一企业中标了)快查APP显示,中国建筑相关公司中国建筑一局(集团)有限公司于2025年5...