PostgreSQL数据库TableAM——HeapAM synchronized scan machinery
创始人
2024-05-07 10:19:34
0

当多个后端在同一个表上运行顺序扫描时,我们尝试使它们保持同步,以减少所需的总体I/O。目标是只将每个页面读入共享缓冲区缓存一次,并让参与共享扫描的所有后端在页面脱离缓存之前处理该页面。When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. The goal is to read each page into shared buffer cache only once, and let all backends that take part in the shared scan process the page before it falls out of the cache.

由于一组后端中的“领头羊”在进行seqscan时必须等待I/O,而“跟随者”则不需要,因此一旦我们可以让后端同时检查表的大致相同部分,就会产生强烈的自同步效果。因此,所有真正需要的是获得一个新的后端,开始seqscan,以接近其他后端正在读取的位置。我们可以循环扫描表,从块X到最后,然后从块0到X-1,以确保我们在仍然参与公共扫描的同时访问所有行。Since the “leader” in a pack of backends doing a seqscan will have to wait for I/O, while the “followers” don’t, there is a strong self-synchronizing effect once we can get the backends examining approximately the same part of the table at the same time. Hence all that is really needed is to get a new backend beginning a seqscan to begin it close to where other backends are reading. We can scan the table circularly, from block X up to the end and then from block 0 to X-1, to ensure we visit all rows while still participating in the common scan.

为了实现这一点,我们跟踪每个表的扫描位置,并在上一次扫描的位置附近开始新的扫描。我们不尝试进行任何额外的同步,以便在之后将扫描保持在一起;有些扫描的进度可能比其他扫描慢得多,例如,如果需要通过慢速网络将结果传输到客户端,并且我们不希望这样的查询减慢其他查询的速度。实际上,在任何时候都只能在不同的表上进行几次大规模的连续扫描。因此,我们只需将扫描位置保存在一个小的LRU列表中,每当我们需要查找或更新扫描位置时,我们都会扫描该列表。整个机制仅适用于超过阈值大小的表(但这不是本模块关注的问题)。To accomplish that, we keep track of the scan position of each table, and start new scans close to where the previous scan(s) are. We don’t try to do any extra synchronization to keep the scans together afterwards; some scans might progress much more slowly than others, for example if the results need to be transferred to the client over a slow network, and we don’t want such queries to slow down others. There can realistically only be a few large sequential scans on different tables in progress at any time. Therefore we just keep the scan positions in a small LRU list which we scan every time we need to look up or update a scan position. The whole mechanism is only applied for tables exceeding a threshold size (but that is not the concern of this module).

SYNC SCAN LRU list

ss_scan_locations_t结构体就是SYNC SCAN LRU,items就是存储SYNC_SCAN_NELEM个items的柔性数组。ss_scan_locations_t.head和.tail指向该LRU的头元素和尾元素。ss_lru_item_t组织为双向链表,其真正数据成员为ss_scan_location_t结构体。

typedef struct ss_scan_locations_t{ss_lru_item_t *head;ss_lru_item_t *tail;ss_lru_item_t items[FLEXIBLE_ARRAY_MEMBER]; /* SYNC_SCAN_NELEM items */
} ss_scan_locations_t;
typedef struct ss_lru_item_t
{struct ss_lru_item_t *prev;struct ss_lru_item_t *next;ss_scan_location_t location;
} ss_lru_item_t;
typedef struct ss_scan_location_t
{RelFileNode relfilenode;	/* identity of a relation */BlockNumber location;		/* last-reported location in the relation */
} ss_scan_location_t;

scan_locations指向共享内存中的SYNC SCAN LRU list。头指针指向scan_locations.items数组的头元素,尾指针指向scan_locations.items数组的尾元素。对每个数组成员进行初始化,并对ss_lru_item_t成员建立双向链表关联。

static ss_scan_locations_t *scan_locations; /* Pointer to struct in shared memory */
void SyncScanShmemInit(void) {bool		found;scan_locations = (ss_scan_locations_t *)ShmemInitStruct("Sync Scan Locations List",SizeOfScanLocations(SYNC_SCAN_NELEM),&found);if (!IsUnderPostmaster){/* Initialize shared memory area */scan_locations->head = &scan_locations->items[0];scan_locations->tail = &scan_locations->items[SYNC_SCAN_NELEM - 1];for (int i = 0; i < SYNC_SCAN_NELEM; i++){ss_lru_item_t *item = &scan_locations->items[i]; /* Initialize all slots with invalid values. As scans are started, these invalid entries will fall off the LRU list and get replaced with real entries. */item->location.relfilenode.spcNode = InvalidOid;item->location.relfilenode.dbNode = InvalidOid;item->location.relfilenode.relNode = InvalidOid;item->location.location = InvalidBlockNumber;item->prev = (i > 0) ? (&scan_locations->items[i - 1]) : NULL;item->next = (i < SYNC_SCAN_NELEM - 1) ? (&scan_locations->items[i + 1]) : NULL;}}else Assert(found);
}

ss_search函数在scan_locations结构中搜索具有给定relfilenode的条目。ss_search — search the scan_locations structure for an entry with the given relfilenode. 如果“set”为真,则位置将更新为给定位置。如果找不到给定relfilenode的条目,即使“set”为false,也将在列表的开头以给定位置创建该条目。在任何情况下,都会返回可能更新后的位置。调用者负责获取共享数据结构上的适当锁。If “set” is true, the location is updated to the given location. If no entry for the given relfilenode is found, it will be created at the head of the list with the given location, even if “set” is false. In any case, the location after possible update is returned. Caller is responsible for having acquired suitable lock on the shared data structure.

static BlockNumber ss_search(RelFileNode relfilenode, BlockNumber location, bool set) {ss_lru_item_t *item = scan_locations->head;for (;;) {bool		match = RelFileNodeEquals(item->location.relfilenode, relfilenode);if (match || item->next == NULL) {/* If we reached the end of list and no match was found, take over the last entry */if (!match){item->location.relfilenode = relfilenode;item->location.location = location;}else if (set)item->location.location = location;if (item != scan_locations->head){ /* Move the entry to the front of the LRU list *//* unlink */if (item == scan_locations->tail)scan_locations->tail = item->prev;item->prev->next = item->next;if (item->next)item->next->prev = item->prev;/* link */item->prev = NULL;item->next = scan_locations->head;scan_locations->head->prev = item;scan_locations->head = item;}return item->location.location;}item = item->next;}/* not reached */
}

ss_get_location

ss_get_location函数获取扫描的最佳起始位置。ss_get_location — get the optimal starting location for scan. 返回表上连续扫描的上次报告位置,如果未找到有效位置,则返回0。我们预计调用者刚刚完成了RelationGetNumberOfBlocks(),因此该数字将被传入,而不是再次计算。保证结果小于relnblocks(假设该值>0)。Returns the last-reported location of a sequential scan on the relation, or 0 if no valid location is found.We expect the caller has just done RelationGetNumberOfBlocks(), and so that number is passed in rather than computing it again. The result is guaranteed less than relnblocks (assuming that’s > 0).

BlockNumber ss_get_location(Relation rel, BlockNumber relnblocks) {LWLockAcquire(SyncScanLock, LW_EXCLUSIVE);BlockNumber startloc = ss_search(rel->rd_node, 0, false);LWLockRelease(SyncScanLock);/* If the location is not a valid block number for this scan, start at 0. This can happen if for instance a VACUUM truncated the table since the location was saved. */if (startloc >= relnblocks) startloc = 0;return startloc;
}

ss_report_location

ss_report_location函数更新当前扫描位置。ss_report_location — update the current scan location. 将条目写入表单的共享同步扫描状态(relfilenode,blocknumber),覆盖同一relfilenode的任何现有条目。Writes an entry into the shared Sync Scan state of the form (relfilenode, blocknumber), overwriting any existing entry for the same relfilenode.

void ss_report_location(Relation rel, BlockNumber location) {/* To reduce lock contention, only report scan progress every N pages. For the same reason, don't block if the lock isn't immediately available. Missing a few updates isn't critical, it just means that a new scan that wants to join the pack will start a little bit behind the head of the scan.  Hopefully the pages are still in OS cache and the scan catches up quickly. */if ((location % SYNC_SCAN_REPORT_INTERVAL) == 0){if (LWLockConditionalAcquire(SyncScanLock, LW_EXCLUSIVE)){(void) ss_search(rel->rd_node, location, true);LWLockRelease(SyncScanLock);}}
}

相关内容

热门资讯

【深度学习笔记】CNN网络各种... FLOPs 这里先注意一下FLOPs的写法,不要弄混了: FLOPS(全大写):是floating...
Linux查看log日志命令总... 目录1,动态实时查看日志1.1 tail -f filename1.2 追踪特定内容日...
Elasticsearch 索... 1、创建\更新索引模板 PUT _template/logging_template {   "in...
语义分割------FCN、d... 一、个人理解 语义分割,其实就是为图片中的每个像素打上相应的标签,即将其所代表的语义具现化,呈现出的...
C++标准模版库中线程的使用 文章目录线程的基本使用最基础的使用方法在创建线程时传参再看看join线程与线程之间的同步 线程是程序...
CMMI之度量与分析(MA) 目的度量与分析(Measurement and Analysis, MA...
TII投稿时间线 IEEE论文审稿状态由awaiting ae recommendation变为AwaitingRev...
C++11中的完美转发 C++11中的完美转发 在讨论引用折叠这个话题之前,先回顾一下C...
Python调用lua 可以使用 Python 的第三方库 pylua 来调用 Lua 代码。 首先,需要在 ...
【手写 Promise 源码】... 一,前言 目前,以下专栏已完结(初版完结,持...