PostgreSQL数据库TableAM——HeapAM synchronized scan machinery
创始人
2024-05-07 10:19:34
0

当多个后端在同一个表上运行顺序扫描时,我们尝试使它们保持同步,以减少所需的总体I/O。目标是只将每个页面读入共享缓冲区缓存一次,并让参与共享扫描的所有后端在页面脱离缓存之前处理该页面。When multiple backends run a sequential scan on the same table, we try to keep them synchronized to reduce the overall I/O needed. The goal is to read each page into shared buffer cache only once, and let all backends that take part in the shared scan process the page before it falls out of the cache.

由于一组后端中的“领头羊”在进行seqscan时必须等待I/O,而“跟随者”则不需要,因此一旦我们可以让后端同时检查表的大致相同部分,就会产生强烈的自同步效果。因此,所有真正需要的是获得一个新的后端,开始seqscan,以接近其他后端正在读取的位置。我们可以循环扫描表,从块X到最后,然后从块0到X-1,以确保我们在仍然参与公共扫描的同时访问所有行。Since the “leader” in a pack of backends doing a seqscan will have to wait for I/O, while the “followers” don’t, there is a strong self-synchronizing effect once we can get the backends examining approximately the same part of the table at the same time. Hence all that is really needed is to get a new backend beginning a seqscan to begin it close to where other backends are reading. We can scan the table circularly, from block X up to the end and then from block 0 to X-1, to ensure we visit all rows while still participating in the common scan.

为了实现这一点,我们跟踪每个表的扫描位置,并在上一次扫描的位置附近开始新的扫描。我们不尝试进行任何额外的同步,以便在之后将扫描保持在一起;有些扫描的进度可能比其他扫描慢得多,例如,如果需要通过慢速网络将结果传输到客户端,并且我们不希望这样的查询减慢其他查询的速度。实际上,在任何时候都只能在不同的表上进行几次大规模的连续扫描。因此,我们只需将扫描位置保存在一个小的LRU列表中,每当我们需要查找或更新扫描位置时,我们都会扫描该列表。整个机制仅适用于超过阈值大小的表(但这不是本模块关注的问题)。To accomplish that, we keep track of the scan position of each table, and start new scans close to where the previous scan(s) are. We don’t try to do any extra synchronization to keep the scans together afterwards; some scans might progress much more slowly than others, for example if the results need to be transferred to the client over a slow network, and we don’t want such queries to slow down others. There can realistically only be a few large sequential scans on different tables in progress at any time. Therefore we just keep the scan positions in a small LRU list which we scan every time we need to look up or update a scan position. The whole mechanism is only applied for tables exceeding a threshold size (but that is not the concern of this module).

SYNC SCAN LRU list

ss_scan_locations_t结构体就是SYNC SCAN LRU,items就是存储SYNC_SCAN_NELEM个items的柔性数组。ss_scan_locations_t.head和.tail指向该LRU的头元素和尾元素。ss_lru_item_t组织为双向链表,其真正数据成员为ss_scan_location_t结构体。

typedef struct ss_scan_locations_t{ss_lru_item_t *head;ss_lru_item_t *tail;ss_lru_item_t items[FLEXIBLE_ARRAY_MEMBER]; /* SYNC_SCAN_NELEM items */
} ss_scan_locations_t;
typedef struct ss_lru_item_t
{struct ss_lru_item_t *prev;struct ss_lru_item_t *next;ss_scan_location_t location;
} ss_lru_item_t;
typedef struct ss_scan_location_t
{RelFileNode relfilenode;	/* identity of a relation */BlockNumber location;		/* last-reported location in the relation */
} ss_scan_location_t;

scan_locations指向共享内存中的SYNC SCAN LRU list。头指针指向scan_locations.items数组的头元素,尾指针指向scan_locations.items数组的尾元素。对每个数组成员进行初始化,并对ss_lru_item_t成员建立双向链表关联。

static ss_scan_locations_t *scan_locations; /* Pointer to struct in shared memory */
void SyncScanShmemInit(void) {bool		found;scan_locations = (ss_scan_locations_t *)ShmemInitStruct("Sync Scan Locations List",SizeOfScanLocations(SYNC_SCAN_NELEM),&found);if (!IsUnderPostmaster){/* Initialize shared memory area */scan_locations->head = &scan_locations->items[0];scan_locations->tail = &scan_locations->items[SYNC_SCAN_NELEM - 1];for (int i = 0; i < SYNC_SCAN_NELEM; i++){ss_lru_item_t *item = &scan_locations->items[i]; /* Initialize all slots with invalid values. As scans are started, these invalid entries will fall off the LRU list and get replaced with real entries. */item->location.relfilenode.spcNode = InvalidOid;item->location.relfilenode.dbNode = InvalidOid;item->location.relfilenode.relNode = InvalidOid;item->location.location = InvalidBlockNumber;item->prev = (i > 0) ? (&scan_locations->items[i - 1]) : NULL;item->next = (i < SYNC_SCAN_NELEM - 1) ? (&scan_locations->items[i + 1]) : NULL;}}else Assert(found);
}

ss_search函数在scan_locations结构中搜索具有给定relfilenode的条目。ss_search — search the scan_locations structure for an entry with the given relfilenode. 如果“set”为真,则位置将更新为给定位置。如果找不到给定relfilenode的条目,即使“set”为false,也将在列表的开头以给定位置创建该条目。在任何情况下,都会返回可能更新后的位置。调用者负责获取共享数据结构上的适当锁。If “set” is true, the location is updated to the given location. If no entry for the given relfilenode is found, it will be created at the head of the list with the given location, even if “set” is false. In any case, the location after possible update is returned. Caller is responsible for having acquired suitable lock on the shared data structure.

static BlockNumber ss_search(RelFileNode relfilenode, BlockNumber location, bool set) {ss_lru_item_t *item = scan_locations->head;for (;;) {bool		match = RelFileNodeEquals(item->location.relfilenode, relfilenode);if (match || item->next == NULL) {/* If we reached the end of list and no match was found, take over the last entry */if (!match){item->location.relfilenode = relfilenode;item->location.location = location;}else if (set)item->location.location = location;if (item != scan_locations->head){ /* Move the entry to the front of the LRU list *//* unlink */if (item == scan_locations->tail)scan_locations->tail = item->prev;item->prev->next = item->next;if (item->next)item->next->prev = item->prev;/* link */item->prev = NULL;item->next = scan_locations->head;scan_locations->head->prev = item;scan_locations->head = item;}return item->location.location;}item = item->next;}/* not reached */
}

ss_get_location

ss_get_location函数获取扫描的最佳起始位置。ss_get_location — get the optimal starting location for scan. 返回表上连续扫描的上次报告位置,如果未找到有效位置,则返回0。我们预计调用者刚刚完成了RelationGetNumberOfBlocks(),因此该数字将被传入,而不是再次计算。保证结果小于relnblocks(假设该值>0)。Returns the last-reported location of a sequential scan on the relation, or 0 if no valid location is found.We expect the caller has just done RelationGetNumberOfBlocks(), and so that number is passed in rather than computing it again. The result is guaranteed less than relnblocks (assuming that’s > 0).

BlockNumber ss_get_location(Relation rel, BlockNumber relnblocks) {LWLockAcquire(SyncScanLock, LW_EXCLUSIVE);BlockNumber startloc = ss_search(rel->rd_node, 0, false);LWLockRelease(SyncScanLock);/* If the location is not a valid block number for this scan, start at 0. This can happen if for instance a VACUUM truncated the table since the location was saved. */if (startloc >= relnblocks) startloc = 0;return startloc;
}

ss_report_location

ss_report_location函数更新当前扫描位置。ss_report_location — update the current scan location. 将条目写入表单的共享同步扫描状态(relfilenode,blocknumber),覆盖同一relfilenode的任何现有条目。Writes an entry into the shared Sync Scan state of the form (relfilenode, blocknumber), overwriting any existing entry for the same relfilenode.

void ss_report_location(Relation rel, BlockNumber location) {/* To reduce lock contention, only report scan progress every N pages. For the same reason, don't block if the lock isn't immediately available. Missing a few updates isn't critical, it just means that a new scan that wants to join the pack will start a little bit behind the head of the scan.  Hopefully the pages are still in OS cache and the scan catches up quickly. */if ((location % SYNC_SCAN_REPORT_INTERVAL) == 0){if (LWLockConditionalAcquire(SyncScanLock, LW_EXCLUSIVE)){(void) ss_search(rel->rd_node, location, true);LWLockRelease(SyncScanLock);}}
}

相关内容

热门资讯

实控人控制企业全额认购定增股票... 每经记者|于垚峰    每经编辑|董兴生     6月19日晚,泉峰汽车(6039...
那里有新破天一剑单机版完整啊 那里有新破天一剑单机版完整啊别傻了。。会这么容易出单机版?我都等了5年了···········
春立医疗2091万股限售股6月... 来源:中访网财观中访网数据  北京市春立正达医疗器械股份有限公司(以下简称“春立医疗”)公告,其首次...
构建“四维关爱矩阵”为新就业形... 转自:劳动午报 本报讯 (记者 刘欣欣) 记者近日从朝阳区奥运村街道总工会了解到,该街道总工会立足辖...
《当我足够好,才会遇见你》读后... 《当我足够好,才会遇见你》读后感一直会有人问,你怎么一直都单着,以前会觉得年龄尚小,后来遇见了你,才...
一个出色的悬案推理小说要具备哪... 一个出色的悬案推理小说要具备哪些基本要素?首先是小说要具备的要素,人物时间事情地点,那么推理小说还要...
我喜欢你有这个小说吗 我喜欢你有这个小说吗这是什么,没太看懂啊,亲~~叫《我喜欢你》的小说很多啊。 下列作者都写过:十月未...
在学校课堂学生被打,学校处理打... 在学校课堂学生被打,学校处理打人的学生,还需要被打的学生家长受权吗不需要。年满16周岁以上的学生打架...
我们如何被他人影响,以及如何有... 我们如何被他人影响,以及如何有效的影响别人我们的一生中都要被他人影响的,因为我们一生都要在,人群当中...
希望你心里有我 .英语怎么说 希望你心里有我 .英语怎么说看我的版本Ihopetherewillbeaplaceformeinsi...