Conversation
In pursuit of removing a Valgrind-detected leak, I inserted "pfree(pq_mq_handle);" into mq_putmessage's recursion-trouble-recovery code path, failing to notice that shm_mq_detach would have pfree'd that block just before (i.e., this particular code path did not leak). So now that was a double pfree. We didn't notice because the recursion scenario isn't exercised in our regression tests, but Alexander Lakhin found it via code fuzzing. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/b8b40954-e155-41b3-9af8-ad4f261a1b64@gmail.com
pgxmlNodeSetToText() passed nodeTab[i]->doc to xmlNodeDump() without checking the node type, which could cause a crash as a XML_NAMESPACE_DECL maps to a xmlNs struct. The passed-in code would then be dereferenced in xmlNodeDump(). This commit switches the code to render XML_NAMESPACE_DECL nodes with xmlXPathCastNodeToString(), like xpath_table(). Some tests are added, written by me. Author: Andrey Chernyy <andrey.cherny@tantorlabs.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20260611031436.5afde3cb@andrnote Backpatch-through: 14
The subscription option max_retention_duration accepts an integer value representing a timeout in milliseconds, where zero means unlimited retention (no timeout). Negative values have no useful meaning, but were silently accepted and stored in the subscription catalog. A negative value causes should_stop_conflict_info_retention() to always return true, because TimestampDifferenceExceeds() treats a negative threshold as already exceeded. This stops dead tuple retention immediately rather than honoring the configured timeout. Fix by rejecting negative values for max_retention_duration during CREATE SUBSCRIPTION and ALTER SUBSCRIPTION. Author: Chao Li <lic@highgo.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/9232401A-DEEE-49E1-9D11-D14A776DB82B@gmail.com
During promotion, there is a window where RecoveryInProgress() returns true but the WAL segments of the old timeline have already been removed. A logical decoding could pick up the old timeline in this window when reading a page, failing with the following error: ERROR: requested WAL segment ... has already been removed This issue does not lead to any data correctness issue, as retrying to decode the data works in follow-up decoding attempts. It impacts availability, though. Other WAL page read callbacks have a similar issue, this commit takes care of what should be the noisiest code path: logical decoding with START_REPLICATION in a WAL sender. A TAP test, based on an injection point waiting in the startup process after the segments have been removed/recycled, is added. This part is backpatched down to v17. This issue has been causing sporadic failures in the buildfarm, and was reproducible manually. This issue happens since logical decoding on standbys exists, down to v16. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/7daef094-abf3-4672-bc23-3df4763b16a3@gmail.com Backpatch-through: 16
When printing the upper boundary of a seg interval, seg_out() decided
whether to emit the certainty indicator ('<', '>' or '~') by testing the
upper indicator (u_ext) for '<' and '>', but mistakenly tested the lower
indicator (l_ext) for '~'. This is a copy-and-paste slip from the
symmetric code that prints the lower boundary a few lines above.
The consequences for valid input were:
* A '~' on the upper boundary was dropped on output, e.g.
'1.5 .. ~2.5'::seg printed as '1.5 .. 2.5'.
* When the lower boundary carried '~' but the upper boundary had no
indicator, the wrong test matched and sprintf(p, "%c", seg->u_ext)
wrote a NUL byte (u_ext == '\0'), which truncated the result string
and silently lost the entire upper boundary, e.g.
'~6.5 .. 8.5'::seg printed as '~6.5 .. '.
Certainty indicators are documented to be preserved on output (they are
ignored by the operators, but kept as comments), so this broke the
input/output round-trip for the affected values.
The bug has existed since seg was added. It went unnoticed because the
existing regression tests only exercised certainty indicators on
single-point segs, which are printed by a different branch of seg_out().
Add tests that place indicators on both boundaries of an interval.
Author: Ewan Young <kdbase.hack@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAON2xHPYeRRCEVAv8XfE18KsEsEHCiYcJ5fOsoxFuMEfpxF1=g@mail.gmail.com
Backpatch-through: 14
When parsing expressions like (old).colname and (old).* in a RETURNING list, the parser would lose track of the intended varreturningtype, and therefore return incorrect results. The root cause was code using GetNSItemByRangeTablePosn() to find a namespace item from its rtindex and levelsup, without taking into account returningtype, which would return the wrong namespace item. Fix by adding a new function GetNSItemByVar() that does take returningtype into account. Backpatch to v18, where support for RETURNING OLD/NEW was added. Bug: #19516 Reported-by: Marko Grujic <markoog@gmail.com> Author: Marko Grujic <markoog@gmail.com> Suggested-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAOvwyF2cO_5mAt=w=y-dFnaG5UkZ+3H8nSDoKF_iuWZHsU2ARg@mail.gmail.com Backpatch-through: 18
transformJsonParseArg() was not careful enough on generation of transformed expressions when starting from expressions that are not coercible to text but are in the string type category: it failed to verify that coerce_to_target_type() succeeds, and returned a NULL pointer. This leads to a later NULL dereference and crash at executor time. This escaped noticed because it cannot happen for built-in types, all of which have casts to text. Only user-created types are potentially problematic. Fix by raising an error when a cast to text doesn't exist. This mistake came in with commit 6ee3020. Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reported-by: Chi Zhang <798604270@qq.com> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Backpatch-through: 16 Discussion: https://postgr.es/m/19491-7aafc221ec63f288@postgresql.org
In a few places, we were constructing translatable strings consisting of elements list by adding one element at a time and separately a comma. This is not great from a translation point of view, so rewrite to append the comma together with the corresponding element in one go. Author: Peter Smith <smithpb2250@gmail.com> Author: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAHut+Pvp7jYcaiZ3pXedXgLcWZWDBLXFUK05JtZpGv3Mj=UOjw@mail.gmail.com
Since this is trying to add a SharedInvalRelSyncMsg rather than a SharedInvalRelcacheMsg, it should use rs rather than rc. This makes no difference as things stand, because the two structure definitions are identical (except for the capitalization of "relid"), but it's still a good idea to fix it. Co-authored-by: Stolpovskikh Danil <d.stolpovskikh@ftdata.ru> Co-authored-by: Robert Haas <rhaas@postgresql.org> Discussion: http://postgr.es/m/bd6a5735b72b4afe99af49c3c62901d6@localhost.localdomain
MD5 authentication warnings are queued during authentication, before startup options and role/database settings have been applied. The code checked md5_password_warnings at queue time, so settings such as ALTER ROLE ... SET md5_password_warnings = off did not suppress the warning, even though the established session showed the GUC as off. Keep the connection-warning infrastructure generic by allowing each queued warning to carry an optional filter callback. Evaluate that callback when warnings are emitted, after startup options and role/database settings have been processed. Use this for MD5 authentication warnings, while leaving password expiration warnings unchanged. Add test coverage for an MD5-authenticated role with md5_password_warnings disabled. Author: Chao Li <lic@highgo.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/AE46E42D-5966-4D76-9E64-95EAB01B9FB5@gmail.com
When amcheck validates that a B-Tree metapage's allequalimage flag matches _bt_allequalimage(), it could fail to report corruption unless one of the index key columns used interval_ops. As a result, pg_amcheck could silently miss this corruption on other opclasses, incorrectly reporting the index as valid. The mistake was that bt_index_check_callback() kept ereport(ERROR) inside the loop that scans key attributes for INTERVAL_BTREE_FAM_OID, even though that loop is only needed to decide whether to add the interval-specific hint. This commit moves ereport() out of the loop so allequalimage mismatches are always reported, while still emitting the hint for affected interval indexes. Back-patch to v18, where d70b176 introduced this regression while moving the check into bt_index_check_callback(). Author: Chao Li <lic@highgo.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/011ACC9C-CB87-4160-ACE7-4ED57AB86E15@gmail.com Backpatch-through: 18
xpath() attempted to call xmlCopyNode() and xmlNodeDump() on a
XML_NAMESPACE_DECL, finishing with a confusing error:
=# SELECT xpath('//namespace::foo', '<root xmlns:foo="http://127.0.0.1"/>');
ERROR: 53200: could not copy node
CONTEXT: SQL function "xpath" statement 1
xpath() is changed so as it goes through xmlXPathCastNodeToString()
instead, that is able to handle namespace nodes. xml2 uses the same
solution. This issue has been discovered while digging into
9d33a5a.
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aioT7ui_ZJ9RMlfM@paquier.xyz
Backpatch-through: 14
The FK fast-path batching added in b7b27eb wrote the incoming row into the batch array before checking whether the array was full: fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot); fpentry->batch_count++; if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE) ri_FastPathBatchFlush(fpentry, fk_rel, riinfo); batch_count is reset to zero only at the end of ri_FastPathBatchFlush(), so it remains at RI_FASTPATH_BATCH_SIZE throughout a full-batch flush. A flush runs user-defined cast functions and equality operators; if that user code performs DML on the same FK table, ri_FastPathBatchAdd() re-enters with batch_count == RI_FASTPATH_BATCH_SIZE and writes one past the end of the array, corrupting the adjacent batch_count field. This is reachable by an unprivileged table owner via an implicit cast with a PL/pgSQL function and causes a SIGSEGV in assert-enabled builds. Fix by bounds-checking the write into the batch array so a re-entrant add can never write past the end, and by adding a "flushing" flag to RI_FastPathEntry that routes re-entrant ri_FastPathBatchAdd() calls on a busy entry to the per-row path (ri_FastPathCheck) instead of touching the mid-flush batch array. The flag is set around the probe in ri_FastPathBatchFlush() and cleared in a PG_FINALLY, which also resets batch_count, so the entry is left empty and reusable if a flush error (including a reported FK violation) is caught by a savepoint. Add regression tests for both the re-entrant flush and reuse of an entry after a flush error caught by a savepoint. Reported-by: Nikolay Samokhvalov <nik@postgres.ai> Reviewed-by: Nikolay Samokhvalov <nik@postgres.ai> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
Commit a70bce4 added instructions on how to recover if PostgreSQL refuses to issue new transaction IDs because of imminent wraparound, but when describing how to find replication slots that should be dropped, it referred to pg_stat_replication where it should have referenced pg_replication_slots. In passing, decorate references to views with <structname> tags. Backpatch to all supported versions. Reported-By: Sanjaya Waruna <sanjaya.waruna@gmail.com> Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/176767268098.1084085.10345048667224193115@wrigleys.postgresql.org Backpatch-through: 14
The FK fast-path batching added in b7b27eb buffers rows in a transaction-lived cache (ri_fastpath_cache) keyed by constraint OID. Running user-defined cast and equality functions during a batch flush, together with the cache's lifetime and iteration, exposed two defects reachable by an unprivileged table owner. First, on subtransaction abort ri_FastPathSubXactCallback discarded the entire cache. An entry's batch holds rows buffered by the enclosing transaction, not just the aborting subxact -- the cache is keyed by constraint, so a single entry can mix rows from multiple subxact levels. An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer transaction without running their FK checks, letting orphan rows commit behind a constraint that still reported itself valid. The discard also left relations opened by the batch unclosed, producing "resource was not closed" warnings. Second, ri_FastPathEndBatch flushes by iterating the cache with hash_seq_search. If flush-time user code inserts into a different fast-path FK table, a new entry is added to the cache mid-scan; it may land in a bucket the scan has already passed and never be reached, and ri_FastPathTeardown then destroys the cache without flushing it, silently dropping that check. Cleanly unwinding the cache on subxact abort would require tracking the originating subxact of each buffered row, since rows from different levels share an entry (the cache is keyed by constraint) and deferred constraints cannot be flushed early at a subxact boundary. Rather than add that bookkeeping, confine batching to the top transaction level: in RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the per-row fast path (ri_FastPathCheck) instead of buffering. Rows checked inside a subtransaction are then verified immediately and roll back cleanly with their subtransaction, and the cache only ever holds top-level rows. With the cache confined to the top level, a subtransaction abort has nothing of its own to discard, so ri_FastPathSubXactCallback is removed along with its registration. For the second defect, add a cache-wide flag (ri_fastpath_flushing) set while ri_FastPathEndBatch iterates the cache. A re-entrant FK check arriving while the flag is set takes the per-row path rather than adding an entry to the cache being scanned, so no entry can be missed and torn down unflushed. The flag is cleared in a PG_FINALLY so a flush that throws (a reported violation or an error from user code) does not leave it stuck. As defensive insurance it is also cleared in ri_FastPathXactCallback() at transaction end. The per-row fast path still bypasses SPI and stays well ahead of the pre-19 SPI-based check. A fuller fix that preserves batching across subtransactions -- whether by tracking the originating subxact of each buffered row or by per-subxact cache stacks merged into the parent on commit -- is left for a future release. The subtransaction-abort case is covered by a new regression test. The mid-scan cross-table case depends on hash bucket placement and so is not reliably reproducible in a portable test, but the flag prevents it by construction. Reported-by: Nikolay Samokhvalov <nik@postgres.ai> Reviewed-by: Nikolay Samokhvalov <nik@postgres.ai> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
read_local_xlog_page_guts has the same race as logical_read_xlog_page: RecoveryInProgress() can return true during promotion, impacting the availability of the operations doing WAL page reads with this callback. This problem is similar to eb4e722 that has addressed the issue for logical replication, impacting more areas of the code where this WAL page callback can be used (same narrow window during promotion, same availability issue): - pg_walinspect. - Slot advance (SQL function). - Slot creation. Repack workers (v19~) and 2PC files (since forever) can also use this callback, but they are irrelevant as far as I know. A test is added with the SQL lookup functions. This part relies on injection points, and is backpatched down to v18, like the test added for eb4e722. This issue could probably be fixed as well in v14 and v15 for pg_walinspect. However, I also feel that there is a conservative argument about consistency here due to the support of logical decoding on standbys, so let's limit ourselves to v16 for now. pg_walinspect is used less in the field compared to the two other operations, making addressing this problem less attractive in these two older branches. Reported-by: Xuneng Zhou <xunengzhou@gmail.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/7daef094-abf3-4672-bc23-3df4763b16a3%40gmail.com Backpatch-through: 16
This one has been forgotten in 8bf257a. Per report from buildfarm member massasauga. Backpatch-through: 14
Similar to commit 3692a62, for a slightly different code pattern in psql. No backpatch to avoid disrupting translation in stable branches. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/airjxKXx7aTG8kfE@alvherre.pgsql
Commit 0e1f1ed taught seg_out() to print the certainty indicator on an interval's upper boundary, but it was back-patched only as far as v14. When upgrading from an older release, the old server prints the one test_seg row exercising that case ('4.6 .. ~7.0') without the indicator, so the pre- and post-upgrade dumps do not match. Make AdjustUpgrade.pm delete just that row; seg's comparison function does distinguish the certainty indicators, so the otherwise identical row '4.6 .. 7.0' is unaffected. Back-patch to all supported branches. Per buildfarm members crake and fairywren. Discussion: https://postgr.es/m/5ccbdbde-6467-4a10-bf4d-0be73a05ce8d@dunslane.net
bt_normalize_tuple() uses VARSIZE() to get the size of varlena, even though it's not yet known, that it has a 4-byte header. Fix this by replacing a accessor with a universal VARSIZE_ANY(). Backpatch to all supported versions. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/7ckc7oka4bvafkf5bwlqs6ygrhlsbhz25ppozfch7zbuxcx3rf%40e4pr4oqenalc Author: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Backpatch-through: 14
Since the remote column names of a foreign table could be longer than NAMEDATALEN, remattrmap_cmp(), which compares such column names, should have used strcmp(), not strncmp() with n=NAMEDATALEN. Author: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com> Discussion: https://postgr.es/m/81D981EB-ECC1-495D-8EAC-5CFB67B2CF77%40gmail.com
Commit 2f70fdb removed the deprecated containment operator ~(aclitem[],aclitem) from the catalogs, but missed removing its entry from the documentation. (Arguably the blame should fall on c62dd80, which added this entry in contravention of the longstanding policy that we don't document deprecated aliases in the first place.) Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAOzEurQSyR5psWukyhUz1LtxyO55C2Vfp0Fmt8w2jGKxhszQmQ@mail.gmail.com Backpatch-through: 14
This commit reduces the number of expected output files for the "xml" test from three to two (well, mostly one, see below for details). xml_2.out existed to handle some differences in output due to libxml2 2.9.3, due to some error context missing (085423e). This file is removed, by tweaking the XML inputs to trigger the same error patterns for the problematic 2.9.3 and other libxml2 versions. This part is authored by Tom Lane. xml_1.out (no libxml2 support) is reduced in size by adding an \if query that exits the test early. This still checks NO_XML_SUPPORT() through xmlin(). The rest of the test is skipped if XML input cannot be handled by the backend. This part has been written by me. Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/aiu6CXO67q-s70n5@paquier.xyz Backpatch-through: 14
If a query has more than 7498 params, the ParameterDescription message exceeds the 30000 byte limit on messages that are not specifically marked as possibly being longer than that (VALID_LONG_MESSAGE_TYPE). To fix, add ParameterDescription to the list. Author: Ning Sun <classicning@gmail.com> Discussion: https://www.postgresql.org/message-id/dbfb4b65-0aa8-470a-8b87-b6496160b28a@gmail.com Backpatch-through: 14
Late-model clang complains that these functions should be labeled with "format(printf, 2, 3)", and it's right. But let's go a bit further and also make use of varargs, to remove duplication and allow these functions to be used with non-integer input values. Since no good deed goes unpunished, I had to also adjust a couple of call sites. They weren't wrong as-is, since the size_t-sized arguments were coerced to int on the way into diag3(). But without that, we have to adjust the format strings. The point of this is to suppress compiler warnings, so back-patch into branches containing pg_bsd_indent, even though there's no functional change. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/1645041.1781283554@sss.pgh.pa.us Backpatch-through: 16
The syntax "tablename *" has been obsolete for years, but we want to retain it and its documentation for backward compatibility reasons. However, the documentation wording was confusing and could be understood to mean that "tablename *" is the same as "ONLY tablename". Reported-by: Jochen Bandhauer <jochen.bandhauer@gmx.net> Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/178125831604.1285960.8250607197280951685@wrigleys.postgresql.org
Presently, the "Prev" link on the page for background workers sends you to the middle of the previous chapter instead of the actual previous page. This appears to be caused by a libxml2 bug, but regardless, a minimal fix is to change the link generation code to use [position()=last()] instead of [last()] in the predicate on the union of reverse axes. Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/aim4AZorFKaC7Wrf%40nathan Backpatch-through: 14
Our handling of quoting within replication commands was pretty
sloppy, typically looking like
appendStringInfo(&cmd, " SLOT \"%s\"", options->slotname);
This is fine as long as options->slotname doesn't contain a double
quote mark, but what if it does? In principle this'd allow injection
of harmful options into replication commands, in the probably-unlikely
case that a slot name comes from untrustworthy input. We ought to
clean that up.
Moreover, even the places that were trying to be more careful
generally got it wrong, because they used quoting subroutines
intended for SQL commands rather than something that will work
with the replication-command scanner repl_scanner.l. For example,
several places naively use PQescapeLiteral() to quote option values
for replication commands. If the string contains a backslash,
PQescapeLiteral() will produce E'...' literal syntax, which
repl_scanner.l doesn't recognize. Another near miss was to use
quote_identifier() to quote identifiers. That function won't quote
valid lowercase identifiers unless they match SQL keywords ... but in
this context, replication keywords are what matter. Neither of these
errors seem to risk string injection, but they definitely can cause
syntax errors in replication commands that ought to be valid.
We can clean all this up by using simple quoting logic that just
doubles single or double quotes respectively.
Or at least, we could if repl_scanner.l handled doubled double quotes
in identifiers, but for some reason it doesn't! So the first step in
this fix has to be to fix that. (The fact that we'll later reject
slot names containing double quotes is very far short of justifying
this omission.)
Having done that, this patch runs around and applies correct
quoting in all places that generate replication commands containing
strings coming from outside the immediate context. Probably some
of these places are safe because of restrictions elsewhere, but it
seems best to just quote all the time.
This was originally reported as a security bug, which it could be
if replication slot names or parameters were to originate from
untrustworthy sources. But the security team concluded that that
was a very improbable situation, so we're just going to fix this
as a regular bug.
Reported-by: Team Dhiutsa
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/1648659.1781287310@sss.pgh.pa.us
Backpatch-through: 14
Attempting to restore a schema, a table or an index with --only-statistics skipped all the statistics of the objects wanted. Like for pg_dump, statistics should be included, so this created an assymetry between dump and restore. A second set of problems existed for --table and --index, where the presence of --statistics skipped the restore of the stats of the object(s) targetted. This issue has been reported originally as related to an inconsistency with the way extended stats restore is handled in Postgres v19, but the issue is related to the restore of relation and attribute statistics in v18. Some TAP tests are added to cover all these cases. Reported-by: Chao Li <li.evan.chao@gmail.com> Author: Chao Li <li.evan.chao@gmail.com> Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/66E80CAB-527C-42B1-BB65-3F82CF4AD998@gmail.com Backpatch-through: 18
The schema_only_with_statistics test scenario was referenced in 002_pg_dump.pl, but was associated to no command sequence since 0ed92cf. Issue discovered while investigating a different bug. Perhaps this cleanup is not worth backpatching, but there is also an argument in favor of reducing noise when touching this area of the code in stable branches. Reviewed-by: Ewan Young <kdbase.hack@gmail.com> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/ai-y0S7Z25NlrG_n@paquier.xyz Backpatch-through: 18
The first "if (difffile)" block initializes the startpos variable, and the second "if (difffile)" block reads it. The second if-condition can only be true when the first one was true, so the startpos variable is always initialized when it's used. However, the compiler might not be able to deduce that, and warn about startpos being used uninitialized. To silence the warning, rearrange the if-checks. Also, bail out if the diff file cannot be opened, instead of ignoring it silently. Author: Mikhail Litsarev <m.litsarev@postgrespro.ru> Reviewed-by; Ewan Young <kdbase.hack@gmail.com> Discussion: https://www.postgresql.org/message-id/ee06f058c626cd37babd8c81579ffb1e@postgrespro.ru
Several calls of pgstat_count_io_op_time() have been used as data to count negative values returned by pg_pread() or pg_pwrite(), leading to an incorrect count reported, casting them back to uint64. Most of the problematic calls updated here are adjusted so as we do not report buggy negative numbers anymore. In xlogrecovery.c, the spot updated still counts short reads. In xlog.c, after a WAL segment initialization, I/O numbers are aggregated only after checking that the operation has succeeded. issues introduced by a051e71. Reported-by: Peter Eisentraut <peter@eisentraut.org> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/0db864e6-4477-4eba-b2be-d3523cc86564@eisentraut.org Backpatch-through: 18
The current form of the catalog query picks up partitioned tables with expression indexes that lack statistics. However, since such indexes never have statistics, there's no point in analyzing them. To fix, adjust the relevant part of the query to skip partitioned tables with expression indexes. While at it, remove the nearby stainherit check; entries for index expressions always have stainherit = false. Author: Baji Shaik <baji.pgdev@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CA%2Bfm-RPE1tEc6CUUPDyRbYTz9tF5Kw47nnk-Zq%3DyYvanbsxyCQ%40mail.gmail.com Backpatch-through: 18
Add check_stack_depth() to Jsonb_to_SV, SV_to_JsonbValue, PLyObject_FromJsonbContainer, and PLyObject_ToJsonbValue. Without this, deeply nested JSONB values can crash the backend with SIGSEGV instead of raising a proper error. Also add CHECK_FOR_INTERRUPTS() to the while loop in SV_to_JsonbValue that dereferences chains of Perl references, so that a circular reference (e.g. $x = \$x) can be cancelled by the user instead of spinning indefinitely. (We looked at detecting such circular references, but it seems more trouble than it's worth.) Author: Aleksander Alekseev <aleksander@tigerdata.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAJ7c6TPbjkzUk4qJ5dHvDNEz0hBuFue3A-XWz_=897z+BC+z8A@mail.gmail.com Backpatch-through: 14
If the call count test fails, you'll reasonably want to know what the network trace looked like, but that information is currently swallowed. Print it out instead. Backpatch-through: 18
When debugging an OAuth trace, it's helpful to know what version of Curl is in use. The SSL library that Curl is using (which may not be the one in use by libpq) is also relevant, and it's just as easy to get, so print that too. This is being added post-feature-freeze, with RMT approval, in order to fix some tests in the face of an upstream Curl regression. A subsequent commit will make use of it in oauth_validator. Backpatch to 18 as well. Tested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAOYmi%2B%3DkP86t%2BZFFXNQ9G6K4ht7utdmB%3DCzhP%3DZ2wvuBymOTtQ%40mail.gmail.com Backpatch-through: 18
The call-count test in 001_server.pl runs into a recent upstream
regression in Curl:
curl/curl#21547
The symptom is high CPU usage on some platforms during OAuth HTTP
requests. But it looks like the fix is on track for a June 2026 release,
as part of Curl 8.21.0, so just skip the test if we happen to be using
the broken version.
Reported-by: Andrew Dunstan <andrew@dunslane.net>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOYmi%2B%3DyrwMSsHuNJ1V14isA4iSix5Xb3P3VEp1X0BS61MdV4A%40mail.gmail.com
Backpatch-through: 18
Previously, when retrieving the old Subscription object, constructing the conninfo could encounter an error during ForeignServerConnectionString(). ACL errors were handled properly, but other errors could interfere with a user fixing the problem with ALTER SUBSCRIPTION. Reported-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/D908370F-2695-4231-851D-17179A6A6F2A@gmail.com
Dumps will fail if standard_conforming_strings = off. Reported-by: Tom Lane Backpatch-through: 1131492.1781717349@sss.pgh.pa.us
Commit 16a0039 reduced the lock level for ALTER DOMAIN ... VALIDATE CONSTRAINT from ShareLock to ShareUpdateExclusiveLock. However, that change was unsafe. If DML on tables using the domain had already started and initialized domain constraint checks before a NOT VALID constraint was added, it could still insert or update rows that violated the new constraint. This commit reverts commit 16a0039 so that ALTER DOMAIN ... VALIDATE CONSTRAINT once again acquires ShareLock on relations using the domain. Also add an isolation test covering this case. Author: Chao Li <lic@highgo.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/463C0E1A-4A40-4BCA-839C-9236B80D65EE@gmail.com
Reported-by: Fujii Masao Discussion: https://postgr.es/m/CAHGQGwGjfu5n5H-jsMp6fAsf-zMJzYWDcJGti6pmr7SHf9BcpA@mail.gmail.com
The EXCEPT clause of a FOR ALL TABLES publication tracks each excluded table by its identity rather than by name. As a result, renaming a table or moving it to another schema with ALTER TABLE ... SET SCHEMA leaves the exclusion in place, and the table stays excluded from the publication. This behavior was not previously documented and could surprise users who might reasonably expect a schema-qualified exclusion to apply only while the table remains in that schema. Add a note to CREATE PUBLICATION to make the behavior explicit. Author: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAHut+PvQ5BqnawCQd6r1tqqd+iAJC-CuRY8wscuXSrpHGUzofA@mail.gmail.com
pgstat_drop_entry_internal() generates an ERROR if facing a pgstats entry already marked as dropped. With a workload doing a lot of concurrent CALL and DROP/CREATE PROCEDURE, it could be possible for AtEOXact_PgStat_DroppedStats(), that wants to do transactional drops, to find entries that are already dropped, after a commit record has been written. In this case, ERRORs are upgraded to PANIC, taking down the server. This issue is fixed by making pgstat_drop_entry() optionally more tolerant to concurrent drops, adding to the routine a missing_ok option to make some of its callers more tolerant (spoiler: some of the callers want a strict behavior, like replication slots and backend stats). pgstat_drop_entry_internal() cannot be called anymore for an entry marked as dropped, hence its error is replaced by an assertion. Functions are handled as a special case in core; this problem could also apply to custom stats kinds depending on what an extension does. track_functions is costly when enabled (disabled by default), which is perhaps the main reason why this has not be found yet. A similar version of this patch has been proposed by Sami Imseih on a different thread for a feature in development. This version has tweaked here by me for the sake of fixing this issue. Reported-by: zhanglihui <zlh21343@163.com> Author: Sami Imseih <samimseih@gmail.com> Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com> Discussion: https://postgr.es/m/19520-73873648d44793cf@postgresql.org Backpatch-through: 15
drop_local_obsolete_slots() continued to dereference local_slot after calling ReplicationSlotDropAcquired(). Once the slot is dropped, its entry in the slot array can be reused by another backend, so later reads of local_slot->data could observe a different slot's name or database OID, leading to an incorrect unlock and log message. Save the slot name and database OID before performing the drop, and use the saved values for the subsequent UnlockSharedObject() call and the log message. While at it, emit the "dropped replication slot" message only when a slot was actually dropped, rather than unconditionally. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com
If the allocation of Snapshot->subxip fails, a follow-up call of GetSnapshotData() would see a partially-initialized snapshot, causing a NULL dereference on reentry when using "subxip" because only "xip" would be allocated. In the event of an out-of-memory error when allocating "subxip", "xip" is now reset before throwing an ERROR, so as Snapshots can be allocated and handled gracefully on retry. This problem is unlikely going to show up in practice, so no backpatch. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/e77acaac-a1b3-40b3-99ee-5769b4e453e4@gmail.com
ALTER TABLE ... MERGE PARTITIONS / SPLIT PARTITION builds a new partition via createPartitionTable(), but never gives it a TOAST table. When the source rows carried out-of-line varlena values, the move into the new partition entered heap_toast_insert_or_update() with reltoastrelid = InvalidOid: the externalization step is skipped, the value falls back to inline storage and heap_insert() fails with "row is too big" error. Also, TOAST table is needed if the new partition receives out-of-line varlena values after the DDL operation is complete. Call NewRelationCreateToastTable() right after the new partition is created in createPartitionTable(), mirroring what DefineRelation() does for regular CREATE TABLE. NewRelationCreateToastTable() decides on its own whether a TOAST table is actually required, so partitions with no toast-eligible columns are unaffected. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/ai_c4-v8iLA2kXFV%40pryzbyj2023 Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>
This reverts the non-text (custom/directory/tar) output format support for pg_dumpall added by 763aaa0 and its feature-specific follow-ups, in line with Noah Misch's post-commit review which recommends reverting and finishing the work through the commitfest. Scope is deliberately minimal: only the feature itself is removed. Independent improvements that merely touched the same files, or that were committed alongside the feature but do not depend on its design, are preserved. Reverted (the feature): 763aaa0 Add non-text output formats to pg_dumpall d6d9b96 Clean up nodes that are no longer of use in 007_pgdumpall.pl 01c729e Fix casting away const-ness in pg_restore.c c7572cd Improve writing map.dat preamble 3c19983 pg_restore: add --no-globals option to skip globals abff449 Fix options listing of pg_restore --no-globals bb53b8d Fix small memory leak in get_dbname_oid_list_from_mfile() a793677 pg_restore: Remove dead code in restore_all_databases() a198c26 pg_dumpall: simplify coding of dropDBs() ec80215 pg_restore: Remove unnecessary strlen() calls in options parsing Preserved (independent of the feature): b2898ba the check_mut_excl_opts() helper in src/fe_utils/option_utils.c and its use in pg_dump 7c8280e pg_dump's conflicting-option refactor (and tests 002/005) be0d0b4 pg_dumpall's rejection of --clean together with --data-only (re-expressed directly, since pg_dumpall.c is otherwise returned to its pre-feature state) 74b4438 the dangling-grantor-OID GRANT fix (back-patched through 16) 273d26b, d4cb9c3 independent pg_restore.sgml clarifications Because the feature restructured pg_dumpall.c and pg_restore.c (pg_restore's main() was split into restore_one_database() plus a dispatcher) and interleaved its option checks with the conflicting-option refactor in the same regions, the cosmetic check_mut_excl_opts() reflow of those two files' option blocks is inseparable from the feature and comes out with it; the behavior is unchanged. The reusable helper and pg_dump's use of it are unaffected. Discussion: https://postgr.es/m/20260607000218.96.noahmisch@microsoft.com
A ctypes binding of libpq (bindings, constants, OIDs, library discovery, notification and result handling) plus a Session class providing synchronous, asynchronous, pipeline, COPY-free NOTIFY/notice and non-blocking query execution. This lets the Python test suite run SQL in-process without forking psql.
PostgresServer manages a cluster's lifecycle (initdb, start/stop/restart, promote), configuration, in-process SQL, log inspection, backup/streaming/ archiving/restore, WAL helpers, replication-slot helpers and connect_ok/ connect_fails connection assertions. PgBin runs client programs; the fixtures (pg_bin, create_pg, pg, conn, bindir, libdir) build the common test objects and tear them down automatically. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
pgtap is a pytest plugin that emits TAP for the meson/prove harness when TESTLOGDIR is set, and maps a whole-module skip to success. The repository pyproject.toml carries the pytest configuration. meson gains a pytest feature option and a kind=='pytest' test branch, so each directory can list pytest suites beside its tap suites. Includes the suite's own self-tests. The root pyproject.toml also configures the suite's code quality gates: black for formatting, and pylint and mypy for linting and type checking (the dev-tooling dependency group lives in src/test/pytest/pyproject.toml; none of it is needed to run the tests). The whole suite is kept black-clean, pylint 10.00/10 and mypy-clean. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Helpers used by the heavier test suites: a pg_regress runner, an OpenSSL-backed SSL server configurator, an slapd launcher, a stand-alone Kerberos KDC, and a launcher for the mock OAuth provider.
Document the Python port of the Perl TAP suite: layout, how to run the tests under meson and directly with pytest, the shared fixtures, and the PostgresServer/Session/PgBin framework classes.
Replace the four src/bin/psql Perl TAP tests with their pytest equivalents and switch the meson test registration from 'tap' to 'pytest'. The interactive tab-completion and pager tests drive psql through a pty via pexpect (the interactive_psql fixture in pyt/conftest.py) and skip when pexpect, readline or a working "wc -l" is unavailable.
Replace the four src/bin/pg_ctl Perl TAP tests with their pytest equivalents and switch the meson test registration from 'tap' to 'pytest'.
Replace the six src/interfaces/libpq Perl TAP tests with their pytest equivalents and switch the meson test registration from 'tap' to 'pytest', preserving the existing env and deps.
Convert the streaming-replication recovery test as a representative multi-node physical-replication example. Add a 'pytest' block to src/test/recovery/meson.build for pyt/test_001_stream_rep.py and drop the corresponding t/001_stream_rep.pl from the 'tap' list; the remaining recovery Perl tests are unchanged.
Convert the logical-replication change-propagation test as a representative multi-node logical-replication example. Add a 'pytest' block to src/test/subscription/meson.build for pyt/test_001_rep_changes.py and drop the corresponding t/001_rep_changes.pl from the 'tap' list; the remaining subscription Perl tests are unchanged.
Enable the pytest suite (-Dpytest=enabled) on all jobs. This needs pytest installed where the images do not already provide it: via MacPorts on macOS (plus pexpect for the interactive psql tests, which need a pty and so skip on Windows), via pip on the Windows VS image, and via pacman on MinGW. The AddressSanitizer job needs one accommodation: the suite loads libpq in-process via ctypes, and dlopening an ASan-instrumented libpq into an uninstrumented python aborts because the ASan runtime must come first in the link order. Preload the ASan runtime for the test step to satisfy that; it is a no-op for the already-instrumented server binaries.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces the Python/pytest test framework (in-process libpq via ctypes, a
PostgresServer fixture layer, the pgtap plugin wired into the meson build) plus
a small representative sample of TAP→pytest conversions. Based on
master.Core framework
SessionPostgresServerframework + pytest fixturesRepresentative conversions (16 scripts)
Each converted script's Perl
t/*.plorigin is removed in the same commit.src/bin/psqlsrc/bin/pg_ctlsrc/interfaces/libpqsrc/test/recovery001_stream_rep(physical rep)src/test/subscription001_rep_changes(logical rep)Full directories flip the meson registration from
taptopytestentirely.The two partial conversions add a
pytestblock and drop only the one convertedPerl entry; the remaining Perl tests in those directories (e.g.
recovery/002_archiving,subscription/002_types) are untouched. Coverage spansinteractive client, server lifecycle, libpq/Session networking, and physical &
logical replication.
CI
ci: run the pytest suite in CI— enables-Dpytest=enabledon all jobs andinstalls the needed deps (MacPorts/pip/pacman; ASan
LD_PRELOADaccommodation).This branch deliberately removes the Perl scripts it converts rather than
carrying a "temporarily disable the Perl TAP tests" commit. The
pg_mkdir_pconcurrency fix is not included (already in
master).Validation
meson setup -Dpytest=enabledconfigures clean; meson registers exactly theconverted pytest tests with no dangling Perl references; the libpq suite runs
green (4 OK, 2 env-gated skips).