Skip to content

Python (pytest) test framework + representative TAP→pytest conversions#8

Closed
adunstan wants to merge 71 commits into
masterfrom
pytap/v4
Closed

Python (pytest) test framework + representative TAP→pytest conversions#8
adunstan wants to merge 71 commits into
masterfrom
pytap/v4

Conversation

@adunstan

Copy link
Copy Markdown
Owner

Introduces the Python/pytest test framework (in-process libpq via ctypes, a
PostgresServer fixture layer, the pgtap plugin wired into the meson build) plus
a small representative sample of TAP→pytest conversions. Based on master.

Core framework

  • libpq ctypes layer + in-process Session
  • PostgresServer framework + pytest fixtures
  • pgtap plugin + meson wiring
  • SSL/LDAP/Kerberos/OAuth + pg_regress helpers
  • README for the pytest/session framework

Representative conversions (16 scripts)

Each converted script's Perl t/*.pl origin is removed in the same commit.

Suite Mode pytest Perl removed
src/bin/psql full dir 4 4
src/bin/pg_ctl full dir 4 4
src/interfaces/libpq full dir 6 6
src/test/recovery 001_stream_rep (physical rep) partial 1 1
src/test/subscription 001_rep_changes (logical rep) partial 1 1

Full directories flip the meson registration from tap to pytest entirely.
The two partial conversions add a pytest block and drop only the one converted
Perl entry; the remaining Perl tests in those directories (e.g.
recovery/002_archiving, subscription/002_types) are untouched. Coverage spans
interactive client, server lifecycle, libpq/Session networking, and physical &
logical replication.

CI

  • ci: run the pytest suite in CI — enables -Dpytest=enabled on all jobs and
    installs the needed deps (MacPorts/pip/pacman; ASan LD_PRELOAD accommodation).

This branch deliberately removes the Perl scripts it converts rather than
carrying a "temporarily disable the Perl TAP tests" commit. The pg_mkdir_p
concurrency fix is not included (already in master).

Validation

meson setup -Dpytest=enabled configures clean; meson registers exactly the
converted pytest tests with no dangling Perl references; the libpq suite runs
green (4 OK, 2 env-gated skips).

tglsfdc and others added 30 commits June 10, 2026 17:01
In pursuit of removing a Valgrind-detected leak, I inserted
"pfree(pq_mq_handle);" into mq_putmessage's recursion-trouble-recovery
code path, failing to notice that shm_mq_detach would have pfree'd
that block just before (i.e., this particular code path did not leak).
So now that was a double pfree.  We didn't notice because the
recursion scenario isn't exercised in our regression tests, but
Alexander Lakhin found it via code fuzzing.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/b8b40954-e155-41b3-9af8-ad4f261a1b64@gmail.com
pgxmlNodeSetToText() passed nodeTab[i]->doc to xmlNodeDump() without
checking the node type, which could cause a crash as a
XML_NAMESPACE_DECL maps to a xmlNs struct.  The passed-in code would
then be dereferenced in xmlNodeDump().

This commit switches the code to render XML_NAMESPACE_DECL nodes with
xmlXPathCastNodeToString(), like xpath_table().  Some tests are added,
written by me.

Author: Andrey Chernyy <andrey.cherny@tantorlabs.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20260611031436.5afde3cb@andrnote
Backpatch-through: 14
The subscription option max_retention_duration accepts an integer value
representing a timeout in milliseconds, where zero means unlimited
retention (no timeout).  Negative values have no useful meaning, but were
silently accepted and stored in the subscription catalog.

A negative value causes should_stop_conflict_info_retention() to always
return true, because TimestampDifferenceExceeds() treats a negative
threshold as already exceeded.  This stops dead tuple retention
immediately rather than honoring the configured timeout.

Fix by rejecting negative values for max_retention_duration during CREATE
SUBSCRIPTION and ALTER SUBSCRIPTION.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/9232401A-DEEE-49E1-9D11-D14A776DB82B@gmail.com
During promotion, there is a window where RecoveryInProgress() returns
true but the WAL segments of the old timeline have already been removed.
A logical decoding could pick up the old timeline in this window when
reading a page, failing with the following error:
ERROR: requested WAL segment ... has already been removed

This issue does not lead to any data correctness issue, as retrying to
decode the data works in follow-up decoding attempts.  It impacts
availability, though.  Other WAL page read callbacks have a similar
issue, this commit takes care of what should be the noisiest code path:
logical decoding with START_REPLICATION in a WAL sender.

A TAP test, based on an injection point waiting in the startup process
after the segments have been removed/recycled, is added.  This part is
backpatched down to v17.

This issue has been causing sporadic failures in the buildfarm, and
was reproducible manually.  This issue happens since logical decoding on
standbys exists, down to v16.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/7daef094-abf3-4672-bc23-3df4763b16a3@gmail.com
Backpatch-through: 16
When printing the upper boundary of a seg interval, seg_out() decided
whether to emit the certainty indicator ('<', '>' or '~') by testing the
upper indicator (u_ext) for '<' and '>', but mistakenly tested the lower
indicator (l_ext) for '~'.  This is a copy-and-paste slip from the
symmetric code that prints the lower boundary a few lines above.

The consequences for valid input were:

  * A '~' on the upper boundary was dropped on output, e.g.
    '1.5 .. ~2.5'::seg printed as '1.5 .. 2.5'.

  * When the lower boundary carried '~' but the upper boundary had no
    indicator, the wrong test matched and sprintf(p, "%c", seg->u_ext)
    wrote a NUL byte (u_ext == '\0'), which truncated the result string
    and silently lost the entire upper boundary, e.g.
    '~6.5 .. 8.5'::seg printed as '~6.5 .. '.

Certainty indicators are documented to be preserved on output (they are
ignored by the operators, but kept as comments), so this broke the
input/output round-trip for the affected values.

The bug has existed since seg was added.  It went unnoticed because the
existing regression tests only exercised certainty indicators on
single-point segs, which are printed by a different branch of seg_out().
Add tests that place indicators on both boundaries of an interval.

Author: Ewan Young <kdbase.hack@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAON2xHPYeRRCEVAv8XfE18KsEsEHCiYcJ5fOsoxFuMEfpxF1=g@mail.gmail.com
Backpatch-through: 14
When parsing expressions like (old).colname and (old).* in a RETURNING
list, the parser would lose track of the intended varreturningtype,
and therefore return incorrect results.

The root cause was code using GetNSItemByRangeTablePosn() to find a
namespace item from its rtindex and levelsup, without taking into
account returningtype, which would return the wrong namespace item.
Fix by adding a new function GetNSItemByVar() that does take
returningtype into account.

Backpatch to v18, where support for RETURNING OLD/NEW was added.

Bug: #19516
Reported-by: Marko Grujic <markoog@gmail.com>
Author: Marko Grujic <markoog@gmail.com>
Suggested-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAOvwyF2cO_5mAt=w=y-dFnaG5UkZ+3H8nSDoKF_iuWZHsU2ARg@mail.gmail.com
Backpatch-through: 18
transformJsonParseArg() was not careful enough on generation of
transformed expressions when starting from expressions that are not
coercible to text but are in the string type category: it failed to
verify that coerce_to_target_type() succeeds, and returned a NULL
pointer.  This leads to a later NULL dereference and crash at executor
time.

This escaped noticed because it cannot happen for built-in types, all of
which have casts to text.  Only user-created types are potentially
problematic.

Fix by raising an error when a cast to text doesn't exist.

This mistake came in with commit 6ee3020.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reported-by: Chi Zhang <798604270@qq.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Backpatch-through: 16
Discussion: https://postgr.es/m/19491-7aafc221ec63f288@postgresql.org
In a few places, we were constructing translatable strings consisting of
elements list by adding one element at a time and separately a comma.
This is not great from a translation point of view, so rewrite to append
the comma together with the corresponding element in one go.

Author: Peter Smith <smithpb2250@gmail.com>
Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAHut+Pvp7jYcaiZ3pXedXgLcWZWDBLXFUK05JtZpGv3Mj=UOjw@mail.gmail.com
Since this is trying to add a SharedInvalRelSyncMsg rather than
a SharedInvalRelcacheMsg, it should use rs rather than rc.

This makes no difference as things stand, because the two structure
definitions are identical (except for the capitalization of "relid"),
but it's still a good idea to fix it.

Co-authored-by: Stolpovskikh Danil <d.stolpovskikh@ftdata.ru>
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Discussion: http://postgr.es/m/bd6a5735b72b4afe99af49c3c62901d6@localhost.localdomain
MD5 authentication warnings are queued during authentication, before
startup options and role/database settings have been applied. The code
checked md5_password_warnings at queue time, so settings such as
ALTER ROLE ... SET md5_password_warnings = off did not suppress the
warning, even though the established session showed the GUC as off.

Keep the connection-warning infrastructure generic by allowing each
queued warning to carry an optional filter callback. Evaluate that
callback when warnings are emitted, after startup options and
role/database settings have been processed.

Use this for MD5 authentication warnings, while leaving password
expiration warnings unchanged. Add test coverage for an MD5-authenticated
role with md5_password_warnings disabled.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/AE46E42D-5966-4D76-9E64-95EAB01B9FB5@gmail.com
When amcheck validates that a B-Tree metapage's allequalimage flag
matches _bt_allequalimage(), it could fail to report corruption
unless one of the index key columns used interval_ops. As a result,
pg_amcheck could silently miss this corruption on other opclasses,
incorrectly reporting the index as valid.

The mistake was that bt_index_check_callback() kept ereport(ERROR)
inside the loop that scans key attributes for INTERVAL_BTREE_FAM_OID,
even though that loop is only needed to decide whether to add
the interval-specific hint. This commit moves ereport() out of the loop
so allequalimage mismatches are always reported, while still emitting
the hint for affected interval indexes.

Back-patch to v18, where d70b176 introduced this regression
while moving the check into bt_index_check_callback().

Author: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/011ACC9C-CB87-4160-ACE7-4ED57AB86E15@gmail.com
Backpatch-through: 18
xpath() attempted to call xmlCopyNode() and xmlNodeDump() on a
XML_NAMESPACE_DECL, finishing with a confusing error:
=# SELECT xpath('//namespace::foo', '<root xmlns:foo="http://127.0.0.1"/>');
ERROR:  53200: could not copy node
CONTEXT:  SQL function "xpath" statement 1

xpath() is changed so as it goes through xmlXPathCastNodeToString()
instead, that is able to handle namespace nodes.  xml2 uses the same
solution.  This issue has been discovered while digging into
9d33a5a.

Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aioT7ui_ZJ9RMlfM@paquier.xyz
Backpatch-through: 14
The FK fast-path batching added in b7b27eb wrote the incoming row
into the batch array before checking whether the array was full:

    fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
    fpentry->batch_count++;
    if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
        ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

batch_count is reset to zero only at the end of ri_FastPathBatchFlush(),
so it remains at RI_FASTPATH_BATCH_SIZE throughout a full-batch flush.
A flush runs user-defined cast functions and equality operators; if that
user code performs DML on the same FK table, ri_FastPathBatchAdd()
re-enters with batch_count == RI_FASTPATH_BATCH_SIZE and writes one past
the end of the array, corrupting the adjacent batch_count field.  This
is reachable by an unprivileged table owner via an implicit cast with a
PL/pgSQL function and causes a SIGSEGV in assert-enabled builds.

Fix by bounds-checking the write into the batch array so a re-entrant
add can never write past the end, and by adding a "flushing" flag to
RI_FastPathEntry that routes re-entrant ri_FastPathBatchAdd() calls on
a busy entry to the per-row path (ri_FastPathCheck) instead of touching
the mid-flush batch array.  The flag is set around the probe in
ri_FastPathBatchFlush() and cleared in a PG_FINALLY, which also resets
batch_count, so the entry is left empty and reusable if a flush error
(including a reported FK violation) is caught by a savepoint.

Add regression tests for both the re-entrant flush and reuse of an entry
after a flush error caught by a savepoint.

Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Reviewed-by: Nikolay Samokhvalov <nik@postgres.ai>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
Commit a70bce4 added instructions on how to recover if PostgreSQL
refuses to issue new transaction IDs because of imminent wraparound,
but when describing how to find replication slots that should be dropped,
it referred to pg_stat_replication where it should have referenced
pg_replication_slots.

In passing, decorate references to views with <structname> tags.

Backpatch to all supported versions.

Reported-By: Sanjaya Waruna <sanjaya.waruna@gmail.com>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/176767268098.1084085.10345048667224193115@wrigleys.postgresql.org
Backpatch-through: 14
The FK fast-path batching added in b7b27eb buffers rows in a
transaction-lived cache (ri_fastpath_cache) keyed by constraint OID.
Running user-defined cast and equality functions during a batch flush,
together with the cache's lifetime and iteration, exposed two defects
reachable by an unprivileged table owner.

First, on subtransaction abort ri_FastPathSubXactCallback discarded the
entire cache.  An entry's batch holds rows buffered by the enclosing
transaction, not just the aborting subxact -- the cache is keyed by
constraint, so a single entry can mix rows from multiple subxact levels.
An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL
BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer
transaction without running their FK checks, letting orphan rows commit
behind a constraint that still reported itself valid.  The discard also
left relations opened by the batch unclosed, producing "resource was not
closed" warnings.

Second, ri_FastPathEndBatch flushes by iterating the cache with
hash_seq_search.  If flush-time user code inserts into a different
fast-path FK table, a new entry is added to the cache mid-scan; it may
land in a bucket the scan has already passed and never be reached, and
ri_FastPathTeardown then destroys the cache without flushing it,
silently dropping that check.

Cleanly unwinding the cache on subxact abort would require tracking the
originating subxact of each buffered row, since rows from different
levels share an entry (the cache is keyed by constraint) and deferred
constraints cannot be flushed early at a subxact boundary.  Rather than
add that bookkeeping, confine batching to the top transaction level: in
RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the
per-row fast path (ri_FastPathCheck) instead of buffering.  Rows checked
inside a subtransaction are then verified immediately and roll back
cleanly with their subtransaction, and the cache only ever holds
top-level rows.  With the cache confined to the top level, a
subtransaction abort has nothing of its own to discard, so
ri_FastPathSubXactCallback is removed along with its registration.

For the second defect, add a cache-wide flag (ri_fastpath_flushing) set
while ri_FastPathEndBatch iterates the cache.  A re-entrant FK check
arriving while the flag is set takes the per-row path rather than adding
an entry to the cache being scanned, so no entry can be missed and torn
down unflushed.  The flag is cleared in a PG_FINALLY so a flush that
throws (a reported violation or an error from user code) does not leave
it stuck.  As defensive insurance it is also cleared in
ri_FastPathXactCallback() at transaction end.

The per-row fast path still bypasses SPI and stays well ahead of the
pre-19 SPI-based check.  A fuller fix that preserves batching across
subtransactions -- whether by tracking the originating subxact of each
buffered row or by per-subxact cache stacks merged into the parent on
commit -- is left for a future release.

The subtransaction-abort case is covered by a new regression test.  The
mid-scan cross-table case depends on hash bucket placement and so is not
reliably reproducible in a portable test, but the flag prevents it by
construction.

Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Reviewed-by: Nikolay Samokhvalov <nik@postgres.ai>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
read_local_xlog_page_guts has the same race as logical_read_xlog_page:
RecoveryInProgress() can return true during promotion, impacting the
availability of the operations doing WAL page reads with this callback.

This problem is similar to eb4e722 that has addressed the issue for
logical replication, impacting more areas of the code where this WAL
page callback can be used (same narrow window during promotion, same
availability issue):
- pg_walinspect.
- Slot advance (SQL function).
- Slot creation.

Repack workers (v19~) and 2PC files (since forever) can also use this
callback, but they are irrelevant as far as I know.  A test is added
with the SQL lookup functions.  This part relies on injection points,
and is backpatched down to v18, like the test added for eb4e722.

This issue could probably be fixed as well in v14 and v15 for
pg_walinspect.  However, I also feel that there is a conservative
argument about consistency here due to the support of logical decoding
on standbys, so let's limit ourselves to v16 for now.  pg_walinspect is
used less in the field compared to the two other operations, making
addressing this problem less attractive in these two older branches.

Reported-by: Xuneng Zhou <xunengzhou@gmail.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/7daef094-abf3-4672-bc23-3df4763b16a3%40gmail.com
Backpatch-through: 16
This one has been forgotten in 8bf257a.  Per report from buildfarm
member massasauga.

Backpatch-through: 14
Similar to commit 3692a62, for a slightly different code pattern in
psql.

No backpatch to avoid disrupting translation in stable branches.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/airjxKXx7aTG8kfE@alvherre.pgsql
Commit 0e1f1ed taught seg_out() to print the certainty indicator
on an interval's upper boundary, but it was back-patched only as far
as v14.  When upgrading from an older release, the old server prints
the one test_seg row exercising that case ('4.6 .. ~7.0') without the
indicator, so the pre- and post-upgrade dumps do not match.  Make
AdjustUpgrade.pm delete just that row; seg's comparison function does
distinguish the certainty indicators, so the otherwise identical row
'4.6 .. 7.0' is unaffected.

Back-patch to all supported branches.

Per buildfarm members crake and fairywren.

Discussion: https://postgr.es/m/5ccbdbde-6467-4a10-bf4d-0be73a05ce8d@dunslane.net
bt_normalize_tuple() uses VARSIZE() to get the size of varlena, even though
it's not yet known, that it has a 4-byte header.  Fix this by replacing a
accessor with a universal VARSIZE_ANY().

Backpatch to all supported versions.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/7ckc7oka4bvafkf5bwlqs6ygrhlsbhz25ppozfch7zbuxcx3rf%40e4pr4oqenalc
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Backpatch-through: 14
Since the remote column names of a foreign table could be longer than
NAMEDATALEN, remattrmap_cmp(), which compares such column names, should
have used strcmp(), not strncmp() with n=NAMEDATALEN.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/81D981EB-ECC1-495D-8EAC-5CFB67B2CF77%40gmail.com
Commit 2f70fdb removed the deprecated containment operator
~(aclitem[],aclitem) from the catalogs, but missed removing its entry
from the documentation.  (Arguably the blame should fall on c62dd80,
which added this entry in contravention of the longstanding policy
that we don't document deprecated aliases in the first place.)

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOzEurQSyR5psWukyhUz1LtxyO55C2Vfp0Fmt8w2jGKxhszQmQ@mail.gmail.com
Backpatch-through: 14
This commit reduces the number of expected output files for the "xml"
test from three to two (well, mostly one, see below for details).

xml_2.out existed to handle some differences in output due to libxml2
2.9.3, due to some error context missing (085423e).  This file is
removed, by tweaking the XML inputs to trigger the same error patterns
for the problematic 2.9.3 and other libxml2 versions.  This part is
authored by Tom Lane.

xml_1.out (no libxml2 support) is reduced in size by adding an \if query
that exits the test early.  This still checks NO_XML_SUPPORT() through
xmlin().  The rest of the test is skipped if XML input cannot be
handled by the backend.  This part has been written by me.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/aiu6CXO67q-s70n5@paquier.xyz
Backpatch-through: 14
If a query has more than 7498 params, the ParameterDescription message
exceeds the 30000 byte limit on messages that are not specifically
marked as possibly being longer than that (VALID_LONG_MESSAGE_TYPE).
To fix, add ParameterDescription to the list.

Author: Ning Sun <classicning@gmail.com>
Discussion: https://www.postgresql.org/message-id/dbfb4b65-0aa8-470a-8b87-b6496160b28a@gmail.com
Backpatch-through: 14
Late-model clang complains that these functions should be labeled
with "format(printf, 2, 3)", and it's right.  But let's go a bit
further and also make use of varargs, to remove duplication and
allow these functions to be used with non-integer input values.

Since no good deed goes unpunished, I had to also adjust a couple
of call sites.  They weren't wrong as-is, since the size_t-sized
arguments were coerced to int on the way into diag3().  But
without that, we have to adjust the format strings.

The point of this is to suppress compiler warnings, so back-patch
into branches containing pg_bsd_indent, even though there's no
functional change.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/1645041.1781283554@sss.pgh.pa.us
Backpatch-through: 16
The syntax "tablename *" has been obsolete for years, but we want to
retain it and its documentation for backward compatibility reasons.
However, the documentation wording was confusing and could be
understood to mean that "tablename *" is the same as "ONLY tablename".

Reported-by: Jochen Bandhauer <jochen.bandhauer@gmx.net>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/178125831604.1285960.8250607197280951685@wrigleys.postgresql.org
Presently, the "Prev" link on the page for background workers sends
you to the middle of the previous chapter instead of the actual
previous page.  This appears to be caused by a libxml2 bug, but
regardless, a minimal fix is to change the link generation code to
use [position()=last()] instead of [last()] in the predicate on the
union of reverse axes.

Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/aim4AZorFKaC7Wrf%40nathan
Backpatch-through: 14
Our handling of quoting within replication commands was pretty
sloppy, typically looking like
        appendStringInfo(&cmd, " SLOT \"%s\"", options->slotname);
This is fine as long as options->slotname doesn't contain a double
quote mark, but what if it does?  In principle this'd allow injection
of harmful options into replication commands, in the probably-unlikely
case that a slot name comes from untrustworthy input.  We ought to
clean that up.

Moreover, even the places that were trying to be more careful
generally got it wrong, because they used quoting subroutines
intended for SQL commands rather than something that will work
with the replication-command scanner repl_scanner.l.  For example,
several places naively use PQescapeLiteral() to quote option values
for replication commands.  If the string contains a backslash,
PQescapeLiteral() will produce E'...' literal syntax, which
repl_scanner.l doesn't recognize.  Another near miss was to use
quote_identifier() to quote identifiers.  That function won't quote
valid lowercase identifiers unless they match SQL keywords ... but in
this context, replication keywords are what matter.  Neither of these
errors seem to risk string injection, but they definitely can cause
syntax errors in replication commands that ought to be valid.

We can clean all this up by using simple quoting logic that just
doubles single or double quotes respectively.

Or at least, we could if repl_scanner.l handled doubled double quotes
in identifiers, but for some reason it doesn't!  So the first step in
this fix has to be to fix that.  (The fact that we'll later reject
slot names containing double quotes is very far short of justifying
this omission.)

Having done that, this patch runs around and applies correct
quoting in all places that generate replication commands containing
strings coming from outside the immediate context.  Probably some
of these places are safe because of restrictions elsewhere, but it
seems best to just quote all the time.

This was originally reported as a security bug, which it could be
if replication slot names or parameters were to originate from
untrustworthy sources.  But the security team concluded that that
was a very improbable situation, so we're just going to fix this
as a regular bug.

Reported-by: Team Dhiutsa
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/1648659.1781287310@sss.pgh.pa.us
Backpatch-through: 14
Attempting to restore a schema, a table or an index with
--only-statistics skipped all the statistics of the objects wanted.
Like for pg_dump, statistics should be included, so this created an
assymetry between dump and restore.

A second set of problems existed for --table and --index, where the
presence of --statistics skipped the restore of the stats of the
object(s) targetted.

This issue has been reported originally as related to an inconsistency
with the way extended stats restore is handled in Postgres v19, but the
issue is related to the restore of relation and attribute statistics in
v18.  Some TAP tests are added to cover all these cases.

Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/66E80CAB-527C-42B1-BB65-3F82CF4AD998@gmail.com
Backpatch-through: 18
The schema_only_with_statistics test scenario was referenced in
002_pg_dump.pl, but was associated to no command sequence since
0ed92cf.

Issue discovered while investigating a different bug.  Perhaps this
cleanup is not worth backpatching, but there is also an argument in
favor of reducing noise when touching this area of the code in stable
branches.

Reviewed-by: Ewan Young <kdbase.hack@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/ai-y0S7Z25NlrG_n@paquier.xyz
Backpatch-through: 18
hlinnaka and others added 29 commits June 17, 2026 09:00
The first "if (difffile)" block initializes the startpos variable, and
the second "if (difffile)" block reads it. The second if-condition can
only be true when the first one was true, so the startpos variable is
always initialized when it's used. However, the compiler might not be
able to deduce that, and warn about startpos being used uninitialized.

To silence the warning, rearrange the if-checks. Also, bail out if the
diff file cannot be opened, instead of ignoring it silently.

Author: Mikhail Litsarev <m.litsarev@postgrespro.ru>
Reviewed-by; Ewan Young <kdbase.hack@gmail.com>
Discussion: https://www.postgresql.org/message-id/ee06f058c626cd37babd8c81579ffb1e@postgrespro.ru
Several calls of pgstat_count_io_op_time() have been used as data to
count negative values returned by pg_pread() or pg_pwrite(), leading to
an incorrect count reported, casting them back to uint64.

Most of the problematic calls updated here are adjusted so as we do not
report buggy negative numbers anymore.  In xlogrecovery.c, the spot
updated still counts short reads.  In xlog.c, after a WAL segment
initialization, I/O numbers are aggregated only after checking that the
operation has succeeded.

issues introduced by a051e71.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/0db864e6-4477-4eba-b2be-d3523cc86564@eisentraut.org
Backpatch-through: 18
The current form of the catalog query picks up partitioned tables
with expression indexes that lack statistics.  However, since such
indexes never have statistics, there's no point in analyzing them.
To fix, adjust the relevant part of the query to skip partitioned
tables with expression indexes.  While at it, remove the nearby
stainherit check; entries for index expressions always have
stainherit = false.

Author: Baji Shaik <baji.pgdev@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CA%2Bfm-RPE1tEc6CUUPDyRbYTz9tF5Kw47nnk-Zq%3DyYvanbsxyCQ%40mail.gmail.com
Backpatch-through: 18
Add check_stack_depth() to Jsonb_to_SV, SV_to_JsonbValue,
PLyObject_FromJsonbContainer, and PLyObject_ToJsonbValue.  Without
this, deeply nested JSONB values can crash the backend with SIGSEGV
instead of raising a proper error.

Also add CHECK_FOR_INTERRUPTS() to the while loop in SV_to_JsonbValue
that dereferences chains of Perl references, so that a circular
reference (e.g. $x = \$x) can be cancelled by the user instead of
spinning indefinitely.  (We looked at detecting such circular
references, but it seems more trouble than it's worth.)

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TPbjkzUk4qJ5dHvDNEz0hBuFue3A-XWz_=897z+BC+z8A@mail.gmail.com
Backpatch-through: 14
If the call count test fails, you'll reasonably want to know what the
network trace looked like, but that information is currently swallowed.
Print it out instead.

Backpatch-through: 18
When debugging an OAuth trace, it's helpful to know what version of Curl
is in use. The SSL library that Curl is using (which may not be the one
in use by libpq) is also relevant, and it's just as easy to get, so
print that too.

This is being added post-feature-freeze, with RMT approval, in order to
fix some tests in the face of an upstream Curl regression. A subsequent
commit will make use of it in oauth_validator. Backpatch to 18 as well.

Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOYmi%2B%3DkP86t%2BZFFXNQ9G6K4ht7utdmB%3DCzhP%3DZ2wvuBymOTtQ%40mail.gmail.com
Backpatch-through: 18
The call-count test in 001_server.pl runs into a recent upstream
regression in Curl:

    curl/curl#21547

The symptom is high CPU usage on some platforms during OAuth HTTP
requests. But it looks like the fix is on track for a June 2026 release,
as part of Curl 8.21.0, so just skip the test if we happen to be using
the broken version.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOYmi%2B%3DyrwMSsHuNJ1V14isA4iSix5Xb3P3VEp1X0BS61MdV4A%40mail.gmail.com
Backpatch-through: 18
Previously, when retrieving the old Subscription object, constructing
the conninfo could encounter an error during
ForeignServerConnectionString(). ACL errors were handled properly, but
other errors could interfere with a user fixing the problem with ALTER
SUBSCRIPTION.

Reported-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/D908370F-2695-4231-851D-17179A6A6F2A@gmail.com
Dumps will fail if standard_conforming_strings = off.

Reported-by: Tom Lane

Backpatch-through: 1131492.1781717349@sss.pgh.pa.us
Commit 16a0039 reduced the lock level for ALTER DOMAIN ...
VALIDATE CONSTRAINT from ShareLock to ShareUpdateExclusiveLock.
However, that change was unsafe. If DML on tables using the domain had
already started and initialized domain constraint checks before a NOT
VALID constraint was added, it could still insert or update rows that
violated the new constraint.

This commit reverts commit 16a0039 so that ALTER DOMAIN ...
VALIDATE CONSTRAINT once again acquires ShareLock on relations using
the domain. Also add an isolation test covering this case.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/463C0E1A-4A40-4BCA-839C-9236B80D65EE@gmail.com
The EXCEPT clause of a FOR ALL TABLES publication tracks each excluded
table by its identity rather than by name. As a result, renaming a table
or moving it to another schema with ALTER TABLE ... SET SCHEMA leaves the
exclusion in place, and the table stays excluded from the publication.

This behavior was not previously documented and could surprise users who
might reasonably expect a schema-qualified exclusion to apply only while
the table remains in that schema. Add a note to CREATE PUBLICATION to make
the behavior explicit.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHut+PvQ5BqnawCQd6r1tqqd+iAJC-CuRY8wscuXSrpHGUzofA@mail.gmail.com
pgstat_drop_entry_internal() generates an ERROR if facing a pgstats
entry already marked as dropped.  With a workload doing a lot of
concurrent CALL and DROP/CREATE PROCEDURE, it could be possible for
AtEOXact_PgStat_DroppedStats(), that wants to do transactional drops, to
find entries that are already dropped, after a commit record has been
written.  In this case, ERRORs are upgraded to PANIC, taking down the
server.

This issue is fixed by making pgstat_drop_entry() optionally more
tolerant to concurrent drops, adding to the routine a missing_ok option
to make some of its callers more tolerant (spoiler: some of the callers
want a strict behavior, like replication slots and backend stats).
pgstat_drop_entry_internal() cannot be called anymore for an entry
marked as dropped, hence its error is replaced by an assertion.
Functions are handled as a special case in core; this problem could also
apply to custom stats kinds depending on what an extension does.
track_functions is costly when enabled (disabled by default), which is
perhaps the main reason why this has not be found yet.

A similar version of this patch has been proposed by Sami Imseih on a
different thread for a feature in development.  This version has tweaked
here by me for the sake of fixing this issue.

Reported-by: zhanglihui <zlh21343@163.com>
Author: Sami Imseih <samimseih@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/19520-73873648d44793cf@postgresql.org
Backpatch-through: 15
drop_local_obsolete_slots() continued to dereference local_slot after
calling ReplicationSlotDropAcquired().  Once the slot is dropped, its
entry in the slot array can be reused by another backend, so later reads
of local_slot->data could observe a different slot's name or database
OID, leading to an incorrect unlock and log message.

Save the slot name and database OID before performing the drop, and use
the saved values for the subsequent UnlockSharedObject() call and the log
message.  While at it, emit the "dropped replication slot" message only
when a slot was actually dropped, rather than unconditionally.

Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com
If the allocation of Snapshot->subxip fails, a follow-up call of
GetSnapshotData() would see a partially-initialized snapshot, causing a
NULL dereference on reentry when using "subxip" because only "xip" would
be allocated.  In the event of an out-of-memory error when allocating
"subxip", "xip" is now reset before throwing an ERROR, so as Snapshots
can be allocated and handled gracefully on retry.

This problem is unlikely going to show up in practice, so no backpatch.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/e77acaac-a1b3-40b3-99ee-5769b4e453e4@gmail.com
ALTER TABLE ... MERGE PARTITIONS / SPLIT PARTITION builds a new
partition via createPartitionTable(), but never gives it a TOAST table.
When the source rows carried out-of-line varlena values, the move
into the new partition entered heap_toast_insert_or_update() with
reltoastrelid = InvalidOid: the externalization step is skipped, the
value falls back to inline storage and heap_insert() fails with
"row is too big" error.  Also, TOAST table is needed if the new partition
receives out-of-line varlena values after the DDL operation is complete.

Call NewRelationCreateToastTable() right after the new partition is
created in createPartitionTable(), mirroring what DefineRelation()
does for regular CREATE TABLE.  NewRelationCreateToastTable() decides
on its own whether a TOAST table is actually required, so partitions
with no toast-eligible columns are unaffected.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/ai_c4-v8iLA2kXFV%40pryzbyj2023
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
This reverts the non-text (custom/directory/tar) output format support
for pg_dumpall added by 763aaa0 and its feature-specific follow-ups,
in line with Noah Misch's post-commit review which recommends reverting
and finishing the work through the commitfest.

Scope is deliberately minimal: only the feature itself is removed.
Independent improvements that merely touched the same files, or that were
committed alongside the feature but do not depend on its design, are
preserved.

Reverted (the feature):

  763aaa0  Add non-text output formats to pg_dumpall
  d6d9b96  Clean up nodes that are no longer of use in 007_pgdumpall.pl
  01c729e  Fix casting away const-ness in pg_restore.c
  c7572cd  Improve writing map.dat preamble
  3c19983  pg_restore: add --no-globals option to skip globals
  abff449  Fix options listing of pg_restore --no-globals
  bb53b8d  Fix small memory leak in get_dbname_oid_list_from_mfile()
  a793677  pg_restore: Remove dead code in restore_all_databases()
  a198c26  pg_dumpall: simplify coding of dropDBs()
  ec80215  pg_restore: Remove unnecessary strlen() calls in options parsing

Preserved (independent of the feature):

  b2898ba  the check_mut_excl_opts() helper in src/fe_utils/option_utils.c
               and its use in pg_dump
  7c8280e  pg_dump's conflicting-option refactor (and tests 002/005)
  be0d0b4  pg_dumpall's rejection of --clean together with --data-only
               (re-expressed directly, since pg_dumpall.c is otherwise
               returned to its pre-feature state)
  74b4438  the dangling-grantor-OID GRANT fix (back-patched through 16)
  273d26b, d4cb9c3  independent pg_restore.sgml clarifications

Because the feature restructured pg_dumpall.c and pg_restore.c (pg_restore's
main() was split into restore_one_database() plus a dispatcher) and
interleaved its option checks with the conflicting-option refactor in the
same regions, the cosmetic check_mut_excl_opts() reflow of those two files'
option blocks is inseparable from the feature and comes out with it; the
behavior is unchanged.  The reusable helper and pg_dump's use of it are
unaffected.

Discussion: https://postgr.es/m/20260607000218.96.noahmisch@microsoft.com
A ctypes binding of libpq (bindings, constants, OIDs, library discovery,
notification and result handling) plus a Session class providing synchronous,
asynchronous, pipeline, COPY-free NOTIFY/notice and non-blocking query
execution.  This lets the Python test suite run SQL in-process without forking
psql.
PostgresServer manages a cluster's lifecycle (initdb, start/stop/restart,
promote), configuration, in-process SQL, log inspection, backup/streaming/
archiving/restore, WAL helpers, replication-slot helpers and connect_ok/
connect_fails connection assertions.  PgBin runs client programs; the fixtures
(pg_bin, create_pg, pg, conn, bindir, libdir) build the common test objects and
tear them down automatically.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
pgtap is a pytest plugin that emits TAP for the meson/prove harness when
TESTLOGDIR is set, and maps a whole-module skip to success.  The repository
pyproject.toml carries the pytest configuration.  meson gains a pytest feature
option and a kind=='pytest' test branch, so each directory can list pytest
suites beside its tap suites.  Includes the suite's own self-tests.

The root pyproject.toml also configures the suite's code quality gates:
black for formatting, and pylint and mypy for linting and type checking
(the dev-tooling dependency group lives in src/test/pytest/pyproject.toml;
none of it is needed to run the tests).  The whole suite is kept
black-clean, pylint 10.00/10 and mypy-clean.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Helpers used by the heavier test suites: a pg_regress runner, an OpenSSL-backed
SSL server configurator, an slapd launcher, a stand-alone Kerberos KDC, and a
launcher for the mock OAuth provider.
Document the Python port of the Perl TAP suite: layout, how to run the
tests under meson and directly with pytest, the shared fixtures, and the
PostgresServer/Session/PgBin framework classes.
Replace the four src/bin/psql Perl TAP tests with their pytest
equivalents and switch the meson test registration from 'tap' to
'pytest'.  The interactive tab-completion and pager tests drive psql
through a pty via pexpect (the interactive_psql fixture in
pyt/conftest.py) and skip when pexpect, readline or a working "wc -l"
is unavailable.
Replace the four src/bin/pg_ctl Perl TAP tests with their pytest
equivalents and switch the meson test registration from 'tap' to
'pytest'.
Replace the six src/interfaces/libpq Perl TAP tests with their pytest
equivalents and switch the meson test registration from 'tap' to
'pytest', preserving the existing env and deps.
Convert the streaming-replication recovery test as a representative
multi-node physical-replication example.  Add a 'pytest' block to
src/test/recovery/meson.build for pyt/test_001_stream_rep.py and drop
the corresponding t/001_stream_rep.pl from the 'tap' list; the
remaining recovery Perl tests are unchanged.
Convert the logical-replication change-propagation test as a
representative multi-node logical-replication example.  Add a 'pytest'
block to src/test/subscription/meson.build for
pyt/test_001_rep_changes.py and drop the corresponding
t/001_rep_changes.pl from the 'tap' list; the remaining subscription
Perl tests are unchanged.
Enable the pytest suite (-Dpytest=enabled) on all jobs.

This needs pytest installed where the images do not already provide it:
via MacPorts on macOS (plus pexpect for the interactive psql tests, which
need a pty and so skip on Windows), via pip on the Windows VS image, and
via pacman on MinGW.

The AddressSanitizer job needs one accommodation: the suite loads libpq
in-process via ctypes, and dlopening an ASan-instrumented libpq into an
uninstrumented python aborts because the ASan runtime must come first in
the link order.  Preload the ASan runtime for the test step to satisfy
that; it is a no-op for the already-instrumented server binaries.
@adunstan adunstan closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.