Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/mysql-parser-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: MySQL Parser Tests

on:
push:
branches:
- trunk
paths:
- '.github/workflows/mysql-parser-tests.yml'
- 'packages/mysql-parser/**'
pull_request:
paths:
- '.github/workflows/mysql-parser-tests.yml'
- 'packages/mysql-parser/**'
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

# Disable permissions for all available scopes by default.
# Any needed permissions should be configured at the job level.
permissions: {}

jobs:
test:
# The runtime supports PHP 7.2+; test the oldest and the latest.
name: PHP ${{ matrix.php }}
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read # Required to clone the repo.
strategy:
fail-fast: false
matrix:
php: [ '7.2', '8.5' ]

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up PHP
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
coverage: none

- name: Install dependencies
working-directory: packages/mysql-parser
run: composer install --no-interaction --no-progress

- name: Run tests
working-directory: packages/mysql-parser
run: composer run test
1 change: 1 addition & 0 deletions packages/mysql-parser/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/build/
78 changes: 78 additions & 0 deletions packages/mysql-parser/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# MySQL Parser

A fast and complete **MySQL parser** in pure PHP with zero dependencies, generated
directly from the **official MySQL grammar**.

The runtime requires **PHP 7.2+** with no extensions.

## How it works

The grammar is compiled ahead of time from the official MySQL grammar sources.
It outputs a compact parse table and a list of MySQL tokens for the lexer:

```
mysql-server sources
sql/sql_yacc.yy ──▶ Bison 3.8.2 ──▶ automaton.xml ─┬─▶ generate-parse-table.php ──▶ mysql-parse-table.php
|
sql/lex.h ─────────────────────────────────────────┴─▶ generate-tokens.php ───────▶ class-wp-mysql-tokens.php
```

The runtime ships only the compiled artifacts and a thin parser implementation.
In this package, the runtime lives under `src` and the grammar tooling in `tools`.

A MySQL query is processed using the following pipeline:
```
MySQL query ──────────▶ WP_MySQL_Lexer ──────────▶ WP_Parser ──────────▶ AST
string WP_MySQL_Token[] WP_Parser_Node(s)
WP_MySQL_Token(s)
```

## Usage

```php
require_once __DIR__ . '/vendor/autoload.php';

$parser = WP_MySQL_Parser_Factory::create_parser();
$tokens = ( new WP_MySQL_Lexer( 'SELECT 1 + 2' ) )->remaining_tokens();
$ast = $parser->parse( $tokens );
```

## Development
This package includes the full grammar compilation pipeline. The generated artifacts
are committed to the repository, and they only need to be regenerated on grammar
changes or changes to the compilation pipeline itself. The grammar comes with a
testing query corpus extracted from the MySQL server test suite.

### Building the grammar
To regenerate the compiled MySQL artifacts from MySQL sources, use:

```bash
composer run build-grammar
```

This requires `bash`, `curl`, `docker`, and `php` and executes the following steps:
1. Fetch the grammar from https://github.com/mysql/mysql-server/.
2. Run Bison in a Docker container to generate `build/automaton.xml`.
3. Extract and compact the grammar and tokens, and save them in `src`.

### Building the query corpus
This package includes a testing corpus of about 70,000 MySQL queries extracted
from the MySQL release that corresponds to the generated MySQL grammar version.
To regenerate it, use:

```bash
composer run build-corpus
```

This requires `bash`, `git`, and `php`. It performs the following steps:
1. Shallow-clone the `mysql-test` directory from MySQL server into `build/`.
2. Extract SQL queries from the MySQL test files.
3. Store the queries under `data/mysql-server-query-corpus/mysql-latest.csv`.

## Tests and benchmarks
To run lexer and parser tests and benchmarks, use:

```bash
composer run test # PHPUnit suite, including the query corpus tests
composer run benchmark # Query throughput with and without JIT on the whole corpus
```
22 changes: 22 additions & 0 deletions packages/mysql-parser/bin/build-corpus
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash
#
# Build the MySQL server query corpus from the MySQL server test suite.
#
# Runs the full pipeline end to end:
# 1. Fetch the mysql-test directory from the pinned mysql-server tag.
# 2. Extract the SQL queries into data/mysql-server-query-corpus/.
#
# Requirements: bash, git, php. Override the MySQL version with MYSQL_TAG
# (default: mysql-8.4.10).
#
set -euo pipefail

package_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
tools_dir="$package_dir/tools"

bash "$tools_dir/fetch-mysql-tests.sh"

echo "Generating the query corpus ..."
php "$tools_dir/generate-query-corpus.php"

echo "Done."
32 changes: 32 additions & 0 deletions packages/mysql-parser/bin/build-grammar
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash
#
# Build the MySQL parse table and token map from the official MySQL grammar.
#
# Runs the full pipeline end to end:
# 1. Fetch sql_yacc.yy + lex.h from the pinned mysql-server tag.
# 2. Run Bison 3.8.2 to produce the LALR automaton (automaton.xml).
# 3. Generate src/mysql-parse-table.php from the automaton.
# 4. Generate the lexer's token data block from lex.h and the automaton.
#
# Requirements: bash, curl, docker, php. Override the MySQL version with
# MYSQL_TAG (default: mysql-8.4.10).
#
set -euo pipefail

package_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
tools_dir="$package_dir/tools"
build_dir="$package_dir/build"
src_dir="$package_dir/src"

mkdir -p "$src_dir"

bash "$tools_dir/fetch-mysql-grammar.sh"
bash "$tools_dir/run-bison.sh"

echo "Generating parse table ..."
php "$tools_dir/generate-parse-table.php" "$build_dir/automaton.xml" "$src_dir/mysql-parse-table.php"

echo "Generating grammar tokens ..."
php "$tools_dir/generate-tokens.php" "$build_dir/automaton.xml" "$build_dir/lex.h" "$src_dir/class-wp-mysql-lexer.php"

echo "Done."
30 changes: 30 additions & 0 deletions packages/mysql-parser/composer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"name": "wordpress/mysql-parser",
"type": "library",
"description": "A fast and complete MySQL parser with zero dependencies.",
"license": "GPL-2.0-or-later",
"require": {
"php": ">=7.2"
},
"autoload": {
"classmap": [
"src/"
]
},
"require-dev": {
"phpunit/phpunit": "^8.5"
},
"scripts": {
"test": "phpunit",
"build-grammar": [
"./bin/build-grammar"
],
"build-corpus": [
"./bin/build-corpus"
],
"benchmark": [
"@php tests/benchmark.php",
"@php -d opcache.enable_cli=1 -d opcache.jit_buffer_size=64M -d opcache.jit=tracing tests/benchmark.php"
]
}
}
Loading
Loading