Skip to content

feat: auto-detect input format from file extension with -I override#164

Merged
vmvarela merged 2 commits into
masterfrom
issue-158/auto-detect-input-format
Jun 13, 2026
Merged

feat: auto-detect input format from file extension with -I override#164
vmvarela merged 2 commits into
masterfrom
issue-158/auto-detect-input-format

Conversation

@vmvarela

Copy link
Copy Markdown
Owner

Closes #158

What

When a file argument has a recognizable extension (.csv, .tsv, .json, .ndjson, .xml), the input format is auto-detected — no -I flag needed. The -I flag still works as an explicit override for all files (e.g. when a TSV file has a .txt extension).

Changes

  • src/args.zig: Track whether -I/--input-format was explicitly set (input_format_explicit). When set, it takes precedence over extension-based detection for all file arguments. When not set, each file's extension is inspected via InputFormat.fromExtension(), falling back to CSV for unrecognized extensions.
  • build.zig: 8 new integration tests (157a–157h) covering auto-detection for JSON/NDJSON/XML, -I override (short, long, = syntax), ambiguous extensions (.txt), and updated fixture test 14 to use auto-detection instead of piping.
  • docs/sql-pipe.1.scd: Document auto-detection behavior, -I/-O options, and new examples.
  • README.md: Add JSON auto-detection example, -I override example, update options table and limitations section.

Acceptance Criteria

  • File extensions .csv, .tsv, .json, .ndjson, .xml auto-set input format
  • -I flag still works as explicit override
  • Stdin input still defaults to CSV (no filename to inspect)
  • Ambiguous extensions (.txt, .dat) default to CSV
  • All existing tests pass
  • New tests cover auto-detection and override behavior

- Add input_format_explicit flag to track when -I is explicitly set
- When -I is set, it overrides file extension auto-detection for all files
- When -I is not set, auto-detect from .csv/.tsv/.json/.ndjson/.xml extensions
- Ambiguous extensions (.txt, .dat) default to CSV
- Stdin always uses -I value (no filename to inspect)
- Add 8 integration tests (157a-157h) covering auto-detection and override
- Update fixture test 14 to use file auto-detection instead of stdin + -I
- Document auto-detection and -I override in README, man page, and --help

Closes #158
@github-actions github-actions Bot added the type:feature New functionality label Jun 13, 2026
@vmvarela vmvarela added priority:medium Should be done soon size:xs Trivial — less than 1 hour status:review In code review or waiting for feedback labels Jun 13, 2026
…hecks

Compute effective_input_format from per-file auto-detection when a file
argument is present, and use it for:
- --columns, --validate, --sample mode dispatch (Issue A)
- --json-path validation (Issue B)
- --xml-root/--xml-row name validation (Issue C)

Previously these paths used the global input_format (default CSV or
explicit -I value), causing auto-detected .tsv/.json/.xml files to be
parsed as CSV in special modes and valid --json-path invocations to be
rejected.
@vmvarela vmvarela merged commit a1c4fa7 into master Jun 13, 2026
4 checks passed
@vmvarela vmvarela deleted the issue-158/auto-detect-input-format branch June 13, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Should be done soon size:xs Trivial — less than 1 hour status:review In code review or waiting for feedback type:feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-detect input format from file extension

1 participant