Skip to content

fix(nodes): tolerate doubled-brace JSON output from models like DeepSeek#1085

Merged
VinciGit00 merged 1 commit into
ScrapeGraphAI:pre/betafrom
mjmirza:fix/tolerant-json-doubled-braces
Jun 11, 2026
Merged

fix(nodes): tolerate doubled-brace JSON output from models like DeepSeek#1085
VinciGit00 merged 1 commit into
ScrapeGraphAI:pre/betafrom
mjmirza:fix/tolerant-json-doubled-braces

Conversation

@mjmirza

@mjmirza mjmirza commented Jun 10, 2026

Copy link
Copy Markdown

Summary

GenerateAnswerNode fails with OutputParserException: Invalid json output when the LLM returns its JSON wrapped in doubled braces, e.g. {{"content": "..."}}. This happens reliably with DeepSeek (deepseek/deepseek-chat) and intermittently with other less strict models.

Root cause

The schema-less format_instructions show the expected shape as:

{{"content": "your analysis here"}}

Those doubled braces are LangChain template escaping for a single literal { }. GPT-4o-class models follow the instruction and emit single braces, but some models (notably DeepSeek) copy the doubled braces verbatim into their answer, which is not valid JSON, so JsonOutputParser raises.

Fix

Add TolerantJsonOutputParser, a small JsonOutputParser subclass that, only on the parse-failure path, retries once with a single layer of wrapping braces removed. Behaviour is unchanged for any model already returning valid JSON (the happy path goes straight through the parent parser). It is used in the schema-less branch of GenerateAnswerNode.

  • scrapegraphai/utils/output_parser.py — new TolerantJsonOutputParser + _strip_doubled_braces helper
  • scrapegraphai/nodes/generate_answer_node.py — use it in the non-schema branch
  • tests/utils/output_parser_test.py — 7 unit tests (clean JSON unchanged, doubled-brace recovery, whitespace, irrecoverable output still raises)

Reproduction (before this change)

from scrapegraphai.graphs import SmartScraperGraph
cfg = {"llm": {"api_key": "<deepseek-key>", "model": "deepseek/deepseek-chat"}, "headless": True}
SmartScraperGraph(prompt="Extract the heading", source="https://example.com", config=cfg).run()
# -> OutputParserException: Invalid json output: {{"content": "..."}}

After: returns the parsed dict as expected.

Testing

python -m pytest tests/utils/output_parser_test.py
# 7 passed

Suggested labels: bug, size:S

The default GenerateAnswerNode format_instructions show the expected shape as
{{"content": ...}} (LangChain's escaped braces). Strongly instruction-following
models emit single braces, but some models (notably DeepSeek) copy the doubled
braces verbatim, yielding {{"content": ...}} which JsonOutputParser rejects with
'Invalid json output'.

Add TolerantJsonOutputParser, a JsonOutputParser subclass that retries once with a
single layer of wrapping braces removed, only on the parse-failure path. Behaviour
is unchanged for any model already returning valid JSON. Use it in the schema-less
branch of GenerateAnswerNode. Adds unit tests.
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jun 10, 2026
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 11, 2026
@VinciGit00 VinciGit00 merged commit aaa5d2c into ScrapeGraphAI:pre/beta Jun 11, 2026
1 of 2 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 11, 2026
## [2.2.0-beta.4](v2.2.0-beta.3...v2.2.0-beta.4) (2026-06-11)

### Bug Fixes

* **nodes:** tolerate doubled-brace JSON output from models like DeepSeek ([#1085](#1085)) ([aaa5d2c](aaa5d2c))
@github-actions

Copy link
Copy Markdown

🎉 This PR is included in version 2.2.0-beta.4 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer released on @dev size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants