Config: add strict type validation for pyfallow TOML values #37

Closed
opened 2026-05-12 08:04:44 +02:00 by codex · 0 comments
Collaborator

Source

Triaged from DeepSeek v4 Pro audit via #35.

Problem

The config loader validates some semantic constraints, but _merge_dataclass() can assign TOML values with the wrong type directly into dataclass fields. That can let malformed configuration fail later in analysis with a less useful error, or quietly alter behavior.

This is distinct from #8 / B5, which is about MCP Pydantic contract models and schema drift. This issue is about stdlib core config parsing in src/pyfallow/config.py.

Target behavior

Malformed .pyfallow.toml, .fallow.toml, or [tool.pyfallow] values should fail early with ConfigError naming the field and expected shape.

Examples that should be rejected before analysis starts:

  • roots = "src" where a list is expected
  • entry = [123] where list[str] is expected
  • include_tests = "yes" where bool is expected
  • numeric thresholds passed as strings
  • nested sections with unknown keys, if the current config contract does not allow them

Acceptance criteria

  • Add tests that load invalid TOML and assert ConfigError with the offending field name.
  • Preserve all currently valid config fixtures and README examples.
  • Validation covers top-level fields and nested dataclass sections that are loaded through _merge_dataclass().
  • Error messages are deterministic and useful to an agent: expected type/shape, actual type, config path if available.
  • No runtime dependencies are added.

Out of scope

  • Moving config parsing to Pydantic or another dependency.
  • Redesigning the config schema.
  • Changing default analyzer behavior.
## Source Triaged from DeepSeek v4 Pro audit via #35. ## Problem The config loader validates some semantic constraints, but `_merge_dataclass()` can assign TOML values with the wrong type directly into dataclass fields. That can let malformed configuration fail later in analysis with a less useful error, or quietly alter behavior. This is distinct from #8 / B5, which is about MCP Pydantic contract models and schema drift. This issue is about stdlib core config parsing in `src/pyfallow/config.py`. ## Target behavior Malformed `.pyfallow.toml`, `.fallow.toml`, or `[tool.pyfallow]` values should fail early with `ConfigError` naming the field and expected shape. Examples that should be rejected before analysis starts: - `roots = "src"` where a list is expected - `entry = [123]` where list[str] is expected - `include_tests = "yes"` where bool is expected - numeric thresholds passed as strings - nested sections with unknown keys, if the current config contract does not allow them ## Acceptance criteria - Add tests that load invalid TOML and assert `ConfigError` with the offending field name. - Preserve all currently valid config fixtures and README examples. - Validation covers top-level fields and nested dataclass sections that are loaded through `_merge_dataclass()`. - Error messages are deterministic and useful to an agent: expected type/shape, actual type, config path if available. - No runtime dependencies are added. ## Out of scope - Moving config parsing to Pydantic or another dependency. - Redesigning the config schema. - Changing default analyzer behavior.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pdurlej/fallow-py#37
No description provided.