Skip to content

scout compare — Within-Corpus Subset Comparison

Partitions any single-project corpus into lexically-defined subsets and emits a JSON summary plus a Markdown report covering up to four metrics. No merged corpus required.

Spec: specs/003-merge-compare-charts/contracts/cli-compare.md.

Synopsis

scout compare <PROJECT> \
  --include "PATTERN" \
  [--include-regex] \
  [--exclude "PATTERN"] [--exclude-regex] \
  [--against rest|filter:COL=VAL|include:PATTERN] \
  [--filter COL=VAL] \
  [--group-by COL] \
  [--metrics sentiment,communities,topics,timeline] \
  [--case-sensitive] \
  [--max-excerpts N] \
  [--charts] \
  [--output-dir PATH] \
  [--quiet]

Options

Token Type Required Default Notes
PROJECT str yes Existing project name. projects/<PROJECT>/cleaned/cleaned_data.parquet must exist.
--include PATTERN str yes Comma-separated keyword list (OR semantics) by default; treated as regex iff --include-regex.
--include-regex flag no false Treat --include as a regex.
--exclude PATTERN str no Same syntax as --include. Applied after inclusion (FR-011).
--exclude-regex flag no false Treat --exclude as a regex.
--against str no rest One of: rest, filter:COL=VAL, include:PATTERN.
--filter COL=VAL str no Pre-filter applied before inclusion logic.
--group-by COL str no Add an inclusion-group breakdown by this column.
--metrics comma-sep no sentiment Subset of {sentiment, communities, topics, timeline}.
--case-sensitive flag no false Default is case-insensitive.
--max-excerpts N int no 5 Per-group excerpt cap. 0 suppresses excerpts.
--charts flag no false Also write compare_charts.html (requires [viz]).
--output-dir PATH str no projects/<PROJECT>/compare/<slug>/ Slug = lowercased --include with non-alnum runs collapsed to _, truncated to 64 chars.
--quiet flag no false Suppress progress output.

Exit codes

Code Meaning
0 Success.
1 Project not found, or cleaned_data.parquet missing.
2 --against form not recognized.
3 Required column missing for a requested metric.
4 Inclusion pattern matches zero records (FR-018).
5 Comparison group resolves to zero records (FR-018).
6 Output directory exists and contains files.
7 Internal error during compute or write.

Precedence (verified by contract tests): exit 4 (zero-include) > exit 3 (metric column missing) > exit 5 (zero-against, explicit filter:/include: form). Default --against=rest yielding zero rows is treated as include-saturation; a missing metric column there surfaces as exit 3, not 5.

Output

Writes to <output-dir>/:

  • compare_summary.json — see compare-summary.schema.json.
  • compare_report.md — Markdown report with group sizes, per-metric tables, and representative excerpts.
  • compare_charts.html (only when --charts) — Plotly figures.

All files atomic-write (.tmp then rename).

Markdown report structure

# Comparison report — <PROJECT>

**Inclusion**: `<pattern>` (case <sensitive|insensitive>)
**Exclusion**: `<pattern or "—">`
**Comparison group**: `<against form>`
**Pre-filter**: `<COL=VAL or "—">`
**Group-by breakdown**: `<column or "—">`
**Metrics**: <comma-separated>

## Group sizes
| Group | n |

## Sentiment (if metric requested)
| Group | positive | neutral | negative | neg-rate |

## Top communities (if metric requested)
(per-group ranked list, top 10)

## Top topics (if metric requested)
(per-group ranked list, top 10)

## Timeline (if metric requested)
(per-group monthly buckets)

## Representative excerpts
### Group: Included (n=…)
1. *<community>* — <created_at> — sentiment: <label>
   > <text (truncated to 500 chars)>

Examples

# Within-CS-baseline AI/bot mentions vs not
scout compare master-corpus-2026q2 \
  --filter "is_cs_baseline=true" \
  --include "chatbot,bot,chatgpt,gpt,ai,llm,claude,gemini,alexa,copilot" \
  --against rest \
  --metrics sentiment,communities

# Consent-violation phrases by autonomy type
scout compare master-corpus-2026q2 \
  --include "didn'?t authorize|without (?:my )?consent|never (?:asked|consented)" \
  --include-regex \
  --against rest \
  --group-by type \
  --metrics sentiment

# Two inclusion sets compared head-to-head
scout compare master-corpus-2026q2 \
  --include "agentic ai,autonomous agent" \
  --against 'include:traditional ecommerce|cs chatbot' \
  --metrics sentiment,timeline

See also