scout compare — Within-Corpus Subset Comparison¶
Partitions any single-project corpus into lexically-defined subsets and emits a JSON summary plus a Markdown report covering up to four metrics. No merged corpus required.
Spec: specs/003-merge-compare-charts/contracts/cli-compare.md.
Synopsis¶
scout compare <PROJECT> \
--include "PATTERN" \
[--include-regex] \
[--exclude "PATTERN"] [--exclude-regex] \
[--against rest|filter:COL=VAL|include:PATTERN] \
[--filter COL=VAL] \
[--group-by COL] \
[--metrics sentiment,communities,topics,timeline] \
[--case-sensitive] \
[--max-excerpts N] \
[--charts] \
[--output-dir PATH] \
[--quiet]
Options¶
| Token | Type | Required | Default | Notes |
|---|---|---|---|---|
PROJECT |
str | yes | — | Existing project name. projects/<PROJECT>/cleaned/cleaned_data.parquet must exist. |
--include PATTERN |
str | yes | — | Comma-separated keyword list (OR semantics) by default; treated as regex iff --include-regex. |
--include-regex |
flag | no | false | Treat --include as a regex. |
--exclude PATTERN |
str | no | — | Same syntax as --include. Applied after inclusion (FR-011). |
--exclude-regex |
flag | no | false | Treat --exclude as a regex. |
--against |
str | no | rest |
One of: rest, filter:COL=VAL, include:PATTERN. |
--filter COL=VAL |
str | no | — | Pre-filter applied before inclusion logic. |
--group-by COL |
str | no | — | Add an inclusion-group breakdown by this column. |
--metrics |
comma-sep | no | sentiment |
Subset of {sentiment, communities, topics, timeline}. |
--case-sensitive |
flag | no | false | Default is case-insensitive. |
--max-excerpts N |
int | no | 5 | Per-group excerpt cap. 0 suppresses excerpts. |
--charts |
flag | no | false | Also write compare_charts.html (requires [viz]). |
--output-dir PATH |
str | no | projects/<PROJECT>/compare/<slug>/ |
Slug = lowercased --include with non-alnum runs collapsed to _, truncated to 64 chars. |
--quiet |
flag | no | false | Suppress progress output. |
Exit codes¶
| Code | Meaning |
|---|---|
| 0 | Success. |
| 1 | Project not found, or cleaned_data.parquet missing. |
| 2 | --against form not recognized. |
| 3 | Required column missing for a requested metric. |
| 4 | Inclusion pattern matches zero records (FR-018). |
| 5 | Comparison group resolves to zero records (FR-018). |
| 6 | Output directory exists and contains files. |
| 7 | Internal error during compute or write. |
Precedence (verified by contract tests): exit 4 (zero-include) > exit 3 (metric column missing) > exit 5 (zero-against, explicit filter:/include: form). Default --against=rest yielding zero rows is treated as include-saturation; a missing metric column there surfaces as exit 3, not 5.
Output¶
Writes to <output-dir>/:
compare_summary.json— seecompare-summary.schema.json.compare_report.md— Markdown report with group sizes, per-metric tables, and representative excerpts.compare_charts.html(only when--charts) — Plotly figures.
All files atomic-write (.tmp then rename).
Markdown report structure¶
# Comparison report — <PROJECT>
**Inclusion**: `<pattern>` (case <sensitive|insensitive>)
**Exclusion**: `<pattern or "—">`
**Comparison group**: `<against form>`
**Pre-filter**: `<COL=VAL or "—">`
**Group-by breakdown**: `<column or "—">`
**Metrics**: <comma-separated>
## Group sizes
| Group | n |
## Sentiment (if metric requested)
| Group | positive | neutral | negative | neg-rate |
## Top communities (if metric requested)
(per-group ranked list, top 10)
## Top topics (if metric requested)
(per-group ranked list, top 10)
## Timeline (if metric requested)
(per-group monthly buckets)
## Representative excerpts
### Group: Included (n=…)
1. *<community>* — <created_at> — sentiment: <label>
> <text (truncated to 500 chars)>
Examples¶
# Within-CS-baseline AI/bot mentions vs not
scout compare master-corpus-2026q2 \
--filter "is_cs_baseline=true" \
--include "chatbot,bot,chatgpt,gpt,ai,llm,claude,gemini,alexa,copilot" \
--against rest \
--metrics sentiment,communities
# Consent-violation phrases by autonomy type
scout compare master-corpus-2026q2 \
--include "didn'?t authorize|without (?:my )?consent|never (?:asked|consented)" \
--include-regex \
--against rest \
--group-by type \
--metrics sentiment
# Two inclusion sets compared head-to-head
scout compare master-corpus-2026q2 \
--include "agentic ai,autonomous agent" \
--against 'include:traditional ecommerce|cs chatbot' \
--metrics sentiment,timeline
See also¶
- Multi-corpus merge —
scout merge. - User-defined charts —
scout visualize --extra-charts.