Skip to content

Social Scout

Transform social media text into actionable qualitative insights using AI-powered multi-agent analysis.

CI codecov Python 3.11+ License: MIT


What is Social Scout?

Social Scout is a command-line pipeline that automates qualitative research on social media text. It handles the full journey from raw data collection to a structured, citation-backed analysis report — without requiring any coding.

The pipeline has four independent steps, each persisting its output so you can re-run any stage without redoing earlier work:

scout collect   →  scout preprocess  →  scout model  →  scout analyze  →  scout visualize
 Apify scraper     Polars+GLiNER+       BERTopic (7)    CrewAI (5 agents)  30+ Plotly charts
                   Sentiment                                                + PNG export

Or run everything in one command:

scout run my-project --keywords "AI commerce" --all-techniques

Key Features

  • Collect Reddit posts and comments via Apify's managed scraping infrastructure
  • Filter by keywords, subreddits, sort order, and time range
  • Output: newline-delimited JSON (raw_data.ndjson)
  • High-speed text cleaning with Polars (Rust-based DataFrame library)
  • Named entity recognition with GLiNER (brand names, products, people, locations)
  • Sentiment scoring with --sentiment — adds sentiment_label and sentiment_score columns using a Twitter-fine-tuned RoBERTa model
  • Configurable minimum text length and entity confidence thresholds

Seven BERTopic techniques available:

Technique Use case
basic Standard topic clustering
dynamic Topic evolution over time
hierarchical Macro/micro topic trees
class-based Topics by metadata group
sentiment-topic Sentiment distribution per topic
network Topic co-occurrence patterns
zero-shot Hypothesis validation

Five CrewAI agent personas analyze and debate the findings:

Agent Role
Data Analyst Quantitative patterns from topic model
Consumer Psychologist Emotional and cognitive drivers
Strategy Advisor Stakeholder implications
Critical Reviewer Validates claims, flags speculation
Chief Theorist Synthesizes a unified framework

Hallucination control: every finding must cite a source record. Uncited claims are blocked from the report.

Use --report-language korean to produce all findings in Korean. Use --llm ensemble (default) to mix Claude and Gemini for diverse perspectives.

  • scout visualize <project> generates a standalone HTML dashboard with 30+ Plotly charts
  • Sections: Pipeline Overview · Collection · Preprocessing · Topic Modeling · Analysis · Sentiment & Perception · Cross-Stage
  • --sentiment flag (preprocess step) unlocks the Sentiment & Perception section: donut, heatmaps, controversy bar, time series, community heatmap, perception scatter
  • --export-png exports all charts as 300 DPI PNG files via kaleido
  • --interpret generates LLM-written section interpretations shown in the dashboard
  • --export-png --interpret writes visualization_report.md — section heading + blockquote + PNG links, ready for academic papers

5-Minute Quick Start

# Install (user)
uv tool install git+https://github.com/ecoinfoai/social-scout.git

# Configure credentials
mkdir -p ~/.config/social-scout
cp .env.example ~/.config/social-scout/.env
# → edit ~/.config/social-scout/.env with your API keys

# Create a project and run the pipeline
scout project create agentic-commerce
scout run agentic-commerce \
  --keywords "agentic commerce,AI shopping" \
  --communities technology,futurology \
  --all-techniques

# Read the report
cat projects/agentic-commerce/reports/report.md

See the Quick Start guide for a step-by-step walkthrough.


Documentation

  • Installation — system requirements, virtual environment, API keys
  • Quick Start — get running in 5 minutes
  • Tutorial — full walkthrough with a real research example
  • Configuration — environment variables, .env file, advanced settings

Project