Configuration Setup Guide¶

This guide explains the three configuration files used by FormA, what each one does, and exactly what to write in every field. It is written for first-time users who have never configured the tool before.

Overview: Three Files, Three Purposes¶

FormA uses three configuration files. Each lives in a different place and serves a different scope:

File	Scope	Location	Purpose
`config.json`	Machine-wide	`~/.config/formative-analysis/config.json`	API keys, SMTP credentials — secrets that never go into version control
`forma.yaml`	Per-semester	Project root (e.g., `anp2026/forma.yaml`)	Course info, class sections, LLM settings, output paths — shared across all weeks
`week.yaml`	Per-week	Each week's directory (e.g., `week_01/week.yaml`)	Image directories, file patterns, exam questions — changes every week

How they relate:

~/.config/formative-analysis/config.json    (secrets — API keys, SMTP)
    |
    v
anp2026_formative_analysis/
    forma.yaml                               (semester settings)
    week_01/
        week.yaml                            (week 1 settings)
        scans_1A_w1/                         (scan images for class A)
        final_A.yaml                         (joined results)
        eval_A/                              (evaluation output)
    week_02/
        week.yaml                            (week 2 settings)
        ...

Priority order (highest to lowest): CLI flags > week.yaml > forma.yaml > config.json > built-in defaults. This means you can always override any setting by passing a CLI flag, without editing any file.

File 1: config.json (Secrets)¶

What it is¶

A JSON file that stores API keys and email server credentials. It lives in your home directory, not inside your project — this keeps secrets out of version control.

Where to put it¶

~/.config/formative-analysis/config.json

Create the directory if it does not exist:

mkdir -p ~/.config/formative-analysis

Then create config.json with a text editor. Set restrictive permissions so only you can read it:

chmod 600 ~/.config/formative-analysis/config.json

What to write in it¶

The file has up to three sections. Include only the sections you need.

{
  "llm": {
    "provider": "gemini",
    "api_key": "AIzaSy..."
  },
  "smtp": {
    "server": "smtp.gmail.com",
    "port": 587,
    "sender_email": "professor@university.edu",
    "sender_name": "Prof. Kim",
    "use_tls": true,
    "send_interval_sec": 1.0
  },
  "naver_ocr": {
    "secret_key": "your-naver-secret-key",
    "api_url": "https://your-endpoint.apigw.ntruss.com/..."
  }
}

Field-by-field explanation¶

`llm` section — LLM API access¶

You need this section if you use forma eval (AI-powered evaluation) or forma ocr scan (LLM Vision OCR).

Field	What to write	Example
`provider`	Which LLM service to use. Write `"gemini"` for Google Gemini or `"anthropic"` for Anthropic Claude.	`"gemini"`
`api_key`	Your API key from the provider's console. For Gemini, get it from Google AI Studio. For Anthropic, get it from Anthropic Console. You can also set this as an environment variable (`GEMINI_API_KEY` or `ANTHROPIC_API_KEY`) instead of putting it here.	`"AIzaSy..."`

Tip: If you prefer not to store the API key in a file, set the environment variable instead and omit the api_key field entirely. FormA checks the environment variable automatically.

`smtp` section — Email delivery¶

You need this section only if you plan to email PDF reports to students using forma deliver send. Skip it entirely if you distribute reports by hand or through an LMS.

Field	What to write	Example
`server`	Your university's SMTP server hostname. Ask your IT department if unsure.	`"smtp.gmail.com"`
`port`	SMTP port number. `587` is standard for STARTTLS; `465` for SSL.	`587`
`sender_email`	The "from" address students will see.	`"professor@university.edu"`
`sender_name`	Display name shown in the email.	`"Prof. Kim"`
`use_tls`	Whether to encrypt the connection. Almost always `true`.	`true`
`send_interval_sec`	Seconds to wait between emails (rate limiting). `1.0` is safe for most servers.	`1.0`

Important: The SMTP password is never stored in this file. When you run forma deliver send, provide it via the FORMA_SMTP_PASSWORD environment variable or --password-from-stdin.

`naver_ocr` section — Naver CLOVA OCR (optional)¶

You need this only if you use Naver CLOVA OCR for answer sheet scanning. Most users should skip this and use the default LLM Vision OCR instead (forma ocr scan --provider gemini).

Field	What to write	Example
`secret_key`	Naver CLOVA OCR secret key from Naver Cloud Console.	`"abc123..."`
`api_url`	Naver CLOVA OCR API endpoint URL. Must start with `https://`.	`"https://..."`

Minimal config.json (most users)¶

If you only use Gemini for evaluation and OCR, and you set the API key via environment variable, your config.json can be as simple as:

{
  "llm": {
    "provider": "gemini"
  }
}

Or even an empty object {} if you rely entirely on environment variables and do not send emails.

File 2: forma.yaml (Semester Settings)¶

What it is¶

A YAML file that stores settings for your entire semester: course name, class sections, LLM preferences, and file path conventions. Created once at the start of the semester, rarely changed after that.

Where to put it¶

Place it at the root of your project directory. FormA searches upward from your current working directory to find it, so it covers all subdirectories (week folders) automatically.

anp2026_formative_analysis/
    forma.yaml              <-- here
    week_01/
    week_02/

How to create it¶

Either run the interactive wizard:

forma init

Or create it manually with a text editor. Below is a complete, annotated example.

Complete example with explanations¶

# forma.yaml -- semester-level settings
# Only fill in the fields you need. Everything has a sensible default.

project:
  course_name: "Human Anatomy and Physiology"  # appears in report headers
  year: 2026                                    # academic year (>= 2020)
  semester: 1                                   # 1 = spring, 2 = fall
  grade: 1                                      # student year (1 = freshman)

classes:
  identifiers: [A, B, C, D]       # your class section labels
  join_pattern: "final_{class}.yaml"   # {class} is replaced with A, B, C, D
  eval_pattern: "eval_{class}"         # directory pattern for evaluation output

paths:
  join_dir: ""                     # base directory for joined data files
  output_dir: ""                   # where PDF reports are saved
  longitudinal_store: ""           # path to longitudinal tracking file
  font_path: null                  # Korean font path (null = auto-detect)

ocr:
  ocr_model: null                  # LLM model for OCR (null = provider default)
  spreadsheet_url: ""              # Google Sheets URL for student responses
  num_questions: 2                 # number of answer areas per exam sheet

evaluation:
  provider: "gemini"               # "gemini" or "anthropic"
  model: null                      # scoring model (null = provider default)
  n_calls: 3                       # LLM calls per question (1-5, higher = more reliable)

reports:
  dpi: 150                         # chart resolution (72-600)

prediction:
  model_path: null                 # risk prediction model (.pkl file)

current_week: 2                    # update this each week

Field-by-field explanation¶

`project` section — Course metadata¶

This information appears in PDF report headers and file naming. Fill it in once at the start of the semester.

Field	What to write	When to change
`course_name`	Full course name as you want it to appear on reports.	Never (once set)
`year`	Academic year. Must be 2020 or later.	Each academic year
`semester`	`1` for spring/first semester, `2` for fall/second semester.	Each semester
`grade`	Student year level. `1` = freshman, `2` = sophomore, etc.	Each year

`classes` section — Section configuration¶

Defines your class sections and the file naming patterns FormA uses to find data files for each section.

Field	What to write	Example
`identifiers`	List of class section labels. These must match the labels you use in `--class` flags and in `{class}` patterns.	`[A, B, C, D]`
`join_pattern`	Filename pattern for joined data files. Must contain `{class}`. FormA replaces `{class}` with each identifier.	`"final_{class}.yaml"` produces `final_A.yaml`, `final_B.yaml`, etc.
`eval_pattern`	Directory pattern for evaluation results. Must contain `{class}`.	`"eval_{class}"` produces `eval_A/`, `eval_B/`, etc.

When to set these: Fill in identifiers at the start of the semester. Fill in join_pattern and eval_pattern if you use batch commands (forma report batch). If you always specify paths explicitly on the command line, you can leave the patterns empty.

`paths` section — Directory paths¶

These paths are used by report and batch commands so you do not have to type them every time.

Field	What to write	Example
`join_dir`	Directory containing joined data files (the `final_*.yaml` files). Leave empty if you always specify it on the command line.	`"results/week_01"`
`output_dir`	Directory where PDF reports are saved.	`"reports/"`
`longitudinal_store`	Path to the YAML file that accumulates results across weeks. Create this file after your first evaluation; it grows over the semester.	`"store/longitudinal.yaml"`
`font_path`	Path to a `.ttf` Korean font file. Set to `null` and FormA will try to find one automatically. Only set this if auto-detection fails.	`null`

Tip: These paths are relative to the directory where you run the command, not relative to forma.yaml. If you always cd into your project root before running commands, relative paths work fine.

`ocr` section — OCR scanning settings¶

Controls how FormA processes scanned answer sheets.

Field	What to write	Example
`ocr_model`	LLM model ID for OCR text extraction. Set to `null` to use the provider's default model (`gemini-2.5-flash`). Set a specific model if you want to pin the version.	`null` or `"gemini-2.5-flash"`
`spreadsheet_url`	Google Sheets URL if students submit answers online. FormA reads responses from this sheet during `forma ocr join`. Leave empty if you only use paper scans.	`"https://docs.google.com/spreadsheets/d/abc..."`
`num_questions`	Number of answer areas per exam sheet. This tells the OCR pipeline how many text regions to extract from each scanned image.	`2`

`evaluation` section — LLM scoring settings¶

Controls the AI evaluation pipeline that scores student answers.

Field	What to write	Example
`provider`	LLM provider for scoring. `"gemini"` (Google, free tier available) or `"anthropic"` (Claude, paid).	`"gemini"`
`model`	Specific model name. Set to `null` to use the provider's default (recommended).	`null` or `"claude-sonnet-4-6"`
`n_calls`	How many times to call the LLM per question. Higher values improve reliability through median aggregation. `3` is the recommended default. Use `1` for quick testing.	`3`

Note: ocr.ocr_model and evaluation.model are separate settings. The OCR model extracts text from images (needs a vision-capable model). The evaluation model scores written answers (needs a reasoning-capable model). You can use different models for each.

`reports` section — PDF output settings¶

Field	What to write	Example
`dpi`	Chart image resolution in dots per inch. Higher values produce sharper charts but larger files. `150` is a good balance.	`150`

`prediction` section — Risk prediction¶

Field	What to write	Example
`model_path`	Path to a trained risk prediction model (`.pkl` file). Set to `null` until you have trained a model with `forma train risk`.	`null` or `"models/risk.pkl"`

`current_week` — Top-level field¶

Field	What to write	Example
`current_week`	The current week number. Update this each week. Used by commands that need to know which week it is.	`2`

File 3: week.yaml (Per-Week Settings)¶

What it is¶

A YAML file that stores settings specific to one week's formative assessment: which exam questions were used, where the scanned images are, what the output files should be named. You create one week.yaml per week directory.

Where to put it¶

Inside each week's directory:

anp2026_formative_analysis/
    forma.yaml
    week_01/
        week.yaml           <-- week 1 settings
        scans_1A_w1/         <-- scanned images for class A
        scans_1B_w1/         <-- scanned images for class B
    week_02/
        week.yaml           <-- week 2 settings
        ...

FormA discovers week.yaml by searching upward from your current directory. If you cd into week_01/ and run a command, FormA finds week_01/week.yaml automatically.

Complete example with explanations¶

# week.yaml -- settings for one week's formative assessment

week: 1                    # week number (required, must be >= 1)

select:
  source: "../exams/Ch01_FormativeTest.yaml"   # path to question bank
  questions: [1, 3]                             # which questions to use
  num_papers: 220                               # exam copies to print
  form_url: "https://docs.google.com/forms/d/e/.../viewform?usp=pp_url&entry.123={student_id}"

ocr:
  num_questions: 2                              # answer areas per sheet
  image_dir_pattern: "scans_1{class}_w1"        # image directory pattern
  ocr_output_pattern: "ocr_results_{class}.yaml"
  join_output_pattern: "final_{class}.yaml"
  join_forms_csv: "week1_ids.csv"               # CSV with student IDs
  student_id_column: "student_id"               # column name in the CSV

eval:
  config: "../exams/Ch01_FormativeTest.yaml"    # exam config (answer key)
  questions_used: [1, 3]                        # must match select.questions
  responses_pattern: "final_{class}.yaml"       # input: joined results
  output_pattern: "eval_{class}"                # output: evaluation directory
  skip_feedback: false                          # generate written feedback?
  skip_graph: true                              # skip knowledge graph comparison?
  generate_reports: true                        # auto-generate student PDFs?

Field-by-field explanation¶

`week` — Top-level required field¶

Field	What to write	Example
`week`	The week number for this assessment. Must be >= 1.	`1`

`select` section — Exam generation¶

These fields are used by forma select to generate printable exam PDFs.

Field	What to write	Example
`source`	Path to the FormativeTest YAML file containing your question bank. Relative to this `week.yaml` file's directory.	`"../exams/Ch01_FormativeTest.yaml"`
`questions`	List of question serial numbers to include in this week's exam. These are the `sn` fields from your question bank.	`[1, 3]`
`num_papers`	How many copies of the exam to generate (one per student plus extras).	`220`
`form_url`	Google Forms URL template. The `{student_id}` placeholder is replaced with each student's ID to create pre-filled links. Leave empty if you do not use Google Forms.	`"https://docs.google.com/forms/d/e/.../viewform?usp=pp_url&entry.123={student_id}"`

`ocr` section — Scan processing¶

These fields tell forma ocr scan and forma ocr join where to find images and where to write results.

Field	What to write	Example
`num_questions`	Number of answer areas per scanned sheet. Must match the number of questions on the exam.	`2`
`image_dir_pattern`	Directory containing scanned images. `{class}` is replaced with the class label (A, B, ...).	`"scans_1{class}_w1"`
`ocr_output_pattern`	Where to save OCR results. `{class}` is replaced.	`"ocr_results_{class}.yaml"`
`join_output_pattern`	Where to save joined (OCR + Forms) results. `{class}` is replaced.	`"final_{class}.yaml"`
`join_forms_csv`	CSV file containing student IDs from Google Forms. Used during `forma ocr join` to match paper scans with online submissions. Leave empty if not applicable.	`"week1_ids.csv"`
`student_id_column`	Column name in the CSV that contains student IDs. This must match the exact header text in your CSV file.	`"student_id"`
`crop_coords`	Coordinates for cropping answer areas from scanned images. Do not fill this in manually. FormA populates it automatically the first time you run `forma ocr scan` and interactively select the crop region.	(auto-populated)
`review_threshold`	Confidence threshold for flagging low-quality OCR results (0.0-1.0). Results below this threshold are marked for manual review. Default is `0.75`.	`0.75`

About {class} patterns: Every field that contains {class} is expanded once per class section. When you run forma ocr scan --class A, FormA reads image_dir_pattern: "scans_1{class}_w1" and opens scans_1A_w1/. When you run --class B, it opens scans_1B_w1/. This lets you use one week.yaml for all sections.

`eval` section — Evaluation pipeline¶

These fields tell forma eval how to score student responses.

Field	What to write	Example
`config`	Path to the exam configuration YAML (contains correct answers, concept tags, rubric). Usually the same as `select.source`.	`"../exams/Ch01_FormativeTest.yaml"`
`questions_used`	Which questions were actually used in this week's exam. Must match `select.questions`.	`[1, 3]`
`responses_pattern`	Input file pattern. `{class}` is replaced. This should point to the output of `forma ocr join`.	`"final_{class}.yaml"`
`output_pattern`	Output directory pattern. `{class}` is replaced. Evaluation results are written here.	`"eval_{class}"`
`skip_feedback`	Set to `true` to skip generating written feedback text. Useful for quick test runs.	`false`
`skip_graph`	Set to `true` to skip knowledge graph comparison (the triplet-based analysis). Saves time if you do not need it.	`true`
`generate_reports`	Set to `true` to automatically generate individual student PDF reports after evaluation.	`true`

Putting It All Together: A Typical Semester Setup¶

Here is what a real project directory looks like after two weeks of assessments:

anp2026_formative_analysis/
    forma.yaml                          # semester settings (created once)
    exams/
        Ch01_FormativeTest.yaml         # question bank - week 1
        Ch03_FormativeTest.yaml         # question bank - week 2
    week_01/
        week.yaml                       # week 1 settings
        scans_1A_w1/                    # scanned images - class A
        scans_1B_w1/                    # scanned images - class B
        ocr_results_A.yaml             # OCR output
        final_A.yaml                    # joined results
        eval_A/                         # evaluation output
            res_lvl1/concept_results.yaml
            res_lvl2/llm_results.yaml
            ...
    week_02/
        week.yaml                       # week 2 settings
        ...

Step-by-step first-time setup¶

Create config.json with your API key:

mkdir -p ~/.config/formative-analysis
cat > ~/.config/formative-analysis/config.json << 'EOF'
{
  "llm": {
    "provider": "gemini",
    "api_key": "AIzaSy..."
  }
}
EOF
chmod 600 ~/.config/formative-analysis/config.json

Create forma.yaml in your project root:
```
cd anp2026_formative_analysis/
forma init
```
Answer the interactive prompts, then edit the generated file to fill in patterns.
Create week.yaml for your first week:
```
mkdir week_01 && cd week_01
```
Create week.yaml with a text editor, filling in the exam source, questions, and file patterns.

Run the pipeline:

forma ocr scan --class A          # scan answer sheets
forma ocr join --class A          # merge with Google Forms data
forma eval --class A              # AI-powered evaluation

Next week: Create week_02/week.yaml with updated paths and question numbers. Increment current_week in forma.yaml.

Quick Reference: Which File Controls What¶

Setting	Where to configure	Why
API keys	`config.json`	Secrets stay out of version control
SMTP server	`config.json`	Secrets stay out of version control
Course name, year	`forma.yaml`	Fixed for the semester
Class sections (A, B, C, D)	`forma.yaml`	Fixed for the semester
LLM provider and model	`forma.yaml`	Usually the same all semester
OCR model	`forma.yaml`	Usually the same all semester
Which questions this week	`week.yaml`	Changes every week
Scan image directories	`week.yaml`	Changes every week
Output file patterns	`week.yaml`	Changes every week
Google Forms CSV	`week.yaml`	Changes every week

Troubleshooting¶

"No config file found"¶

FormA cannot find config.json. Check that the file exists at ~/.config/formative-analysis/config.json and is valid JSON. Run:

cat ~/.config/formative-analysis/config.json | python3 -m json.tool

If this prints an error, your JSON syntax is broken (usually a missing comma or quote).

"week.yaml not found"¶

FormA searches upward from your current directory. Make sure you are inside (or below) the directory that contains week.yaml. Alternatively, specify the path explicitly:

forma ocr scan --class A --week-config path/to/week.yaml

"Unknown key in config.json: 'xxx'"¶

You have a section name in config.json that FormA does not recognize. Valid top-level keys are: llm, smtp, naver_ocr. Check for typos.

OCR results look wrong¶

This is almost always a scan quality issue, not a configuration issue. See Tips and Gotchas in the New Teachers Guide for scan quality requirements.

Configuration Setup Guide¶

Overview: Three Files, Three Purposes¶

File 1: config.json (Secrets)¶

What it is¶

Where to put it¶

What to write in it¶

Field-by-field explanation¶

llm section — LLM API access¶

smtp section — Email delivery¶

naver_ocr section — Naver CLOVA OCR (optional)¶

Minimal config.json (most users)¶

File 2: forma.yaml (Semester Settings)¶

What it is¶

Where to put it¶

How to create it¶

Complete example with explanations¶

Field-by-field explanation¶

project section — Course metadata¶

classes section — Section configuration¶

paths section — Directory paths¶

ocr section — OCR scanning settings¶

evaluation section — LLM scoring settings¶

reports section — PDF output settings¶

prediction section — Risk prediction¶

current_week — Top-level field¶

File 3: week.yaml (Per-Week Settings)¶

What it is¶

Where to put it¶

Complete example with explanations¶

Field-by-field explanation¶

week — Top-level required field¶

select section — Exam generation¶

ocr section — Scan processing¶

eval section — Evaluation pipeline¶

Putting It All Together: A Typical Semester Setup¶

Step-by-step first-time setup¶

Quick Reference: Which File Controls What¶

Troubleshooting¶

"No config file found"¶

"week.yaml not found"¶

"Unknown key in config.json: 'xxx'"¶

OCR results look wrong¶

Further Reading¶

`llm` section — LLM API access¶

`smtp` section — Email delivery¶

`naver_ocr` section — Naver CLOVA OCR (optional)¶

`project` section — Course metadata¶

`classes` section — Section configuration¶

`paths` section — Directory paths¶

`ocr` section — OCR scanning settings¶

`evaluation` section — LLM scoring settings¶

`reports` section — PDF output settings¶

`prediction` section — Risk prediction¶

`current_week` — Top-level field¶

`week` — Top-level required field¶

`select` section — Exam generation¶

`ocr` section — Scan processing¶

`eval` section — Evaluation pipeline¶