microsoft
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 0 deletions b/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎tools/README.md‎
Lines changed: 11 additions & 0 deletions b/‎tools/README.md‎
Lines changed: 11 additions & 0 deletions
@@ -5,6 +5,7 @@ This is a benchmark for evaluating coding agents on real-world Business Central
 - **Dataset**: Benchmark entries following SWE-Bench schema with BC-specific adjustments
 - **Python Package** (`src/bcbench/`): CLI tools, agent implementations, and validation utilities
 - **PowerShell Scripts** (`scripts/`): Environment setup and dataset verification using AL-GO/BCContainerHelper
+- **Tools** (`tools/`): Standalone scripts for downloading and analyzing GitHub Actions artifacts
 - **Agent Evaluations**: Focuses on GitHub Copilot CLI and Claude Code
 - **Experiments**: MCP Servers, custom instructions, custom agents, skills, etc. and their performance on the benchmark
 - **Notebooks** (`notebooks/`): Analysis and visualization of benchmark results
 
@@ -0,0 +1,11 @@
+# Tools
+
+Standalone scripts for downloading and analyzing GitHub Actions artifacts.
+
+## `altest/`
+
+Scripts for analyzing AL test results from BC-Bench GitHub Actions runs:
+
+- **`Get-WorkflowSummary.ps1`** — Fetches workflow run summaries from GitHub Actions, downloads run artifacts, and extracts JSONL result files (even from nested zips).
+- **`bcbench_analyze_artifacts.py`** — Extracts, collects, and summarizes test results from downloaded artifact zips or pre-extracted folders. Outputs failure rankings, error variations, and extracted test code.
+- **`group_errors_from_summary.py`** — Groups error messages from `errors_summary.csv` into high-level categories for easier triage.