Skip to content

Add counterfactual dataset #2119

Add counterfactual dataset

Add counterfactual dataset #2119

Triggered via pull request May 1, 2026 10:29
Status Failure
Total duration 1m 49s
Artifacts 5

CI.yml

on: pull_request
select-category
3s
select-category
lint-and-test
16s
lint-and-test
get-entries  /  get-entries
20s
get-entries / get-entries
Matrix: mock-evaluation
summarize-results  /  Results
37s
summarize-results / Results
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 4 warnings
lint-and-test
Process completed with exit code 1.
ruff (UP042): src/bcbench/analysis/family.py#L16
src/bcbench/analysis/family.py:16:7: UP042 Class FamilyType inherits from both `str` and `enum.Enum` help: Inherit from `enum.StrEnum`
ruff (ANN003): evaluator/counterfactual_scores.py#L17
evaluator/counterfactual_scores.py:17:43: ANN003 Missing type annotation for `**kwargs`
ruff (ANN003): evaluator/counterfactual_scores.py#L12
evaluator/counterfactual_scores.py:12:43: ANN003 Missing type annotation for `**kwargs`
ruff (ANN003): evaluator/counterfactual_scores.py#L7
evaluator/counterfactual_scores.py:7:43: ANN003 Missing type annotation for `**kwargs`
bcbench.results.base
Result for microsoftInternal__NAV-224668__cf-1 missing metrics: execution_time, llm_duration, turn_count, prompt_tokens, completion_tokens, tool_usage
bcbench.results.base
Creating result for microsoft__BCApps-4699__cf-1 with no agent metrics - performance data will be unavailable
bcbench.results.base
Result for microsoftInternal__NAV-203923__cf-1 missing metrics: execution_time, llm_duration, turn_count, prompt_tokens, completion_tokens, tool_usage
bcbench.results.base
Creating result for microsoftInternal__NAV-175765__cf-1 with no agent metrics - performance data will be unavailable

Artifacts

Produced during runtime
Name Size Digest
evaluation-summary Expired
527 Bytes
sha256:7b1b89f71dc895a9dbbfe0ef0948fd66e7f509f2401b4db77805ed2ff0b2b912
microsoftInternal__NAV-175765__cf-1 Expired
470 Bytes
sha256:09d8b808a0d816c8fd7bf3b9fdf687646e7b32c938ef9e6be7abec0ab2b0ea31
microsoftInternal__NAV-203923__cf-1 Expired
527 Bytes
sha256:15999c12201789a89505f984fbd76ef679b2e6a6c8f732c006a807a14adebd60
microsoftInternal__NAV-224668__cf-1 Expired
526 Bytes
sha256:c56425eac079902519a83ed427bb0aa120a7700b8af3c3713f9511c62d0ac154
microsoft__BCApps-4699__cf-1 Expired
400 Bytes
sha256:dbd02ff2e1e5b006d4e9fee9be7fe0dbcbb7389ea374c7b2fd1fbfaab2d2b1a1