Files

Rohit Ghumare c3f43d8b61 Expand toolkit to 135 agents, 120 plugins, 796 total files

- Add 60 new agents across all 10 categories (75 -> 135)
- Add 95 new plugins with command files (25 -> 120)
- Update all agents to use model: opus
- Update README with complete plugin/agent tables
- Update marketplace.json with all 120 plugins

2026-02-04 21:08:28 +00:00

1.4 KiB

Raw Permalink Blame History

/compare-models - Compare ML Models

Compare multiple ML models to select the best performer.

Steps

Ask the user for the models to compare and the evaluation dataset
Load all models and verify they accept the same input format
Run inference with each model on the identical test dataset
Calculate the same metrics for all models for fair comparison
Create a side-by-side comparison table with all metrics
Perform statistical significance testing between model pairs (McNemar, paired t-test)
Compare inference performance: latency, throughput, memory footprint
Calculate the cost-performance trade-off: accuracy vs compute cost
Identify which model performs best on specific data subsets
Evaluate robustness: test with noisy or adversarial inputs
Create a recommendation based on the use case priorities (accuracy vs speed vs cost)
Generate a comparison report with tables, rankings, and the recommended model

Rules

Use the exact same test data and preprocessing for all models
Apply statistical significance tests; do not rely on point estimates alone
Consider practical significance, not just statistical significance
Include model size and inference cost in the comparison
Test edge cases that differentiate the models
Report the evaluation methodology for reproducibility
Consider deployment constraints (model size, latency requirements) in recommendations

1.4 KiB Raw Permalink Blame History

/compare-models - Compare ML Models

Steps

Rules

1.4 KiB

Raw Permalink Blame History