- Add 60 new agents across all 10 categories (75 -> 135) - Add 95 new plugins with command files (25 -> 120) - Update all agents to use model: opus - Update README with complete plugin/agent tables - Update marketplace.json with all 120 plugins
1.4 KiB
1.4 KiB
/compare-models - Compare ML Models
Compare multiple ML models to select the best performer.
Steps
- Ask the user for the models to compare and the evaluation dataset
- Load all models and verify they accept the same input format
- Run inference with each model on the identical test dataset
- Calculate the same metrics for all models for fair comparison
- Create a side-by-side comparison table with all metrics
- Perform statistical significance testing between model pairs (McNemar, paired t-test)
- Compare inference performance: latency, throughput, memory footprint
- Calculate the cost-performance trade-off: accuracy vs compute cost
- Identify which model performs best on specific data subsets
- Evaluate robustness: test with noisy or adversarial inputs
- Create a recommendation based on the use case priorities (accuracy vs speed vs cost)
- Generate a comparison report with tables, rankings, and the recommended model
Rules
- Use the exact same test data and preprocessing for all models
- Apply statistical significance tests; do not rely on point estimates alone
- Consider practical significance, not just statistical significance
- Include model size and inference cost in the comparison
- Test edge cases that differentiate the models
- Report the evaluation methodology for reproducibility
- Consider deployment constraints (model size, latency requirements) in recommendations