Files

Rohit Ghumare c3f43d8b61 Expand toolkit to 135 agents, 120 plugins, 796 total files

- Add 60 new agents across all 10 categories (75 -> 135)
- Add 95 new plugins with command files (25 -> 120)
- Update all agents to use model: opus
- Update README with complete plugin/agent tables
- Update marketplace.json with all 120 plugins

2026-02-04 21:08:28 +00:00

1.4 KiB

Raw Permalink Blame History

/evaluate-model - Evaluate ML Model

Evaluate machine learning model performance with comprehensive metrics.

Steps

Ask the user for the model type: classification, regression, NLP, or generative
Load the model and test dataset from the specified paths
Run inference on the entire test dataset and collect predictions
For classification models, calculate: accuracy, precision, recall, F1-score, AUC-ROC
For regression models, calculate: MAE, MSE, RMSE, R-squared, MAPE
For NLP models, calculate: BLEU, ROUGE, perplexity, exact match
Generate a confusion matrix for classification tasks
Identify the worst-performing classes or data segments
Calculate calibration metrics: expected calibration error
Run performance profiling: inference time per sample, memory usage, throughput
Check for bias: evaluate performance across demographic subgroups if applicable
Generate a comprehensive evaluation report with all metrics and visualizations

Rules

Use stratified sampling if the test set is imbalanced
Report confidence intervals for all metrics when sample size allows
Include both micro and macro averages for multi-class metrics
Test on held-out data never seen during training
Report inference latency percentiles (p50, p95, p99) not just averages
Check for data leakage between train and test sets
Include baseline comparisons (random, majority class, previous model version)

1.4 KiB Raw Permalink Blame History

/evaluate-model - Evaluate ML Model

Steps

Rules

1.4 KiB

Raw Permalink Blame History