hoops_ai.ml.context_layer.CategoricalRule
- class hoops_ai.ml.context_layer.CategoricalRule(temperature=10.0, score_floor=0.0, min_confidence=0.0, min_margin=0.1, top1_dominance_threshold=1.0, top1_dominance_weight=0.85, min_top1_margin_for_dominance=0.08, normalize=None)
Bases:
AggregationRulePredict categorical context by exponential score-weighting with consensus guard.
Combines three key ideas: 1. Exponential sharpening — top-scoring hits dominate the vote,
making it safe to use large candidate pools (top-50+) without noise dilution from low-scoring neighbors.
Top-1 exact-match dominance — when the best hit is a near-duplicate of the query (score ≥
top1_dominance_threshold) and clearly beats the runner-up (margin ≥min_top1_margin_for_dominance), the rule short-circuits the softmax and gives that hit a fixed weight (top1_dominance_weight), splitting the remainder across the tail. This handles the “the query is already in the index” / self-hit case deterministically instead of letting softmax smear the result.Margin-based abstention — refuses to predict when the winner’s lead over the runner-up is too slim, avoiding unreliable coin-flip predictions in production pipelines.
weight_i = exp(temperature × score_i) # softmax path weights = [w_top1, (1-w_top1) * softmax(rest)] # top-1 dominance path
A prediction is made only when: - The winner’s weighted share ≥ min_confidence - The gap between winner and runner-up ≥ min_margin
Special cases: - temperature=0 → uniform weighting (all hits equal) - min_margin=0 → always predicts (no consensus guard) - score_floor > 0 → discards distant hits before voting - top1_dominance_threshold >= 1.0 (default) → dominance disabled, pure softmax
- Parameters:
temperature (float) – Controls exponential sharpness. Higher values focus more weight on top hits. Default 10.0.
score_floor (float) – Hits below this score are discarded entirely. Default 0.0.
min_confidence (float) – Minimum weighted vote share for a valid prediction. Predictions below this return None (abstain). Default 0.0.
min_margin (float) – Minimum confidence gap between winner and runner-up. If the race is tighter than this, returns None. Default 0.1.
top1_dominance_threshold (float) – Score at/above which the best hit is treated as a near-duplicate of the query and assigned
top1_dominance_weightdirectly. Use 0.99 to enable. Default 1.0 (disabled).top1_dominance_weight (float) – Weight assigned to the top-1 hit when dominance triggers. The remaining hits share
1 - thisvia softmax. Default 0.85.min_top1_margin_for_dominance (float) – Minimum score gap between top-1 and top-2 for dominance to trigger. Prevents triggering when the second hit is also near-duplicate. Default 0.08.
normalize (Callable[[Any], Any] | Mapping[str, Callable[[Any], Any]] | None) – Optional callable mapping raw values to a canonical form for grouping (e.g.
"steel 1018"and"S235JR"→"steel"). The rule groups bynormalize(value)and reports the most frequent raw form per group as the prediction’s value. Default None (no normalization).
- predict(values, scores, key)
Predict a context value from neighbor evidence.
- Parameters:
- Returns:
ContextPrediction or None if insufficient evidence.
- Return type:
ContextPrediction | None