hoops_ai.ml.context_layer.CategoricalRule

class hoops_ai.ml.context_layer.CategoricalRule(temperature=10.0, score_floor=0.0, min_confidence=0.0, min_margin=0.1, top1_dominance_threshold=1.0, top1_dominance_weight=0.85, min_top1_margin_for_dominance=0.08, normalize=None)

Bases: AggregationRule

Predict categorical context by exponential score-weighting with consensus guard.

Combines three key ideas: 1. Exponential sharpening — top-scoring hits dominate the vote,

making it safe to use large candidate pools (top-50+) without noise dilution from low-scoring neighbors.

  1. Top-1 exact-match dominance — when the best hit is a near-duplicate of the query (score ≥ top1_dominance_threshold) and clearly beats the runner-up (margin ≥ min_top1_margin_for_dominance), the rule short-circuits the softmax and gives that hit a fixed weight (top1_dominance_weight), splitting the remainder across the tail. This handles the “the query is already in the index” / self-hit case deterministically instead of letting softmax smear the result.

  2. Margin-based abstention — refuses to predict when the winner’s lead over the runner-up is too slim, avoiding unreliable coin-flip predictions in production pipelines.

weight_i = exp(temperature × score_i) # softmax path weights = [w_top1, (1-w_top1) * softmax(rest)] # top-1 dominance path

A prediction is made only when: - The winner’s weighted share ≥ min_confidence - The gap between winner and runner-up ≥ min_margin

Special cases: - temperature=0 → uniform weighting (all hits equal) - min_margin=0 → always predicts (no consensus guard) - score_floor > 0 → discards distant hits before voting - top1_dominance_threshold >= 1.0 (default) → dominance disabled, pure softmax

Parameters:
  • temperature (float) – Controls exponential sharpness. Higher values focus more weight on top hits. Default 10.0.

  • score_floor (float) – Hits below this score are discarded entirely. Default 0.0.

  • min_confidence (float) – Minimum weighted vote share for a valid prediction. Predictions below this return None (abstain). Default 0.0.

  • min_margin (float) – Minimum confidence gap between winner and runner-up. If the race is tighter than this, returns None. Default 0.1.

  • top1_dominance_threshold (float) – Score at/above which the best hit is treated as a near-duplicate of the query and assigned top1_dominance_weight directly. Use 0.99 to enable. Default 1.0 (disabled).

  • top1_dominance_weight (float) – Weight assigned to the top-1 hit when dominance triggers. The remaining hits share 1 - this via softmax. Default 0.85.

  • min_top1_margin_for_dominance (float) – Minimum score gap between top-1 and top-2 for dominance to trigger. Prevents triggering when the second hit is also near-duplicate. Default 0.08.

  • normalize (Callable[[Any], Any] | Mapping[str, Callable[[Any], Any]] | None) – Optional callable mapping raw values to a canonical form for grouping (e.g. "steel 1018" and "S235JR""steel"). The rule groups by normalize(value) and reports the most frequent raw form per group as the prediction’s value. Default None (no normalization).

predict(values, scores, key)

Predict a context value from neighbor evidence.

Parameters:
  • values (list[Any]) – The context values collected from hits for this key, ordered by relevance (best match first).

  • scores (list[float]) – Similarity scores corresponding 1-to-1 with values (higher = more similar).

  • key (str) – The context key being inferred.

Returns:

ContextPrediction or None if insufficient evidence.

Return type:

ContextPrediction | None

predict_with_context(values, scores, key, query_context=None, *, hits=None)

Predict with optional query context and per-hit metadata.

Default implementation ignores query_context and hits and delegates to predict(). Override in subclasses that benefit from one or both.

Parameters:
Return type:

ContextPrediction | None