hoops_ai.ml.context_layer.CategoricalRule

class hoops_ai.ml.context_layer.CategoricalRule(temperature=10.0, score_floor=0.0, min_confidence=0.0, min_margin=0.1, top1_dominance_threshold=1.0, top1_dominance_weight=0.85, min_top1_margin_for_dominance=0.08, normalize=None)

Bases: AggregationRule

Predict categorical context by exponential score-weighting with consensus guard.

Combines three key ideas: 1. Exponential sharpening — top-scoring hits dominate the vote,

making it safe to use large candidate pools (top-50+) without noise dilution from low-scoring neighbors.

Top-1 exact-match dominance — when the best hit is a near-duplicate of the query (score ≥ top1_dominance_threshold) and clearly beats the runner-up (margin ≥ min_top1_margin_for_dominance), the rule short-circuits the softmax and gives that hit a fixed weight (top1_dominance_weight), splitting the remainder across the tail. This handles the “the query is already in the index” / self-hit case deterministically instead of letting softmax smear the result.
Margin-based abstention — refuses to predict when the winner’s lead over the runner-up is too slim, avoiding unreliable coin-flip predictions in production pipelines.

weight_i = exp(temperature × score_i) # softmax path weights = [w_top1, (1-w_top1) * softmax(rest)] # top-1 dominance path

A prediction is made only when: - The winner’s weighted share ≥ min_confidence - The gap between winner and runner-up ≥ min_margin

Special cases: - temperature=0 → uniform weighting (all hits equal) - min_margin=0 → always predicts (no consensus guard) - score_floor > 0 → discards distant hits before voting - top1_dominance_threshold >= 1.0 (default) → dominance disabled, pure softmax

Parameters:

temperature (float) – Controls exponential sharpness. Higher values focus more weight on top hits. Default 10.0.
score_floor (float) – Hits below this score are discarded entirely. Default 0.0.
min_confidence (float) – Minimum weighted vote share for a valid prediction. Predictions below this return None (abstain). Default 0.0.
min_margin (float) – Minimum confidence gap between winner and runner-up. If the race is tighter than this, returns None. Default 0.1.
top1_dominance_threshold (float) – Score at/above which the best hit is treated as a near-duplicate of the query and assigned top1_dominance_weight directly. Use 0.99 to enable. Default 1.0 (disabled).
top1_dominance_weight (float) – Weight assigned to the top-1 hit when dominance triggers. The remaining hits share 1 - this via softmax. Default 0.85.
min_top1_margin_for_dominance (float) – Minimum score gap between top-1 and top-2 for dominance to trigger. Prevents triggering when the second hit is also near-duplicate. Default 0.08.
normalize (Callable[[Any], Any] | Mapping[str, Callable[[Any], Any]] | None) – Optional callable mapping raw values to a canonical form for grouping (e.g. "steel 1018" and "S235JR" → "steel"). The rule groups by normalize(value) and reports the most frequent raw form per group as the prediction’s value. Default None (no normalization).

predict(values, scores, key)

Predict a context value from neighbor evidence.

Parameters:

values (list[Any]) – The context values collected from hits for this key, ordered by relevance (best match first).
scores (list[float]) – Similarity scores corresponding 1-to-1 with values (higher = more similar).
key (str) – The context key being inferred.

Returns:

ContextPrediction or None if insufficient evidence.

Return type:

ContextPrediction | None

predict_with_context(values, scores, key, query_context=None, *, hits=None)

Predict with optional query context and per-hit metadata.

Default implementation ignores query_context and hits and delegates to predict(). Override in subclasses that benefit from one or both.

Parameters:

values (list[Any])
scores (list[float])
key (str)
query_context (dict[str, Any] | None)
hits (list[VectorHit] | None)

Return type:

ContextPrediction | None

hoops_ai.ml.context_layer.CategoricalRule

Hello! I'm HOOPSY