hoops_ai.ml.context_layer
Quick Overview
Classes
Base class for context-aggregation rules.
CategoricalRule([temperature, score_floor, ...])Predict categorical context by exponential score-weighting with consensus guard.
ContextPrediction(value, confidence[, ...])A single predicted context value with confidence.
ContextPredictor(context_provider[, ...])Predicts engineering context (metadata) for a query part from its nearest neighbors.
InMemoryContextProvider([initial, numeric_keys])
NearestNeighborRule([threshold])Use the top-similarity hit's value when shape similarity is high enough.
NumericWeightedRule([log_scale, ...])Predict continuous numeric context (e.g. cost, weight, time) from neighbors.
RelevanceWeighter(factors)Adjusts hit scores based on metadata agreement with a query context.
JsonContextProvider(path[, initial, create, ...])JSON-backed
ContextProviderfor lightweight persistent metadata.
Context Layer Module
Predicts engineering context (metadata) for query parts based on similarity search results.
ContextPredictor consumes a list of VectorHit objects from CADSearch
and predicts metadata values (e.g. material, category) by analyzing what
metadata is present in the nearest neighbors and how strong each match is.
Usage:
from hoops_ai.ml.context_layer import (
ContextPredictor, InMemoryContextProvider,
CategoricalRule, NumericWeightedRule,
)
store = InMemoryContextProvider(
{"part-a": {"Material": "Steel", "Cost": 12.5}},
numeric_keys=["Cost"],
)
# Zero-config: defaults are CategoricalRule + NumericWeightedRule.
# Dispatch is automatic via store.list_numeric_keys().
predictor = ContextPredictor(context_provider=store)
predictions = predictor.infer(hits, keys=["Material", "Cost"])
# predictions["Material"] / predictions["Cost"] are ContextPrediction
# objects with .value/.confidence.
# Customise either default or override per-key:
predictor = ContextPredictor(
context_provider=store,
default_categorical_rule=CategoricalRule(min_evidence=2),
default_numerical_rule=NumericWeightedRule(log_scale=True),
per_key_rules={"Material": CategoricalRule(min_top_share=0.8)},
)
- class hoops_ai.ml.context_layer.AggregationRule
Bases:
ABCBase class for context-aggregation rules.
A rule turns a list of
(value, score)pairs harvested from a single context key into oneContextPrediction. Subclasses must implementpredict();predict_with_context()is optional and defaults to ignoring the extra arguments and delegating topredict.Override
predict_with_context()when the rule wants to consume the query’s own attributes (query_context) or per-hit metadata (hits).NumericWeightedRuledoes this to run an internalRelevanceWeighterand/or an MLP without forcing the predictor to know about either.Override
bind()to receive a one-shot reference to theContextProviderat predictor construction. The default is a no-op so most rules can ignore it.- abstract predict(values, scores, key)
Predict a context value from neighbor evidence.
- Parameters:
- Returns:
ContextPrediction or None if insufficient evidence.
- Return type:
ContextPrediction | None
- class hoops_ai.ml.context_layer.CategoricalRule(temperature=10.0, score_floor=0.0, min_confidence=0.0, min_margin=0.1, top1_dominance_threshold=1.0, top1_dominance_weight=0.85, min_top1_margin_for_dominance=0.08, normalize=None)
Bases:
AggregationRulePredict categorical context by exponential score-weighting with consensus guard.
Combines three key ideas: 1. Exponential sharpening — top-scoring hits dominate the vote,
making it safe to use large candidate pools (top-50+) without noise dilution from low-scoring neighbors.
Top-1 exact-match dominance — when the best hit is a near-duplicate of the query (score ≥
top1_dominance_threshold) and clearly beats the runner-up (margin ≥min_top1_margin_for_dominance), the rule short-circuits the softmax and gives that hit a fixed weight (top1_dominance_weight), splitting the remainder across the tail. This handles the “the query is already in the index” / self-hit case deterministically instead of letting softmax smear the result.Margin-based abstention — refuses to predict when the winner’s lead over the runner-up is too slim, avoiding unreliable coin-flip predictions in production pipelines.
weight_i = exp(temperature × score_i) # softmax path weights = [w_top1, (1-w_top1) * softmax(rest)] # top-1 dominance path
A prediction is made only when: - The winner’s weighted share ≥ min_confidence - The gap between winner and runner-up ≥ min_margin
Special cases: - temperature=0 → uniform weighting (all hits equal) - min_margin=0 → always predicts (no consensus guard) - score_floor > 0 → discards distant hits before voting - top1_dominance_threshold >= 1.0 (default) → dominance disabled, pure softmax
- Parameters:
temperature (float) – Controls exponential sharpness. Higher values focus more weight on top hits. Default 10.0.
score_floor (float) – Hits below this score are discarded entirely. Default 0.0.
min_confidence (float) – Minimum weighted vote share for a valid prediction. Predictions below this return None (abstain). Default 0.0.
min_margin (float) – Minimum confidence gap between winner and runner-up. If the race is tighter than this, returns None. Default 0.1.
top1_dominance_threshold (float) – Score at/above which the best hit is treated as a near-duplicate of the query and assigned
top1_dominance_weightdirectly. Use 0.99 to enable. Default 1.0 (disabled).top1_dominance_weight (float) – Weight assigned to the top-1 hit when dominance triggers. The remaining hits share
1 - thisvia softmax. Default 0.85.min_top1_margin_for_dominance (float) – Minimum score gap between top-1 and top-2 for dominance to trigger. Prevents triggering when the second hit is also near-duplicate. Default 0.08.
normalize (Callable[[Any], Any] | Mapping[str, Callable[[Any], Any]] | None) – Optional callable mapping raw values to a canonical form for grouping (e.g.
"steel 1018"and"S235JR"→"steel"). The rule groups bynormalize(value)and reports the most frequent raw form per group as the prediction’s value. Default None (no normalization).
- predict(values, scores, key)
Predict a context value from neighbor evidence.
- Parameters:
- Returns:
ContextPrediction or None if insufficient evidence.
- Return type:
ContextPrediction | None
- class hoops_ai.ml.context_layer.ContextPrediction(value, confidence, evidence_count=0, alternatives=<factory>, status=None, reasons=<factory>, coverage=None, evidence=None, injected_context=None)
Bases:
objectA single predicted context value with confidence.
- Parameters:
- value
The inferred value for the context key.
- Type:
Any
- status
Optional decision label — one of
ready_to_propose,needs_review, orinsufficient_evidence.Nonewhen the caller has not requested status evaluation.- Type:
str | None
- reasons
Optional human-readable reasons explaining the status (empty when status is
ready_to_proposeorNone).
- coverage
Optional coverage diagnostics about the hit pool that produced this prediction. Keys include
observed_hits,total_hits,observed_score_weight,missing_score_weight. Successful predictions fromContextPredictor.inferinclude coverage; insufficient sentinels include coverage when status/evidence output was requested.
- evidence
Optional per-hit contribution records (rank, score, value, source). Populated only when
return_evidence=Trueis passed toContextPredictor.infer.
- injected_context
Optional snapshot of the
query_contexttheContextPredictoractually fed to the rule for this key. Populated for numeric keys whenContextPredictor.inferforwards earlierready_to_proposecategorical predictions (or the caller’squery_context) so downstream rules can re-rank neighbors.Nonewhen no context was injected.
- value: Any
- class hoops_ai.ml.context_layer.ContextPredictor(context_provider, default_categorical_rule=None, default_numerical_rule=None, per_key_rules=None)
Bases:
objectPredicts engineering context (metadata) for a query part from its nearest neighbors.
Given a list of
VectorHitobjects (fromCADSearch.search_by_shape) and a set of metadata keys to infer, this class fetches per-hit metadata from aContextProvider, extracts evidence per key, and dispatches each key to anAggregationRule.Dispatch precedence:
per_key_rules[key]if present.default_numerical_ruleifkey in context_provider.list_numeric_keys().default_categorical_ruleotherwise.
Defaults are constructed lazily:
CategoricalRule()andNumericWeightedRule()if you do not pass them.- Parameters:
context_provider (ContextProvider) – A
ContextProvidersubclass providing the metadata for indexed parts. Must declare its numeric keys viaContextProvider.list_numeric_keys()if thedefault_numerical_rulepath should fire.default_categorical_rule (AggregationRule | None) – Aggregation rule for non-numeric keys. Defaults to a fresh
CategoricalRule.default_numerical_rule (AggregationRule | None) – Aggregation rule for numeric keys (as declared by
context_provider.list_numeric_keys()). Defaults to a freshNumericWeightedRulewith auto-fittedRelevanceWeighter.per_key_rules (Mapping[str, AggregationRule] | None) – Optional mapping
{key: AggregationRule}that overrides both defaults on a per-key basis.
Example
- from hoops_ai.ml.context_layer import (
ContextPredictor, InMemoryContextProvider, CategoricalRule, NumericWeightedRule,
)
- store = InMemoryContextProvider(
{“part-a”: {“Material”: “Steel”, “Cost”: 12.5}}, numeric_keys=[“Cost”],
) predictor = ContextPredictor(context_provider=store) predictions = predictor.infer(hits, keys=[“Material”, “Cost”]) # “Material” uses CategoricalRule (default), “Cost” uses NumericWeightedRule (default).
- property context_provider: ContextProvider
- property default_categorical_rule: AggregationRule
- property default_numerical_rule: AggregationRule
- infer(hits, keys, query_context=None, *, return_evidence=False, status_policy=<object object>)
Infer context values from a list of search hits.
Metadata is fetched from the configured
ContextProviderin a single batched call;VectorHit.metadatais not consulted.Every key in
keysis mapped to aContextPrediction— the result is neverNone. When the underlying rule cannot produce a value (margin guard fired, no usable evidence, empty hit list) the returnedContextPredictioncarriesvalue=Noneandstatus=STATUS_INSUFFICIENTso callers can branch on a single, well-typed object.status_policycontrols how status is gated:Not passed →
DEFAULT_STATUS_POLICYis applied. Lenient defaults give every non-abstaining predictionSTATUS_READY; abstentions areSTATUS_INSUFFICIENT.Mapping → the caller’s policy is used (tighten any of
min_observed_hits,min_observed_score_weight,min_top_share,min_margin).Explicitly ``None`` → opt out of status evaluation entirely; predictions are still returned (never
None) but theirstatusisNoneandreasonsis empty.
When
keysmixes categorical and numeric entries, categoricals are inferred first and any prediction reachingSTATUS_READY(ready_to_propose) is merged into thequery_contextpassed to numeric rules. NumericContextPredictionobjects carry the merged context on theirinjected_contextfield so callers can see what conditioned the numeric estimate. The output dict preserves the caller’s key order.
- property per_key_rules: Mapping[str, AggregationRule]
- class hoops_ai.ml.context_layer.ContextProvider
Bases:
ABC- describe_for(part_ids)
Summarize key availability across
part_ids.Issues a single
get_contexts()call and aggregates{observed, missing, coverage}per top-level key.coverage = observed / len(part_ids). Ids that the store does not resolve count as missing for every key (consistent withget_contextsomitting unknown ids).Partners with a cheaper schema endpoint can override this method and skip the
get_contextsround-trip entirely.
- abstract get_contexts(part_ids)
Return
{part_id: attribute_dict}for ids that resolve.Missing ids must be omitted from the result (not returned with
None). Each value is a free-form dict whose top-level keys match the keys passed toContextPredictor.infer(..., keys=[...]).Implementations must treat this as the primary API and issue a single batched call against the backend — not a loop.
- list_numeric_keys()
Return the metadata keys that should be treated as numeric.
ContextPredictorconsults this list to route un-mapped keys to its numerical rule. Keys not returned here are treated as categorical. The default returns()(everything is categorical) so existing partner subclasses keep working unchanged; subclasses with a real schema should override.Per-key rules supplied to
ContextPredictoralways win over this list, so callers can still override on a per-call basis without changing the provider.
- class hoops_ai.ml.context_layer.InMemoryContextProvider(initial=None, *, numeric_keys=None)
Bases:
ContextProvider- describe_for(part_ids)
Summarize key availability across
part_ids.Issues a single
get_contexts()call and aggregates{observed, missing, coverage}per top-level key.coverage = observed / len(part_ids). Ids that the store does not resolve count as missing for every key (consistent withget_contextsomitting unknown ids).Partners with a cheaper schema endpoint can override this method and skip the
get_contextsround-trip entirely.
- get_contexts(part_ids)
Return
{part_id: attribute_dict}for ids that resolve.Missing ids must be omitted from the result (not returned with
None). Each value is a free-form dict whose top-level keys match the keys passed toContextPredictor.infer(..., keys=[...]).Implementations must treat this as the primary API and issue a single batched call against the backend — not a loop.
- list_numeric_keys()
Return the metadata keys that should be treated as numeric.
ContextPredictorconsults this list to route un-mapped keys to its numerical rule. Keys not returned here are treated as categorical. The default returns()(everything is categorical) so existing partner subclasses keep working unchanged; subclasses with a real schema should override.Per-key rules supplied to
ContextPredictoralways win over this list, so callers can still override on a per-call basis without changing the provider.
- class hoops_ai.ml.context_layer.JsonContextProvider(path, initial=None, *, create=True, store_path=None, id_filter=None, identifier_fields=('part_id',))
Bases:
ContextProviderJSON-backed
ContextProviderfor lightweight persistent metadata.pathmay point to either:A store-shaped JSON file whose root is
{part_id: payload}.A directory containing many single-part JSON files.
Directory mode is useful for source metadata corpora where each part has one JSON file. The source files are read, wrapped into a store-shaped mapping, and persisted to
store_path. Source files are not modified.- Parameters:
- describe_for(part_ids)
Summarize key availability across
part_ids.Issues a single
get_contexts()call and aggregates{observed, missing, coverage}per top-level key.coverage = observed / len(part_ids). Ids that the store does not resolve count as missing for every key (consistent withget_contextsomitting unknown ids).Partners with a cheaper schema endpoint can override this method and skip the
get_contextsround-trip entirely.
- get_contexts(part_ids)
Return
{part_id: attribute_dict}for ids that resolve.Missing ids must be omitted from the result (not returned with
None). Each value is a free-form dict whose top-level keys match the keys passed toContextPredictor.infer(..., keys=[...]).Implementations must treat this as the primary API and issue a single batched call against the backend — not a loop.
- list_numeric_keys()
Return the metadata keys that should be treated as numeric.
ContextPredictorconsults this list to route un-mapped keys to its numerical rule. Keys not returned here are treated as categorical. The default returns()(everything is categorical) so existing partner subclasses keep working unchanged; subclasses with a real schema should override.Per-key rules supplied to
ContextPredictoralways win over this list, so callers can still override on a per-call basis without changing the provider.
- class hoops_ai.ml.context_layer.NearestNeighborRule(threshold=0.95)
Bases:
AggregationRuleUse the top-similarity hit’s value when shape similarity is high enough.
Useful for keys whose value is essentially intrinsic to the geometry — number of internal features, hole count, surface area, bounding-box dimensions, … — and therefore better borrowed from the closest shape match than averaged out of a neighborhood. When the top-ranked hit’s score is at or above
threshold, this rule returns that hit’s value with confidence equal to the top score. Otherwise it abstains so the predictor returns anSTATUS_INSUFFICIENTsentinel.Pairs naturally with
ContextPredictor’s cross-key injection: install this rule for an intrinsic geometric key (InternalFeatures) viaper_key_rules, and the resultingSTATUS_READYprediction is forwarded into thequery_contextof every downstream numeric key (Cost,Weight, …) so theirRelevanceWeightercan re-rank neighbors with a matching feature count.- Parameters:
threshold (float) – Minimum top-hit similarity required to trust the borrowed value, in [0.0, 1.0]. Default
0.95.
Example
- from hoops_ai.ml.context_layer import (
ContextPredictor, NearestNeighborRule, NumericWeightedRule,
)
- predictor = ContextPredictor(
provider, per_key_rules={
“InternalFeatures”: NearestNeighborRule(threshold=0.95),
},
)
- predict(values, scores, key)
Predict a context value from neighbor evidence.
- Parameters:
- Returns:
ContextPrediction or None if insufficient evidence.
- Return type:
ContextPrediction | None
- class hoops_ai.ml.context_layer.NumericWeightedRule(log_scale=True, min_evidence=3, interval_sigmas=1.0, relevance_weighter=None, auto_relevance_weight=True, score_temperature=None, nearest_neighbor_threshold=None)
Bases:
AggregationRulePredict continuous numeric context (e.g. cost, weight, time) from neighbors.
Three operating modes selected at construction:
Plain weighted mean —
relevance_weighter=Noneandauto_relevance_weight=False. Pure shape-weighted aggregation.Auto-fitted weighter (default) — the rule lazily fits a per-key
RelevanceWeighterfrom the current call’s hit metadata.Explicit weighter — pass
relevance_weighter=RelevanceWeighter(...)to skip auto-fit and use hand-tuned factors.
Confidence comes from the coefficient of variation (CV) of neighbor values:
confidence = 1 / (1 + CV). Log-space is appropriate for multiplicative quantities (cost, weight, time); setlog_scale=Falsefor additive quantities.- Parameters:
log_scale (bool) – If True (default), operates in log-space. Set False for additive quantities (counts, dimensions).
min_evidence (int) – Minimum hits required before predicting. Default 3.
interval_sigmas (float) – Prediction-interval width in σ. Default 1.0 (≈68%).
relevance_weighter (RelevanceWeighter | None) – Explicit weighter; disables auto-fit for keys this rule sees.
auto_relevance_weight (bool) – When True (default) and
relevance_weighteris None, the rule lazily fits a per-key weighter the first timepredict_with_context()runs for that key, using the call’s hit metadata.score_temperature (float | None) – When set, neighbour weights are computed by a softmax
exp(T·s) / Σ exp(T·s)over the adjusted scores instead of the default linears / Σ s. Sharper weighting: a hit that matches on shape and on every injected metadata key takes almost all of the mass, so the estimate converges to that neighbour’s value. Useful when the cost surface is highly non-linear and you want a near-duplicate, full-metadata-match neighbour to dominate.None(default) preserves the linear behaviour. Typical values:4.0(moderate sharpening) to12.0(near-argmax). Must be a positive finite number.nearest_neighbor_threshold (float | None) – When set, the rule first checks whether any neighbour reaches a normalised global agreement score
adjusted_i / max_possible_boost≥ threshold. The numerator is the weighter-adjusted score (shape × per-key boosts); the denominator is what a perfect-match neighbour would score, so the ratio lives in[0, 1]. When the gate fires the rule short-circuits to that neighbour’s value with confidence equal to the normalised score — the same idea asNearestNeighborRulebut ranked by a global signal that also reflectsMaterial/Process/InternalFeaturesagreement instead of raw shape similarity alone. When no neighbour clears the gate the rule falls through to the softmax / linear path as usual.None(default) disables the gate. Must be in(0, 1]when set. With no weighter and noquery_contextthe normalisation degenerates to1.0and this is exactlyNearestNeighborRule(threshold)on the raw shape score.
- predict(values, scores, key)
Predict a context value from neighbor evidence.
- Parameters:
- Returns:
ContextPrediction or None if insufficient evidence.
- Return type:
ContextPrediction | None
- predict_with_context(values, scores, key, query_context=None, *, hits=None)
Adjust scores via the (auto- or hand-fit) weighter, then aggregate.
Pipeline:
Resolve the weighter for
key: explicitrelevance_weighterwins; otherwise lazy-fit per-key ifauto_relevance_weight.Auto-infer a baseline
query_contextfrom the top-K hits’ metadata using the weighter’s known factor keys, then overlay any caller-suppliedquery_contexton top (caller wins). This lets a partial injection (e.g.{Material, Process}forwarded from earlier categorical predictions) still benefit from neighbor-derived numeric factors likeInternalFeatures.Multiply
scoresby the weighter’s per-hit boosts.Fall through to
predict()on the adjusted scores.
- property relevance_weighter: RelevanceWeighter | None
The explicit weighter passed at construction (
Noneif auto-fitting).
- class hoops_ai.ml.context_layer.RelevanceWeighter(factors)
Bases:
objectAdjusts hit scores based on metadata agreement with a query context.
When predicting a target key (e.g. “Cost”), other known attributes of the query (Material, Process, InternalFeatures) can inform which neighbors are most trustworthy. A neighbor that matches on Material AND Process should be weighted higher than one differing on both.
For categorical attributes: exact match → full boost, mismatch → no boost. For numeric attributes: proximity-based boost using a Gaussian kernel.
- The adjustment is multiplicative:
adjusted_score = base_score × product(boost_per_attribute)
- Parameters:
factors (dict[str, dict[str, Any]]) –
Dict mapping metadata keys to their boost configuration. Each entry is a dict with:
- ”weight”: float — How much this attribute matters (0.0 to 1.0).
1.0 = full match doubles the score; 0.5 = half effect.
”type”: “categorical” | “numeric” — How to compute similarity. “scale”: float (numeric only) — Gaussian kernel bandwidth.
Example
- weighter = RelevanceWeighter(factors={
“Material”: {“weight”: 1.0, “type”: “categorical”}, “Process”: {“weight”: 0.8, “type”: “categorical”}, “InternalFeatures”: {“weight”: 0.5, “type”: “numeric”, “scale”: 5.0},
})
- adjust_scores(hits, base_scores, target_key, query_context)
Compute adjusted scores based on metadata agreement.
- property factors: dict[str, dict[str, Any]]
The learned or manually-specified factor configuration.
- classmethod fit(records, target_key, feature_keys=None, min_samples_per_group=2)
Learn importance factors from data.
- Parameters:
records (list[dict[str, Any]]) – List of metadata dicts from indexed parts.
target_key (str) – The key to predict (e.g. “Cost”). Must be numeric.
feature_keys (list[str] | None) – Keys to evaluate as predictors. If None, uses all top-level record keys excluding target_key.
min_samples_per_group (int) – Minimum samples per categorical group.
- Returns:
A configured RelevanceWeighter with learned factors.
- Return type:
- max_boost_for(target_key, query_context)
Return the multiplicative boost a perfect-match neighbour would receive.
For every factor key that is both known to the weighter and present in
query_context(excepttarget_keyitself), a perfect match contributes(1 + weight). The product gives the ceiling against which an actualadjust_scoresresult can be normalised into[0, 1]— useful for nearest-neighbour-style gates that need a scale-invariant agreement signal.Returns
1.0when no relevant factor is present (the adjusted score then equals the raw shape score and no normalisation is needed).