Creating a Domain YAML File
This guide shows you how to elaborate the domain specification that VerifIA uses to generate and validate derived inputs. For a deeper conceptual overview of “domain” in VerifIA, see Concepts → Domain.
1. Why a Domain File?
VerifIA is domain‑aware: it requires a priori domain knowledge related to your application.
Encoding that knowledge in YAML lets VerifIA:
- Derive novel inputs beyond your dataset
- Enforce feasibility (via constraints)
- Assert behavioral rules on model outputs
2. File Structure
Your domain file has three top‑level sections:
variables: # define inputs & outputs
constraints: # enforce feasibility on input variables only
rules: # assert expected behavior
2.1. Section: variables
Define raw features and model targets on their real‑world scale—no preprocessing required.
Key | Description |
---|---|
description |
Human‑readable label for the feature or target |
type |
INT / FLOAT / CAT |
range |
[min, max] for numeric variables |
values |
["v1","v2",…] for categorical variables (order matters if ordinal) |
formula (optional) |
Python expression to compute one variable from others |
variation_limits |
[min_ratio, max_ratio] percent change when deriving e.g. [0.025,0.5] = 2.5%–50% variation |
insignificant_variation |
Float 0.0–1.0 defining:• Input noise to reveal brittleness • Output tolerance for regression errors |
Note
- Use
variation_limits
to control drift from the seed. - Use
insignificant_variation
to inject white‑noise on inputs or allow tolerable error on outputs.
Example variables block
variables:
age:
description: Customer age in years
type: INT
range: [18, 100]
variation_limits: [0.01, 0.10] # 1%–10% change
insignificant_variation: 0.01 # 1% noise
income:
description: Annual income in USD
type: FLOAT
range: [0, 1e6]
formula: 12 * monthly_income # derived from another variable
churn_probability:
description: Model’s output probability
type: FLOAT
range: [0.0, 1.0]
insignificant_variation: 0.02 # 2% tolerated error
2.2. Section: constraints
Define feasibility constraints over input variables only—never reference the model’s output or target. These formulas capture interdependencies that make certain input combinations invalid. VerifIA discards any derived point violating these constraints.
Key | Description |
---|---|
description |
Human‑readable explanation of the constraint |
formula |
Python expression involving only input variables |
Example constraints block
constraints:
max_income_age_ratio:
description: Income must not exceed 100,000 × age
formula: "income <= 100000 * age"
Note
Constraints must not include the target/output variable—use rules for output behavior assertions.
2.3. Section: rules
Rules assert relative expectations on model outputs when inputs change. They consist of:
- Premises: how inputs vary
- Conclusion: expected output response
2.3.1 Example rules block
rules:
high_income_reduces_churn:
description:
If income increases (age constant), churn probability should decrease
premises:
income: inc
age: cst
conclusion:
churn_probability: dec
2.3.2 Premises: Specifying Input Variations
Directive | Meaning | Details & Examples |
---|---|---|
inc |
Increase | Value is strictly greater than its original seed value. |
dec |
Decrease | Value is strictly less than its original seed value. |
cst |
Constant | Value remains unchanged. |
var |
Vary | May change freely within variation limits. |
eq("v") |
Equal to specific value | Must be set exactly to "v" . |
noeq("v") |
Not equal to specific value | Must not be "v" . |
in("v1","v2") |
One of a set of allowed values | Must be one of "v1" or "v2" . |
noin("v1","v2") |
None of a set of values | Must not be "v1" nor "v2" . |
Note
- Combine multiple premises to constrain multi‑dimensional variations.
- Omitted variables default to
cst
.
2.3.3 Conclusion: Expected Output Response
Directive | Meaning | When to Use |
---|---|---|
inc |
Output should increase relative to seed prediction. | E.g., “More feature_X → Higher risk_score.” |
dec |
Output should decrease relative to seed prediction. | E.g., “Higher price → Lower demand.” |
cst |
Output should remain unchanged. | E.g., “Changing log level does not affect accuracy.” |
noinc |
Output should not increase -> may decrease or stay flat. |
Use when increases are disallowed but decreases acceptable. |
nodec |
Output should not decrease -> may increase or stay flat. |
Use when decreases are disallowed but increases acceptable. |
Note
Rule-based verification consists of comparing model predictions on derived inputs against these expectations. Violations count toward consistency metrics.
3. Full YAML Template
Use this template as domain.yaml
—replace placeholders with your actual domain definitions:
variables:
feature_1:
description: Human‑readable description
type: FLOAT
range: [0.0, 100.0]
variation_limits: [0.01, 0.20]
insignificant_variation: 0.02
feature_2:
description: Categorical feature
type: CAT
values: ["low","medium","high"]
derived_feature:
description: Computed feature
type: FLOAT
formula: "2 * feature_1 + 5"
target:
description: Model’s output
type: FLOAT
range: [0.0, 1.0]
insignificant_variation: 0.05
constraints:
valid_income_age:
description: Income cannot exceed 100 × age
formula: "income <= 100 * age"
non_negative_balance:
description: Balance must be ≥ 0
formula: "balance >= 0"
rules:
demand_vs_price:
description: When price increases, demand should decrease
premises:
price: inc
season: cst
conclusion:
demand: dec
risk_vs_age:
description: Risk score should not decrease when age increases
premises:
age: inc
conclusion:
risk_score: noinc