Rule Consistency Verifier
verifia.verification.RuleConsistencyVerifier
Manages the verification of model rules based on a domain definition and search strategy.
Evaluates a dataset against specified rules, collects statistics, and generates detailed reports.
__init__(domain_cfg_dict=None, domain_cfg_fpath=None)
Initialize the verifier using a domain configuration file. Provide either a model instance or a domain configuration file path to load the domain.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
domain_cfg_dict
|
Optional[Dict]
|
A dictionary of a domain configuration. |
None
|
domain_cfg_fpath
|
Optional[PathLike]
|
Path to a domain configuration YAML file. |
None
|
calculate_dataset_statistics()
Compute detailed statistics for the original dataset based on domain constraints and model predictions.
This method performs the following steps
- Validates that the model and dataset have been set. If not, a ValueError is raised instructing the user to call the appropriate setup methods (verify() for the model and on() for the dataset).
- Filters out rows containing any feature value that falls outside its allowed domain. This is done using an internal helper function that checks each row against the domain constraints.
- Records the total number of original rows (n_orig) and the number of rows removed because they are out-of-domain (n_ood).
- Computes the model's predictive performance on the entire dataset and stores the performance metric name and score.
- Further refines the filtered dataset based on the model's predictions:
- For regression models:
- Retrieves the error tolerance (err_thresh) from the domain of the target variable.
- Removes rows where the absolute prediction error exceeds the tolerance.
- Records the count of in-domain rows removed due to high error (n_herr).
- For classification models:
- Removes rows where the model's predictions do not match the true target values.
- Records the count of in-domain rows removed due to misclassification (n_miscls).
- For regression models:
Returns:
Name | Type | Description |
---|---|---|
OriginalStatistics |
OriginalStatistics
|
An object containing: - n_orig: Total number of rows in the original dataset. - n_ood: Number of rows removed because they are out-of-domain. - n_herr: For regression, number of rows removed due to prediction error exceeding the tolerance. - n_miscls: For classification, number of rows removed due to misclassification. - metric_name: The name of the performance metric used. - metric_score: The score of the performance metric. - err_thresh: For regression, the error tolerance threshold applied. |
Raises:
Type | Description |
---|---|
ValueError
|
If the model or dataset has not been set. |
clean_results()
Remove all previously generated verification results.
This will delete the directory where rule‐violation reports and checkpoints have been stored. You must have already run a verification (i.e., called verify()) before cleaning, otherwise no results will be available.
Raises:
Type | Description |
---|---|
ValueError
|
If no results are available (i.e., verify() has not been called). |
on(dataframe=None, data_fpath=None, dataset=None)
Set the dataset to be verified.
Provide either a Dataset, a DataFrame, or a file path to the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
Optional[DataFrame]
|
A pandas DataFrame. |
None
|
data_fpath
|
Optional[PathLike]
|
File path to the data. |
None
|
dataset
|
Optional[Dataset]
|
A pre-constructed Dataset object. |
None
|
Returns:
Name | Type | Description |
---|---|---|
RuleConsistencyVerifier |
RuleConsistencyVerifier
|
Self, to allow method chaining. |
Raises:
Type | Description |
---|---|
ValueError
|
If none of the Dataset, DataFrame, or file path is provided. If the model is not set, instructs the user to call verify() first. |
run(pop_size, max_iters, orig_seed_ratio=None, orig_seed_size=None, persistance=True)
Execute the verification run.
The method performs the following steps
- Validates input parameters.
- Samples the original dataset.
- Filters out rows violating domain constraints.
- Loads original seed predictions.
- Iterates over rules and original inputs to search for rule violations.
- Records any inconsistent candidates.
- Persists the results if requested.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pop_size
|
int
|
Population size for the search algorithm. |
required |
max_iters
|
int
|
Maximum iterations for the search. |
required |
orig_seed_ratio
|
Optional[float]
|
Ratio of original seed samples to use. |
None
|
orig_seed_size
|
Optional[int]
|
Number of original seed samples to use. |
None
|
persistance
|
bool
|
Whether to persist the run results. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
RulesViolationResult |
RulesViolationResult
|
The final verification result. |
Raises:
Type | Description |
---|---|
TypeError
|
If pop_size or max_iters are not integers, or if seed parameters have incorrect types. |
ValueError
|
If pop_size or max_iters are out of valid ranges, or if neither seed parameter is provided. Also if the model, dataset, or searcher have not been set. |
using(search_algo, search_params=None, search_params_fpath=None)
Specify the search algorithm and parameters for verification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_algo
|
str
|
The identifier of the search algorithm. |
required |
search_params
|
Optional[dict]
|
A dictionary of search parameters. |
None
|
search_params_fpath
|
Optional[PathLike]
|
Path to a configuration file for search parameters. |
None
|
Returns:
Name | Type | Description |
---|---|---|
RuleConsistencyVerifier |
RuleConsistencyVerifier
|
Self, to allow method chaining. |
Warns:
Type | Description |
---|---|
UserWarning
|
If no search parameters are provided. |
verify(model=None, model_card_fpath_or_dict=None)
Set up the model for verification.
Provide either a model instance or a model card file path to build the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Optional[BaseModel]
|
An instance of a model. |
None
|
model_card_fpath
|
Optional[PathLike]
|
Path to a model card YAML file. |
required |
Returns:
Name | Type | Description |
---|---|---|
RuleConsistencyVerifier |
RuleConsistencyVerifier
|
Self, to allow method chaining. |
Raises:
Type | Description |
---|---|
ValueError
|
If neither model nor model_card_fpath is provided. |