Skip to content

Rule Consistency Verifier

verifia.verification.RuleConsistencyVerifier

Manages the verification of model rules based on a domain definition and search strategy.

Evaluates a dataset against specified rules, collects statistics, and generates detailed reports.

__init__(domain_cfg_dict=None, domain_cfg_fpath=None)

Initialize the verifier using a domain configuration file. Provide either a model instance or a domain configuration file path to load the domain.

Parameters:

Name Type Description Default
domain_cfg_dict Optional[Dict]

A dictionary of a domain configuration.

None
domain_cfg_fpath Optional[PathLike]

Path to a domain configuration YAML file.

None

calculate_dataset_statistics()

Compute detailed statistics for the original dataset based on domain constraints and model predictions.

This method performs the following steps
  1. Validates that the model and dataset have been set. If not, a ValueError is raised instructing the user to call the appropriate setup methods (verify() for the model and on() for the dataset).
  2. Filters out rows containing any feature value that falls outside its allowed domain. This is done using an internal helper function that checks each row against the domain constraints.
  3. Records the total number of original rows (n_orig) and the number of rows removed because they are out-of-domain (n_ood).
  4. Computes the model's predictive performance on the entire dataset and stores the performance metric name and score.
  5. Further refines the filtered dataset based on the model's predictions:
    • For regression models:
      • Retrieves the error tolerance (err_thresh) from the domain of the target variable.
      • Removes rows where the absolute prediction error exceeds the tolerance.
      • Records the count of in-domain rows removed due to high error (n_herr).
    • For classification models:
      • Removes rows where the model's predictions do not match the true target values.
      • Records the count of in-domain rows removed due to misclassification (n_miscls).

Returns:

Name Type Description
OriginalStatistics OriginalStatistics

An object containing: - n_orig: Total number of rows in the original dataset. - n_ood: Number of rows removed because they are out-of-domain. - n_herr: For regression, number of rows removed due to prediction error exceeding the tolerance. - n_miscls: For classification, number of rows removed due to misclassification. - metric_name: The name of the performance metric used. - metric_score: The score of the performance metric. - err_thresh: For regression, the error tolerance threshold applied.

Raises:

Type Description
ValueError

If the model or dataset has not been set.

clean_results()

Remove all previously generated verification results.

This will delete the directory where rule‐violation reports and checkpoints have been stored. You must have already run a verification (i.e., called verify()) before cleaning, otherwise no results will be available.

Raises:

Type Description
ValueError

If no results are available (i.e., verify() has not been called).

on(dataframe=None, data_fpath=None, dataset=None)

Set the dataset to be verified.

Provide either a Dataset, a DataFrame, or a file path to the data.

Parameters:

Name Type Description Default
dataframe Optional[DataFrame]

A pandas DataFrame.

None
data_fpath Optional[PathLike]

File path to the data.

None
dataset Optional[Dataset]

A pre-constructed Dataset object.

None

Returns:

Name Type Description
RuleConsistencyVerifier RuleConsistencyVerifier

Self, to allow method chaining.

Raises:

Type Description
ValueError

If none of the Dataset, DataFrame, or file path is provided. If the model is not set, instructs the user to call verify() first.

run(pop_size, max_iters, orig_seed_ratio=None, orig_seed_size=None, persistance=True)

Execute the verification run.

The method performs the following steps
  • Validates input parameters.
  • Samples the original dataset.
  • Filters out rows violating domain constraints.
  • Loads original seed predictions.
  • Iterates over rules and original inputs to search for rule violations.
  • Records any inconsistent candidates.
  • Persists the results if requested.

Parameters:

Name Type Description Default
pop_size int

Population size for the search algorithm.

required
max_iters int

Maximum iterations for the search.

required
orig_seed_ratio Optional[float]

Ratio of original seed samples to use.

None
orig_seed_size Optional[int]

Number of original seed samples to use.

None
persistance bool

Whether to persist the run results. Defaults to True.

True

Returns:

Name Type Description
RulesViolationResult RulesViolationResult

The final verification result.

Raises:

Type Description
TypeError

If pop_size or max_iters are not integers, or if seed parameters have incorrect types.

ValueError

If pop_size or max_iters are out of valid ranges, or if neither seed parameter is provided. Also if the model, dataset, or searcher have not been set.

using(search_algo, search_params=None, search_params_fpath=None)

Specify the search algorithm and parameters for verification.

Parameters:

Name Type Description Default
search_algo str

The identifier of the search algorithm.

required
search_params Optional[dict]

A dictionary of search parameters.

None
search_params_fpath Optional[PathLike]

Path to a configuration file for search parameters.

None

Returns:

Name Type Description
RuleConsistencyVerifier RuleConsistencyVerifier

Self, to allow method chaining.

Warns:

Type Description
UserWarning

If no search parameters are provided.

verify(model=None, model_card_fpath_or_dict=None)

Set up the model for verification.

Provide either a model instance or a model card file path to build the model.

Parameters:

Name Type Description Default
model Optional[BaseModel]

An instance of a model.

None
model_card_fpath Optional[PathLike]

Path to a model card YAML file.

required

Returns:

Name Type Description
RuleConsistencyVerifier RuleConsistencyVerifier

Self, to allow method chaining.

Raises:

Type Description
ValueError

If neither model nor model_card_fpath is provided.