Metric API

Methods For Metric


Program Metric Functions.

sail_on_client.evaluate.metrics.m_acc(gt_novel, p_class, gt_class, round_size, asymptotic_start_round)[source]

Compute top1 and top3 accuracy.

Parameters
  • p_novel – NX1 vector with each element corresponding to probability of novelty

  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 vector with ground-truth class for each sample

  • round_size (int) – Number of samples in a single round of the test

  • asymptotic_start_round (int) – Round id where metric computation starts

  • gt_novel (numpy.ndarray) –

Return type

Dict

Returns

Dictionary with results

sail_on_client.evaluate.metrics.m_accuracy_on_novel(p_class, gt_class, gt_novel)[source]

Additional Metric: Novelty robustness.

The method computes top-K accuracy for only the novel samples

Parameters
  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 vector with ground-truth class for each sample

  • gt_novel (ndarray) – Nx1 binary vector corresponding to the ground truth novel{1}/seen{0} labels

  • k – K value to compute accuracy at

Return type

Dict

Returns

Accuracy at rank-k

sail_on_client.evaluate.metrics.m_ndp(p_novel, gt_novel, mode='full_test')[source]

Program Metric: Novelty detection performance.

The method computes per-sample novelty detection performance

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of it being novel

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

  • mode (str) – if ‘full_test’ computes on all test samples, if ‘post_novelty’ computes from first GT novel sample

Returns

Accuracy, Precision, Recall, F1_score and Confusion matrix

Return type

Dictionary of various metrics

sail_on_client.evaluate.metrics.m_ndp_failed_reaction(p_novel, gt_novel, p_class, gt_class, mode='full_test')[source]

Additional Metric: Novelty detection when reaction fails.

The method computes novelty detection performance for only on samples with incorrect k-class predictions

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 vector with ground-truth class for each sample

  • mode (str) – if ‘full_test’ computes on all test samples, if ‘post_novelty’ computes from the first GT novel sample

  • k – ‘k’ used in top-K accuracy

Returns

Accuracy, Precision, Recall, F1_score and Confusion matrix

Return type

Dictionary of various metrics

sail_on_client.evaluate.metrics.m_ndp_post(p_novel, gt_novel)[source]

Additional Metric: Novelty detection performance after novelty is introduced.

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

Returns

Accuracy, Precision, Recall, F1_score and Confusion matrix

Return type

Dictionary of following metrics values

sail_on_client.evaluate.metrics.m_ndp_pre(p_novel, gt_novel)[source]

Additional Metric: Novelty detection performance before novelty is introduced.

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

Returns

Accuracy, Precision, Recall, F1_score and Confusion matrix

Return type

Dictionary of following metrics values

sail_on_client.evaluate.metrics.m_num(p_novel, gt_novel)[source]

Program Metric: Number of samples needed for detecting novelty.

The method computes number of GT novel samples needed to predict the first true positive.

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

Return type

Dict

Returns

single scalar for number of GT novel samples

sail_on_client.evaluate.metrics.m_num_stats(p_novel, gt_novel)[source]

Program Metric: Number of samples needed for detecting novelty.

The method computes number of GT novel samples needed to predict the first true positive.

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

Return type

Dict

Returns

single scalar for number of GT novel samples

Base Class For Metric


Abstract Class for metrics for sail-on.

class sail_on_client.evaluate.program_metrics.ProgramMetrics[source]

Abstract program metric class.

__init__(protocol)[source]

Initialize.

Parameters

protocol (str) – Name of the protocol.

Returns

None

Return type

None

abstract m_acc(gt_novel, p_class, gt_class, round_size, asymptotic_start_round)[source]

m_acc abstract function.

Parameters
  • gt_novel (ndarray) – ground truth detections

  • p_class (ndarray) – class predictions

  • gt_class (ndarray) – ground truth classes

  • round_size (int) – size of the round

  • asymptotic_start_round (int) – asymptotic samples considered for computing metrics

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy over the test, pre and post novelty.

abstract m_acc_round_wise(p_class, gt_class, round_id)[source]

m_acc_round_wise abstract function.

Parameters
  • p_class (ndarray) – class predictions

  • gt_class (ndarray) – ground truth classes

  • round_id (int) –

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy for a round

abstract m_accuracy_on_novel(p_novel, gt_class, gt_novel)[source]

m_accuracy_on_novel abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_class (ndarray) – ground truth classes

  • gt_novel (ndarray) – ground truth detections

Return type

Dict

Returns

Accuracy on novely samples

abstract m_is_cdt_and_is_early(gt_idx, ta2_idx, test_len)[source]

m_is_cdt_and_is_early abstract function.

Parameters
  • gt_idx (int) – Index when novelty is introduced

  • ta2_idx (int) – Index when change is detected

  • test_len (int) – Length of test

Return type

Dict

Returns

Dictionary containing boolean showing if change was was detected and if it was detected early

Return type

Dict

Parameters
  • gt_idx (int) –

  • ta2_idx (int) –

  • test_len (int) –

abstract m_ndp(p_novel, gt_novel)[source]

m_ndp abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_novel (ndarray) – ground truth detections

Return type

Dict

Returns

Dictionary containing novelty detection performance over the test.

abstract m_ndp_failed_reaction(p_novel, gt_novel, p_class, gt_class)[source]

m_ndp_failed_reaction abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_novel (ndarray) – ground truth detections

  • p_class (ndarray) – class predictions

  • gt_class (ndarray) – ground truth classes

Return type

Dict

Returns

Dictionary containing TP, FP, TN, FN, top1, top3 accuracy over the test.

abstract m_ndp_post(p_novel, gt_novel)[source]

m_ndp_post abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_novel (ndarray) – ground truth detections

Return type

Dict

Returns

Dictionary containing detection performance post novelty.

abstract m_ndp_pre(p_novel, gt_novel)[source]

m_ndp_pre abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_novel (ndarray) – ground truth detections

Return type

Dict

Returns

Dictionary containing detection performance pre novelty.

abstract m_num(p_novel, gt_novel)[source]

m_num abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_novel (ndarray) – ground truth detections

Return type

Dict

Returns

Difference between the novelty introduction and predicting change in world.

abstract m_num_stats(p_novel, gt_novel)[source]

m_num_stats abstract function.

Parameters
  • p_novel (ndarray) – detection predictions

  • gt_novel (ndarray) – ground truth detections

Return type

Dict

Returns

Dictionary containing indices for novelty introduction and change in world prediction.

Activity Recognition Metric


Activity Recognition Class for metrics for sail-on.

class sail_on_client.evaluate.activity_recognition.ActivityRecognitionMetrics[source]

Activity Recognition program metric class.

__init__(protocol, video_id, novel, detection, classification, spatial, temporal)[source]

Initialize.

Parameters
  • protocol (str) – Name of the protocol.

  • video_id (int) – Column id for video

  • novel (int) – Column id for predicting if change was detected

  • detection (int) – Column id for predicting sample wise novelty

  • classification (int) – Column id for predicting sample wise classes

  • spatial (int) – Column id for predicting spatial attribute

  • temporal (int) – Column id for predicting temporal attribute

Returns

None

Return type

None

m_acc(gt_novel, p_class, gt_class, round_size, asymptotic_start_round)[source]

m_acc function.

Parameters
  • gt_novel (DataFrame) – ground truth detections for N videos (Dimension: N X 1)

  • p_class (DataFrame) – class predictions with video id for N videos (Dimension: N X 90 [vid,novel_class,88 known class])

  • gt_class (DataFrame) – ground truth classes for N videos (Dimension: N X 1)

  • round_size (int) – size of the round

  • asymptotic_start_round (int) – asymptotic samples considered for computing metrics

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy over the test, pre and post novelty.

m_acc_round_wise(p_class, gt_class, round_id)[source]

m_acc_round_wise function.

Parameters
  • p_class (DataFrame) – detection predictions

  • gt_class (DataFrame) – ground truth classes

  • round_id (int) – round identifier

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy for a round

m_accuracy_on_novel(p_class, gt_class, gt_novel)[source]

m_accuracy_on_novel function.

Parameters
  • p_class (DataFrame) – class predictions with video id for N videos (Dimension: N X 90 [vid,novel_class,88 known class])

  • gt_class (DataFrame) – ground truth classes for N videos (Dimension: N X 1)

  • gt_novel (DataFrame) – ground truth detections for N videos (Dimension: N X 1)

Return type

Dict

Returns

Accuracy on novely samples

m_is_cdt_and_is_early(gt_idx, ta2_idx, test_len)[source]

m_is_cdt_and_is_early function.

Parameters
  • gt_idx (int) – Index when novelty is introduced

  • ta2_idx (int) – Index when change is detected

  • test_len (int) – Length of test

Return type

Dict

Returns

Dictionary containing boolean showing if change was was detected and if it was detected early

Return type

Dict

Parameters
  • gt_idx (int) –

  • ta2_idx (int) –

  • test_len (int) –

m_ndp(p_novel, gt_novel)[source]

m_ndp function.

Parameters
  • p_novel (ndarray) – detection predictions for N videos (Dimension: N X 1)

  • gt_novel (ndarray) – ground truth detections for N videos (Dimension: N X 1)

Return type

Dict

Returns

Dictionary containing novelty detection performance over the test.

m_ndp_failed_reaction(p_novel, gt_novel, p_class, gt_class)[source]

m_ndp_failed_reaction function.

Parameters
  • p_novel (DataFrame) – detection predictions for N videos (Dimension: N X 1)

  • gt_novel (DataFrame) – ground truth detections for N videos (Dimension: N X 1)

  • p_class (DataFrame) – class predictions with video id for N videos (Dimension: N X 90 [vid,novel_class,88 known class])

  • gt_class (DataFrame) – ground truth classes for N videos (Dimension: N X 1)

Return type

Dict

Returns

Dictionary containing TP, FP, TN, FN, top1, top3 accuracy over the test.

m_ndp_post(p_novel, gt_novel)[source]

m_ndp_post function.

Parameters
  • p_novel (ndarray) – detection predictions for N videos (Dimension: N X 1)

  • gt_novel (ndarray) – ground truth detections for N videos (Dimension: N X 1)

Return type

Dict

Returns

Dictionary containing detection performance post novelty.

m_ndp_pre(p_novel, gt_novel)[source]

m_ndp_pre function.

Parameters
  • p_novel (ndarray) – detection predictions for N videos (Dimension: N X 1)

  • gt_novel (ndarray) – ground truth detections for N videos (Dimension: N X 1)

Return type

Dict

Returns

Dictionary containing detection performance pre novelty.

m_nrp(ta2_acc, baseline_acc)[source]

m_nrp function.

Parameters
  • ta2_acc (Dict) – Accuracy scores for the agent

  • baseline_acc (Dict) – Accuracy scores for baseline

Return type

Dict

Returns

Reaction performance for the agent

m_num(p_novel, gt_novel)[source]

m_num function.

Parameters
  • p_novel (DataFrame) – detection predictions for N videos (Dimension: N X 1)

  • gt_novel (DataFrame) – ground truth detections for N videos (Dimension: N X 1)

Return type

Dict

Returns

Difference between the novelty introduction and predicting change in world.

m_num_stats(p_novel, gt_novel)[source]

m_num_stats function.

Parameters
  • p_novel (ndarray) – detection predictions for N videos (Dimension: N X 1)

  • gt_novel (ndarray) – ground truth detections for N videos (Dimension: N X 1)

Return type

Dict

Returns

Dictionary containing indices for novelty introduction and change in world prediction.

Document Transcription Metric


Document Transcription Class for metrics for sail-on.

class sail_on_client.evaluate.document_transcription.DocumentTranscriptionMetrics[source]

Document transcription program metric class.

__init__(protocol, image_id, text, novel, representation, detection, classification, pen_pressure, letter_size, word_spacing, slant_angle, attribute)[source]

Initialize.

Parameters
  • protocol (str) – Name of the protocol.

  • image_id (int) – Column id for image

  • text (int) – Transcription associated with the image

  • novel (int) – Column id for predicting if change was detected

  • representation (int) – Column id with representation novelty label

  • detection (int) – Column id with sample wise novelty

  • classification (int) – Column id with writer id

  • pen_pressure (int) – Column id with pen pressure values

  • letter_size (int) – Column id with letter size values

  • word_spacing (int) – Column id with word spacing values

  • slant_angle (int) – Column id with slant angle values

  • attribute (int) – Column id with attribute level novelty label

Returns

None

Return type

None

m_acc(gt_novel, p_class, gt_class, round_size, asymptotic_start_round)[source]

m_acc helper function used for computing novelty reaction performance.

Parameters
  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection])

  • p_class (DataFrame) – class predictions (Dimension: [img X prob that sample is novel, prob of known classes])

  • gt_class (DataFrame) – ground truth classes (Dimension: [img X class idx])

  • round_size (int) – size of the round

  • asymptotic_start_round (int) – asymptotic samples considered for computing metrics

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy over the test, pre and post novelty.

m_acc_round_wise(p_class, gt_class, round_id)[source]

m_acc_round_wise function.

Parameters
  • p_class (DataFrame) – class predictions

  • gt_class (DataFrame) – ground truth classes

  • round_id (int) – round identifier

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy for a round

m_accuracy_on_novel(p_class, gt_class, gt_novel)[source]

Additional Metric: Novelty robustness.

Not Implemented since no gt_class info for novel samples. The method computes top-K accuracy for only the novel samples

Parameters
  • p_class (DataFrame) – detection predictions (Dimension: [img X prob that sample is novel, prob of known classes])

  • gt_class (DataFrame) – ground truth classes (Dimension: [img X class idx])

  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Accuracy on novely samples

m_is_cdt_and_is_early(gt_idx, ta2_idx, test_len)[source]

Is change detection and is change detection early (m_is_cdt_and_is_early) function.

Parameters
  • gt_idx (int) – Index when novelty is introduced

  • ta2_idx (int) – Index when change is detected

  • test_len (int) – Length of test

Return type

Dict

Returns

Dictionary containing boolean showing if change was was detected and if it was detected early

Return type

Dict

Parameters
  • gt_idx (int) –

  • ta2_idx (int) –

  • test_len (int) –

m_ndp(p_novel, gt_novel)[source]

m_ndp function.

Novelty detection performance. The method computes per-sample novelty detection performance over the entire test.

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel])

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Dictionary containing novelty detection performance over the test.

m_ndp_failed_reaction(p_novel, gt_novel, p_class, gt_class)[source]

m_ndp_failed_reaction function.

Not Implemented since no gt_class info for novel samples. The method computes novelty detection performance for only on samples with incorrect k-class predictions

Parameters
  • p_novel (DataFrame) – detection predictions (Dimension: [img X novel])

  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection])

  • p_class (DataFrame) – detection predictions (Dimension: [img X prob that sample is novel, prob of known classes])

  • gt_class (DataFrame) – ground truth classes (Dimension: [img X class idx])

Return type

Dict

Returns

Dictionary containing TP, FP, TN, FN, top1, top3 accuracy over the test.

m_ndp_post(p_novel, gt_novel)[source]

m_ndp_post function.

See m_ndp() with post_novelty. This computes from the first GT novel sample. :type p_novel: ndarray :param p_novel: detection predictions (Dimension: [img X novel]) :type gt_novel: ndarray :param gt_novel: ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Dictionary containing detection performance post novelty.

Parameters
m_ndp_pre(p_novel, gt_novel)[source]

m_ndp_pre function.

See m_ndp() with post_novelty. This computes to the first GT novel sample. It really isn’t useful and is just added for completion. Should always be 0 since no possible TP.

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel])

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Dictionary containing detection performance pre novelty.

m_nrp(ta2_acc, baseline_acc)[source]

m_nrp function.

Parameters
  • ta2_acc (Dict) – Accuracy scores for the agent

  • baseline_acc (Dict) – Accuracy scores for baseline

Return type

Dict

Returns

Reaction performance for the agent

m_num(p_novel, gt_novel)[source]

m_num function.

A Program Metric where the number of samples needed for detecting novelty. The method computes the number of GT novel samples needed to predict the first true positive.

Parameters
  • p_novel (DataFrame) – detection predictions (Dimension: [img X novel])

  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Difference between the novelty introduction and predicting change in world.

m_num_stats(p_novel, gt_novel)[source]

m_num_stats function.

Number of samples needed for detecting novelty. The method computes number of GT novel samples needed to predict the first true positive.

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel])

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Dictionary containing indices for novelty introduction and change in world prediction.

Image Classification Metric


Image Classification Class for metrics for sail-on.

class sail_on_client.evaluate.image_classification.ImageClassificationMetrics[source]

Image Classification program metric class.

__init__(protocol, image_id, detection, classification)[source]

Initialize.

Parameters
  • protocol (str) – Name of the protocol.

  • image_id (int) – Column id for image

  • detection (int) – Column id for predicting sample wise world detection

  • classification (int) – Column id for predicting sample wise classes

Returns

None

Return type

None

m_acc(gt_novel, p_class, gt_class, round_size, asymptotic_start_round)[source]

m_acc function.

Parameters
  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection])

  • p_class (DataFrame) – class predictions (Dimension: [img X prob that sample is novel, prob of 88 known classes])

  • gt_class (DataFrame) – ground truth classes (Dimension: [img X detection, classification])

  • round_size (int) – size of the round

  • asymptotic_start_round (int) – asymptotic samples considered for computing metrics

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy over the test, pre and post novelty.

m_acc_round_wise(p_class, gt_class, round_id)[source]

m_acc_round_wise function.

Parameters
  • p_class (DataFrame) – class predictions

  • gt_class (DataFrame) – ground truth classes

  • round_id (int) – round identifier

Return type

Dict

Returns

Dictionary containing top1, top3 accuracy for a round

m_accuracy_on_novel(p_class, gt_class, gt_novel)[source]

Additional Metric: Novelty robustness.

The method computes top-K accuracy for only the novel samples

Parameters
  • p_class (DataFrame) – detection predictions (Dimension: [img X prob that sample is novel, prob of known classes]) Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (DataFrame) – ground truth classes (Dimension: [img X classification]) Nx1 vector with ground-truth class for each sample

  • gt_novel (DataFrame) – ground truth detections (Dimension: N X [img, classification]) Nx1 binary vector corresponding to the ground truth novel{1}/seen{0} labels

Return type

Dict

Returns

Accuracy on novely samples

m_is_cdt_and_is_early(gt_idx, ta2_idx, test_len)[source]

Is change detection and is change detection early (m_is_cdt_and_is_early) function.

Parameters
  • gt_idx (int) – Index when novelty is introduced

  • ta2_idx (int) – Index when change is detected

  • test_len (int) – Length of test

Return type

Dict

Returns

Dictionary containing boolean showing if change was was detected and if it was detected early

Return type

Dict

Parameters
  • gt_idx (int) –

  • ta2_idx (int) –

  • test_len (int) –

m_ndp(p_novel, gt_novel, mode='full_test')[source]

Novelty Detection Performance: Program Metric.

Novelty detection performance. The method computes per-sample novelty detection performance.

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel]) Nx1 vector with each element corresponding to probability of it being novel

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection]) Nx1 vector with each element 0 (not novel) or 1 (novel)

  • mode (str) – the mode to compute the test. if ‘full_test’ computes on all test samples, if ‘post_novelty’ computes from first GT novel sample. If ‘pre_novelty’, only calculate before first novel sample.

Return type

Dict

Returns

Dictionary containing novelty detection performance over the test.

m_ndp_failed_reaction(p_novel, gt_novel, p_class, gt_class, mode='full_test')[source]

Additional Metric: Novelty detection when reaction fails.

Not Implemented since no gt_class info for novel samples

The method computes novelty detection performance for only on samples with incorrect k-class predictions

Parameters
  • p_novel (DataFrame) – detection predictions (Dimension: [img X novel]) Nx1 vector with each element corresponding to probability of novelty

  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection]) Nx1 vector with each element 0 (not novel) or 1 (novel)

  • p_class (DataFrame) – detection predictions (Dimension: [img X prob that sample is novel, prob of 88 known classes]) Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (DataFrame) – ground truth classes (Dimension: [img X classification]) Nx1 vector with ground-truth class for each sample

  • mode (str) – if ‘full_test’ computes on all test samples, if ‘post_novelty’ computes from the first GT novel sample. If ‘pre_novelty’, than everything before novelty introduced.

Return type

Dict

Returns

Dictionary containing TP, FP, TN, FN, top1, top3 accuracy over the test.

m_ndp_post(p_novel, gt_novel)[source]

Novelty Detection Performance Post Red Light. m_ndp_post function.

See m_ndp() with

post_novelty. This computes from the first GT novel sample

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel])

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Dictionary containing detection performance post novelty.

m_ndp_pre(p_novel, gt_novel)[source]

Novelty Detection Performance Pre Red Light. m_ndp_pre function.

See m_ndp() with

post_novelty. This computes to the first GT novel sample. It really isn’t useful and is just added for completion. Should always be 0 since no possible TP.

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel])

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection])

Return type

Dict

Returns

Dictionary containing detection performance pre novelty.

m_nrp(ta2_acc, baseline_acc)[source]

m_nrp function.

Parameters
  • ta2_acc (Dict) – Accuracy scores for the agent

  • baseline_acc (Dict) – Accuracy scores for baseline

Return type

Dict

Returns

Reaction performance for the agent

m_num(p_novel, gt_novel)[source]

m_num function.

A Program Metric where the number of samples needed for detecting novelty. The method computes the number of GT novel samples needed to predict the first true positive.

Parameters
  • p_novel (DataFrame) – detection predictions (Dimension: [img X novel]) Nx1 vector with each element corresponding to probability of novelty

  • gt_novel (DataFrame) – ground truth detections (Dimension: [img X detection]) Nx1 vector with each element 0 (not novel) or 1 (novel)

Return type

Dict

Returns

Difference between the novelty introduction and predicting change in world.

m_num_stats(p_novel, gt_novel)[source]

Program Metric.

Number of samples needed for detecting novelty. The method computes number of GT novel

samples needed to predict the first true positive.

Parameters
  • p_novel (ndarray) – detection predictions (Dimension: [img X novel]) Nx1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – ground truth detections (Dimension: [img X detection]) Nx1 vector with each element 0 (not novel) or 1 (novel)

Return type

Dict

Returns

Dictionary containing indices for novelty introduction and change in world prediction.

sail_on_client.evaluate.image_classification.convert_df(old_filepath, new_filepath)[source]

Convert from the old df to the new df.

Parameters
  • old_filepath (str) – the filepath the the old *_single_df.csv file

  • new_filepath (str) – the filepath the the old *_single_df.csv file

Return type

None

Returns

None

Utility Methods


Helper functions for metrics.

sail_on_client.evaluate.utils.check_class_validity(p_class, gt_class)[source]

Check the validity of the inputs for image classification.

Inputs: p_class: Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample gt_class: Nx1 vector with ground-truth class for each sample

Return type

None

Parameters
sail_on_client.evaluate.utils.check_novel_validity(p_novel, gt_novel)[source]

Check the validity of the inputs for per-sample novelty detection.

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • gt_novel (ndarray) – NX1 vector with each element 0 (not novel) or 1 (novel)

Return type

None

Returns

None

sail_on_client.evaluate.utils.get_first_detect_novelty(p_novel, thresh)[source]

Find the first index where novelty is detected.

Parameters
  • p_novel (ndarray) – NX1 vector with each element corresponding to probability of novelty

  • thresh (float) – Score threshold for detecting when a sample is novel

Return type

int

Returns

Index where an agent reports that a sample is novel

sail_on_client.evaluate.utils.get_rolling_stats(p_class, gt_class, k=1, window_size=50)[source]

Compute rolling statistics which are used for robustness measures.

Parameters
  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 compute vector with ground-truth class for each sample

  • k (int) – ‘k’ used for selecting top k values

  • window_size (int) – Window size for running stats

Return type

List

Returns

List with mean and standard deviation

sail_on_client.evaluate.utils.top1_accuracy(p_class, gt_class, txt='')[source]

Compute top-1 accuracy. (see topk_accuracy() for details).

Parameters
  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 computevector with ground-truth class for each sample

  • txt (str) – Text associated with accuracy

Return type

float

Returns

top-1 accuracy

sail_on_client.evaluate.utils.top3_accuracy(p_class, gt_class, txt='')[source]

Compute top-3 accuracy. (see topk_accuracy() for details).

Parameters
  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 computevector with ground-truth class for each sample

  • txt (str) – Text associated with accuracy

Return type

float

Returns

top-3 accuracy

sail_on_client.evaluate.utils.topk_accuracy(p_class, gt_class, k, txt='')[source]

Compute top-K accuracy.

Parameters
  • p_class (ndarray) – Nx(K+1) matrix with each row corresponding to K+1 class probabilities for each sample

  • gt_class (ndarray) – Nx1 computevector with ground-truth class for each sample

  • k (int) – ‘k’ used in top-K accuracy

  • txt (str) – Text associated with accuracy

Return type

float

Returns

top-K accuracy