Harness

Harnesses are used for testing and evaluating TA2 agents. The evaluation can be conducted using a server setup by TA1 or by providing the files containing ground truth and metadata associated with the tests. The abstraction was primarily to communicate with the evaluation server or replicate the same functionality without with files provided by TA1s. They work in conjunction with the protocol classes to fulfill the input and output requirements of an agent. The harnesses are subclasses of TestAndEvaluationHarness. We support two harnesses in sail-on-client

Local Harness

LocalHarness is primarily used for replicating the capabilities of ParHarness without using the server. This allows local testing an agent without setting up a server instance locally or via a URL. Since LocalHarness uses the files it requires 3 parameters:

  1. data_dir: Root directory where the data for tests is stored

  2. gt_dir: Root directory where ground truth is stored

  3. gt_config: A json file with column mapping for ground truth

PAR Harness

ParHarness is primarily responsible for communicating with the evaluation server setup by the TA1 team. The interface relies on RESTful api (detailed in the next section) to provide the following features

  1. Support batch inquiry (also called rounds) with full data set response.

  2. Support batch response for evaluation.

  3. Answer requests for multiple dataset types.

  4. Includes option for ‘hints’ accompanying datasets.

  5. Can accept additional meta-data along with annotations, labels and localization data (e.g time intervals for video) along with class and certainty scores.

  6. Provide feedback as requested by the algorithm after the results for a batch have been submitted.

REST API

This section provides a detailed description of the RESTful api used for communication.

Request Name

Request Type

Definition

Request Data

Response Data

Test Request

GET

TA2 Requests for Test Identifiers as part of a series of individual tests.

  1. Protocol: Empirical protocol

  2. Domain: Problem domain.

  3. Detector Seed

  4. JSON file: Test Assumptions (if any)

1. CSV file containing: Test ID(s) with the following naming convention: Protocol.Group.Run.Seed

New Session

POST

Create a new session to evaluate the detector using an empirical protocol.

  1. Test ID(s) obtained from test request.

  2. Protocol

  3. Novelty Detector Version

  1. Session ID: A unique identifier that the server associated with the client

Dataset Request

GET

Request data for evaluation.

  1. Session ID

  2. Test ID

  3. Round ID (where applicable per protocol)

  1. CSV of Dataset URIs. Each URI identifies the media to be used. It may be location specific (e.g. S3) or independent, assuming a shared repository.

Get Feedback

GET

Get Feedback from the server based on

one or more example ids

  1. Session ID

  2. Test ID

  3. Round Number

  4. Example IDs

  5. Feedback type: Detection, Characterization, Label

  1. CSV file for detection and characterization feedback in accordance with the feedback space specified by the protocol.

Get Metadata

GET

Get metadata for a test

  1. Test ID

  1. JSON File containing the metadata.

Post Results

POST

Post client detector predictions for the dataset.

  1. Session ID

  2. Test ID

  3. Round ID (where applicable)

  4. Result Files: (CSV)

  5. Protocol constant: Characterization/Detection

  1. Result acknowledgement.

Evaluation

GET

Get results for test(s)

  1. Session ID

  2. Test ID

  3. Round ID (where applicable )

  1. Score or None.

Terminate Session

DELETE

Terminate the session after the evaluation

for the protocol is complete

  1. Session ID

  2. Logs for the session

  1. Acknowledgement of session termination.