Conversation
Contributor
|
Before you submit for review:
If you did not complete any of these, then please explain below. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This has some changes needed to test, for example, PVS data organized into profiles by label.
Typical structure:
These are hosted in a way that is compatible with the new loader (containing the entries layout), using logical dataset names like "testdataset:testprofile" which preserve the relationship to the base dataset. (the base dataset may have common base vectors which are shared in some cases, or which are related as upstream data)
To facilitate using custom entries.yaml paths, the base_url now supports specifying the full name, but the parent path to it is preserved as before for all relative lookup to facet entries.
In the case of these profiles, for PVS testing specifically, each partition is a pre-filtered set of predicate-matching base vectors, and the per-label ground truths are brute-force over these just as any other knn answer key.
To support multiple labeled partitions of test data, the various index metadata, dataset name, and other files now support configuring for multiple similarly-named datasets.
For dataset-metadata, dataset name keys can now be globs like this:
For index-parameters/*.yml files, a configuration can be matched against similar patterns with a
also_forkey:Notice that these must be quoted.
repetitions
The following logic is bypassed by default if these parameters are not provided.
There is also a new 'repetitions' parameter that drives testing, which, if provided, takes the average of multiple runs in grid for a given dataset, and an accompanying queryRuns parameter for the search section which does the same within each main run for queries only. For cached indexes, the repetition enum is passed in to ensure that repetitions include new index builds. Also, if there are multiple repetitions, a post-hoc summarizer will show the average of them.
compression parameters
Support has been added for matrix of compressin parameters as fed to grid. For now, this is done by parameterizing instances of compression parameters and letting grid do that it already does. A follow-up improvement to this really needs to be done which gets rid of the for loop cascade. I didn't send this in this change set because it is a bigger change and I wanted to keep this easier for review even if it was a bit uglier.