Prompt-image Alignment Metrics
CLIPImageQualityScoreMetric
Bases: Scorer
CLIP Image Quality Assessment metric for to measuring the visual content of images.
The metric is based on the CLIP model, which is a neural network trained on a variety of (image, text) pairs to be able to generate a vector representation of the image and the text that is similar if the image and text are semantically similar.
The metric works by calculating the cosine similarity between user provided images and pre-defined prompts. The prompts always comes in pairs of “positive” and “negative” such as “Good photo.” and “Bad photo.”. By calculating the similartity between image embeddings and both the “positive” and “negative” prompt, the metric can determine which prompt the image is more similar to. The metric then returns the probability that the image is more similar to the first prompt than the second prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name or path of the CLIP model to use. |
'clip_iqa'
|
Source code in hemm/metrics/prompt_alignment/clip_iqa_score.py
CLIPScoreMetric
Bases: Scorer
CLIP score metric for text-to-image similarity. CLIP Score is a reference free metric that can be used to evaluate the correlation between a generated caption for an image and the actual content of the image. It has been found to be highly correlated with human judgement.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name or path of the CLIP model to use. |
'openai/clip-vit-base-patch16'
|