Suggestion: Improve Answer Extraction Logic to better evaluate QA tasks. #7

tangg555 · 2024-10-18T08:39:23Z

Hi,

I believe the logic for extract_answer needs some adjustments. Instances where multiple answers are present in the responses shouldn't be marked as correct. For example, I've observed cases where the model simply copies the options from the instruction as its response, like: answer1/answer2/answer3/answer4. This inflates the accuracy of the QA task beyond the actual performance.

Here's the revised extract_answer function:

def extract_answer(args, sentence: str) -> float:
    dataset = args.dataset
    sentence_ = sentence.strip()
    if dataset == 'boolq':
        pred_answers = re.findall(r'true|false', sentence_)
    elif dataset == 'piqa':
        pred_answers = re.findall(r'solution1|solution2', sentence_)
    elif dataset in ['social_i_qa', 'ARC-Challenge', 'ARC-Easy', 'openbookqa']:
        pred_answers = re.findall(r'answer1|answer2|answer3|answer4|answer5', sentence_)
    elif dataset == 'hellaswag':
        pred_answers = re.findall(r'ending1|ending2|ending3|ending4', sentence_)
    elif dataset == 'winogrande':
        pred_answers = re.findall(r'option1|option2', sentence_)
    if not pred_answers:
        return ""
    unique_answers = set(pred_answers)

    # if only one answer, then return it
    if len(unique_answers) == 1:
        return unique_answers.pop()
    else:
        return ""

This should help improve accuracy by ensuring only a single answer is considered correct.

The text was updated successfully, but these errors were encountered:

wutaiqiang · 2024-10-18T09:04:36Z

Thanks for your suggestion.

I also mentioned this issue of answers in: #6 (comment)

wutaiqiang closed this as completed Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Improve Answer Extraction Logic to better evaluate QA tasks. #7

Suggestion: Improve Answer Extraction Logic to better evaluate QA tasks. #7

tangg555 commented Oct 18, 2024 •

edited

Loading

wutaiqiang commented Oct 18, 2024

Suggestion: Improve Answer Extraction Logic to better evaluate QA tasks. #7

Suggestion: Improve Answer Extraction Logic to better evaluate QA tasks. #7

Comments

tangg555 commented Oct 18, 2024 • edited Loading

wutaiqiang commented Oct 18, 2024

tangg555 commented Oct 18, 2024 •

edited

Loading