Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Improve Answer Extraction Logic to better evaluate QA tasks. #7

Closed
tangg555 opened this issue Oct 18, 2024 · 1 comment
Closed

Comments

@tangg555
Copy link

tangg555 commented Oct 18, 2024

Hi,

I believe the logic for extract_answer needs some adjustments. Instances where multiple answers are present in the responses shouldn't be marked as correct. For example, I've observed cases where the model simply copies the options from the instruction as its response, like: answer1/answer2/answer3/answer4. This inflates the accuracy of the QA task beyond the actual performance.

Here's the revised extract_answer function:

def extract_answer(args, sentence: str) -> float:
    dataset = args.dataset
    sentence_ = sentence.strip()
    if dataset == 'boolq':
        pred_answers = re.findall(r'true|false', sentence_)
    elif dataset == 'piqa':
        pred_answers = re.findall(r'solution1|solution2', sentence_)
    elif dataset in ['social_i_qa', 'ARC-Challenge', 'ARC-Easy', 'openbookqa']:
        pred_answers = re.findall(r'answer1|answer2|answer3|answer4|answer5', sentence_)
    elif dataset == 'hellaswag':
        pred_answers = re.findall(r'ending1|ending2|ending3|ending4', sentence_)
    elif dataset == 'winogrande':
        pred_answers = re.findall(r'option1|option2', sentence_)
    if not pred_answers:
        return ""
    unique_answers = set(pred_answers)

    # if only one answer, then return it
    if len(unique_answers) == 1:
        return unique_answers.pop()
    else:
        return ""

This should help improve accuracy by ensuring only a single answer is considered correct.

@wutaiqiang
Copy link
Owner

Thanks for your suggestion.

I also mentioned this issue of answers in: #6 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants