Investigate Question-Answering models working on tables #614

FrancescoCasalegno · 2022-08-18T09:25:19Z

Context

Traditional transformers-based models for extractive question-answering tasks operate on contexts that are units of texts in natural language, e.g. a sentence or a paragraph.
However, in many cases the values of parameters of interest for our neuroscientific applications are contained into tables of articles rather than in the text.
For instance, the Wikipedia article on Michaelis constant (here) contains several values for this parameter of interest for us, but they are all in a table and no value is mentioned in the text. In fact this is not an isolated case: it's really hard to find Michaelis constant values in the text of any scientific article!
There seem to be some models for question-answering that can operate on tabular or text/tabular mixed contexts, like TAPAS.

How should the tables be represented for TAPAS (or another model) to be able to take it in input (html? csv? ...) ?
Is this format compatible with what we can get out our parsing pipeline for the various formats (arXiv, medRxiv, bioRxiv, PMC, PubMed, ...) when the article contains a table?
Can TAPAS take mixed inputs, i.e. contexts containing both text and tables?
How does TableQuestionAnsweringPipeline differ from QuestionAnsweringPipeline in 🤗 transformers?
Are there any other models a part from TAPAS that support question-answering on tabular contexts?
Test TAPAS (or another model) on a sample related to neruoscience to see if it could potentially work on our use case.

The text was updated successfully, but these errors were encountered:

FrancescoCasalegno added the ↩️ question-answering Attribute values extraction using QA models label Aug 18, 2022