Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo "generic Readers" #19

Open
bovee opened this issue Mar 11, 2022 · 1 comment
Open

Redo "generic Readers" #19

bovee opened this issue Mar 11, 2022 · 1 comment

Comments

@bovee
Copy link
Owner

bovee commented Mar 11, 2022

I'm not crazy about the current impl_reader! design for generating a "reader" for each file type and it would be nice if that could be replaced by a generic solution like Reader<T> (although we'd still need to impl a trait like RecordReader on this to let it be boxed up for e.g. get_reader, I think?) A generic design like this could potentially be used to replace the experimental chunk interface for multi-threaded reading (new below is inspired by the init_state in that).

I wrote out the following before I started running into some lifetime issues; I think the type bounds for S and T probably need to be tweaked? Refactoring to next(&'r mut self) -> Result<Option<Vec<Value<'r>>>, EtError> breaks the caller because they're no longer able to iterate over multiple records?

use core::fmt::Debug;
use core::marker::PhantomData;
use crate::parsers::FromSlice;
use crate::record::StateMetadata;

/// A reader that abstracts over parsing files
#[derive(Debug)]
pub struct Reader<'r, S, T> where 
    S: Debug + 'r,
    T: Debug + FromSlice<'r, State=&'r mut S>,
{
    rb: ReadBuffer<'r>,
    state: S,
    returns: PhantomData<T>,
}

impl<'r, S, T> Reader<'r, S, T> where
    S: Debug + 'r,
    T: Debug + FromSlice<'r, State=&'r mut S> + Into<Vec<Value<'r>>>,
{
    /// Create a new Reader
    pub fn new<B, P>(data: B, params: Option<P>) -> Result<Self, EtError>
    where
        B: TryInto<ReadBuffer<'r>>,
        EtError: From<<B as TryInto<ReadBuffer<'r>>>::Error>,
        S: for<'a> FromSlice<'a, State = P>,
        P: Default,
    {
        let mut rb = data.try_into()?;
        if let Some(state) = rb.next::<S>(params.unwrap_or_default())? {
            Ok(Reader {
                rb,
                state,
                returns: PhantomData,
            })
        } else {
            Err(format!(
                "Could not initialize state {}",
                ::core::any::type_name::<S>()
            )
            .into())
        }
    }

    /// Get the next record from the `Reader`
    #[allow(clippy::should_implement_trait)]
    pub fn next(&mut self) -> Result<Option<T>, EtError> {
        // FIXME: fails on the next line because of lifetime issues with borrowing both `self.rb` and `self.state`
        self.rb.next::<T>(&mut self.state)
    }
}

impl<'r, S, T> RecordReader for Reader<'r, S, T> where
    S: Debug + StateMetadata + 'r,
    T: Debug + FromSlice<'r, State=&'r mut S> + Into<Vec<Value<'r>>>,
{
    fn next_record(&mut self) -> Result<Option<Vec<Value>>, EtError> {
        Ok(self.next()?.map(|v| v.into()))
    }

    fn headers(&self) -> Vec<String> {
        self.state.header().iter().map(|s| s.to_string()).collect()
    }

    fn metadata(&self) -> BTreeMap<String, Value> {
        self.state.metadata()
    }
}
@bovee
Copy link
Owner Author

bovee commented Mar 12, 2022

I got a little further, but having a lot of trouble with the impl RecordReader for Reader<S, T> where T isn't actually returned from next_record. T's lifetime mucks up the whole conversion process because it should be 'a from next_value, but you can't actually set that outside the scope of next_value.

Construct a reader with:
FileType::Bam => Box::new(Reader::<_, parsers::sam::BamRecord>::new(data, None)?),

Code below is changes from above (also tried modifying the signature of RecordReader, but still can't get it to work).

/// A reader that abstracts over parsing files
#[derive(Debug)]
pub struct Reader<'r, S, T> where 
    S: Debug,
    T: Debug,
{
    rb: ReadBuffer<'r>,
    state: S,
    returns: PhantomData<T>,
}

impl<'r, 's, S, T> Reader<'r, S, T> where
    S: Debug,
    T: Debug,
{
    /// Get the next record from the `Reader`
    #[allow(clippy::should_implement_trait)]
    pub fn next<'a>(&'a mut self) -> Result<Option<T>, EtError> where
        T: FromSlice<'a, State=&'a mut S>,
    {
        self.rb.next::<T>(&mut self.state)
    }

    /// Get the next record from the `Reader`
    #[allow(clippy::should_implement_trait)]
    pub fn next_value<'a>(&'a mut self) -> Result<Option<Vec<Value>>, EtError> where
        T: FromSlice<'a, State=&'a mut S> + Into<Vec<Value<'a>>>
    {
        Ok(self.next()?.map(|v| v.into()))
    }
}

impl<'r> RecordReader for Reader<'r, parsers::fasta::FastaState, parsers::fasta::FastaRecord<'r>> where Self: 'r {
    fn next_record<'a>(&'a mut self) -> Result<Option<Vec<Value<'a>>>, EtError> {
        self.next_value() // <- error here: expected FastaRecord<'r>, found FastaRecord<'_>
    }
    ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant