Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options #119

Merged
merged 8 commits into from
Feb 12, 2025

Conversation

mnako
Copy link
Owner

@mnako mnako commented Jan 19, 2025

Cf. #112 (comment) and further discussion

@mnako mnako self-assigned this Jan 19, 2025
@mnako mnako mentioned this pull request Jan 19, 2025
@rorycl
Copy link

rorycl commented Jan 25, 2025

Hi @mnako

I'm very much in favour of adding a parser struct, as you have in this patch, as this is the key point for allowing future flexibility of use of this very useful library. This flexibility could come through the new Options you've suggested, which are great, but also to allow greater flexibility in the structure and testing of the library in general. The WithFileFilter example you provide is particularly rich and hints at many uses.

At the moment you have this in the patch:

type EmailParser struct {
	bodyFilter EmailBodyFilter
	fileFilter EmailFileFilter
}

I would be grateful if you consider extending this struct further, perhaps as follows:

type EmailParser struct {
    // what are we going to process? header only / no attachments, // etc...
    processType typeOfProcessing 
    // store the email as it is extracted here
    email       *Email
    // have funcs here that can be overridden by the user, with the
    // default attached at "NewParser".
    addressFunc func(mail.Header, string) (*mail.Address, error)
    dateFunc    func(string) (*time.Time, error)
    fileFunc    func([]string) (*email.File, error) // signature tbc
    // possibly helpful to store main email encoding and cte
    encoding    encoding.Encoding
    cte         email.ContentTransferEncoding
}

Futher funcs might be added, as you suggest, but they might not be just restricted to filtering.

My suggestions are intended to assist someone trying to contribute to the letters project to tell what the EmailParser is trying to do at a glance. For example if parser.processType == headerOnly that is why processing might exit early.

You'll see there are a few further things in my suggestion. I'll go through them in turn.

  1. Make the type of processing explicit, as noted above.

  2. Having a parser.email field will allow funcs or methods to build up the returned email using a single point of reference. For example the emailBodies intermediary type might not be needed. Also this struct gives the opportunity of attaching important funcs, such as parsePart to it, so it becomes a method.

  3. I suggest keeping the function signatures in the struct. That allows testing in future to swap out complicated functions with simple ones for mocking. It also makes it possible for someone to pass in a custom func to attach. The struct constructor would attach the default funcs at initialisation. So, for example, the initialiser would do p.dateFunc = parseDateHeader. This of course would require calls to the date parsing func to be called as a method, i.e. p.dateFunc(...) rather than as a func. I feel this points to changing quite a lot of the funcs to methods on the parser struct.

  4. Other state stuff, like cte and encoding might be stored in the struct for more convenient passing of state between methods.

In summary I'm enthusiastic about the use of a struct for the parser and hope that you will consider some of these items for expanding the use of the struct, including moving some funcs to methods. The benefits would include:

  • more composability and extensibility
  • easier testing for individual units of functionality (such as date and address parsing)
  • shorter method signatures
  • (arguably) less nesting of the code.

Thanks for your consideration.

@mnako mnako force-pushed the feature/conditionally-parse-bodies-and-files branch from dd24a40 to d28a884 Compare February 1, 2025 02:59
@mnako mnako changed the title wip: Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options Feb 2, 2025
@mnako mnako marked this pull request as ready for review February 2, 2025 05:31
@mnako
Copy link
Owner Author

mnako commented Feb 2, 2025

@ValleyCrisps this should be now in a good enough state for a review.

One notable change is that I have added the Content-Disposition header to the file filter function type. This will allow filtering files not only by their filename, but also by whether they are inlined or attached.

The changes to the test file are huge, you might need to review them in a local Git client.

Please also take a look at the readme and let me know whether it is easy to follow and helpful to future user.

@rorycl
Copy link

rorycl commented Feb 2, 2025

I apologise if my comments haven't been clear enough to have a conversation.

I personally consider the direction being taken unnecessarily limiting. For example, simply exposing the full file struct to users will allow then to provide an attached or inline file processing func. There is no benefit (except perhaps to provide an example file processing func) for the file filter.

…rs.ParseEmail()` for backward compatibility.
@mnako
Copy link
Owner Author

mnako commented Feb 8, 2025

@ValleyCrisps, one last change: since we are already using letters on a production email server, I want to keep unnecessary breaking changes to a minimum.

I have renamed (ep *EmailParser) ParseEmail() to Parse() and kept letters.ParseEmail() as a simple wrapper creating a default parser and returning the parsed email to make upgrading less painful.

I have also updated the readme.

I believe this is good enough for the next minor release, please let me know if you are ok with this merge and a release.

Copy link
Collaborator

@ValleyCrisps ValleyCrisps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README looks good to me.
I like how the ParseEmail function is now a wrapper around the default implementation. This will introduce the new functionality as a non-breaking change for current users.

@mnako
Copy link
Owner Author

mnako commented Feb 11, 2025

The README looks good to me. I like how the ParseEmail function is now a wrapper around the default implementation. This will introduce the new functionality as a non-breaking change for current users.

Thanks for your review and comments, @ValleyCrisps. I answered your comments.

Unless you have more comments, if you could approve, I can try and cut out a release candidate tomorrow.

Copy link
Collaborator

@ValleyCrisps ValleyCrisps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments, good to go!

@mnako mnako merged commit 670cb70 into main Feb 12, 2025
6 checks passed
@mnako mnako deleted the feature/conditionally-parse-bodies-and-files branch February 12, 2025 08:42
@mnako
Copy link
Owner Author

mnako commented Feb 12, 2025

@rorycl, thank you for your comments.

With #119 now merged and soon to be released as v0.2.4, I hope this covers the use case of skipping parsing attachments.

As we discussed, we've chosen the WithFileFilter(NoFiles) and WithBodyFilter(NoBodies) approach rather than your headersOnly suggestion. We believe this provides broader and less-assuming semantics.

Since we're going in a different direction with letters, I think your suggestion of a fork is excellent. You're welcome to use any commit as a base for a new fork, and since all of the code is under the MIT license, you can continue development in the direction you envision.

Thank you for your work for the Golang community.

@rorycl
Copy link

rorycl commented Feb 12, 2025

@mnako

Thanks very much for your thoughtful response and your wonderful work on letters.

I hope you find some useful ideas in the code, which I'm actively developing at https://github.com/rorycl/letters. I'll be sure to keep you and your contributors mentioned as set out in the licence, although I'll upgrade that formally to an MIT licence unless you suggest otherwise.

A few things that I quite like about the more modularised approach include:

  • more use of methods, which makes the func signatures more concise
  • some code duplication has been removed
  • a greater use of pointers to avoid passing around large structs
  • the custom fileFunc which allows filtering or bypassing reading whole inline or attached files into a byte slice
  • rationalising the content-* headers into a new ContentInfo type that could be passed around quite easily
  • derationalising the string types, largely related to content-* types, since this added (for me at least) some cognitive dissonance

You might also find testRegen.go useful if you need to regenerate your excellent tests after changing the structure of the returned Email in any way.

Best wishes,
Rory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants