Add `bodyFilter` and `fileFilter` on `EmailParser` and allow `WithBodyFilter()` and `WithFileFilter()` options #119

mnako · 2025-01-19T10:37:49Z

Cf. #112 (comment) and further discussion

rorycl · 2025-01-25T18:03:32Z

I'm very much in favour of adding a parser struct, as you have in this patch, as this is the key point for allowing future flexibility of use of this very useful library. This flexibility could come through the new Options you've suggested, which are great, but also to allow greater flexibility in the structure and testing of the library in general. The WithFileFilter example you provide is particularly rich and hints at many uses.

At the moment you have this in the patch:

type EmailParser struct {
	bodyFilter EmailBodyFilter
	fileFilter EmailFileFilter
}

I would be grateful if you consider extending this struct further, perhaps as follows:

type EmailParser struct {
    // what are we going to process? header only / no attachments, // etc...
    processType typeOfProcessing 
    // store the email as it is extracted here
    email       *Email
    // have funcs here that can be overridden by the user, with the
    // default attached at "NewParser".
    addressFunc func(mail.Header, string) (*mail.Address, error)
    dateFunc    func(string) (*time.Time, error)
    fileFunc    func([]string) (*email.File, error) // signature tbc
    // possibly helpful to store main email encoding and cte
    encoding    encoding.Encoding
    cte         email.ContentTransferEncoding
}

Futher funcs might be added, as you suggest, but they might not be just restricted to filtering.

My suggestions are intended to assist someone trying to contribute to the letters project to tell what the EmailParser is trying to do at a glance. For example if parser.processType == headerOnly that is why processing might exit early.

You'll see there are a few further things in my suggestion. I'll go through them in turn.

Make the type of processing explicit, as noted above.
Having a parser.email field will allow funcs or methods to build up the returned email using a single point of reference. For example the emailBodies intermediary type might not be needed. Also this struct gives the opportunity of attaching important funcs, such as parsePart to it, so it becomes a method.
I suggest keeping the function signatures in the struct. That allows testing in future to swap out complicated functions with simple ones for mocking. It also makes it possible for someone to pass in a custom func to attach. The struct constructor would attach the default funcs at initialisation. So, for example, the initialiser would do p.dateFunc = parseDateHeader. This of course would require calls to the date parsing func to be called as a method, i.e. p.dateFunc(...) rather than as a func. I feel this points to changing quite a lot of the funcs to methods on the parser struct.
Other state stuff, like cte and encoding might be stored in the struct for more convenient passing of state between methods.

In summary I'm enthusiastic about the use of a struct for the parser and hope that you will consider some of these items for expanding the use of the struct, including moving some funcs to methods. The benefits would include:

more composability and extensibility
easier testing for individual units of functionality (such as date and address parsing)
shorter method signatures
(arguably) less nesting of the code.

Thanks for your consideration.

…ilter() and WithFileFilter() options.

Update tests.

Fix a loop variable captured in closure in tests.

Update tests. Update readme.

mnako · 2025-02-02T05:45:03Z

@ValleyCrisps this should be now in a good enough state for a review.

One notable change is that I have added the Content-Disposition header to the file filter function type. This will allow filtering files not only by their filename, but also by whether they are inlined or attached.

The changes to the test file are huge, you might need to review them in a local Git client.

Please also take a look at the readme and let me know whether it is easy to follow and helpful to future user.

rorycl · 2025-02-02T20:43:30Z

I apologise if my comments haven't been clear enough to have a conversation.

I personally consider the direction being taken unnecessarily limiting. For example, simply exposing the full file struct to users will allow then to provide an attached or inline file processing func. There is no benefit (except perhaps to provide an example file processing func) for the file filter.

…rs.ParseEmail()` for backward compatibility.

mnako · 2025-02-08T11:52:48Z

@ValleyCrisps, one last change: since we are already using letters on a production email server, I want to keep unnecessary breaking changes to a minimum.

I have renamed (ep *EmailParser) ParseEmail() to Parse() and kept letters.ParseEmail() as a simple wrapper creating a default parser and returning the parsed email to make upgrading less painful.

I have also updated the readme.

I believe this is good enough for the next minor release, please let me know if you are ok with this merge and a release.

ValleyCrisps

The README looks good to me.
I like how the ParseEmail function is now a wrapper around the default implementation. This will introduce the new functionality as a non-breaking change for current users.

letters.go

parsers.go

mnako · 2025-02-11T14:11:39Z

The README looks good to me. I like how the ParseEmail function is now a wrapper around the default implementation. This will introduce the new functionality as a non-breaking change for current users.

Thanks for your review and comments, @ValleyCrisps. I answered your comments.

Unless you have more comments, if you could approve, I can try and cut out a release candidate tomorrow.

ValleyCrisps

Thanks for addressing my comments, good to go!

mnako · 2025-02-12T08:46:30Z

@rorycl, thank you for your comments.

With #119 now merged and soon to be released as v0.2.4, I hope this covers the use case of skipping parsing attachments.

As we discussed, we've chosen the WithFileFilter(NoFiles) and WithBodyFilter(NoBodies) approach rather than your headersOnly suggestion. We believe this provides broader and less-assuming semantics.

Since we're going in a different direction with letters, I think your suggestion of a fork is excellent. You're welcome to use any commit as a base for a new fork, and since all of the code is under the MIT license, you can continue development in the direction you envision.

Thank you for your work for the Golang community.

rorycl · 2025-02-12T08:59:56Z

@mnako

Thanks very much for your thoughtful response and your wonderful work on letters.

I hope you find some useful ideas in the code, which I'm actively developing at https://github.com/rorycl/letters. I'll be sure to keep you and your contributors mentioned as set out in the licence, although I'll upgrade that formally to an MIT licence unless you suggest otherwise.

A few things that I quite like about the more modularised approach include:

more use of methods, which makes the func signatures more concise
some code duplication has been removed
a greater use of pointers to avoid passing around large structs
the custom fileFunc which allows filtering or bypassing reading whole inline or attached files into a byte slice
rationalising the content-* headers into a new ContentInfo type that could be passed around quite easily
derationalising the string types, largely related to content-* types, since this added (for me at least) some cognitive dissonance

You might also find testRegen.go useful if you need to regenerate your excellent tests after changing the structure of the returned Email in any way.

Best wishes,
Rory

mnako self-assigned this Jan 19, 2025

mnako mentioned this pull request Jan 19, 2025

API changes #112

Closed

mnako requested review from NikitaKurpas, adomanskagrzyb and ValleyCrisps January 19, 2025 10:49

mnako mentioned this pull request Jan 26, 2025

Enable testpackage golangci-lint linter to enforce testing through public interfaces #121

Merged

mnako added 2 commits February 1, 2025 11:53

wip: Add bodyFilter and fileFilter on EmailParser and allow WithBodyF…

c75ffa0

…ilter() and WithFileFilter() options.

Move EmailBodyFilter and EmailFileFilter and related code to filters.go.

d28a884

Update tests.

mnako force-pushed the feature/conditionally-parse-bodies-and-files branch from dd24a40 to d28a884 Compare February 1, 2025 02:59

mnako added 5 commits February 1, 2025 15:14

Set 1.18 as minimum-go-version in go.mod.

5182ea9

Fix a loop variable captured in closure in tests.

Split testing and linting in GitHub Action

f6f4607

Fix differences in tests for one language

5c911a1

Add /TXTFileOnlyParser subtests

d9fb7c5

Expand EmailFileFilter to accept ContentDispositionHeader as well.

8f6c6c1

Update tests. Update readme.

mnako changed the title ~~wip: Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options~~ Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options Feb 2, 2025

mnako marked this pull request as ready for review February 2, 2025 05:31

Rename (ep *EmailParser) ParseEmail() to Parse(), and keep `lette…

9f567f9

…rs.ParseEmail()` for backward compatibility.

ValleyCrisps reviewed Feb 11, 2025

View reviewed changes

letters.go Show resolved Hide resolved

parsers.go Show resolved Hide resolved

ValleyCrisps approved these changes Feb 11, 2025

View reviewed changes

mnako merged commit 670cb70 into main Feb 12, 2025
6 checks passed

mnako deleted the feature/conditionally-parse-bodies-and-files branch February 12, 2025 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `bodyFilter` and `fileFilter` on `EmailParser` and allow `WithBodyFilter()` and `WithFileFilter()` options #119

Add `bodyFilter` and `fileFilter` on `EmailParser` and allow `WithBodyFilter()` and `WithFileFilter()` options #119

mnako commented Jan 19, 2025

rorycl commented Jan 25, 2025

mnako commented Feb 2, 2025

rorycl commented Feb 2, 2025

mnako commented Feb 8, 2025

ValleyCrisps left a comment

mnako commented Feb 11, 2025

ValleyCrisps left a comment

mnako commented Feb 12, 2025

rorycl commented Feb 12, 2025

Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options #119

Add bodyFilter and fileFilter on EmailParser and allow WithBodyFilter() and WithFileFilter() options #119

Conversation

mnako commented Jan 19, 2025

rorycl commented Jan 25, 2025

mnako commented Feb 2, 2025

rorycl commented Feb 2, 2025

mnako commented Feb 8, 2025

ValleyCrisps left a comment

Choose a reason for hiding this comment

mnako commented Feb 11, 2025

ValleyCrisps left a comment

Choose a reason for hiding this comment

mnako commented Feb 12, 2025

rorycl commented Feb 12, 2025

Add `bodyFilter` and `fileFilter` on `EmailParser` and allow `WithBodyFilter()` and `WithFileFilter()` options #119

Add `bodyFilter` and `fileFilter` on `EmailParser` and allow `WithBodyFilter()` and `WithFileFilter()` options #119