Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature[notebooks]: add surrogate notebooks #228

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

groberts-flex
Copy link
Contributor

@groberts-flex groberts-flex commented Jan 21, 2025

Adding two surrogate notebooks that have minimal dependency on changes in the design plugin. There is one small change needed so that PyTorch can use the the design plugin Result object directly as a PyTorch Dataset due to implementing simple accessor and len functions (flexcompute/tidy3d#2189).

The directional coupler example is working well through a few tests.

However, right now, I am marking this as a draft because I'm a little unsure if we want to change how the metalens surrogate is working. I inherited most of this code and it is currently based on elliptical posts and optimizes a phase profile for just an x-polarized input source. The elliptical library is more interesting as a more expansive meta surface platform with independent control for orthogonal input polarizations, but for a lens, typically a symmetric cylindrical post is used as this yields a polarization independent output. I'm wondering if the current elliptical post usage is also what is leading to a non circular focal shape. I think this could be handled in the example in a few ways (in order from least to most changes required from the original notebook). I also wanted to check with this if there was a reason for them to be elliptical that I'm missing!

  1. no changes and elliptical library is used, but the notebook clarifies the lens is for an x-polarized input specifically.
  2. beginning part of notebook is kept, but the final Bayesian optimization gets done for a single radius parameter (i.e. major_radius and minor_radius are set equal).
  3. changed to reduce the complexity of both the surrogate model and the metalens optimization by just considering a circular post shape.

@groberts-flex
Copy link
Contributor Author

Thought more about the meta surface example and figured it made the most sense to change to a circular post for the metalens, so I changed the notebook to build a surrogate for the general elliptical posts and then search it in the Bayesian optimization with only a single radius and no post rotation. This way, the notebook can be used as a starting point to create a more general metasurface library in the future.

@groberts-flex groberts-flex marked this pull request as ready for review January 22, 2025 19:27
Copy link

Copy link
Contributor

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @groberts-flex for going through these, must be quite tedious! Great that they run, and will be super valuable to have in the example library! One general comment about the notebooks: mention that they require at least tidy3d 2.8 to run (because of the plugin changes required).

Overall they look pretty nice, but they need a lot more hand-holding for readers in terms of text explanations and comments.

I left comments for each of the notebooks, let's discuss methodology there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going from top to bottom but the cell numbers are not in order (in the notebook, might be good to re-run them to have them sorted?):

General notes:

  • what exactly is this notebook supposed to do 😄 i was having a hard time understanding the fitness function - basically it's just MSE? if i understand correctly, we are just trying to create a 25/75 coupler? that's fine, but needs to be stated clearly somewhere (ideally at the top)

Cell by cell:

  • intro:
    • "with the critical advantage that the neural network"
    • we're doing 100 simulations for a two parameter sweep, so the speed advantage is a bit questionable. i think it's fine if we don't advertise this part but just reframe as "you can use tidy3d + design plugin for ML, here's how" (what do you think @tomflexcompute?)
  • cell 25:
    output_dir = "./misc/dc_surrogate/"
    mc_data_dir = os.path.join(output_dir, "data")
    os.makedirs(mc_data_dir, exist_ok=True)
    this is fine in principle but i'd really like us to start using pathlib, so this would read:
    output_dir = Path("./misc/dc_surrogate/)"
    mc_data_dir = output_dir / "data"
    mc_data_dir.mkdir(exist_ok=True)
    and then similar changes throughout
  • cell 12:
    • sidewall_angle = np.pi / 6; ~30 degree sidewall angle is pretty unrealistic, could we change this to at most 10? not required but would make the whole thing a bit closer to reality
  • cell 14:
    • i think the domain_field parameter should be changed to be more explicit, e.g. field_monitor or add_field_monitor
    • we actually changed the behavior in pre/2.8 so that mode_spec is not a required parameter anymore, so in the monitor definitions you can either remove mode_spec=td.ModeSpec() or perhaps be more explicit by using mode_spec=td.ModeSpec(num_modes=1). i think both would be fine, given that these notebooks can't be backward compatible anyways.
    • if domain_field == True: 🤔
  • cell 16:
    • for i, (monitor, direction) in enumerate: i think the i is unused so the enumerate is not necesary?
  • cell 16/17:
    • i think there needs to be a paragraph here explaining the design plugin a bit here, since actually one of the main points of the notebook is highlighting how easy it is to generate this data with it. i guess we're doing a good job because it's really just a few lines of code in the notebook, so we need to make more of an effort to highlight that
  • cells 22 ff.
    • there is basically only code in these cells (rest of the notebook). i think there needs to be much more explanation what's going on in the data cleaning, model setup, etc., otherwise it seems almost impossible for someone new to this to follow along.
  • cell 22:
    • would be good to just print the final dataframe to see what the training data actually looks like
  • cell 248:
    • batch_size = 64 but is unused and hard-coded to 16. let's use batch_size = 16 consistently
    • torch.utils.data.random_split: i guess this was trained on pytorch>2? because random_split used to expect integers (that summed to the total no. of samples). i'm not sure if this might lead to unexpected behavior for earlier pytorch versions. maybe best to stick with the old API?
    • collate_fn=lambda x: x: i'm a bit worried about this collate_fn, it's pretty nonstandard (why doesn't it return inputs / labels?). i guess it probably works as expected but it needs some more explanation. kind of difficult to follow along the shapes of the data. needs more comments and text. i might just not be understanding it correctly but i fear there is something weird going on here with the data.
  • cell 254:
    • clarify the xi and remove debug comment
  • cell 258:
    • just a comment, but it should be entirely feasible in this setup to hit the coupling (at least the ratio) exactly, but i guess that's due to the prediction error in the NN?
  • cell 259:
    • related to above comment about data, but it does seem like there is something weird going on, since the NN apparently is fitting perfectly to all the data splits but there is still significant difference to the simulation data, so something doesn't quite add up. also the plot should probably be over wavelength

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for these comments, really helpful in getting this notebook into good shape! I'll write some comments back here on them to answer any questions on them/call out changes I made/track that I addressed all of them:

  • general comment / intro: wrote this much more neutrally (i.e. - without overselling what the neural network is doing) and tried to set up the design problem being solved in the notebook more clearly. I framed it more as we discussed to showcase that it is easy to generate training data for neural networks using tidy3d and that we can in principle use this for further optimization. There was a small bug in the plotting of the fitness function of the Monte Carlo data that I fixed that was making it seem like all the random points had low performance. Actually, there are many near-optimal points just from random sampling (think this isn't necessarily surprising). By using the neural network, the performance does improve marginally but there are also a lot of other ways to get that much improvement especially when starting from a bunch of random samples. Hopefully, this notebook can still show people how to set up neural networks using tidy3d design plugin, but I do agree with you that it is wise to not overplay the NN surrogate in this notebook and instead let it be used as the simplest case of how to set this kind of thing up.
  • cell 25: thank you, updated to use pathlib and removed os import
  • cell 12: no problem, left a lot of parameters like this unchanged, notebook works fine with smaller sidewall angle (I set it to 9 degrees)
  • cell 14: changed ModeSpec to say num_modes=1, and updated to add_field_monitor instead of domain_field, checked boolean directly
  • cell 16: removed enumeration, correct that i was not being used
  • cell 16/17: talked more about design plugin here to explain how we are using it to conveniently run the Monte Carlo simulation search. At every other point in the notebook where it comes up again (network hyperparameter optimization, final Bayesian optimization) I try and highlight again how we can simply use that plugin for all these different parts of the design problem
  • cell 22: I added a lot more text explanation and code comments throughout the rest of the notebook to explain what we were doing at each step. I'm happy to add more too if there are parts that are still unclear! good idea on the dataframe printing, the pandas dataframes do print very nicely like you said!
  • cell 248: whoops on the batch size, good catch (updated to 16 and used the actual variable haha).
  • cell 248: yes, the current PyTorch api allows you to specify fractions that sum to 1 instead of integer lengths - I changed the code to compute the integer lengths and input those instead for more compatibility, but are we giving users a manifest of the package versions they should be using for optional packages when running the notebooks? I would think we would want to make sure users are using the same python package versions as us.
  • cell 248: good call on the collate_fn. It was coming from how things were returning from the Result getter. I fixed that in the other PR and this all works much more smoothly and the data comes out of the data loader as tensors now as I would have expected.
  • cell 254: removed argument comment
  • cell 258: this seems to be due to the network trained on data where the top and bottom waveguide output transmissions sum to slightly less than 1 (differs by ~1e-2). From what I can tell, this is the case in the simulation data as well, so the best place the Bayesian optimization has access to predicts to be slightly less than 0.75 in the bottom waveguide because that's the pattern the network has learned. in the current version, the final merit function from the Bayesian optimization is ~7e-5, which ends up being mostly from not finding a solution with as close of an output power to the bottom waveguide (getting something like ~.742 with a goal of 0.75, which is something I'm printing out now to compare. the top is almost spot on 0.25 according to the network and the simulation of the final coupler). The reflection is very low in practice (~1e-6) so I think it is ok we are ignoring it and doesn't account for the full difference. The network actually matches well with simulation here and the simulation is also seeing a coupler with a total output power that is ~8e-3 off from summing to 1 (some small part of this is reflection, but probably there is some marginal power lost elsewhere either in simulation or somehow by the normalization). I think this is the reason the Bayesian optimizer with the network doesn't find a "perfect" solution, but I do think it's quite close when considering the actual simulation outputs and the data the network was trained on.
  • cell 259: changed plot to be over wavelength. To me, everything is looking fairly consistent (but this may be due to me going through and cleaning a few things up). Right now, the MSE for the test set on the network is ~4e-5 and the average MSE in the final plot across all the wavelengths is the same.

I am pushing a version of the more cleaned up notebook now, but it still needs to be run again to get all the cells in the right order. Every time I make changes and check outputs, the order gets scrambled so I'm going to wait until things are completely done to run final verification runs that get everything in the right order. I'll switch over to working the metalens notebook next and apply a lot of similar changes and address the below comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but are we giving users a manifest of the package versions they should be using for optional packages when running the notebooks? I would think we would want to make sure users are using the same python package versions as us.

Ah you're right, these notebooks are tied to tidy3d versions and we actually require pytorch 2.1 ... so yeah either this or fractions are both fine. I was just confused by the syntax for a moment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @groberts-flex this is awesome! from my side this is good to go, I think the notebook looks pretty great now. I guess the only thing one could additionally try now is to see how few simulations we could get away with during training, since it seems we're probably still using too many. but for illustration purposes I think this is already good enough, so I'm mostly just asking out of curiosity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general:

  • same comment as other notebook regarding pathlib

  • this notebooks barely has any explanation of what's going on, moreso than the other one; needs more text

  • some of the cells have super verbose output that clutters the notebook, is it possible to avoid this?

  • also needs to be more clearly what the design problem and targets are

  • intro:

    • might be because i'm not a native speaker but a library of metasurface posts by utilizing a random sampling of post dimensions sounds strange to me. maybe "unit cells"?
  • cell 5:

    • plot not showing? i think we need to use the regular 2d plotting functions for notebooks
  • cells 8ff.:

    • same comments as other notebook regarding methodology / data cleaning / descriptiveness, although final loss here looks more reasonable to me
  • cell 17:

    • super verbose output
    • lots of warnings about duplicate data points? is this indicative of a problem or fine? if fine, can they be suppressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comments, think this notebook is getting into better shape as well - running some final verifications and doing some more proofreading:

  • added pathlib
  • added more explanation of what's going on in markdown and comments
  • reduced a lot of the clutter (removed printing in hyperparameter search, changed Bayesian optimization part (see below))
  • added some more clarity on what the network is optimized for and what the loss function is computing
  • intro: good point, rewrote the intro in general to try and make more clear/clean up wording
  • cell 5: changed to a 2d plot
  • cells 8ff: adopted a lot of the same things as directional coupler notebook to make things run more smoothly and described more what was going on with the data cleaning (also printed the data frame)
  • cell 17: ended up changing the post selection. I think it's actually overkill to use Bayesian optimization on every post when really we are just sweeping the radius (and it takes forever to do even using the surrogate solver). So I did this much more simply and just picked based on a lookup table method (since we already compute at some point the predicted amplitude and phase for a sweep of radii in the allowed range).

will post the notebook with these changes after I get through the verification run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this is great! only two minor comments, otherwise good to go from my side

intro:

please refer the the Metalens and MidIRMetalens notebooks

can you include a link there?

cell 4:

t = np.linspace(0, 2 * np.pi, num_points + 1)[
            :-1
        ]  # avoid duplicate points at 0=2pi

you can do: t = np.linspace(0, 2 * np.pi, num_points, endpoint=False) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, added the links and updated the numpy linspace call (like that argument, did not know about it!)

@groberts-flex groberts-flex force-pushed the groberts-flex/dc_surrogate branch from 09b7590 to 82d31cf Compare January 30, 2025 13:57
@groberts-flex
Copy link
Contributor Author

These are both pretty cleaned up now and I have run them all the way through to get all the cells in order. I ended up taking the directional coupler back to 100 MC simulations because the neural network accuracy near the optimal point sometimes is not very good with the lower simulation count. I suspect the first time I ran it, there was a lucky randomization that caused the final solution to be well predicted by the solver possibly because it was close to one of the training points. It seems like we shouldn't need that much data to train this, but for now I am going to keep the count higher so that the notebook works consistently.

rebased on pre/2.8 and ran the formatter on both these notebooks.

@tylerflex
Copy link
Collaborator

Thanks @groberts-flex ! I had a few comments

Both Notebooks

  1. Should add these to the proper .rst files so they show up in the docs.
  2. Titles and intros should be more flashy :D. “Directional Coupler Surrogate” -> “Building an AI Surrogate Model for a Directional Coupler”. First sentence, should probably also include that we build a machine learning model for the directional coupler and use it to design …”. I feel also it might be useful to include a simple diagram about what wer’re doing (parameters -> set of devices -> scatter plot -> ML model).
  3. Would be useful to add a mention that users should install pytorch and where they can find more information about that. Throughout the notebooks too (for model building / training) would be helpful to add a link to their docs for users to learn more about these things if they aren’t familiar?
  4. How much does the hyper parameter optimization add? Not opposed to it, but wondering if we need it or could get away without it? It does add some complexity that could be making it harder to understand for the layperson.

Coupler

  • Cells [1] - [5] could probably use a bit more information. “First we define general parameters that won’t change during the sampling” … “Next, we set up a function to generate a directional coupler as a function of our design parameters …” etc. Basically introduce what’s generally happening here.
  • [19] curious why are there so many points on the left side of the plot? Is it from the bayesian optimization sampling? I would have naively assumed it would start to cluster sampling points around the optimal value?
  • Related to above, maybe you can explain why the bayesian optimization is needed here? I suppose it’s just a demonstration but looking at this, one could probably equivalently just sample more MC and pick the best one?

Metalens

  • “The use of cos ensures that the phase can differ from the target phase by a multiple of 2”. I think there’s an added benefit in that the cos accurately discounts the phase difference in its effect on the constructive interference at the focal point.
  • Slightly curious how to quantify the final result vs a more standard approach . For example, we have a notebook that’s very similar but with circular cylinders. How does it compare? I’m also curious how the design would look if you used some very simple linear interpolation to build the model, rather than an ANN? Might be worth adding a little discussion and link to other notebook with metalens design (if it’s published)
  • With 3 design parameters per cell, it’s maybe possible to do this with more standard approaches? Could be worth elaborating a bit at the end that this can be most useful in cases where we have more degrees of freedom and we can’t sample that space fully. Maybe for a follow up, we could try something more complex as a demo.

@groberts-flex
Copy link
Contributor Author

Thanks @tylerflex for the comments, I appreciate you taking the time to go through these notebooks! I'll work on making the changes to the notebooks, but first I wanted to give some more context/thoughts on these in case it changes the direction we want to go with this PR. I inherited both these notebooks to try and clean up and get into good states as examples. I've done some simplification and made modifications so they could use the design plugin and PyTorch functions more than they were. Some parts of the design process in them I've removed, but other parts I kept in there because I wasn't sure if they were meant to be in there for some reason. For example, the Bayesian optimization in the directional coupler I kept, but it does seem like overkill and I agree with you that additional Monte Carlo sampling would quickly find as good of a design. I could also modify this part to run gradient descent on the two coupler parameters by backpropagating through the PyTorch network until we find a good design (that might converge really quickly).

The general problem with both of these (as you've noticed) is that the neural network is not a very critical part of the design process. In the directional coupler, we do see an improvement from pure random sampling, but the benefit is pretty marginal. In the metalens surrogate notebook, it's almost impossible for me to justify the use of the NN surrogate. 100 MC simulations are fed into the training and then we ultimately just need to interpolate post radius based on a 1D lookup table since we use cylindrical posts in the end. In fact, in the MidIRMetalens notebook that is published, this table is built by just running 20 simulations. The resulting focal plane fields in that notebook look roughly the same as when using the NN surrogate (both are based on the same center wavelength although I did swap in a lower index substrate in the NN surrogate example rather than use silicon posts on silicon), so it's not adding any benefit and in fact uses more simulations. The network learns a mapping for elliptical posts instead of cylindrical, but then in the end we don't need the elliptical library for the actual example. It also learns a little bit odd of a mapping because it ignores the birefringent nature of an elliptical post and just records the p-pol to p-pol transfer function (this makes certain elliptical posts that cause polarization rotation to appear as if they have lower transmission).

With that extra context, is the metalens notebook of interest still? If so, I'll definitely add some more discussion in there to clarify the technique and the comparison to other techniques and address the other comments.

As far as the hyperparameter optimization goes, in the directional coupler, for example, I see validation losses that vary by 2 orders of magnitude for different hyperparameter selections so it does seem like it is adding some value. This was a feature of the notebook that was in there when I received it and the main change I made was to have it leverage the design plugin to show another example of how that can be useful. I think it might be useful for someone reading the notebook to see as opposed to hard coding a set of parameters.

Sorry for a really long reply here! I'm also happy to jump on a call if that's easier!

@tylerflex
Copy link
Collaborator

@groberts-flex totally agree with everything here. guess we need to clarify what the intention of the notebooks are and that would guide some of these decisions. I'm ok with publishing mostly as is and then adding more of my comments after we discuss more with yannick perhaps?

@groberts-flex groberts-flex force-pushed the groberts-flex/dc_surrogate branch 2 times, most recently from ee41014 to de37325 Compare January 31, 2025 21:15
@tomflexcompute
Copy link
Contributor

tomflexcompute commented Jan 31, 2025

Thanks @groberts-flex for cleaning up the notebooks. I only finished the DC one but most comments are general so might be applicable to the other notebook as well. I'll take a look at the other one shortly. There is a long thread already but I didn't read it all closely so my comments might be redundant to others' already so feel free to take it accordingly.

  1. The intro is a little verbose in my personal opinion. "We aim for 75% of the light input in one waveguide to be transferred evanescently to a coupled waveguide. The remaining 25% of the input power should be retained in the original waveguide. This coupling optimization boils down to choosing two geometric paremeters that control the rate and amount of transfer between the waveguides. First, to facilitate coupling, the two waveguides are brought close together. This spacing is parameterized by a variable wg_spacing_coup in the notebook and controls the transfer rate between the waveguides. Second, the total amount of transferred power depends on the length the two waveguides remain in close contact. We use a variable coup_length to parameterize the total coupling length." can be simplified to something like "We aim to design a directional coupler (DC) with a 75/25 splitting ratio. This coupling optimization boils down to choosing the gap size wg_spacing_coup and coupling length coup_length of the DC."

  2. In the first part of the notebook where the DC simulation is defined, refer to this tutorial by saying something like "The DC simulation setup process follows the previous tutorial".

  3. In the comments above cell[7], link to design plugin tutorial

  4. In our convention for the notebook title (h1 header) we only capitalize the first letter of the first word.

  5. Might be good to add some h2 headers to section the notebook for a better structure. For example, ## Simulation Setup, ##Design Space Exploration, ##Neural Network Surrogate Model, ##Bayesian Optimization. Note that in our convention h2 headers capitalize the first letter of each word. A bit confusing I agree but it's set up this way early on somehow so we need to follow it for consistency.

  6. Notebooks need to be added to the rst file. I think both notebooks can be added to docs/features/advanced.rst

  7. In terms of practicality, I doubt anyone would actually design a DC this way but as a demonstration it works well.

@groberts-flex
Copy link
Contributor Author

  1. changed title, opening sentence of DC notebook and added a diagram to show the flow of the notebook
  2. mentioned PyTorch and which version the notebook has been verified with and included links to installation information and other documentation at appropriate places in notebook
  3. added some more explanation to cells [1]-[5]
  4. removed the Bayesian optimization part and replaced with simple gradient based optimizer that uses the NN surrogate (much faster and seems to get to a better solution). I think the reason there were so many points that were not in the optimal range is that optimizer will sample a certain number of random points at the beginning before starting to sample more intelligently

Based on some more discussion with @yaugenst-flex this morning, I removed the metalens surrogate from this PR. The work is saved on another branch (groberts-flex/metalens_surrogate), but for now it's a little unclear if it is a good example of using a surrogate model. I'll keep thinking on that one to see if there are some modifications to make that notebook more convincing of an example. Also happy to re-evaluate including it if there is interest in having in there as is!

@groberts-flex
Copy link
Contributor Author

@tomflexcompute - thanks for your comments and for going through the notebooks. As of now, the plan is to just put the directional coupler one in there so no need to also review the metalens one at this time! Working now on the changes you suggested!

@groberts-flex groberts-flex force-pushed the groberts-flex/dc_surrogate branch from de37325 to 3820cef Compare January 31, 2025 21:53
@groberts-flex
Copy link
Contributor Author

Updated the DC notebook and added it to the docs/features/advanced.rst file!

@groberts-flex groberts-flex merged commit 83d3865 into pre/2.8 Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants