-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature[notebooks]: add surrogate notebooks #228
Conversation
Thought more about the meta surface example and figured it made the most sense to change to a circular post for the metalens, so I changed the notebook to build a surrogate for the general elliptical posts and then search it in the Bayesian optimization with only a single radius and no post rotation. This way, the notebook can be used as a starting point to create a more general metasurface library in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @groberts-flex for going through these, must be quite tedious! Great that they run, and will be super valuable to have in the example library! One general comment about the notebooks: mention that they require at least tidy3d 2.8 to run (because of the plugin changes required).
Overall they look pretty nice, but they need a lot more hand-holding for readers in terms of text explanations and comments.
I left comments for each of the notebooks, let's discuss methodology there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going from top to bottom but the cell numbers are not in order (in the notebook, might be good to re-run them to have them sorted?):
General notes:
- what exactly is this notebook supposed to do 😄 i was having a hard time understanding the fitness function - basically it's just MSE? if i understand correctly, we are just trying to create a 25/75 coupler? that's fine, but needs to be stated clearly somewhere (ideally at the top)
Cell by cell:
- intro:
- "with the critical advantage that the neural network"
- we're doing 100 simulations for a two parameter sweep, so the speed advantage is a bit questionable. i think it's fine if we don't advertise this part but just reframe as "you can use tidy3d + design plugin for ML, here's how" (what do you think @tomflexcompute?)
- cell 25:
this is fine in principle but i'd really like us to start using
output_dir = "./misc/dc_surrogate/" mc_data_dir = os.path.join(output_dir, "data") os.makedirs(mc_data_dir, exist_ok=True)
pathlib
, so this would read:and then similar changes throughoutoutput_dir = Path("./misc/dc_surrogate/)" mc_data_dir = output_dir / "data" mc_data_dir.mkdir(exist_ok=True)
- cell 12:
sidewall_angle = np.pi / 6
; ~30 degree sidewall angle is pretty unrealistic, could we change this to at most 10? not required but would make the whole thing a bit closer to reality
- cell 14:
- i think the
domain_field
parameter should be changed to be more explicit, e.g.field_monitor
oradd_field_monitor
- we actually changed the behavior in
pre/2.8
so thatmode_spec
is not a required parameter anymore, so in the monitor definitions you can either removemode_spec=td.ModeSpec()
or perhaps be more explicit by usingmode_spec=td.ModeSpec(num_modes=1)
. i think both would be fine, given that these notebooks can't be backward compatible anyways. if domain_field == True:
🤔
- i think the
- cell 16:
for i, (monitor, direction) in enumerate
: i think thei
is unused so theenumerate
is not necesary?
- cell 16/17:
- i think there needs to be a paragraph here explaining the design plugin a bit here, since actually one of the main points of the notebook is highlighting how easy it is to generate this data with it. i guess we're doing a good job because it's really just a few lines of code in the notebook, so we need to make more of an effort to highlight that
- cells 22 ff.
- there is basically only code in these cells (rest of the notebook). i think there needs to be much more explanation what's going on in the data cleaning, model setup, etc., otherwise it seems almost impossible for someone new to this to follow along.
- cell 22:
- would be good to just print the final dataframe to see what the training data actually looks like
- cell 248:
batch_size = 64
but is unused and hard-coded to16
. let's usebatch_size = 16
consistentlytorch.utils.data.random_split
: i guess this was trained on pytorch>2? becauserandom_split
used to expect integers (that summed to the total no. of samples). i'm not sure if this might lead to unexpected behavior for earlier pytorch versions. maybe best to stick with the old API?collate_fn=lambda x: x
: i'm a bit worried about thiscollate_fn
, it's pretty nonstandard (why doesn't it return inputs / labels?). i guess it probably works as expected but it needs some more explanation. kind of difficult to follow along the shapes of the data. needs more comments and text. i might just not be understanding it correctly but i fear there is something weird going on here with the data.
- cell 254:
- clarify the
xi
and remove debug comment
- clarify the
- cell 258:
- just a comment, but it should be entirely feasible in this setup to hit the coupling (at least the ratio) exactly, but i guess that's due to the prediction error in the NN?
- cell 259:
- related to above comment about data, but it does seem like there is something weird going on, since the NN apparently is fitting perfectly to all the data splits but there is still significant difference to the simulation data, so something doesn't quite add up. also the plot should probably be over wavelength
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for these comments, really helpful in getting this notebook into good shape! I'll write some comments back here on them to answer any questions on them/call out changes I made/track that I addressed all of them:
- general comment / intro: wrote this much more neutrally (i.e. - without overselling what the neural network is doing) and tried to set up the design problem being solved in the notebook more clearly. I framed it more as we discussed to showcase that it is easy to generate training data for neural networks using tidy3d and that we can in principle use this for further optimization. There was a small bug in the plotting of the fitness function of the Monte Carlo data that I fixed that was making it seem like all the random points had low performance. Actually, there are many near-optimal points just from random sampling (think this isn't necessarily surprising). By using the neural network, the performance does improve marginally but there are also a lot of other ways to get that much improvement especially when starting from a bunch of random samples. Hopefully, this notebook can still show people how to set up neural networks using tidy3d design plugin, but I do agree with you that it is wise to not overplay the NN surrogate in this notebook and instead let it be used as the simplest case of how to set this kind of thing up.
- cell 25: thank you, updated to use pathlib and removed os import
- cell 12: no problem, left a lot of parameters like this unchanged, notebook works fine with smaller sidewall angle (I set it to 9 degrees)
- cell 14: changed ModeSpec to say num_modes=1, and updated to
add_field_monitor
instead ofdomain_field
, checked boolean directly - cell 16: removed enumeration, correct that
i
was not being used - cell 16/17: talked more about design plugin here to explain how we are using it to conveniently run the Monte Carlo simulation search. At every other point in the notebook where it comes up again (network hyperparameter optimization, final Bayesian optimization) I try and highlight again how we can simply use that plugin for all these different parts of the design problem
- cell 22: I added a lot more text explanation and code comments throughout the rest of the notebook to explain what we were doing at each step. I'm happy to add more too if there are parts that are still unclear! good idea on the dataframe printing, the pandas dataframes do print very nicely like you said!
- cell 248: whoops on the batch size, good catch (updated to 16 and used the actual variable haha).
- cell 248: yes, the current PyTorch api allows you to specify fractions that sum to 1 instead of integer lengths - I changed the code to compute the integer lengths and input those instead for more compatibility, but are we giving users a manifest of the package versions they should be using for optional packages when running the notebooks? I would think we would want to make sure users are using the same python package versions as us.
- cell 248: good call on the collate_fn. It was coming from how things were returning from the Result getter. I fixed that in the other PR and this all works much more smoothly and the data comes out of the data loader as tensors now as I would have expected.
- cell 254: removed argument comment
- cell 258: this seems to be due to the network trained on data where the top and bottom waveguide output transmissions sum to slightly less than 1 (differs by ~1e-2). From what I can tell, this is the case in the simulation data as well, so the best place the Bayesian optimization has access to predicts to be slightly less than 0.75 in the bottom waveguide because that's the pattern the network has learned. in the current version, the final merit function from the Bayesian optimization is ~7e-5, which ends up being mostly from not finding a solution with as close of an output power to the bottom waveguide (getting something like ~.742 with a goal of 0.75, which is something I'm printing out now to compare. the top is almost spot on 0.25 according to the network and the simulation of the final coupler). The reflection is very low in practice (~1e-6) so I think it is ok we are ignoring it and doesn't account for the full difference. The network actually matches well with simulation here and the simulation is also seeing a coupler with a total output power that is ~8e-3 off from summing to 1 (some small part of this is reflection, but probably there is some marginal power lost elsewhere either in simulation or somehow by the normalization). I think this is the reason the Bayesian optimizer with the network doesn't find a "perfect" solution, but I do think it's quite close when considering the actual simulation outputs and the data the network was trained on.
- cell 259: changed plot to be over wavelength. To me, everything is looking fairly consistent (but this may be due to me going through and cleaning a few things up). Right now, the MSE for the test set on the network is ~4e-5 and the average MSE in the final plot across all the wavelengths is the same.
I am pushing a version of the more cleaned up notebook now, but it still needs to be run again to get all the cells in the right order. Every time I make changes and check outputs, the order gets scrambled so I'm going to wait until things are completely done to run final verification runs that get everything in the right order. I'll switch over to working the metalens notebook next and apply a lot of similar changes and address the below comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but are we giving users a manifest of the package versions they should be using for optional packages when running the notebooks? I would think we would want to make sure users are using the same python package versions as us.
Ah you're right, these notebooks are tied to tidy3d versions and we actually require pytorch 2.1 ... so yeah either this or fractions are both fine. I was just confused by the syntax for a moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @groberts-flex this is awesome! from my side this is good to go, I think the notebook looks pretty great now. I guess the only thing one could additionally try now is to see how few simulations we could get away with during training, since it seems we're probably still using too many. but for illustration purposes I think this is already good enough, so I'm mostly just asking out of curiosity
MetalensSurrogate.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general:
-
same comment as other notebook regarding pathlib
-
this notebooks barely has any explanation of what's going on, moreso than the other one; needs more text
-
some of the cells have super verbose output that clutters the notebook, is it possible to avoid this?
-
also needs to be more clearly what the design problem and targets are
-
intro:
- might be because i'm not a native speaker but
a library of metasurface posts by utilizing a random sampling of post dimensions
sounds strange to me. maybe "unit cells"?
- might be because i'm not a native speaker but
-
cell 5:
- plot not showing? i think we need to use the regular 2d plotting functions for notebooks
-
cells 8ff.:
- same comments as other notebook regarding methodology / data cleaning / descriptiveness, although final loss here looks more reasonable to me
-
cell 17:
- super verbose output
- lots of warnings about duplicate data points? is this indicative of a problem or fine? if fine, can they be suppressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the comments, think this notebook is getting into better shape as well - running some final verifications and doing some more proofreading:
- added pathlib
- added more explanation of what's going on in markdown and comments
- reduced a lot of the clutter (removed printing in hyperparameter search, changed Bayesian optimization part (see below))
- added some more clarity on what the network is optimized for and what the loss function is computing
- intro: good point, rewrote the intro in general to try and make more clear/clean up wording
- cell 5: changed to a 2d plot
- cells 8ff: adopted a lot of the same things as directional coupler notebook to make things run more smoothly and described more what was going on with the data cleaning (also printed the data frame)
- cell 17: ended up changing the post selection. I think it's actually overkill to use Bayesian optimization on every post when really we are just sweeping the radius (and it takes forever to do even using the surrogate solver). So I did this much more simply and just picked based on a lookup table method (since we already compute at some point the predicted amplitude and phase for a sweep of radii in the allowed range).
will post the notebook with these changes after I get through the verification run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks this is great! only two minor comments, otherwise good to go from my side
intro:
please refer the the Metalens and MidIRMetalens notebooks
can you include a link there?
cell 4:
t = np.linspace(0, 2 * np.pi, num_points + 1)[
:-1
] # avoid duplicate points at 0=2pi
you can do: t = np.linspace(0, 2 * np.pi, num_points, endpoint=False)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, added the links and updated the numpy linspace call (like that argument, did not know about it!)
09b7590
to
82d31cf
Compare
These are both pretty cleaned up now and I have run them all the way through to get all the cells in order. I ended up taking the directional coupler back to 100 MC simulations because the neural network accuracy near the optimal point sometimes is not very good with the lower simulation count. I suspect the first time I ran it, there was a lucky randomization that caused the final solution to be well predicted by the solver possibly because it was close to one of the training points. It seems like we shouldn't need that much data to train this, but for now I am going to keep the count higher so that the notebook works consistently. rebased on pre/2.8 and ran the formatter on both these notebooks. |
82d31cf
to
1298aa8
Compare
Thanks @groberts-flex ! I had a few comments Both Notebooks
Coupler
Metalens
|
Thanks @tylerflex for the comments, I appreciate you taking the time to go through these notebooks! I'll work on making the changes to the notebooks, but first I wanted to give some more context/thoughts on these in case it changes the direction we want to go with this PR. I inherited both these notebooks to try and clean up and get into good states as examples. I've done some simplification and made modifications so they could use the design plugin and PyTorch functions more than they were. Some parts of the design process in them I've removed, but other parts I kept in there because I wasn't sure if they were meant to be in there for some reason. For example, the Bayesian optimization in the directional coupler I kept, but it does seem like overkill and I agree with you that additional Monte Carlo sampling would quickly find as good of a design. I could also modify this part to run gradient descent on the two coupler parameters by backpropagating through the PyTorch network until we find a good design (that might converge really quickly). The general problem with both of these (as you've noticed) is that the neural network is not a very critical part of the design process. In the directional coupler, we do see an improvement from pure random sampling, but the benefit is pretty marginal. In the metalens surrogate notebook, it's almost impossible for me to justify the use of the NN surrogate. 100 MC simulations are fed into the training and then we ultimately just need to interpolate post radius based on a 1D lookup table since we use cylindrical posts in the end. In fact, in the MidIRMetalens notebook that is published, this table is built by just running 20 simulations. The resulting focal plane fields in that notebook look roughly the same as when using the NN surrogate (both are based on the same center wavelength although I did swap in a lower index substrate in the NN surrogate example rather than use silicon posts on silicon), so it's not adding any benefit and in fact uses more simulations. The network learns a mapping for elliptical posts instead of cylindrical, but then in the end we don't need the elliptical library for the actual example. It also learns a little bit odd of a mapping because it ignores the birefringent nature of an elliptical post and just records the p-pol to p-pol transfer function (this makes certain elliptical posts that cause polarization rotation to appear as if they have lower transmission). With that extra context, is the metalens notebook of interest still? If so, I'll definitely add some more discussion in there to clarify the technique and the comparison to other techniques and address the other comments. As far as the hyperparameter optimization goes, in the directional coupler, for example, I see validation losses that vary by 2 orders of magnitude for different hyperparameter selections so it does seem like it is adding some value. This was a feature of the notebook that was in there when I received it and the main change I made was to have it leverage the design plugin to show another example of how that can be useful. I think it might be useful for someone reading the notebook to see as opposed to hard coding a set of parameters. Sorry for a really long reply here! I'm also happy to jump on a call if that's easier! |
@groberts-flex totally agree with everything here. guess we need to clarify what the intention of the notebooks are and that would guide some of these decisions. I'm ok with publishing mostly as is and then adding more of my comments after we discuss more with yannick perhaps? |
ee41014
to
de37325
Compare
Thanks @groberts-flex for cleaning up the notebooks. I only finished the DC one but most comments are general so might be applicable to the other notebook as well. I'll take a look at the other one shortly. There is a long thread already but I didn't read it all closely so my comments might be redundant to others' already so feel free to take it accordingly.
|
Based on some more discussion with @yaugenst-flex this morning, I removed the metalens surrogate from this PR. The work is saved on another branch (groberts-flex/metalens_surrogate), but for now it's a little unclear if it is a good example of using a surrogate model. I'll keep thinking on that one to see if there are some modifications to make that notebook more convincing of an example. Also happy to re-evaluate including it if there is interest in having in there as is! |
@tomflexcompute - thanks for your comments and for going through the notebooks. As of now, the plan is to just put the directional coupler one in there so no need to also review the metalens one at this time! Working now on the changes you suggested! |
de37325
to
3820cef
Compare
Updated the DC notebook and added it to the |
Adding two surrogate notebooks that have minimal dependency on changes in the design plugin. There is one small change needed so that PyTorch can use the the design plugin Result object directly as a PyTorch Dataset due to implementing simple accessor and len functions (flexcompute/tidy3d#2189).
The directional coupler example is working well through a few tests.
However, right now, I am marking this as a draft because I'm a little unsure if we want to change how the metalens surrogate is working. I inherited most of this code and it is currently based on elliptical posts and optimizes a phase profile for just an x-polarized input source. The elliptical library is more interesting as a more expansive meta surface platform with independent control for orthogonal input polarizations, but for a lens, typically a symmetric cylindrical post is used as this yields a polarization independent output. I'm wondering if the current elliptical post usage is also what is leading to a non circular focal shape. I think this could be handled in the example in a few ways (in order from least to most changes required from the original notebook). I also wanted to check with this if there was a reason for them to be elliptical that I'm missing!
major_radius
andminor_radius
are set equal).