Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add shock tube species profiles as an experiment type #60

Open
bryanwweber opened this issue Jun 20, 2017 · 9 comments
Open

Comments

@bryanwweber
Copy link
Member

bryanwweber commented Jun 20, 2017

PyKED/ChemKED version

v0.1.5

Code sample

...
experiment-type: Species profile
apparatus:
    kind: shock tube
    institution: Stanford University
    facility: stainless steel shock tube
    inner-diameter: &diam
        value: 15.2
        units: cm
common-properties:
    pressure: &pres
        - 1.672 atm
        - uncertainty-type: relative
          uncertainty: 0.02
    temperature: &temp
        - 1182 K
        - uncertainty-type: relative
          uncertainty: 0.01
    composition: &comp
      kind: mole-fraction
      species:
        - species-name: H2O2
          amount:
            - 0.002046 #can one specify it in ppm by giving ppm as units?
            - uncertainty-type: relative
              uncertainty: 0.05
        - species-name: H2O
          amount:
            - 0.001113
            - uncertainty-type: relative
              uncertainty: 0.05
        - species-name: O2
          amount:
            - 0.000556
            - uncertainty-type: relative
              uncertainty: 0.05
        - species-name: Ar
          amount:
            - 0.996285 #can balance be specified here instead of a value?
            - uncertainty-type: relative
              uncertainty: 0.05
    time-shift: &timeshift
      - 0 s
      - uncertainty-type: absolute
        uncertainty: 0.01 ms
    assumptions: &assumptions
        thermal-boundary: adiabatic #vs. isothermal
        mechanical-boundary: constant pressure #vs. constant volume
        equation-of-state: ideal gas
datapoints: #is it necessary to include the temperature: *temp stuff for each item even though it's common?
    - csvfile: 'hong_pci2013_oh.csv'
      targets:
        - name: OH
          type: mole fraction # vs. concentration or absorbance
          species: OH
          absolute-uncertainty: 1E-6
          relative-uncertainty: .05
    - csvfile: 'hong_pci2013_h2o.csv'
      targets:
        - name: H2O
          type: mole fraction
          species: H2O
          absolute-uncertanity: 1E-6
          relative-uncertainty: .05
    - csvfile: 'hong_pci2013_abs227nm.csv'
      targets:
        - name: 'abs_227nm'
          type: absorbance
          wavelength:
            value: 227
            units: nm
          absorbing_species:
            - species: H2O2
              cross-section:
                value: 1.40E+05
                units: 'cm^2 mol^-2'
                relative-uncertainty: .1
            - species: HO2
              cross-section:
                value: 1.27E+06
                units: 'cm^2 mol^-1'
                relative-uncertainty: .1
          path_length: *diam
          absolute-uncertainty: .01

Proposed by Mike Burke's group at Columbia. Mildly edited by Bryan for clarity.

@kyleniemeyer
Copy link
Member

To start, we don't need the ' ' around strings, like in the csvfile and units fields

@bryanwweber
Copy link
Member Author

bryanwweber commented Jun 21, 2017

This is excellent! A few thoughts:

  1. ppm is not really necessary as a unit, just use E-06
  2. The time-shift field looks fine, but it should be specified in each datapoint
  3. I'm not sure about the assumptions field. How often are each of the options used? What is the utility of providing this?
  4. Yes, it is necessary to explicitly include each required option in a data point. Therefore, each data point needs a temperature, composition, and pressure. See Should common-properties be automatically filled into a datapoint element if unspecified? #59
  5. Rather than having a csv file with the data, it would IMO be better to have 3 YAML files, one for each data set. That will reduce the problems YAML has dealing with tables. You can check out the RCM example for a case where a two-column table is used, https://github.com/pr-omethe-us/PyKED/blob/master/pyked/tests/testfile_rcm.yaml With that file in mind, I'd propose the following schema for a single target:
datapoints:
  - temperature: *temperature
    composition: *composition
    pressure: *pressure
    time-shift: *timeshift
    species-history:
      time:
        units: s
        column: 0
      target:
        name: OH  # This is more or less arbitrary, for user convenience
        type: 
          - mole fraction  # or concentration or absorbance, need a field for units of concentration
          - uncertainty-type: relative
            uncertainty: 0.1
        column: 1
      values:
        - ...
  1. Absolute uncertainty and relative uncertainty are mutually exclusive, I think...
  2. For absorbance, maybe something like this:
datapoints:
  - temperature: *temperature
    composition: *composition
    pressure: *pressure
    time-shift: *timeshift
    species-history:
      time:
        units: s
        column: 0
      target:
        name: abs_227nm  # This is more or less arbitrary, for user convenience
        type: 
          - absorbance
          - wavelength:
             - 227 nm
             - uncertainty-type: relative
                uncertainty: 0.1
          - path-length:
              - 0.1 m
              - uncertainty-type: relative
                uncertainty: 0.1
          - absorbing-species:
              - species-name: H2O2
                 cross-section:
                   - 0.1E5 cm2/mol
                   - uncertainty-type: relative
                      uncertainty: 0.1
              - species-name: OH
                 cross-section:
                   - 0.2E5 cm2/mol
                   - uncertainty-type: relative
                      uncertainty: 0.1
        column: 1
      values:
        - ...
  1. I don't think the diameter should go in the apparatus field. We're trying to come up with an alternate file format for apparatuses, something that would store the details of each device and give it a unique identifier.

OK, lots to think about here! I'm not sure if this is the best way to handle the type of the species profile. Suggestions appreciated!

@kyleniemeyer
Copy link
Member

I agree that each file should only contain a single dataset or series—it's ok if there are multiple ChemKED files from a single paper, for example.

@kyleniemeyer
Copy link
Member

Regarding assumptions, I am also wary of putting something specific to a way of modeling (at least I assume that's what that is for) in a file that is meant to encode the measurement itself.

That said, if this sort of information is similar to (e.g.) pre-ignition pressure rise in a shock tube, or compression volume trace or post-compression heat loss in an RCM, then I think it should be included. However, I don't think that is the case.

@bryanwweber
Copy link
Member Author

Email from Mike Burke to Bryan:

Hi Bryan,

Thanks for your comments. Some followups below to continue the dialogue...

1 Good point, agreed.

2,4 It might be worth pointing out there's one key aspect of this type of multi-species experiment that I suspect would to to differences in how one would treat it compared to a series of ignition delay times -- each of the species time profiles are measured in a single experiment rather than multiple experiments. As a result, the time-shift (2), as well as the initial pressure, temperature, and composition (4) should be identical -- such that I would see value in listing them among common-properties only and not in the datapoints section because they would be redundant and because it would avoid someone making a mistake by entering different initial values for different datapoints.

3 Yeah, I'm not sure I even like the 'assumptions' field so much anyway -- I have the same philosophical objections to the assumptions block within our proposed shock tube file. The tricky thing with this Hong experiment that we're emulating in this YAML file is that there is a driver insert in the shock tube such that the authors suggest that it is nearly constant pressure rather than the more conventional constant volume assumption used for shock tube (based on their pressure traces). So in that regard, it is similar to the pressure trace that might in other experiments be used to model boundary layer effects or heat loss, or in the present case lack thereof.

5 Similar to 2,4, all of the species time profiles correspond to a single experiment not just a single paper, and have all the same common properties like initial temperature, pressure, composition, etc. In that case, does it still make sense to have 3 different YAML files to represent it? My students initially suggested csv files such that tables with titles for each column could be viewed very easily and read in similarly easily using pandas dataframes. In many species profile experiments, there could be tables consisting of many dozens of rows (i.e. species) rather than just a few. Another advantage of this format would be that it is a very common way that people right now represent their data in supplemental material for papers. Thoughts?

6 I would see value in having both relative and absolute uncertainties for species time profiles for example, where one could account for "calibration" type uncertainties (which perhaps might be relative) and one could account for signal-to-noise/detection limit type uncertainties (which perhaps might be absolute). At different points along the time profile, the relative uncertainties may take on different values on an absolute basis.

8 Ah, I see. I could see some value in doing that. I suppose you're planning on having an apparatus YAML format which describes an apparatus that could be identified by name or tag only within the experiment YAML file. I should mention that the shock tube diameter is a necessary component to predicting the absorption signal through a Beer's Law analysis.

@bryanwweber
Copy link
Member Author

bryanwweber commented Jun 22, 2017

Mike,

Thanks for the quick response! Some further clarifications:

such that I would see value in listing them among common-properties only and not in the datapoints section because they would be redundant and because it would avoid someone making a mistake by entering different initial values for different datapoints.

Values listed in the common-properties section have to be entered into every data point, although they can be added by reference, for instance, the following is the preferred method:

common-properties:
  temperature: &temp
    - value
  pressure: &pres
    - value
datapoints:
  - temperature: *temp
    pressure: *pres
    ...
  - temperature: *temp
    pressure: *pres
    ...

This is an intentional design decision on our part. We felt there were two likely scenarios:

  1. Properties in the common-properties block get implicitly autofilled for missing properties in each data point
  2. Properties in the common-properties block are explicitly referenced in each data point

In general, our philosophy is "explicit is better than implicit"; therefore, we chose the second scenario. We feel that the danger of someone forgetting to add a required value and having it implicitly filled from the common-properties with the wrong value is greater than the work required to add the property to each datapoint. Note, though, that even in the second scenario, the user is not required to type the value more than once, so we gain the advantages of reducing the risk of typos, while also forcing the user to think about what they're doing.

Yeah, I'm not sure I even like the 'assumptions' field so much anyway

I can see one important use case, which is specifying constant pressure vs. constant volume, as you mention. There are also some butanol experiments that use what Ron's group calls CRV (constrained reactor volume, I think) that they model as constant p-h rather than the typical constant u-v. Maybe this could be a part of the apparatus block? constant volume shock tube vs. crv shock tube vs. rcm or something?

FYI, we do have a keyword for the pressure rise that's sometimes specified in the constant volume shock tube experiments.

In that case, does it still make sense to have 3 different YAML files to represent it?

We think the answer to this question is yes. Obviously there is some concern for typos, since there are a bunch of shared properties for the single experiment. I think the solution to this is to specify the common properties once and use a script to write multiple files. We're already sort of planning to have converter scripts from internal data formats to the standard format, so this would fit into that pretty neatly.

Another option that we've (I've) considered is putting a binary file encoded as a string into the data file, something like

data: !!binary base64-string

See here also: http://yaml.org/type/binary.html This is something that Kyle and I haven't come to consensus on yet though. This would obviously not be human-readable, but a large table is barely human-readable anyways. If we went this route, we would prefer a standard data container format such as HDF5.

At different points along the time profile, the relative uncertainties may take on different values on an absolute basis.

If we can define how the relative uncertainty and absolute uncertainty combine, then that would be fine, I suppose, although I think my preference would be for the person writing the YAML file to just write out the absolute uncertainty for a single point in time.

I should mention that the shock tube diameter is a necessary component to predicting the absorption signal through a Beer's Law analysis.

Yes, I think the diameter should be a required property if the absorbance is specified, except the diameter (really, path length, right, since there can be multiple reflections?) should be specified in each data point, or in the common properties section and then referenced in each data point.

@tsikes
Copy link

tsikes commented Jan 29, 2020

I come at this from the perspective of writing a program that will eventually need to import data from a broad range of experiments. I like to break things down to experimental apparatus and observables with the computational pair being reactor/observables. With that in mind, I think it would be useful to keep the assumptions in the file because it provides explicit assumptions the experimentalists are making and would need to taken into account for modeling the experiments. This is also more versatile between experiment types as opposed to having to figure out what the assumptions going into a crv shock tube are. This doesn't necessarily have to be about how to set up the reactor though. An experimentalist could use this information to replicate and/or interpret the results. It's useful for multiple reasons. If assumptions is disliked, perhaps rename it to something like experimental details?

Another issue I see is that there are multiple ways of inputting vector data in PyKED. I think that inherently this data is not human readable and the vector data format should be consolidated. I am partial to a standard format such as HDF5. This is already a compromise for people who need to convert their files from whatever they're using for acquisition or from text formats. I dislike the idea of having multiple YAML files because it makes it more difficult to share data and it will be cluttered. This also scales poorly with more measurements. I think that uncertainties could be added as additional columns within the data file and if they are not specified then the common-property would fill in.

Related to the vector data issue: How would this format handle measurements with different scales and/or independent variables? Perhaps include a keyword to link Y to a given X or column?

I see the value in including the initial reaction temperature and pressure, but I also think that it is important that the initial experimental conditions be added. This allows for recalculation of post shock conditions if thermochemistry is altered. Second, it's more representative of the uncertainties of the actual measurements as opposed to the ballpark estimates often put on T5 and P5. Finally, these properties are important for a number of other experiment types so it could be a parent class property.

Currently, I need a shock tube file format that can differentiate between incident shock and reflected shock experiments, as well as, be able to save arbitrary vector observables from such experiments. I am compatible with incident shock shock and 0D reactors. For observables, I can take temperature, pressure, density gradient and various ways of writing out the species quantities (conc., mol frac, etc). However, looking forward I see the possibility of expanding that list to other reactor types and observables. I think that we could work together to merge my needs into chemKED/PyKED and make progress on a standard file format for the community.

@bryanwweber
Copy link
Member Author

Hi @tsikes! Thanks for your input. Certainly valuable to get notes from people who actually want to use this thing! I just wanted to follow up on one comment, which is that we do allow specifying filenames of CSV files for time-history input, which was added in #104 although it is apparently a little broken, see #115. We do (if I recall correctly) allow users to specify which column contains which data.

Unfortunately, as you can probably see from the commit history, this project is on a bit of a hiatus at the moment. We don't have any resources to push it forward, so if you're interested in taking that on, we'd love to have the help.

@mefuller
Copy link
Collaborator

Like @tsikes , I am also interested in developing a format for laser-schlieren data, but I think keeping the required fields to a minimum is best with the additional experimental descriptions being optional.

Maybe I'm missing something, but I think for the reflected shock experiment types, simulations should be performed by specifying the post-shock conditions T5, P5) that are present at time zero and then the time history of the observable - either a species profile as in laser absorption or H-ARAS.
For an observed density gradient in laser schlieren, we need the Mach number or shock speed (since we can easily calculate the Mach number), so I would propose specifying the initial temperature and pressure (P1, T1) and the observed shock speed and the calculated density gradient from the facility - attached is a sample of what I think a laser-schlieren ChemKED file could look like.

I know this means the thermo data, gas refractivities, etc. as used in the calculation of the density gradient are baked-in and not necessarily determinable from the data format, but I think this is something that would be added as optional fields versus trying to provide the raw signals and process those in PyKED. I do also think that alternatively specifying the incident shock conditions (T2, P2) might be more convenient for someone just looking at the data, but that processing the data by doing the ideal shock calculations first is "better".
LS_iBN_017.yaml.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants