-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JTS - Method for describing units for a field #35
Comments
+1 for some sort of units support. It would need to handle scientific/physical units as well. To be compatible with @rgrp's proposal above, the unit hash for an acceleration could be something like this:
This is based on the unit strings generated by https://github.com/spatchcock/quantify. This would be very handy for automatic inspection, presentation and analysis of data. |
Perhaps |
@Floppy sounds good. Would we have a good enumeration of values - i guess @spatchcock quantify work is best. |
A work in progress of mine (with @Floppy) is https://github.com/spatchcock/calcJSON where we extend JTS and add a "unit" attribute. I am not sure it needs to be any more complex than a single string in the vast majority of cases, although currency units are perhaps exceptions to the rule given that their values vary through time (as opposed to, say, the 'metre') and therefore something like a time-stamp is necessary. Here are my thoughts regardless:
DimensionsDimensions are part of any unit definition so it is entirely sufficient to specify the unit and the dimension are implied. I would argue that it is the role of the user (probably via client libraries) to understand how to handle units and any role for dimensions (e.g. in operations, producing compound units, etc,) and the protocol simply needs to describe the unit. "Currency" is not a formal (physical) dimension but it is analogous to a dimension. Still, the specification of, for example, "GBP", would suffice to completely describe the intended meaning. Valid units and identifiersI have established a tentative list of units with unique identifiers here https://github.com/spatchcock/quantify/blob/master/lib/quantify/config.rb . In many cases the unique identifier ("label") for each unit is simply the unit symbol (e.g. m, kg, K, J, Pa, etc.). In other cases, where exotic chars are used in the symbols (e.g. "°"), or multiple unit variants exist (e.g. US and UK versions), the label is something different. Many follow JScience and therefore there is already some precedent to many of them, although I added many others myself. I tried to stay close to the unit symbol where possible, although I think enforcing basic characters and using underscores for whitespace would be good practice. Standard unit symbols are case sensitive so if they are used as unique identifiers case sensitivity has to be okay. Compound unit syntaxCompound units need to describe base unit multiplication, division (denominators) and raising to powers. There are a few ways to do this, e.g.:
I think JScience uses the "·", "/" and "^" characters. The Quantify Rubygem supports "·" and white- space for multiplication, "/" or negative powers for denoting denominators and either "^" or superscript characters (2,3 only) for powers. I would suggest not supporting superscript characters (these are only for presentation in Quantify - NOT unique unit labels). I would also suggest using white-space for multiplication rather than "·" which would be okay as long as unit descriptors use underscores so that there is no ambiguity. The "/" for delimiting denominators must only occur once, otherwise parentheses would be needed and opens the door to very messy unit descriptions. Examples
CurrencyIn my view, these strings are all that is needed to uniquely describe a unit (as long as the identifiers and compound syntax are standardised). Since currency requires more metadata (a time-stamp) then perhaps that is reason enough to require a more complex structure like those descried above. In the vast majority of cases, though, this would reduce to:
so it could be argued that a simple string should be allowed since (I imagine) that will cover most cases (maybe not?). Either way, "date" (or perhaps a metadata object) should probably be optional. Hope that helps! |
@spatchcock this is awesome and would like to get this officially in or at least an official extension (do we need a way to have JTS "extensions"). Re the currency I should explain that the year part is when people say stuff like: "prices in 2000 dollars" meaning they have also deflated to have everything in the same year prices. Re the type field: I agree on redundancy but it sure might make rendering easier for the average client (e.g. how do i work out that GBP is a currency?) |
My understanding was that JTS requires a few core attributes then allows any additional ones. I guess we can define a "unit protocol" in its own right which could simply be adopted by/used in conjunction with JTS without any changes. Is that correct? My take on the "type" field was that clients would need to understand how to use units. But then, I am not sure what use cases you have in mind. Rendering data could be one use case, as could using it in calculations/conversions. For rendering, a user might want to be able to render a humanised unit name or standard symbol (e.g. "Great British Pounds", "£"). For this case, simply labelling as the unit "currency" would not be enough as the additional information is not included (everyone could handle pounds but not necessarily other currencies). These humanized names/symbols could be specified as part of the protocol, but my question would be whether they therefore need to be described in the data every time or whether they are simply part of the standard, supported by client libraries, etc. A "currency" type would be useful for rendering rules such as limiting a quantity value to 2 decimal places. But then I wonder what other "types" we would support. My instinct is to think in terms of physical dimensions. There are 7 "base" physical dimensions, from which all other physical dimensions are derived. Would we want to support these (e.g. length, mass, temperature, time...)? And, it is conventional to name some "compound" dimensions (e.g. "energy" = mass x length^2 x time^-2; "acceleration" = mass x time^-2). Do we support these? Or do we support unnamed compound "types" ("currency per time")? And again, does this "type" actually need to be included in the data or simply inferred from the protocol? My answer would be that we define the units supported by the protocol and include in the definitions their base dimensions. This way, anyone wanting to comply with the protocol can do, but the information does not have to be contained within the data description itself - the unit description suffices and everything else is implied and gets delegated to the protocol. There are at least 3 types of quantity that I can think of that are not covered in the standard set of physical base dimensions but would be useful regardless: "currency" (e.g. £, $), "information" (e.g. bit, byte) and "item" (e.g. GDP in £ per capita, methane emissions in kg per head of livestock). We could choose to define these if we are going down the route of using dimensions. The other use case I can think of for data with machine-readable units is using the data in operations (e.g. addition, multiplication,... of quantities) or converting quantities into different units. This, to me, is the real important use case - being able to read in data with possibly a variety of units and perform the same operations on each with units dynamically identified and accounted for. This would certainly require knowledge of dimensions as well as conversion factors amongst other things. Again, I don't think the dimensions (the "type"?) need to be communicated in the data, but are simply implied by the units and compliance with the protocol. On the "prices in 2000 dollars" use case: I am not sure what someone might want to do with this data. If they want to operate on it using other non-currency units (e.g. $10 / 1 hr = $10/hr), then that would be okay. If they want to convert it to a different currency or the same currency at a different time, then that would be impossible without more information on the exchange rates. In the latter case, the "date" attribute would certainly convey useful information, as it would if the intention is simply to be able to render the data with the context described accurately. This seems to me to be quite an exceptional requirement in comparison with all other types of quantities (albeit perhaps quite common). Currency is odd in that the "conversion factors" float. This means that, in a sense, a yr 2000 $ is a different unit to a yr 2012 $ - they require different conversion factors; they mean different things. These types of units are similar to the variants of physical units that exist (e.g. UK/US gallons, UK/US tons) in that they have the same name but a different meaning. However, the other units have a small number of standard, fixed variants whereas currencies have an infinite number of different "meanings" depending on the time period and resolution you choose. The British Thermal Unit is perhaps the closest to this in that it has several definitions based on different experimental temperatures. I can imagine defining the BTU with a metadata field which describes the reference temperature in the same way as a $ might have a year or timestamp. However, the BTU's do have a set of standard variants so the analogy is not too close. I am not sure whether a separate "currency protocol" makes more sense... |
I have made a start here: https://github.com/spatchcock/dataprotocols/blob/master/source/unit-protocol.markdown Perfectly happy to extend into a more complex data structure if we think that is required. It would be good to try to think of any other cases (beyond the currency one) where more metadata would be required in association with units. (I've used markdown for now - not familiar with .rst). |
Any thoughts on the above? |
@spatchcock I like this :-) Suggestions:
|
@rgrp Great. I'll make those changes and raise a pull request. |
* Update governance.md First draft of governance page. Might need to tweak it a little bit later on. @roll have a look and see what you think. Mainly: do we want to mention wg members are part of the Frictionless community? It's nice to keep the link ATM, but we might want to skip that if the wg changes composition in the future. * Update governance.md Updating wording taking into account @peterdesmet's comments. * Update governance.md Improving a sentence.
Say I have real GDP in 2009 £m (i.e. in millions of £ in the year 2009) I have no way to specify this.
Propose two new fields:
Concerns
The text was updated successfully, but these errors were encountered: