Ability to differentiate decimal and floating point numbers #709

roll · 2020-10-14T08:42:37Z

Overview

At the moment we have (using a Python as a platform example but it's similar in other languages):

integer -> int (Python)
number -> decimal.Decimal (Python) - guaranteed to have 100% precision

We don't have a type to represent numbers when it's ok to lose some precision in favor of calculation speed. Decimals are really slow in Python/JavaScript/etc

Here is a root issue with benchmarking - frictionlessdata/frictionless-py#461

The text was updated successfully, but these errors were encountered:

rufuspollock · 2020-10-14T09:24:51Z

this does open the door to quite a bit more e.g. float vs double etc (we should look at other systems). I've wondered whether we should use the "format" argument on number to qualify things or similar vs having lots of number types.

…

_________________________________________________________________________ *Datopian *| https://datopian.com | Open solutions for a data driven world *DataHub* | https://datahub.io | GitHub for data *CKAN * | http://ckan.org <https://ckan.org/> | The world's leading data portal solution President - +44 7795176976 - @rufuspollock

On Wed, Oct 14, 2020 at 10:42 AM roll ***@***.***> wrote: Overview At the moment we have using a Python as a platform example but it's similar in other languages: - integer -> int (Python) - number -> decimal (Python) We don't have a type to represent numbers when it's ok to lose some precision in favor of calculation speed. Decimals are really slow in Python/JavaScript/etc Here is a root issue with benchmarking - frictionlessdata/frictionless-py#461 <frictionlessdata/frictionless-py#461> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#709>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABMDMTECRITVQCLG2R7NLDSKVQA5ANCNFSM4SQJTT2Q> .

roll · 2020-10-14T10:16:22Z

Extending number definitely is a nice option.

It can have something like precision property:

infinite (default)
double
single

I didn't mention it in the first place just because I'm not sure whether it's ok to have data types that maps to N native types e.g. frictionles:number -> python:decimal;double;float

BTW I think it's a really important thing to consider as our dependency on decimals withouth an opt-out option makes Frictionless really useless at crunching numbers and we get more and more people coming from Pandas/Numpy (cc @akariv)

drunkcod · 2021-06-05T13:58:45Z

Out of curiosity which way is this leaning? "precision" or "format"?

Asking since I'm in a position where I'm pondering ways to augment my own datapackages with both the above "singe", "double", "infinite" numbers but actually even more important to me for integers (=> (s)int[8|16|32|64)

My thoughts thus far has been centered on making it custom formats since from what I understand unknown formats are validationwise ignored so it would be transparent to other consumers.

edit: one perhaps even better solution could be to leverage the maxLength constraint and define it as the number of bytes for numbers and integers.

That would limit the number of necessary new symbols avoid language specific names and let implementors pick the most suitable mapping for fp32,64,80 and int types.

roll · 2021-06-06T09:36:10Z

@drunkcod
Currently, Frictionless Framework uses field.floatNumber (bool, default: False) - https://framework.frictionlessdata.io/docs/references/api-reference/#fieldfloat_number - similar to bareNumber notation - https://specs.frictionlessdata.io/table-schema/#number

No support from the specs yet

dafeder · 2022-10-20T19:10:24Z

Any movement on this, even in terms of an unofficial pattern people have been using? Building some tooling and using the table schema where it would be very useful to specify a currency field, IE a number field with two decimal place precision. I can make up my own pattern but would like to use something with some precedent.

roll · 2022-10-21T10:55:33Z

I think we need to prioritize this feature as it's critical for a lot of cases

rjgladish · 2022-10-21T12:23:05Z

Consider existing standards for low precision real number encoding, such as FP16, or Linear11
https://en.wikipedia.org/wiki/Half-precision_floating-point_format?wprov=sfla1,

Also, Q notation is one option for flexible fixed point encoding. See Also in the following article has more info:
https://en.wikipedia.org/wiki/Fixed-point_arithmetic?wprov=sfla1

If you are encoding [0...1) values such as covariance coefficients, you can fix the exponent and use an 8 bit mantissa.

dafeder · 2022-10-21T20:27:41Z

We may be trying to solve too many problems at once here, and there may be a reason so many standards just avoid dealing with this problem, but that doesn't exactly help people who need to represent something specific like currency that has two decimal places. If we wanted to just address the decimal problem, it could be with a new constraint of scale or something similar. IE maxLength of 12 but scale of 2 means you can have a number with 12 total digits, two of which are to the right of the decimal. This seems consistent with both python's decimal module and how decimal types are implemented in most SQL systems. To be closer to SQL we could use precision instead of maxLength but maxLength already exists and the two seem redundant.

Another solution I've seen suggested is "multipleOf": 0.01. But I can't say I love that :)

rufuspollock · 2022-10-25T06:30:35Z

@roll i'd prefer doing the simplest thing possible at present and use format on number and have a small set of recommended supported values e.g. the classic float, double etc but allow given community to add whatever agreed values they want. You could even hack decimal places into this to start with by doing decimalPlaces2, decimalPlaces3 etc.

@roll i'm happy with you to run with a recommendation on this one and trial into the frictionless framework and if you have something working well for us to "standardize" as it were.

dafeder · 2023-08-17T14:40:12Z

Hello - any current guidance on the use-case I described above? We have still not committed to a particular strategy for our currency columns.

nichtich · 2024-01-04T09:58:35Z

There is another isue for currency types: #352. In general I'd prefer to stick to an existing standard of data types such as XML Schema Datatypes or Spreadsheet datatypes instead of inventing our own solution.

roll added the Table Schema label Oct 14, 2020

roll changed the title ~~Add new numeric type: floating point number~~ Ability to represent floating point number Oct 14, 2020

roll changed the title ~~Ability to represent floating point number~~ Ability to represent floating point numbers Oct 14, 2020

roll added this to the v2 milestone Apr 7, 2023

roll added this to Open Knowledge Apr 14, 2023

nichtich mentioned this issue Jan 7, 2024

Reconsider data types #867

Closed

roll self-assigned this Feb 6, 2024

This was referenced Feb 6, 2024

Ability to constraint number precision #879

Closed

Clarify numeric precision and add exactNumber property frictionlessdata/datapackage-v2-draft#33

Closed

roll added the proposal label Feb 6, 2024

roll changed the title ~~Ability to represent floating point numbers~~ Ability to differentiate decimal and floating point numbers Feb 21, 2024

roll added discussion epic and removed proposal labels Feb 21, 2024

roll removed this from the v2 milestone Mar 11, 2024

roll mentioned this issue Apr 12, 2024

Data representation model frictionlessdata/datapackage-v2-draft#49

Closed

roll removed the discussion label Apr 12, 2024

frictionlessdata locked and limited conversation to collaborators Oct 21, 2024

roll converted this issue into discussion #1009 Oct 21, 2024

github-project-automation bot moved this to Done in Open Knowledge Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Ability to differentiate decimal and floating point numbers #709

Ability to differentiate decimal and floating point numbers #709

roll commented Oct 14, 2020 •

edited

Loading

rufuspollock commented Oct 14, 2020 via email

roll commented Oct 14, 2020 •

edited

Loading

drunkcod commented Jun 5, 2021 •

edited

Loading

roll commented Jun 6, 2021

dafeder commented Oct 20, 2022

roll commented Oct 21, 2022

rjgladish commented Oct 21, 2022

dafeder commented Oct 21, 2022 •

edited

Loading

rufuspollock commented Oct 25, 2022

dafeder commented Aug 17, 2023

nichtich commented Jan 4, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Ability to differentiate decimal and floating point numbers #709

Ability to differentiate decimal and floating point numbers #709

Comments

roll commented Oct 14, 2020 • edited Loading

Overview

rufuspollock commented Oct 14, 2020 via email

roll commented Oct 14, 2020 • edited Loading

drunkcod commented Jun 5, 2021 • edited Loading

roll commented Jun 6, 2021

dafeder commented Oct 20, 2022

roll commented Oct 21, 2022

rjgladish commented Oct 21, 2022

dafeder commented Oct 21, 2022 • edited Loading

rufuspollock commented Oct 25, 2022

dafeder commented Aug 17, 2023

nichtich commented Jan 4, 2024

This issue was moved to a discussion.

roll commented Oct 14, 2020 •

edited

Loading

roll commented Oct 14, 2020 •

edited

Loading

drunkcod commented Jun 5, 2021 •

edited

Loading

dafeder commented Oct 21, 2022 •

edited

Loading