Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: convert nested objects to their native types #1648

Merged
merged 5 commits into from
Oct 14, 2016

Conversation

stephenplusplus
Copy link
Contributor

Fixes #1593

@stephenplusplus stephenplusplus added the api: bigquery Issues related to the BigQuery API. label Sep 29, 2016
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 29, 2016
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 898ae7d on stephenplusplus:spp--1593 into * on GoogleCloudPlatform:master*.

@c0b
Copy link
Contributor

c0b commented Oct 1, 2016

wow, it still somewhat not working because mode 'REPEATED' could be on every possible type! not only on 'RECORD';

try this query:

$ bq query --use_legacy_sql=false 'SELECT GENERATE_ARRAY(10, 0, -3) AS example_array'

[
  {
    "example_array": [
      "10", 
      "7", 
      "4", 
      "1"
    ]
  }
]
{ err: null,
  rows: [ { example_array: [ { v: '10' }, { v: '7' }, { v: '4' }, { v: '1' } ] } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'example_array', type: 'INTEGER', mode: 'REPEATED' } ] },
     ...
     totalRows: '1',
     rows: [ { f: [ { v: [ { v: '10' }, { v: '7' }, { v: '4' }, { v: '1' } ] } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: true } }

@stephenplusplus
Copy link
Contributor Author

Phew, thanks for catching that. I reorganized a bit, and I think I've covered that case now. Let me know what you think.

@stephenplusplus stephenplusplus force-pushed the spp--1593 branch 2 times, most recently from 74facf5 to 1e468f6 Compare October 3, 2016 17:40
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 1e468f6 on stephenplusplus:spp--1593 into 009bc99 on GoogleCloudPlatform:master.

Copy link
Contributor

@c0b c0b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments

break;
}
case 'FLOAT': {
if (!is.nil(value)) {

This comment was marked as spam.

break;
}
case 'INTEGER': {
if (!is.nil(value)) {

This comment was marked as spam.

function convert(schemaField, value) {
switch (schemaField.type) {
case 'BOOLEAN': {
value = value === 'true';

This comment was marked as spam.

This comment was marked as spam.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 1e468f6 on stephenplusplus:spp--1593 into 009bc99 on GoogleCloudPlatform:master.

@c0b
Copy link
Contributor

c0b commented Oct 3, 2016

from some more testing I just realized bigquery has more data types than JavaScript, like TIME, DATE, DATETIME (which is different than TIMESTAMP), only DATE is convertable to a JavaScript Date but you may not want to convert?

➸ node -p 'new Date("2016-10-01")'
2016-10-01T00:00:00.000Z
node ./bigquery/queries.js sync 'SELECT DATETIME "2016-10-01 03:45:23.34583"'
{ err: null,
  rows: [ { f0_: '2016-10-01T03:45:23.345830' } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'f0_', type: 'DATETIME', mode: 'NULLABLE' } ] },
     jobReference: { ... },
     totalRows: '1',
     rows: [ { f: [ { v: '2016-10-01T03:45:23.345830' } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: true } }
Received 1 row(s)!
[ { f0_: '2016-10-01T03:45:23.345830' } ]

losing precision is another problem, all TIME DATETIME TIMESTAMP in BigQuery can be up to micro seconds but a JavaScript Date is only up to milliseconds,

node ./bigquery/queries.js sync 'SELECT TIMESTAMP "2016-10-01 03:45:23.34583 Asia/Singapore"'
{ err: null,
  rows: [ { f0_: 2016-09-30T19:45:23.345Z } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'f0_', type: 'TIMESTAMP', mode: 'NULLABLE' } ] },
     jobReference: { ... },
     totalRows: '1',
     rows: [ { f: [ { v: '1.47526472334583E9' } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: false } }
Received 1 row(s)!
[ { f0_: 2016-09-30T19:45:23.345Z } ]

and losing precision in converting a INT64 to JavaScript Number, due to no true Integer in JavaScript; a JavaScript Number is only FLOAT64

$ node ./bigquery/queries.js sync 'SELECT ARRAY<INT64>[0x7fff1234deadbeef, -0x8000000000000000] AS example_array'
{ err: null,
  rows: [ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'example_array', type: 'INTEGER', mode: 'REPEATED' } ] },
     jobReference: { ... },
     totalRows: '1',
     rows: [ { f: [ { v: [ { v: '9223110580161593071' }, { v: '-9223372036854775808' } ] } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: false } }
Received 1 row(s)!
[ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ]

the queries.js is a modified https://github.com/GoogleCloudPlatform/nodejs-docs-samples/blob/master/bigquery/queries.js to print raw apiResponse

I'm not sure if you guys discussed this before, this can be different issue which I don't have a good solution; #1607 #1611 were about Datastore.

break;
}
case 'FLOAT': {
if (!is.nil(value)) {

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@stephenplusplus
Copy link
Contributor Author

@c0b thanks for bringing that up! I will look into that.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 1e468f6 on stephenplusplus:spp--1593 into 009bc99 on GoogleCloudPlatform:master.

@c0b
Copy link
Contributor

c0b commented Oct 3, 2016

I don't have a good solution for all above losing precision datatype conversions; but BYTES is wonderfully convertable to a nodejs Buffer type that I think you may add a clause into the convert function

node ./bigquery/queries.js sync '
SELECT b"abc©"'
{ err: null,
  rows: [ { f0_: 'YWJjwqk=' } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'f0_', type: 'BYTES', mode: 'NULLABLE' } ] },
     jobReference: { ... },
     totalRows: '1',
     rows: [ { f: [ { v: 'YWJjwqk=' } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: false } }
Received 1 row(s)!
[ { f0_: 'YWJjwqk=' } ]

a Buffer.from call would suffice https://nodejs.org/api/buffer.html#buffer_class_method_buffer_from_string_encoding

 node -p 'Buffer.from("YWJjwqk=", "base64")'
<Buffer 61 62 63 c2 a9>

when you do so; please also make sure stream writing also works, make sure writing a Buffer directly to a BYTES field works

here are my local changes with this pr:

 packages/bigquery/src/table.js | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/packages/bigquery/src/table.js b/packages/bigquery/src/table.js
index 761f862..1b53cf1 100644
--- a/packages/bigquery/src/table.js
+++ b/packages/bigquery/src/table.js
@@ -191,60 +191,58 @@ Table.mergeSchemaWithRows_ = function(schema, rows) {
   return arrify(rows).map(mergeSchema).map(flattenRows);

   function mergeSchema(row) {
     return row.f.map(function(field, index) {
       var schemaField = schema.fields[index];
       var value = field.v;

-      var fieldObject = {};
-
-      if (value === null) {
-        fieldObject[schemaField.name] = null;
-        return fieldObject;
-      }
-
       if (schemaField.mode === 'REPEATED') {
         value = value.map(function(val) {
           return convert(schemaField, val.v);
         });
       } else {
         value = convert(schemaField, value);
       }

+      var fieldObject = {};
       fieldObject[schemaField.name] = value;
       return fieldObject;
     });
   }

   function convert(schemaField, value) {
+    if (is.nil(value)) {
+      return null;
+    }
+
     switch (schemaField.type) {
       case 'BOOLEAN': {
         value = value === 'true';
         break;
       }
       case 'FLOAT': {
-        if (!is.nil(value)) {
-          value = parseFloat(value);
-        }
+        value = parseFloat(value);
         break;
       }
       case 'INTEGER': {
-        if (!is.nil(value)) {
-          value = parseInt(value, 10);
-        }
+        value = parseInt(value, 10);
         break;
       }
       case 'RECORD': {
         value = Table.mergeSchemaWithRows_(schemaField, value).pop();
         break;
       }
       case 'TIMESTAMP': {
         value = new Date(value * 1000);
         break;
       }
+      case 'BYTES': {
+        value = Buffer.from(value, 'base64');
+        break;
+      }
     }

     return value;
   }

   function flattenRows(rows) {
     return rows.reduce(function(acc, row) {

@stephenplusplus
Copy link
Contributor Author

Alright, support for BYTES has been added, as well as pre-API request encoding, so that Buffer & Date objects are converted to their JSON API-safe values (buffer.toString('base64') & date.toJSON()).

Regarding TIME, DATE, DATETIME, and TIMESTAMP, we haven't discussed this before, so a new issue would be great.

Thank you again for continuing to catch these things and make our BigQuery support much better 👍

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 33dabf3 on stephenplusplus:spp--1593 into 10cd2ac on GoogleCloudPlatform:master.

@qw3r
Copy link

qw3r commented Oct 10, 2016

+1

* JSON array of fields, which allows for nested and repeated fields. See
* a [Table resource](http://goo.gl/sl8Dmg) for more detailed information.
* pairs. Valid types are "string", "integer", "float", "boolean", "bytes",
* "recodr", and "timestamp". If the type is omitted, it is assumed to be

This comment was marked as spam.

This comment was marked as spam.

@callmehiphop callmehiphop merged commit cdd38a6 into googleapis:master Oct 14, 2016
@c0b
Copy link
Contributor

c0b commented Oct 20, 2016

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants