Skip to content

Latest commit

 

History

History
49 lines (31 loc) · 2.28 KB

README.md

File metadata and controls

49 lines (31 loc) · 2.28 KB

#How to Run

  1. Generate metadata needed to validate incoming histograms python specgen.py validation/nightly/23.0a1/histogram_descriptions.json > histogram_specs.json

  2. Run map/reduce. on test data: python FileDriver.py scripts/dashboard.py json_per_line.txt out.txt

  3. Output/update ondisk data using out.txt from above. python mr2disk.py outdir < out.txt

Histogram View: There are x fields to narrow query by

have a category table that stores category tree: Each node has a unique id Level1 Product: Firefox|Fennec|Thunderbird Level2 Platform: Windows|Linux Level3 etc

size of this table can be kept in check by reducing common videocards to a family name, etc Can also customize what shows up under different levels..For example we could restrict tb, to have less childnodes.

Store the tree in a table, but keep it read into memory for queries, inserting new records

Then have a histogram table where columns: histogram_id | category_id | value where histogram_id is id like SHUTDOWN_OK, category id is a key from category table, value is the sum of histograms in that category...can be represented with some binary value

############## Evolution can be implemented by adding a build_date field to histogram table

TODO: How big would the category tree table be..surely there is a finite size for that

histogram table would be |category_table| * |number of histograms|, pretty compact

############### Map + Reduce Mapper should turn each submission into which looks like buildid/channel/reason/appName/appVersion/OS/osVersion/arch {histograms:{A11Y_CONSUMERS:{histogram_data}, ...} simpleMeasures:{firstPaint:[100,101,1000...]}} Where key identifies where in the filter tree the data should live..Note a single packet could produce more than 1 such entry if we want to get into detailed breakdowns of gfxCard vs some FX UI animation histogram

Reducer would then take above data and sum up histograms + append to simple measure lists based on key

This should produce a fairly small file per day per channel(~200 records). Which will then be quick to pull out and merge into the per-build-per-histogram-json that can be rsynced to some webserver. This basically a final iterative REDUCE on top of map-reduce for new data. Hadoop does not feel like the right option for that, but I could be wrong.