-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider scrapping data from webpage for current week rather than doing API request #234
Comments
Spent a few hours prototyping this. Thus, considering parsing out event details from the DOM preview rather than accessible label. Calendar
I also thought of intercepting the network requests that Google Calendar makes so that I don't need to redundantly fetch from them API but can reuse theirs. Even beyond the question of whether I have the ability to intercept network requests (which is questionable since ad blockers complained about that feature missing in mw3), there are some challenges:
Going back to DOM parsing then. There are DOM nodes for event name, event location (which I ignore) and event time. Seemed simple until I realized that the event time sometimes only includes the event start time, not the end time. And, this is not just for 15minute events - sometimes 30 minute and even 45 min events don't include the end time, only the start. The end time is consistently present in the accessible event name, but that could differ too much between the locales. I compared the DOM notes to see if there is any other hint to dictate if this is a 15/30 or 45min event. There is a "height" attribute. But, besides the reliability concerns between screen sizes and UI settings, I found that the height is a bit different for CJK languages than EN, so I totally can't really on it safely. Looking at the extension stats, it's being used in more than several dozen of languages, so these are the cases I should support. The DOM nodes have class names that indicate some of these things. Besides the class names common to all events on the page, there are 7 that indicate some specific parts (whether it's occupying only part of the column width due to overlap, whether the event is occurring today, whether event has a handle, whether event should be grayed out, and etc). However, these class names are minified, so might not be very stable (though, Google Calendar seems to hash them rather than minify so they don't change too often, but still). I found a class name string that reliably indicates 15 minute events, though I didn't find ones for 30 or 45 mins events, thus this probably isn't going to work. Putting class name parsing into an ice box for now.
Thus, it seems that the only reliable place in the DOM from where I can get the event end time is the accessible label string, which looks like this in EN:
For personal use, my script has been reliably parsing out these accessible labels since 2021 - the label format for EN did not change in that time. The only issue I had was when event name included a comma, thus I avoided the commas in my event names. But the real challenge with parsing accessible labels is going to be the internationalization. Here is Chinise for example:
And to add fun, here is Myanmar, with non-Arabic numbers
To minimize the need to deal with parsing the locale-specific string, I will go with this:
More details on the step 3:
Fun fact: 12hr clock sucks (besides the fact that it is harder to parse). It is ambiguous. See https://en.wikipedia.org/wiki/12-hour_clock#Confusion_at_noon_and_midnight Implementation details:
|
I was a bit afraid that the accessible label code would be on the back-end, but no, it is all on the front-end. The format can differ between languages.
There is also special logic for tasks, reminders, scheduled slots, proposed events without rsvp, with rsvp, denied event, etc... I think if I don't find the calendar ID, I should just bail out of parsing that event (rather than bail out of parsing the entire page, unless failed to find calendar id for any event) so as to exclude all these non-event entities I also found that their DOM elements have a property with virtual dom data. I was excitedly looking if that would contain the calendar event data, but no, it just contains the DOM attribute data that I could already access via direct DOM APIs (which would also be more stable than accessing internal data structures) But, much better, there is a global > gcal.mvd.EL
'{START_DATE_TIME} – {END_DATE_TIME}'
> gcal.kvd.EL
'{FULL_DATE}, {TIME}'
> gcal.LDd.EL
'{DATE}, {START_TIME} to {END_TIME}' Useful util for exploring the global object: // Convert a complex object to JSON
stringifyJson = (data,short=true,seen=new Set())=>JSON.stringify(data, (_key, value)=>{
if(value == null || typeof value === 'string' || typeof value === 'number' || typeof value === 'boolean')
return value;
else if(seen.has(value)) return short ? undefined : '[RECURSIVE]';
seen.add(value);
if(typeof value ==='object') {
const result = {};
for (const key in value) {
if (Object.prototype.hasOwnProperty.call(value, key)) {
try {
const objectValue = value[key];
objectValue.constructor;
result[key] = objectValue === globalThis ? short ? undefined : '[GLOBAL_THIS]' : objectValue;
} catch (err) {
result[key] = short ? undefined : `[ERROR: ${err.message}]`;
}
}
}
return result;
}
else if(short && typeof value === 'function') return undefined;
else return value.toString()
})
// Convert gcal object to a string
stringifyJson(gcal);
// Copy the resulting string into an editor, pretty-print it as JSON, and explore! And then also this to quickly find interesting keys and values in the resulting object: resultingObject= {...}
a = new Set()
d = j=>Object.entries(j).forEach(([k,v])=>{
if(k.length > 10) a.add(k);
if(typeof v === 'string' && v.length > 4) a.add(v)
if(Array.isArray(v)) v.forEach(d)
if(typeof v === 'object' && v !== null) d(v);
})
d(resultingObject)
Array.from(a).sort() And then this to find a path to a string in a deeply nested object: resultingObject= {...}
l = (j,f,p=[])=>Object.entries(j).forEach(([k,v])=>{
if(v === f) { console.log([...p,k].join('.')); }
else if(typeof v === 'object' && v !== null) l(v,f,[...p,k]);
})
l(resultingObject,"Charge Apple Watch") Some findings:
While it has a lot of information I would love to have access to in the extension, as you can see, some of the property access paths are very long, making any reliance on these super fragile. To fetch JS bundles, Google makes a request like this: https://calendar.google.com/calendar/_/web/calendar-static/_/js/k=calendar-web.matasync.en.kiV42_DT6Uo.2020.O/am=QAICIAA7Ek8CAAAI/d=0/rs=ABFko3_cKNOcL2ZJoL-KJZ1fYcDgE694mA/m=syw9,syw8,xDNx2e,KHdXW,ZDBS7d,jjykEd,sy19u,sy19v,IiAxCb,s0ef2c,Hkkrld,synd,ws9Tlc,siKnQd,UAyYnd,sy2a,sy2b,ndDKmb,UItRMc,yqBu4c,zWPBS,qadpGd,rIjGQb,gCKuke,WMXaid,wzzigb,NmJjzb,beFWRb,Cc7Sob,RtZYV,B78tCd,ayMid,o83wje,C8yvoe,bUUOIe,sy1lt,KEohkb,EZnnmd,bAxIgb,sy1h2,ttg67c,I8m4he,SzkWee,lQR3Hd,bqZpcb,sy1a7,e8v9gb,MpJwZc,n73qwf,MtLh9c,...... very long string like this continues in the above, the After a bit of experimentation, I see that the bundle that includes the time formats are the following: this.gcal=this.gcal||{};(function(_){var window=this;
try{
_.B("tXMUsb");
var KDd,FDd;_.GDd=function(a,b){return b?FDd.format({GREGORIAN_DATE:a,ALTERNATE_DATE:b}):a};_.KL.prototype.va=function(a,b,c){return c?_.QCd(_.YCd(this,a.month),a.qb):_.QCd(_.XCd(this,a.month),a.qb)};_.ZCd.prototype.va=function(a){return _.aDd(this,a)};_.bDd.prototype.va=function(a){return _.dDd(a)};_.HDd=function(a,b,c){return _.Vx(_.G(a.ha,_.Ox,23),[b,c])};_.IDd=function(a,b){if(!a.ka||!a.ha)return"";b=a.ha(b);return a.ka.ka(b,!0)};_.JDd=new _.Sx("{START_TIME} to {END_TIME}");KDd=new _.Sx("{START_DATE} at {START_TIME} to {END_DATE} at {END_TIME}");
_.LDd=new _.Sx("{DATE}, {START_TIME} to {END_TIME}");FDd=new _.Sx("{GREGORIAN_DATE}, {ALTERNATE_DATE}");_.ML=function(a){_.U.call(this,a.Ha);this.ha=a.service.tb;this.Uq=a.service.Uq;this.vi=a.service.vi};_.N(_.ML,_.U);_.ML.Ga=_.U.Ga;_.ML.Ea=function(){return{service:{Uq:_.LL,vi:_.Sud,tb:_.bx}}};_.NL=function(a,b,c=!1,d=!1){c=c?_.Wud(a.vi,b):_.Uud(a.vi,b);d&&(a=a.Uq,a.ka&&a.ha?(b=a.ha(b),b=a.ka.va(b)):b="",c=_.GDd(c,b));return c};_.MDd=function(a,b,c=!1){return c?_.Vud(a.vi,b):_.Tud(a.vi,b)};
_.NDd=function(a,b){return a.ha.nS()?_.avd(a.vi,b):b.minute===0?_.Zud(a.vi,b):_.$ud(a.vi,b)};_.ODd=function(a,b,c){return KDd.format({START_DATE:_.Tud(a.vi,b.Cb()),START_TIME:_.NDd(a,b.ae()),END_DATE:_.Tud(a.vi,c.Cb()),END_TIME:_.NDd(a,c.ae())})};_.Ct(_.kv,_.ML);
_.C();
}catch(e){_._DumpException(e)}
///// trimmed...
})(this.gcal);
// Google Inc. Now, I can find the bundle for each of the other formatting strings that comprises the accessible event label and then fetch those for each language Found the function that formats time: _.NDd = function(a, b) {
/// if 24hr clock format enabled then do 6:30
return a.ha.nS() ? _.avd(a.vi, b) :
// else if 0 minutes, do 6am
b.minute === 0 ? _.Zud(a.vi, b) :
// else do 6:30am
_.$ud(a.vi, b)
} Also, interesting that I found code like this: Google Calendar languages: {
"af": "Afrikaans",
"az": "azərbaycan",
"id": "Bahasa Indonesia",
"ca": "Català",
"cy": "Cymraeg",
"da": "Dansk",
"de": "Deutsch",
"en_GB": "English (UK)",
"en": "English (US)",
"es": "Español",
"es_419": "Español (Latinoamérica)",
"eu": "euskara",
"fil": "Filipino",
"fr": "Français",
"fr_CA": "Français (Canada)",
"gl": "galego",
"hr": "Hrvatski",
"zu": "isiZulu",
"it": "Italiano",
"sw": "Kiswahili",
"lv": "Latviešu",
"lt": "Lietuvių",
"hu": "Magyar",
"ms": "Melayu",
"nl": "Nederlands",
"no": "Norsk (bokmål)",
"pl": "Polski",
"pt_BR": "Português (Brasil)",
"pt_PT": "Português (Portugal)",
"ro": "Română",
"sk": "Slovenčina",
"sl": "Slovenščina",
"fi": "Suomi",
"sv": "Svenska",
"vi": "Tiếng Việt",
"tr": "Türkçe",
"is": "íslenska",
"cs": "Čeština",
"el": "Ελληνικά",
"be": "беларуская",
"bg": "Български",
"mn": "монгол",
"ru": "Русский",
"sr": "Српски",
"uk": "Українська",
"kk": "қазақ тілі",
"hy": "Հայերեն",
"iw": "עברית",
"ar": "العربية",
"ur": "اُردُو",
"fa": "فارسی",
"ne": "नेपाली",
"mr": "मराठी",
"hi": "हिन्दी",
"bn": "বাংলা",
"pa": "ਪੰਜਾਬੀ",
"gu": "ગુજરાતી",
"ta": "தமிழ்",
"te": "తెలుగు",
"kn": "ಕನ್ನಡ",
"ml": "മലയാളം",
"si": "සිංහල",
"th": "ภาษาไทย",
"lo": "ລາວ",
"my": "မြန်မာ",
"ka": "ქართული",
"am": "አማርኛ",
"km": "ខ្មែរ",
"zh_HK": "中文 (香港)",
"zh_CN": "中文(简体)",
"zh_TW": "中文(繁體)",
"ja": "日本語",
"ko": "한국어"
} |
Looks like besides the language code, some other language-specific parameters are specified in the URL: The request for the HTML home page returns a script tag with those parameters included in the URL. Let's extract them: await fetch("https://calendar.google.com/calendar/u/0/r/week?pli=1", {"headers": {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:127.0) Gecko/20100101 Firefox/127.0"},}).then(t=>t.text()).then(r=>{
l='<script id="base-js" src="/calendar/_/web/calendar-static/_/js/k=calendar-web.matasync.';
s=r.slice(r.indexOf(l)+l.length);
return s.slice(0,s.indexOf('/m=base"')).replace('d=1','d=0');
}); Changing settings for each language and extracting that language-specific url part: [
"af.fGjYHqi3siQ.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38t2kWSQxCgHMYs21szC-klpIN-Zw",
"az.O9dsS0mL00k.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-GhVrGf7jBQ5RIhiPfOr6qnUv7mg",
"id.aZIa6c4zhPM.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko380p-O1HnMkmfop-u74FXjI_zuBTg",
"ca.O-_1HH8FjL8.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39LoBjXJDwlJ1l8nRa5YWkI5WJAyA",
"cy.q5BBAITHbDs.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38QxBSQisieYXEpfbifjf1FMPZLxQ",
"da.ZjXJzpTk2NQ.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38w7wwVS6vy0FSshnTFRFnaY_-dNw",
"de.2OZiqpfWTHw.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-_aa3qMTXDzk4moqM4tM7XaLgbkw",
"en_GB.KYHrsBTODfk.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_YfrVPQQ7-ugqib4_v0wTk60Oqig",
"en.kiV42_DT6Uo.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39lIyVIeFOxM6CrUuDmwE5CXKPeVw",
"es.WogQQhrZn6E.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_hzhp-cmiD6H-fKZGu-CqL53exkQ",
"es_419.4WrUvkTfE7E.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38XXUigONuWBtjcm3AyGFsIexgWSg",
"eu.FVyXvyQ-vBk.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39lBysf7io9wObOyT8OmBWkucpwFA",
"fil.ZG0z2n6Utsk.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-t2s8VG9NZD6Vosz20v0ou0H1gSw",
"fr.NeL94Vwnkkk.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39_YAnfYSOb96ct-wV94vbSeNaF9Q",
"fr_CA.yDuZVdev-2c.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38VJsMZyrH7O7Boj7BY1cy-NbN1WA",
"gl.m4lqE4mGoLA.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-3-tUHefVRfhBrkw3urg8pNDiobw",
"hr.f_13jldKXQU.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38wFyLVcEIG5Pw2jkVRe48PHmUrAw",
"zu.-Ndbezb-QjE.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38T-CKmgnsz3ZrtPnM0Kc_77ujR-Q",
"it.8xr9VoJDSVQ.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39EMB15h7Uf-WQ0xgELh-fDvwLCsw",
"sw.K_fsoydgg-k.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_-iLzAnxNNypnD8axQnKhqSDC3Bw",
"lv.8beiVA8-HfI.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38Bbob8EJLaffNMqXVlOetbGdt7VQ",
"lt.POBohtPFRKc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39dNttPvcsiVp1VyO6nwDiUzQC0wg",
"hu.Z45PzMZCOYs.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_RvbhZiA_-yqO1iJfLbYoJj3tqLA",
"ms.iY6PmE8PJw0.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_urRPMNmJVgaAbPkf_h3JGjFW5Eg",
"nl.Wfd3wg2oLAg.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39eKT9DEYKyZROZLmP-mVl0mPDKkQ",
"no._UH1m-jairc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_2Q01On91v7SPmlSxwNzrXqPk84Q",
"pl.HoTxzj70m44.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko387q6BCo_p4xmzsrpFdRYXNOUzeZw",
"pt_BR.yPVb0sWPdB8.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_2iipauXn_cofC3bPPKJGAs2ADyQ",
"pt_PT.M40ScvnltBc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-1yyAdr1_yLh2_DAHAzcKvZ1kO1Q",
"ro.o-J6YJwXM_Y.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-7FuKKra7twKFgP7PYjfl13VRkPQ",
"sk.O_XlP1I57Is.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_xzfE5Q5cT9dGECG9K6FuIX9-7nA",
"sl.iVc6mkEVHYE.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_F49K5WjbHkbPQgqywoiK4wIYeyA",
"fi.u4jLcFduYtA.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko383sOE93juaPXTKHbhv4HyFVUeVuQ",
"sv.NZYxm4-5grE.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-cRb50wPQHIRJ4dAzQY6l6WZHxYg",
"vi.jDliExU6ZZc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-7j_NRqcfBnRm8OdXONu-4_eKZgA",
"tr.YBgq_f4fsAs.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38BGabY2dw0GmHWlefnVzAAfch6iQ",
"is.8qGrdxsnvuA.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-QPANWC2JKRRJhAj757jH-ClfXfA",
"cs.xENm3_vyrB0.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_voHJz8EMeHCYOpulB2SnEKBSAEw",
"el._wUNKxcYJc0.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39SSwcrlGtBeBLsM7raY8Y39bbcbw",
"be.KqarmVCCkNE.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38KE6sQG8SBLfKzMU8b6_K-NtVpoQ",
"bg.TNjdIY1ZhxE.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_zZaCjReXnqgCHwp1NqDOGIK-yag",
"mn.M5V9EhgLRPw.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38UD8OeeMIw0zHIihoR-PBmmx0Csg",
"ru.JM2QQSAXLtM.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39mWI4tqJ2VKFbpk91yLzn30335qQ",
"sr.IAZvMlFG9B0.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38JBKny69eqkRrwLQfYFm4iXSse4A",
"uk.T3edqZOotJc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38BvNB5B4EE9gP_oY9NFc1yisZX3A",
"kk.dSk0-wa9tpo.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_pvVerfXbVWKNiWrIEAoNIhPwNaw",
"hy.FfLUK2mnymg.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-F98XiXCuizpat1r8O398JTxsJog",
"iw.lPRGc4QfCK0.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39t-lljHsElqM_xSa_R-pnjQmOEDA",
"ar.n7y-oa7O2zo.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko385XGuSt2zpkzb84j7MftY3vM5Dbg",
"ur.hKhfKhqQ738.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko381I65m_-hdJIjKzY0Eq3Q6V_diVQ",
"fa.geNz2OVQpOY.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko399Zg4iV-cYBXO1vpO47s6zSrkdSQ",
"ne.EcGx837Lpp4.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38z64qGvu9I41bOCtvs8ir8A2HZsw",
"mr.K0sef7tGL9I.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-whkOOYFwRV0YuLARrYU9QSh5heA",
"hi.WenmEHoVQr0.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38FQq2hjlcIHfy8Z4PU4YTjQY37mw",
"bn.Wov9pdgoJoE.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-ji-RDFUY-waVtOT4-Q_ICEyjRVQ",
"pa.8c7rkc9tc5s.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38yS8NeN4zyH-Kcb2PUSzMCpKqs6w",
"gu.rL2XGee_j98.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38ePxUJmwyZ3fFqpq8BkjFFEt-1CQ",
"ta.97lPW3Dw-jk.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39LdleOjtQSUJx1CfNVkdhvMvLJzQ",
"te.dcyfQzF_UF4.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38W5zlRXPfKwXSMzm297mW4ZisXuQ",
"kn.zHgUSeh1vOk.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39ih84cJ2WFl1CnSupT_FsQQpoaiw",
"ml.6OxWWkPNjzI.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38XISnQNXKkU0zXRrisP05JS2323A",
"si.wtLZozlWq5I.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3_d0cdzuK1-ncLdp7TTypnsv9-4rg",
"th.hZ4ZlMcBHEw.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-7dTPSxa-cS3Zdp0CbKHLm50LN0g",
"lo.yq1i_m3Dfzc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko39xwaGqdjxuomgsJICyTzyDRPuQZg",
"my.KnBiGO1WoOc.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-0H-R5e6La5iRfKmCyzA6dqhQbWg",
"ka.2TtkghregFM.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38eAq0Sl8Qcv2OJoAFKj7omFQAxaw",
"am.9IRwIRcj-uw.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko3-Qogi8tVhTPCo6z5Y01qJFEyOzhQ",
"km.l-4sjqqxBdY.2020.O/am=QAICAAArEk8CAAAI/d=0/rs=ABFko38A8e4XHgrE3DtOevCYw8XetVZRsA",
"zh_HK.Cq4_R4Oqa8c.2020.O/am=QAICAAArEk8CAAAE/d=0/rs=ABFko38iW_FtC5whx39xQTbzElvVG_W1kA",
"zh_CN.37fgdVobyrM.2020.O/am=QAICAAArEk8CAAAE/d=0/rs=ABFko3_vfz9WqfSlyyv061gvr_AFSrcNng",
"zh_TW.K7JItRhZf3s.2020.O/am=QAICAAArEk8CAAAE/d=0/rs=ABFko38r62uPGXG9Dj8CZfuB_mlunf53SA",
"ja.GIKsEBuWpIE.2020.O/am=QAICAAArEk8CAAAE/d=0/rs=ABFko3_vXk3Mt69gwOJzP_WILuq-LdTt0A",
"ko.tKvKcd-IL0I.2020.O/am=QAICAAArEk8CAAAE/d=0/rs=ABFko3-apa7LgjbDQq0_LqcLfMkjga5awA"
] Found that the page makes this request when I re-open the tab after it being in the background: which responds with: |
Rather than reading the JavaScript source code for each language to see what formatting it uses, I saw in the code what kind of changes to expand in formatting between languages, and simply scrapped the aria labels and dom times for every one of Google Calendar's 72 languages and for each in the am/pm and 24 clock style. A few edge cases to handle:
Then, I had a great observation - I don't actually need to know if a given date is am or pm! That's because I have access to the DOM and so I can infer the following from the DOM:
With this, the task is simplified to merely "finding numbers in strings", rather than "finding times, that could be written in am/pm in any one of 72 languages and more languages could be added at any point without notifying you causing your extension to break" Still a bit tricky because:
With this, I am able to parse all event label string in all languages even in am/pm with almost 0 reliance on locale specific behavior. In particular, I am making two small assumptions:
All of this is supplemented with test cases that verify parsing is correct in every one of 72 languages in am/pm. The only risk now is that Google will change formatting on their side. But for that, and other things I need #236 anyway, and that would be crucial in making my extension more stable. The other change I did today is that on my machine only, if extension logs any warning/error, it will also be printed on the screen. So if any bug occurs while I am just passively using the extension rather than developing it, I will still notice that it happened (i.e if dom parsing will start failing, I will be notified quickly, at least in case of en, but my parsing is now pretty much language-independent) In any case still, as soon as failure occurs, I fallback to API calls for the duration of the session. All of the above is now implemented and functioning well. With some refactoring, it weights 1000 lines of code total. Plus 5.3k of lines of text fixtures. |
Given that I have 14 calendars, this requires 14 API requests for each week. This is inefficient.
Instead, should consider scrapping the data from the webpage - may be bug prone, but would be good enough for several features:
The text was updated successfully, but these errors were encountered: