Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

that setInterval #4

Closed
titoBouzout opened this issue Feb 14, 2019 · 41 comments
Closed

that setInterval #4

titoBouzout opened this issue Feb 14, 2019 · 41 comments
Assignees
Labels
bug Something isn't working Fixed v1.0.4

Comments

@titoBouzout
Copy link

Hello, thanks!,

Im very curious whats going on with the setInterval here https://github.com/anywhichway/nano-memoize/blob/master/src/nano-memoize.js#L60

Is called non-stop, really?

@titoBouzout
Copy link
Author

The more functions you memo the more is it called......

@anywhichway
Copy link
Owner

anywhichway commented Feb 15, 2019 via email

@titoBouzout
Copy link
Author

titoBouzout commented Feb 15, 2019

Im just trying to be constructive ok,

Is causing issues with my brain, and that's were the real code gets executed.

why this needs to be called every 0.001 seconds? like in 1000 times per second, seriously?

do we really need a setInterval(whatever, 1) ? can this be set in a less intrusive way?

From a technical point of view I can imagine:

  • battery ?
  • browsers limiting setInterval because we do dumb stuff like this ? xD
  • I can't really grasp the idea of the 1sec/1000 times , Im sorry

Can we also set a "global cleanup" check? because as you may know, if we set 10 MEMO function, then we gonna do 10k per second! !!!!!!!

Im kinda sort of mad because I believe ( I dont even remember) came here because of a Medium article "ohhhh we the best at memoize look our benchmarks" and you benchmarking the most dumb things and not even paying attention at how the code runs.......... so yeah..... wth is this ?

@anywhichway
Copy link
Owner

anywhichway commented Feb 15, 2019 via email

@titoBouzout
Copy link
Author

Thank you, I would like to apologize for the way I wrote, Im sorry.

@anywhichway
Copy link
Owner

anywhichway commented Feb 16, 2019 via email

@titoBouzout
Copy link
Author

I can think of :

  1. just setting a setTimeout(cleanup, maxAge) when the memoize function is setup and has a maxAge option set.

I understand this has the downside that some objects will be very short lived, as a value set 100ms ago will be deleted if a timeout to cleanup runs, even if the timeout is set as high as 10 seconds, ex: if the value is saved at the 9.900sec mark it will be just deleted once we reach 10sec

To fix this we can save Date.now() also with the value. Then the cleanup function compares current Date.now() with the Date.now() saved within the value.

I believe the total point of this library is to reuse values and avoid computing expensive stuff. The point of maxAge I believe is so we don't leak or hold data forever, I mean is not time critical.

So I think is worthy to use the Date.now() approach because objects will live longer and avoid recomputing stuff. Even if it makes it slightly slower when maxAge used it will still be faster because an expensive function will be reused.

  1. I was also thinking of a global interval, but this approach has many downsides. Stores for memoized functions should be global which will require special cleanup to not leak functions that are no longer used, Cleanup function will have to iterate over all the values of all the memoized functions set. Nevermind this.

Thanks!

@anywhichway anywhichway self-assigned this Feb 16, 2019
@anywhichway anywhichway added bug Something isn't working Fixed v1.0.2 labels Feb 16, 2019
@anywhichway
Copy link
Owner

@titoBouzout Fixed. Thanks for the assistance!

@titoBouzout
Copy link
Author

ok lol. Just for the record of anyone reading, this is not fixed, he just moved the setInterval to the bottom and set the time of the interval in the variable expireInterval which still values 1. So nothing changed besides a bunch of variables renames. a5d42ae#diff-168726dbe96b3ce427e7fedce31bb0bcR44

@anywhichway
Copy link
Owner

anywhichway commented Feb 16, 2019 via email

@anywhichway
Copy link
Owner

@titoBouzout I have verified the fix is correct. If you wish to verify yourself, insert a trace statement below line 94 of src/nano-memoize.js and re-run the test suite. You will see the trace statement 4 times, but the functions are called far more frequently.

if(expireInterval) { console.log("Creating interval for " + fn); f.interval = setInterval(() => { // process key chngs out of cycle for speed for(const p in c) { if(maxAge) { tmout(c[p]); } delete c[p]; } },expireInterval); }

@titoBouzout
Copy link
Author

titoBouzout commented Feb 16, 2019

If Im reading correctly, the interval was just moved but basically runs in the same place. I think you just skipping the setInterval for when the function does not have an expiration value, which is an optimization yes. But still the setInterval with 1 millisecond is there. If all my memoized functions have a maxAge value, then the situation didn't change.

Ill try to be more clear, you should never ever do setInterval(fn, 1). Find for other work around.

Lets do the math,

for(var a =0;a<100;a++){
for(var b =0;b<100;b++){
nanomemoize(function(){}, {maxAge:50000})
}
}

That situation is entirely possible if you have a lot of complicated tasks and you trying to speed up the complete thing, you do this keeping in mind that nanomemoize should just return the cached function or compute the new value. But nanomemoize does more than that..... it sets a lot of setIntervals that consume a massive amount of CPU for nothing productive.

So if you have 100x100=10,000 setIntervals of 1 millisecond, translating this to a second you have 10,000x1,000 = 10,000,000 functions calls per second, and the setTimeout has a loop with at least 2 calls more and sometimes 3, so this is 30millon. This leaves little margin for doing anything else and totally miss the point of whats trying to achieve.

You can end on a situation like this if your code base is big and have lots of loops. Or lets say you have 100 functions to optimize and then you have a daily cron job, so in 100 days you are in this situation Im describing. Specially because the setInterval is not cleared.

I have a suggestion, run my code in the browser and look how the CPU tops at 100%(is doing nothing, just running intervals). Then run the same code again but with 10x10 and see how the CPU does not move from around 30%(again doing nothing, just running intervals).

10x10=100 setIntervals of 1 millisecond, translating this to a second you have 100x1,000 = 100.000 functions calls per second or 300.000 in the worst case.

Im not trying to discourage you, you also been very tolerant, but this point of the interval is off and for some reason you not seeing it. I should have more patient, no excuses, Im trying to help.

I described a solution that will fix your problem and in the morning wrote a proof of concept. Basically:

  • the setInterval is not set unless the memoized function is called
  • when is called we check if has an expiration value, if does, then check if the setInterval was already been added, if wasn't added then just add it
  • when the setInterval runs, any expired item is deleted, if no items are cached then the setInterval is removed
var memo = (function() {
	function serialize(o) {
		return JSON.stringify(o, _serialize)
	}

	function _serialize(k, v) {
		if (typeof v == 'function') {
			return 'Function ' + v.name
		} else {
			return v
		}
	}

	return function(fn, expires) {
		const f = function(cache, serialize, expires, setInterval, fn, ...args) {
			const k =
				args.length == 1 &&
				(!args[0] ||
					typeof args[0] === 'string' ||
					typeof args[0] === 'number' ||
					typeof args[0] === 'boolean')
					? args[0]
					: serialize(args)

			if (cache[k]) {
				cache[k].n = Date.now()
 			} else {
				cache[k] = { v: fn(...args), n: Date.now() }
 			}
			expires && !this.timeout && setInterval()
			return cache[k].v
		}
		f.cache = {}

		f.expires = expires
		if (expires) {
			f.setInterval = function() {
				this.timeout = setInterval(
					function(expires) {
						const n = Date.now()
						var all_expired = true
						for (var k in this) {
							if (n - this[k].n > expires) {
								delete this[k]
							} else {
								all_expired = false
							}
						}
						if (all_expired) {
							f.clearInterval()
						}
					}.bind(this.cache, this.expires),
					this.expires
				)
			}.bind(f)

			f.clearInterval = function() {
				clearInterval(this.timeout)
				this.timeout = false
			}
		}

		return f.bind(f, f.cache, serialize, expires, f.setInterval, fn)
	}
})()

@anywhichway anywhichway reopened this Feb 16, 2019
@anywhichway
Copy link
Owner

@titoBouzout I am going to re-open this issue until we agree it is resolved.

I had never considered the case of memoizing hundreds of functions. Indeed using an interval with more than a few functions would kill the CPU.

Your most recent explanation of a solution is quite clear and after I read your first few lines about the many function memoization your approach had already popped into my head by the time I read it.

This will not be hard to do. Give me a couple of hours.

@anywhichway
Copy link
Owner

anywhichway commented Feb 16, 2019

@titoBouzout I think the latest build v1.0.2 addresses all your concerns. And, it turns out setInterval is not even needed once one focuses on the calling of the memoized function not the creation of the memoized function. It can all be done with setTimeout. Also, the code is now 25% smaller and will use far less RAM for caching at runtime.

Please close this issue if you think it is resolved.

Thank you for bringing this important flaw in the code to my attention.

@titoBouzout
Copy link
Author

OK thank you, sorry for not presenting the issue in the best way, have a nice day!

@popbee
Copy link

popbee commented Feb 25, 2019

Hey! First thanks for doing all this work, appreciated!
Now I think I see an issue with the latest maxAge/setTimeout implementation. 🤔

If I read the code correctly setTimeout() is being called on ALL cache hits and cache misses.

The use case of using a quick memoizer is to call an expensive function many times (1000+). That means even if my cache contains just 5 elements, I will have 1000+ timeouts racing concurrently to delete those 5 keys.
If I continue accessing the cache, it will have the net effect of recreating the entries all the time anyways since the delete() will get called as often as I accessed it (albeit delayed of maxAge ms). This is not very good IMHO.

So to fix this without changing too much stuff, what about setting up the setTimeout on cache misses only? I know, it won't refresh the TTL on cache hits, but still improves overall and keeps the "happy path" fast(er).

Here I dropped the chng param and created a "d" (delete) function that gets called on all cache misses right before calling the expensive function (using a comma operator). Even if the timeout is 1ms and the function takes 10 seconds to run, the timeout callback will not fire before exiting the JS stack. Now setTimeout appears only once in the code so potentially even saving a few bytes(?).

      d = (s, k) => setTimeout(() => delete s[k], maxAge);
    // for single argument functions, just use a JS object key look-up
    function sngl(f, s, serializer, arg) {
      // strings must be stringified because cache[1] should not equal or overwrite cache["1"] for value = 1 and value = "1"
      const key = !arg || typeof arg === 'number' || typeof arg === 'boolean' ? arg : serializer(arg);
      return s[key] || (d(s, key), s[key] = f.call(this, arg));
    }

I didn't test this, so to be taken with a grain of salt.

@anywhichway anywhichway reopened this Feb 26, 2019
@anywhichway
Copy link
Owner

@popbee You are onto something here. Thanks for the feedback. I like your approach! Will try it out tonight.

@anywhichway
Copy link
Owner

@popbee I implemented your approach. Performance tests show the same speed, but they don't really evaluate the situation you describe. Your analysis of the code made perfect sense. I needed two different cache expiration functions because there are two types of possible cache. Also, it is arguable that the cache hits should not extend the life of a cache value because it could be the cache is actually tied to an underlying sensor that needs to be re-sampled.

@popbee
Copy link

popbee commented Feb 27, 2019

Great stuff! Surprising that's the same speed. I would have expected faster on cache hits.

Anyhow, I am happy that you improved your library.

Note: I had only a single setTimeout function because it was parametrized to delete from the appropriate cache. For the "multiple" case, it was called with v instead of s. But again I didn't test this, so I could be plain wrong.

@euroclydon37
Copy link

Unfortunately, I'm going to have to take a different approach because all timers traverse the js bridge in React Native. This isn't usable in its current form.

@euroclydon37
Copy link

Turns out nearly all memoization libraries use timers in some way. We're not using these libraries for caching responses or anything. We literally just want the last result until args change.

@anywhichway
Copy link
Owner

anywhichway commented Mar 11, 2019 via email

@anywhichway
Copy link
Owner

@euroclydon37 Avoiding timers was trivial. A new version, 1.0.5, has been pushed. Just set the start-up option for maxAge to Infinity.

@popbee
Copy link

popbee commented Mar 12, 2019

@euroclydon37 For a single entry memoize for react purposes, memoize-one is the way to go.

@popbee
Copy link

popbee commented Mar 12, 2019

@anywhichway I do not understand why you added the Infinity option as you already had the feature with undefined and/or 0. You just made your library larger(?)

@anywhichway
Copy link
Owner

@popbee ... undefined and/or 0 would have worked, but if was also possible to set a timeout of Infinity which would have produced a whole bunch of non firing timeouts ... talk about a memory leak!

@popbee
Copy link

popbee commented Mar 20, 2019

I suspect Infinity does not work with setTimeout(). We should try it. To me this project's goal was extreme speed and size but I am not sure this is still true.(?)

@anywhichway
Copy link
Owner

anywhichway commented Mar 20, 2019

@popbee You can successfully use setTimeout with Infinity in V8 and it returns a timeout id. What it will do is questionable. We would have to ask the V8 team. When the raw file is compressed using the best compression tools, a combination of jscompress and txtwizard, without putting webpack or browserify in the middle the size is less than 600 bytes and is still more than 300 bytes less than other libraries that even come close is size, speed and functionality. A trivial memoizer that does not perform can be probably be written in less 250 bytes. Also, I found a few other characters to remove, so adding Infinity did not increase the size.

The goal is still extreme speed and extremely small.

If you can find other opportunities to reduce the size or increase the speed I would like to know what they are so I can incorporate them.

@popbee
Copy link

popbee commented Mar 21, 2019

I just ran a few tests with setTimeout in my Google Chrome version 72.0.3626.121 and only undefined has an unknown behavior. We could try in other browsers.

  • 1000 -> fires after ~1 second (as expected)
  • 0 -> fires immediately (as expected)
  • null -> fires immediately
  • undefined -> gets a number but does not appear to fire (not sure what is happening in that case)
  • Infinity -> fires immediately
  • NaN -> fires immediately
  • -1 -> fires immediately
  • "" -> fires immediately (an empty string which I believe gets coerced into the number 0)
  • Number.MAX_SAFE_INTEGER -> fire immediately (this is 9007199254740991)
  • Number.MAX_VALUE -> fire immediately (this is 1.7976931348623157e+308)
  • 2147483647 -> appear to work as expected (supposedly 24+ days)
  • 2147483648 -> fires immediately

@anywhichway
Copy link
Owner

@popbee Fascinating ... so what do you think the legit value for maxAge should be? I am a little reluctant to change it since is would technically require deprecation activity. Note, I just tested setTimeout(() => console.log("fired"),undefined) and it fires immediately. maxAge = 0 would seem to imply it can never be cached whereas Infinity would seem to imply cache forever. In retrospect, I think I should have chose that to start with.

@popbee
Copy link

popbee commented Mar 21, 2019

On terms of size savings, I have lots of ideas (one you disregarded in this very thread with calling settimeout once).

Gzip / Brotli size is important but parsing time is a great deal too (which depends on uncompressed size). Especially true with react native.

But the trends these days is to build libraries that are tree shaking compatible so you do not pay for what you do not use. In your case, it would be quite a profound refactor for the small savings.

@anywhichway
Copy link
Owner

@popbee I made the change to call setTimeout once on arg changes only:

// set chng timeout only when new value computed, hits will not push out the tte, but it is arguable they should not
	return s[key] || (chng(key),s[key] = f.call(this, arg));

and also line 46

True on the parsing side, but not much to parse just as like there would not be be much of a tree to shake. I could imagine a design that would shake out to single or multiple args, but it seems there would be barely a program it would apply to. And I suppose if a program never called clear, it could be dispensed with. However, at this level of micro optimization, I would be more inclined just to fork the code and take out the parts I did not need.

I have not gone back and looked at the code in the context of the parameterizing comment you made though 23 days ago. I will try and get to that this weekend. It might get 8 to 10 bytes.

At this point the exercise for me has become more one of being able to continually revisit something and see the possibilities for improvement. I doubt anyone would see much of a production impact.

I appreciate the ongoing engagement.

@anywhichway
Copy link
Owner

@popbee, titoBouzout You really should get some credit for the contribution you have made to improving this library. Even a minor change to the README via a pull request would push you onto the contributors list (assuming you want to be there). Otherwise, with your permission, I will just manually reference you in the README.

@popbee
Copy link

popbee commented Mar 22, 2019

Yes, I know you did that change. It's the other parts I was refering to. (I see the word setTimeout twice in the source). But it's OKay.


You can mention me as you wish. No need to be on the contrib list, but thanks for asking. :)

I was wondering if you update your size/speed tables when you make changes? Unless this is automated, I would imagine being a bit tedious to do so...

True that on a big project, your library is really a tiny drop. BUT on a tiny project (something that holds into a small <script> block in an .html page or other special injection of javascript then every byte counts). Also for React Native specifically, AFAIK, the is NO JIT and the engine version is quite old. So it's 100% interpreted javascript -- like we are back in the 1990s! - and we often run on low-powered phones (And I am not talking about network speeds) - this is really a different experience than with Google-Chrome-on-high-powered-desktops. The startup time tends to be slow.

Disclaimer: I can be a little bit over-the-top on performance in general so I tend to look at extreme libs. Since most of your readme file is about speed and size, this is an obvious "concern" of this library, hence my humble involvement.

regards,
Bernard

If I have some time, I'll try to put down some of the ideas I have. No promises though.

@anywhichway
Copy link
Owner

@titoBouzout @popbee Woke up inspired this AM and realized there were a bunch of optimizations I could do. They are documented in the change log at the end of the README. In short, the code is down to 550 bytes Brotli compressed and 1,054 bytes just minimized.

The code is effectively the same speed as fast-memoize for single argument functions and micro-memoize for multiple argument functions and way smaller than both, making it the smallest and fastest. It is also slightly more readable than it was in earlier version.

If has been quite fun squeezing ever more characters out and making it faster and more efficient. Thanks for all your help. I added your names to the README.

@popbee
Copy link

popbee commented Mar 25, 2019

Glad you got that sparkle of inspiration! 👍

Something to keep in mind: local variables and anything not visible "outside" will be minified to 1 character anyway so I tend to keep those names "readable" knowing it will get optimized. Same goes for details like parenthesis etc. The minifier will also take care of anything "weird" like true converted to !0 or 1000 to 1e3.

@anywhichway
Copy link
Owner

anywhichway commented Mar 25, 2019 via email

@anywhichway
Copy link
Owner

@titoBouzout @popbee I was/am really grateful to the contributions you made to nano-memoize last year. With the hope you might provide sharper eyes than mine, I point you to another optimization library library I have updated: https://www.github.com/anywhichway/intersector. Sorry for the spam if you don't care to look. Just ignore.

@titoBouzout
Copy link
Author

Sure no problem :)

@popbee
Copy link

popbee commented Jul 15, 2020 via email

@anywhichway
Copy link
Owner

@popbee ... arg ... cat stepped on keyboard just as I was committing after running tests ... guess it's time for server CI. Did not have this problem prior to COVID ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fixed v1.0.4
Projects
None yet
Development

No branches or pull requests

4 participants