Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoize to reduce web requests to external sites like GitHub. #1101

Merged
merged 2 commits into from
Oct 1, 2015

Conversation

Thorium
Copy link
Member

@Thorium Thorium commented Sep 30, 2015

I had some problems with GitHub:
System.Net.WebException: The remote server returned an error: (403) Forbidden.
"API rate limit exceeded"
So after this it shouldn't make duplicate network calls when resolving versions inside one Paket-command (like paket outdated or paket update).

@forki
Copy link
Member

forki commented Sep 30, 2015

did you see where it repeatably calls the same url?

@Thorium
Copy link
Member Author

Thorium commented Sep 30, 2015

No, I didn't try to track with Fiddler.

If I blindly add counter to this memoize function, it will be hit 239 times with this paket.dependencies:

    source https://www.nuget.org/api/v2

    nuget FAKE
    nuget FSharp.Formatting
    nuget FSharp.Data
    nuget Microsoft.AspNet.SignalR.Core
    nuget Microsoft.AspNet.WebApi.OwinSelfHost
    nuget Microsoft.AspNet.Identity.Owin
    nuget Microsoft.Owin.Cors
    nuget Microsoft.Owin.StaticFiles
    nuget Microsoft.Owin.Diagnostics
    nuget Microsoft.Owin.Security.Facebook
    nuget Microsoft.Owin.Security.Google
    nuget Owin.Security.AesDataProtectorProvider
    nuget MySql.Data
    nuget Rx-Main
    nuget SQLProvider
    nuget SourceLink.Fake

    github zurb/bower-foundation css/foundation.css
    github zurb/bower-foundation css/normalize.css
    github zurb/bower-foundation js/foundation.min.js
    github SignalR/bower-signalr jquery.signalR.js
    github aFarkas/html5shiv dist/html5shiv.min.js
    github reactjs/react-bower react.js
    github FortAwesome/Font-Awesome css/font-awesome.min.css
    github FortAwesome/Font-Awesome fonts/fontawesome-webfont.eot
    github FortAwesome/Font-Awesome fonts/fontawesome-webfont.svg
    github FortAwesome/Font-Awesome fonts/fontawesome-webfont.ttf
    github FortAwesome/Font-Awesome fonts/fontawesome-webfont.woff
    github FortAwesome/Font-Awesome fonts/fontawesome-webfont.woff2
    github danielm/uploader src/dmuploader.min.js
    github danielm/uploader demos/js/demo-preview.js
    github booncon/slippry dist/slippry.css
    github booncon/slippry dist/slippry.min.js
    github booncon/slippry images/arrows.svg
    github booncon/slippry images/sy-loader.gif
    github lodash/lodash lodash.min.js

    http http://ajax.aspnetcdn.com/ajax/jQuery/jquery-2.1.4.min.js jquery.min.js
    http https://code.jquery.com/ui/1.10.4/jquery-ui.min.js
    http https://code.jquery.com/ui/1.10.4/themes/sunny/jquery-ui.min.css

239 requests are not so much but if those (or like 20% of them) are happening at exact the same time to the same web-site, the firewall may not like that. I assume that this .NET concurrent dictionary will not just remove the dublicate calls, but also reduce the parallel request count by slowing down the process a few milliseconds (as it syncs between the threads) which may also help GitHub to not block the user.

@forki
Copy link
Member

forki commented Sep 30, 2015

For me that sounds like you found another perf issue. Will investigate
tomorrow. Can't believe it's correct to make so many calls.
On Sep 30, 2015 8:13 PM, "Tuomas Hietanen" notifications@github.com wrote:

No, I didn't try to track with Fiddler.

If I blindly add counter to this memoize function, it will be hit 239
times with this paket.dependencies:

source https://www.nuget.org/api/v2

nuget FAKE
nuget FSharp.Formatting
nuget FSharp.Data
nuget Microsoft.AspNet.SignalR.Core
nuget Microsoft.AspNet.WebApi.OwinSelfHost
nuget Microsoft.AspNet.Identity.Owin
nuget Microsoft.Owin.Cors
nuget Microsoft.Owin.StaticFiles
nuget Microsoft.Owin.Diagnostics
nuget Microsoft.Owin.Security.Facebook
nuget Microsoft.Owin.Security.Google
nuget Owin.Security.AesDataProtectorProvider
nuget MySql.Data
nuget Rx-Main
nuget SQLProvider
nuget SourceLink.Fake

github zurb/bower-foundation css/foundation.css
github zurb/bower-foundation css/normalize.css
github zurb/bower-foundation js/foundation.min.js
github SignalR/bower-signalr jquery.signalR.js
github aFarkas/html5shiv dist/html5shiv.min.js
github reactjs/react-bower react.js
github FortAwesome/Font-Awesome css/font-awesome.min.css
github FortAwesome/Font-Awesome fonts/fontawesome-webfont.eot
github FortAwesome/Font-Awesome fonts/fontawesome-webfont.svg
github FortAwesome/Font-Awesome fonts/fontawesome-webfont.ttf
github FortAwesome/Font-Awesome fonts/fontawesome-webfont.woff
github FortAwesome/Font-Awesome fonts/fontawesome-webfont.woff2
github danielm/uploader src/dmuploader.min.js
github danielm/uploader demos/js/demo-preview.js
github booncon/slippry dist/slippry.css
github booncon/slippry dist/slippry.min.js
github booncon/slippry images/arrows.svg
github booncon/slippry images/sy-loader.gif
github lodash/lodash lodash.min.js

http http://ajax.aspnetcdn.com/ajax/jQuery/jquery-2.1.4.min.js jquery.min.js
http https://code.jquery.com/ui/1.10.4/jquery-ui.min.js
http https://code.jquery.com/ui/1.10.4/themes/sunny/jquery-ui.min.css

239 requests are not so much but if those (or like 20% of them) are
happening at exact the same time to the same web-site, the firewall may not
like that. I assume that this .NET concurrent dictionary will not just
remove the dublicate calls, but also reduce the parallel request count by
slowing down the process a few milliseconds (as it syncs between the
threads) so that may also help GitHub to not block the user.


Reply to this email directly or view it on GitHub
#1101 (comment).

let cache = ConcurrentDictionary<(string * obj) option,Lazy<obj>>()
let memoizeConcurrent (caller:string) (f: ('a -> 'b)) =
fun (x :'a) ->
(cache.GetOrAdd(Some (caller, x|>box), lazy ((f x)|>box)).Force() |> unbox) : 'b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the lazy ((f x) |> box)).Force()?
why not (f x) |> box)?

Why is the key (string * obj) option and not just (string * obj)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure it's really caching?

@Thorium
Copy link
Member Author

Thorium commented Oct 1, 2015

Lazy is due to the reason that ConcurrentDictionary is locking the threads during the update-action. So lazy will cause threads to be not locked while f is executed.

Just tuple for the key should do well also. I think that the limitation is just that key has to be IEquatable. I just took the simplest and best memoize from fssnip.net to be the base implementation and improved that.

You can test this code with Tomas Petricek's Fibonacci, just replace his non-threadsafe memoize command with memoizeConcurrent: http://www.fssnip.net/8P

@forki forki merged commit 29fa857 into fsprojects:master Oct 1, 2015
@forki
Copy link
Member

forki commented Oct 1, 2015

I was able to understand the root cause.
your deps file specifies multiple files of the same repo. so we look up the hash for every file. that's always the same hash for files of the same repo -> we can cache that.

I changed your strategy to only cache the hashes (and possible remote dependencies files). Please review ad109f3

@Thorium
Copy link
Member Author

Thorium commented Oct 1, 2015

A bit more verbose code.
But as long as I'm not banned by GitHub, I'm happy.
So this is also fine.

@forki
Copy link
Member

forki commented Oct 1, 2015

yeah. and it doesn't do full memoization. I don't really like that because it's harder to explain later. It's "just" cache lookup.

Anyways it saves you 19 github calls per update.

forki added a commit that referenced this pull request Oct 1, 2015
…es in order to reduce stress on API limit - references #1101
@forki
Copy link
Member

forki commented Oct 1, 2015

I was able to reduce the calls by checking if we already downloaded the corresponding dependencies file in the right version. please review f402345

@forki
Copy link
Member

forki commented Oct 1, 2015

(nice side effect it's also much faster for the second update)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants