Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog/website: choose the new hosting #992

Closed
iAdramelk opened this issue Feb 13, 2020 · 60 comments
Closed

blog/website: choose the new hosting #992

iAdramelk opened this issue Feb 13, 2020 · 60 comments
Assignees
Labels
A: website Area: website p1-important Active priorities to deal within next sprints status: research Writing concrete steps for the issue type: discussion Requires active participation to reach a conclusion. website: eng-doc DEPRECATED JS engine for /doc

Comments

@iAdramelk
Copy link
Contributor

iAdramelk commented Feb 13, 2020

UPDATE: See summary of options in #992 (comment)

This is a ticket to discuss and compare possible solutions, based on the criteria listed below.

We plan to convert dvc.org to gatsby and merge it with the blog. Right now, dvc.org is hosted on Heroku, and the blog is hosted on Netlify. We need to choose one hosting that will work for both of those services' needs.

What we want:

API endpoints

dvc.org and blog both use some custom API endpoints to fetch and transform data from github and discourse. We also need to be able to cache the results of these requests because they are not very fast and shouldn't be updated more often than once in 15 minutes.

Right now, this is implemented as Node.js server on Heroku with in-memory cache, but we wouldn't necessarily have a server after we migrate to gatsby. Also, our current implementation has a problem - for the first user that will try to access the page until the cache is created, load time can be quite significant (~10s). Ideally, we should perform cache update ourselves with something like cron and always sent cached results for all of our users.

To solve this, we can use Netlify functions, Cloudflare workers, or something else.

Redirects

We have a large list of redirects that we need to support https://github.com/iterative/dvc.org/blob/master/redirects-list.json

New hosting should allow it.

Build time

Our current build time for the blog on Netlify is long, 7m now, and will only get longer after we merge it with the main site. We can speed it up by preserving .cache dir, yarn module cache between builds. A new hosting/build option should allow us to preserve them.

Demo stands

Right now, both Netlify and Hero allow us to automatically created preview stands from github PRs. We want to have this functionality in the future too.

--

Our new hosting solution may be not one server, but a combination of the few different ones, e. g. CircleCI + Cloudflare Workers + Netlify/Now.sh, but it should be able to all of the things listed above.

@shcheklein
Copy link
Member

@fabiosantoscode please, take a look. Let's discuss it here.

@fabiosantoscode
Copy link
Contributor

I agree with you, in that the holy grail is to have our API be just a bunch of static data updated by a cron. It's similar to the architecture I left behind at a paper, where events would wake up a lambda function which rendered a react page, and then push the result to S3. This was called the reactive manifesto back in the day.

However in our situation, where DX is key, I don't really want to write and maintain a bunch of scripts to support a per-PR environment and work differently in production. Ideally we don't have any code handling this. The reason is, that both websites are so static that it's not really worth it to even have a load balancer, and a single server with a memcached instance could certainly cut the mustard when you hide it behind cloudflare or another caching CDN.

I've talked with @shcheklein a lot about this a lot yesterday and one thing we touched on a lot of times was running workers at the edge. However I've given it more thought, and I don't think we need this capability if our API is made up of mostly static things. Instead, it can just slowly spread across the CDN and be served from the edge. It's OK even if it takes a few minutes to update. If it's not OK, we can invalidate the cache.

Heroku is pretty flexible. It doesn't have functions at the edge (that I know of), but I don't think we need them. It also has a nice DX.


My proposal:

I think we can use just heroku. We put a CDN in front of it for production, I don't think it matters which (cloudflare or cloudfront).

We stay away from edge functions or pushing to the CDN, and for simplicity allow it to fetch from our poor server. If we find that it is not enough, we can start pushing to the edge. Cloudflare allows for this with workers + KV, and cloudfront allows this by having an S3 bucket as an origin and pushing to it.

We will need a memory cache store. This is because we will need to preserve the content and etags we get from the Github API. The reason we store the etags as well is that if we do a request to github with an if-none-match: {etag} header, and the content is the same, we don't use up our rate limit.

There's no reason we can't serve our gatsby files from an express server. Zeit's serve-handler can handle that. And it removes trailing slashes to boot.

  (req, res) => {
    // do redirects if needed and return early

    serve(req, res, serveOptions);
  }

So this way, we can have a local development server (yarn dev runs nodemon server.js) and a production server (yarn start runs node server.js) which are the same server. We can also serve API requests from this server. PR environments will simply run this server without a CDN, and prod would have a CDN.

We will still have the flexibility of setting nice cache headers, which are read both by the browser (and cloudflare as well as your typical run-of-the-mill cache server or CDN). For example, public/static is filled with content-addressible files. We can set expires headers far into the future (or use cache-control: immutable) because they will never change. Static JS/CSS files too, since webpack places an MD4 hash into their filename.

I guess this is not in-scope, but just in case, for dynamic page content like comments, we have a few options:

  • Place comments in a cache key and serve them from the API, and render them only in the clientside
  • Employ an edge strategy known as edge-side includes, which are instructions to copy the string from another URL (or cloudflare KV) into the place they are found. Basically it's like an iframe, but it copies HTML into it in a dumb way, and the browser just reads it as HTML (tutorial: https://blog.cloudflare.com/edge-side-includes-with-cloudflare-workers/).
  • Place comments in an iframe with a known location and just point to it. This means our comments can be dynamically rendered at the server and not expire the cache of the larger document. It also allows for pagination. It's easier to do this if we don't add clientside react to comments, but it's possible to do this using webpack DLLs.

@shcheklein
Copy link
Member

@fabiosantoscode thanks, good summary and good points.

what about the CD part if we stick to Heroku - we need a fast way to build and deliver Gatsby to it.

What do you think about deploying to both - Heroku (APIs + cache) and Netlify (static stuff)?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 14, 2020

Just a few comments from me for now:

On redirects, Netlify seems to have this built in 🙂 https://docs.netlify.com/routing/redirects/

this way, we can have a local development server (yarn dev runs nodemon server.js) and a production server

I think this is pretty important for the docs process (for local sanity checks). Demo stands also help but its faster, easier, and cheaper to run the site locally.

public/static is filled with content-addressible files. We can set expires headers far into the future (or use cache-control: immutable) because they will never change

MD files in public/static/docs/ are the ones that change the most. Very often. (But we don't serve them directly to users atm, the web app "proxies" them via react-markdown rendering).

edge-side includes, which are instructions to copy the string from another URL (or cloudflare KV) into the place they are found

ESI is meant more for personalization I think i.e. if we had user logins. If this is probable in the future road map then I think it's a good option to incorporate now.


In general I just have the impression we may be trying to use too many technologies? Could it all be plain Gatsby (whatever that means) or all microservices or all serverless functions + cdn?

@fabiosantoscode
Copy link
Contributor

@shcheklein

what about the CD part if we stick to Heroku - we need a fast way to build and deliver Gatsby to it.

What do you think about deploying to both - Heroku (APIs + cache) and Netlify (static stuff)?

I say we take advantage of their built-in offering. They also have a cache folder where we can cache the gatsby folder and the public folder, such that things like generating thumbnails is done incrementally. Remember that there will always be a build step. Locally (on an intel i7) a build on a hot cache takes 19 seconds.

The real bottleneck will be saving and restoring cache, so the more images we have, the slower the build will be. We can try to tackle this in the future when it becomes a problem. It will be complicated, yet unavoidable.

Deploying to both Netlify and Heroku would kind of kill the development experience of sending a PR and getting a nice link to a temporary environment: we would have to find a way to connect netlify environment to the heroku API. Additionally, netlify is running a node process anyways :)


@jorgeorpinel

In general I just have the impression we may be trying to use too many technologies? Could it all be plan Gatsby (whatever that means) or all microservices or all serverless functions + cdn?

You make a good point here. I had another look and found this middleware from gatsby. We can use it to embed gatsby's logic into our express server. This is pretty close to plain gatsby. It could get closer if there was a way to use middleware in production like it's possible in dev. They might not allow middleware in production because they mean for gatsby sites to be statically hosted.

I would like to argue though, that having a single node process with express do everything, and put a CDN on top of it for DDOS protection, speed and edge caching, shouldn't be too many technologies. Even if it's under the hood, this is mostly what we have now (gatsby serve is express, and netlify puts a cache in front of it, it's not a static server). We do need to run a gatsby build before we serve, but when the build is finished that's it.


After having written all of this, I think we have a real option to kick the can down the road by continuing to use netlify. Depends on the price of their cdn though.

We would still write an express server to serve gatsby and our APIs, but it wouldn't have any memory cache. It wouldn't be ideal, but I have no reason to believe it wouldn't be webscale.

@shcheklein
Copy link
Member

MD files in public/static/docs/ are the ones that change the most. Very often. (But we don't serve them directly to users atm, the web app "proxies" them via react-markdown rendering).

@jorgeorpinel it's not the case with Gatsby - we serve pre-built static HTMLs that include processed MD in them

On redirects, Netlify seems to have this built in 🙂 https://docs.netlify.com/routing/redirects/

@jorgeorpinel I don't think that is flexible enough. I would prefer to keep the redirects logic that we have - including tests, etc.

We would still write an express server to serve gatsby and our APIs, but it wouldn't have any memory cache.

@fabiosantoscode for some reason I had an idea that it's not possible with Netlify - running your own server with in memory cache that serves APIs externally.


Heroku alone sounds like a good option (+some cache like CDN like Cloudflare). Obviously with some CD (if Heroku can do it - fine, if not - Gatsby as a business has something?). And we have all the flexibility we need, up to having databases if needed.

This solution should be very simple to deploy, runs locally, has previews, edge caching is done by Cloudlfare ... any real downsides to this? cc @iAdramelk @jorgeorpinel @fabiosantoscode

@iAdramelk
Copy link
Contributor Author

Sorry for the long answer guys. I think we are overengineering it a little. In the perfect world I prefer not to have our own Express server at all:

  1. Gatsby generates bunch of static files that we can just expose to the world with either Netlify or heroku-buildpack-static.
  2. We don't need to care about correct headers for the files in the public/static folder. We need it now because we include them by filename now, but it is not a good way to do it anyway. With gatsby I plan to switch to including them with webpack loader for images. This way we can store images in corresponding components, have automatic cache busting with unique filenames and will be able to automatically optimize images at the build time. Same with the images linked inside the md files. This way we can just set inifinite cache time for such images and forget about them.
  3. Both Netlify and heroku-buildpack-static allows to set redirects using json-file config in the root directory without the need for the custom servers.

The only problem that we have with static approach is hosting and updating API functions and caching their results. And, for example, Netlify allows us to solve this as well using Netlify Functions. Here is an example of using Netlify Functions to fetch remote API. It's not that different from our current API implementation and can be deployed and updated as a part of our normal deploy process to Netlify.

I'm not sure that going with Netlify is the best option because I'm not sure that we can optimize our build time on Netlify and I'm not sure from the get go how to cache results of such serverless functions between calls, so I'd like to check other options too, like Heroky, now.sh, etc.

But ideally what I would like to have in the result:

  1. Static hosting that just exposes our public folder without the need to manually set up the server.
  2. Redirets that are set up as a json files in the root without the need for a server.
  3. API as a bunch of serverless functions written in the node.js and deployed as a part of our standard deploy process.

For the local development we can either mock these functions or use already deployed one's. I doubt that we will be updating them this often. We can even place them in the other repository and deploy them separately.

@shcheklein
Copy link
Member

@iAdramelk

Static hosting that just exposes our public folder without the need to manually set up the server.

What do you mean by manually, what are the benefits you see in not using an express or something else before Gatsby?

We don't need to care about correct headers for the files in the public/static folder.

most likely Cloudflare/Netlify do proper headers already?

Redirects that are set up as a json files in the root without the need for a server.

I doubt the Netlify redirect's config is flexible enough to handle what we need. Probably, heroku's one is the same, but I haven't checked (but of we go all static Heroku does not make much sense anyway)

API as a bunch of serverless functions written in the node.js and deployed as a part of our standard deploy process.

I like, but it feels like it might complicate the workflow, deployment, local experience .. would love to try before we jump into this.

For the local development we can either mock these functions or use already deployed one's. I doubt that we will be updating them this often. We can even place them in the other repository and deploy them separately.

Sounds like a complicated setup to me. Would love to see something like yarn develop locally that can handle everything.

@shcheklein shcheklein changed the title Choose the new hosting blog/website: choose the new hosting Feb 16, 2020
@shcheklein shcheklein added type: discussion Requires active participation to reach a conclusion. website: eng-doc DEPRECATED JS engine for /doc A: website Area: website p1-important Active priorities to deal within next sprints status: research Writing concrete steps for the issue labels Feb 16, 2020
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 17, 2020

I think we are overengineering it a little

Agree, that was my point too 🙂 In general I also incline for as static as possible, and built in redirects so we don't need our custom module for that. Built-in redirects probably have much better load capacity, for example. p.s. from what I read, Netlify _redirects is flexible enough, Ivan.

files in the public/static... With gatsby I plan to switch to including them with webpack loader for images

Wha about markdown files? I'm still confused about this part but probably when the Gatsby migration is ready and I get to see it I'll be clearer, so no need to ask this Q. Let's just keep mind that we change MD files very often.

The only problem that we have with static approach is hosting and updating API functions and caching their results
3. API as a bunch of serverless functions written in the node.js and deployed as a part of our standard deploy process.

We could just have the API as a separate node app. I checked pages/api/** and it seems totally stand-alone anyway. (This way also in the future it's possible to pass the API through an authentication/ rate limiting gateway e.g. KongHQ if ever needed.)

Serverless approach also works but maybe its easier to maintain as a regular app to have the same deploy process, and also to reduce the system complexity? Agree with Ivan here.

@jorgeorpinel
Copy link
Contributor

p.s. this issue is kind of long, would be great to summarize options. I'd do it but I'm not sure I understand every comment completely.

@iAdramelk
Copy link
Contributor Author

@shcheklein

What do you mean by manually, what are the benefits you see in not using an express or something else before Gatsby?

My main concern is local development. Using server before static folder in prod is not a problem at all. But if we use Express locally we will need to run it alongside with the gatsby dev server on separate ports and we will need to proxy calls from one port to another. There is also a problem that the port is hard-coded in the resulting html so we will need to somehow update ports in the code that gatsby server generates while gatsby server is still runs on the original port. I didn't research this topic in depth and it is possible that there is an existing plugin for that or that this is easy to configure. But if not, we will need to write a maintain a lot of our own code for that instead of just starting default gatsby dev server with standard command.

most likely Cloudflare/Netlify do proper headers already?

That's my point, we don't need a server for that. We just need to create unique names and static hosting will do the rest. But with our own server on Heroku we will need to do it ourserves if I understand correctly.

I doubt the Netlify redirect's config is flexible enough to handle what we need. Probably, heroku's one is the same, but I haven't checked (but of we go all static Heroku does not make much sense anyway)

Do you have examples of the redirects that you think we would not be able to implement? I had a fast look at the docs and I think that everything that we have in the redirects-list.json can be done with both Netlify and heroku-static.

I like, but it feels like it might complicate the workflow, deployment, local experience .. would love to try before we jump into this.

Well, if we update them often then yes, but I think that we probably just push them once and them forget about them for a year or so. This way we can just use global urls for the local development.

Sounds like a complicated setup to me. Would love to see something like yarn develop locally that can handle everything.

It's a little more complicated that I would like to, yes. But I think that this is a choice between this or the problems with local server above. Not sure what is better to implement between them.

@jorgeorpinel

Wha about markdown files? I'm still confused about this part but probably when the Gatsby migration is ready and I get to see it I'll be clearer, so no need to ask this Q. Let's just keep mind that we change MD files very often.

It's not a problem. We can automatically optimize them and update their paths with gatsby, we already are doing it in the blog.

We could just have the API as a separate node app. I checked pages/api/** and it seems totally stand-alone anyway. (This way also in the future it's possible to pass the API through an authentication/ rate limiting gateway e.g. KongHQ if ever needed.)

My main concern here is running it alongside with the gatsby dev server (see my answer to Ivan above).

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 17, 2020

@iAdramelk

I'm not sure that we can optimize our build time on Netlify and I'm not sure from the get go how to cache results of such serverless functions between calls

Looks like we can't optimize the build time. I've given it a try here: https://github.com/iterative/blog/pull/115. We bust through the netlify cache limits, even without caching image processing (which is our biggest bottleneck I think). As per caching the results of the serverless functions, if we set a cache-control and expires header, the CDN/cache will take care of it, as well as the browser.

I'd like to check other options too, like Heroky, now.sh, etc

now.sh is a real contender, I feel. You can use the now dev command for local development, which gives you serverless functions under /api. I think it takes care of integrating the serverless functions with the underlying server (gatsby develop in our case) through some reverse proxy of their own. In production, gatsby can be served as a static directory and nothing else needs changing. It also allows for redirects expressed through JSON.

The serverless functions in now.sh, as expected, are cached on their end if you use cache-control header (scroll down to "serverless functions".


I think we shouldn't get too hung up about server-side caching in any of these solutions. Basically all of them respect the cache-control header. The header is not only meant for browsers, but for any kind of proxy as well. That's what the private|public segment is for, it's for the proxies to know whether to serve the same thing to other people.

@fabiosantoscode
Copy link
Contributor

I've looked into heroku, and they limit your build cache to 500mb.

I did a small test with now.sh for the blog (changed 2 lines in the package.json), here it is:

https://blog-fihp2x2rk.now.sh/

Here's a function with a 60 second cache using a cache-control header:

https://blog-fihp2x2rk.now.sh/api/example-function

After the first build, this took in total 3 minutes to deploy, including the build time.

Locally, now dev serves me the blog (gatsby develop) with hot reload, plus my function under /api/example-function. In production, it's a static website with serverless functions, with a CDN tacked onto it. If we do need to run a node process instead of a static website in the future, that's possible too. However, I don't think we will need that since we have serverless functions.

Integration with github is also possible, providing us with per-PR environments.

I've read through your comments @jorgeorpinel and @iAdramelk, and this seems to tick all the boxes for you.

Unless anyone has any issues with this solution, when I run out of things to do I'll be sending a PR.

@fabiosantoscode
Copy link
Contributor

I was really disappointed to find out that zeit now no longer supports custom servers. They do support adding routes in JSON, which is working fine for /doc/* (including status codes).

@iAdramelk
Copy link
Contributor Author

@fabiosantoscode it looks like exactly that we need!

@shcheklein
Copy link
Member

@fabiosantoscode a few more questions - how much will it cost us to build with them if we support previews? would love to explore a more conservative option with Heroku as well - in terms of price, build time (if cache is enabled), and local experience (you mentioned some middleware?)

I still concerned with these fancy options like zeit and Netflify to be honest. I really don't like their aggressive pricing models, I don't like waiting minutes to deploy a preview (to some extent Gatsby's problem not hosting)?

Bottom line - can we do better?

@shcheklein
Copy link
Member

To be precise - we pay for Heroku up to $50 / month since we do a lot of preview deployments. It's up to 30 hours with pro plan. Will it be enough? Most likely, yes.

The thing I still don't like is waiting minutes to deploy. If blog takes 3 minutes, it'll be > 10 minutes to deploy blog+dvc.org. Is there a way around it?

@fabiosantoscode
Copy link
Contributor

If blog takes 3 minutes, it'll be > 10 minutes to deploy blog+dvc.org. Is there a way around it?

dvc.org took 30s to deploy for me (15s of which is building JS). Since we will be conflating both of them together, they will share a lot of code, a framework and webpack cache, therefore I would be very surprised if building dvc.org added more than 20 seconds to the total build time.

Our major bottleneck is the generated images in the blog. public/static is 409mb large. It takes around 2 minutes to download them from the cache. Doesn't really matter which platform we're on.

If we can store these images on S3 using DVC, and if with DVC we can somehow generate thumbnails only for images which changed, without downloading everything (can it?), we might be able to host them from S3 directly (using the DVC remote cache URLs). This might be accomplished through a source plugin which stores the checksums of the images to see which ones changed, or with dvc run.

If we can do this, then heroku can be very speedy (we just need to cache node_modules, .gatsby and public, which becomes small enough to be cached). However I think heroku is a bit overkill for us, and it doesn't include a CDN to cache things at the edge like netlify and zeit now.

how much will it cost us to build with them if we support previews

I had a look at their pricing page, and overall I think we can go with the $20 plan. It gives us unlimited deploys, and 10 hours of build time every month. This gives us around 120 builds every month (if they took on average 5 minutes). If we go over the 10 build hours, we pay $10 more, instead of being forced into the $200 plan. We're limited to 3 team members, which I suppose is users with admin access, not users deploying. Couldn't find any specifics on this, so I'll go and ask directly.

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 18, 2020

@iAdramelk keeping the conversation on this ticket

Where do you store example-function btw? I don't see it in changed files

It's not in this PR, since the function is in the blog. I didn't share it either. Here it is:

// api/example-function.js

export default (req, res) => {                                       
  res                                                                
    .setHeader('cache-control', 'max-age=60, public')                
    .json({                                                          
      timeAtWhichThisWasRun: new Date()                              
        .toISOString()                                               
        .split('T')[1]                                               
        .split('.')[0],                                              
      version2: true                                                 
    })                                                               
};                                                                   

(When we have gatsby, our functions will be in api, not pages/api. zeit now piggy-backs on the existing concept of nextjs functions so they use the same folder)

edit: the version2 property was to tell deploys apart and make sure they were deploying the new function correctly.

@fabiosantoscode
Copy link
Contributor

if we can get local Heroku dev to a single command and utilize build cache (and don't hit some limits soon if it's already 400-500Mb) then I would vote for Heroku.

Heroku dev with a single command, see above. Cache limit is going to be hit inevitably.

any idea what takes most of the cache (400-500) - images or JS modules?

In the blog it's mostly modules, but images are a huge chunk. If we add dvc.org to it the modules won't grow by much, I think. And we don't have too many images.

(venv) fabio@fabio-thinkpad ♥  du -sh node_modules/
565M    node_modules/
(venv) fabio@fabio-thinkpad ♥  du -sh public/
207M    public/
(venv) fabio@fabio-thinkpad ♥  du -sh public/static/
165M    public/static/

Largest modules are typescript, babel, core-js and rxjs.


I looked through gatsby-plugin-sharp a lot. I'm really looking for a way for us to store the images elsewhere during the build (like S3), to get around the cache issue while not always regenerating images. During production we could proxy image requests from the app to S3 (or if we have a CDN we can do the proxying from there).

I really think this is the way to go. gatsby-plugin-sharp clearly has a way to avoid re-compressing images from the filesystem if they haven't changed. If the filesystem is just another source, why not S3?

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 19, 2020

@shcheklein I've made the edits. Except for this one:

Heroku ~$50 only, no CDN or any other hidden costs

The extra costs I added are accounting to our own expansion of the build cache, and the fact that heroku doesn't feature HTTP caching at all, much less a CDN.

Heroku doesn’t provide HTTP caching by default. In order to take advantage of HTTP caching, you’ll need to configure your application to set the appropriate HTTP cache control headers and use a content delivery network (CDN) or other external caching service.

We will probably be using S3 even more, since the maximum deployable in heroku is 500mb and it probably includes static files.

@shcheklein
Copy link
Member

@fabiosantoscode

The extra costs I added are accounting to our own expansion of the build cache, and the fact that heroku doesn't feature HTTP caching at all, much less a CDN.

CloudFlare handles this for free, right? And we already run everything through it.

And we can utilize in-memory + CDN cache easily with Heroku for API cache. Again, for free and no changes are required.

We will probably be using S3 even more, since the maximum deployable in heroku is 500mb.

so, it's not related to images. How do people deploy JS apps to it anyways, then?

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 19, 2020

CloudFlare handles this for free, right? And we already run everything through it.

I forgot about cloudflare :) Cloudfront / S3 also have a free tier, it just depends on how much of it we can use.

But hey, if we're using cloudflare, we could place the API in cloudflare workers as you mentioned before, and make use of cloudflare KV as an efficient memory store for storing github etags and responses. And deploy the rest of the site using gatsby's own thing, which will give us the fast builds we want.

so, it's not related to images. How do people deploy JS apps to it anyways, then?

It's pretty related to images. Your typical JS app is smaller than 500mb especially when installing only production dependencies.

Here's the source: https://devcenter.heroku.com/articles/slug-compiler#slug-size

@gurobokum
Copy link
Contributor

I also can recommend to take look at aws with Amplify based on S3+Cloudfront+other aws services

@shcheklein
Copy link
Member

we could place the API in cloudflare workers as you mentioned before, and make use of cloudflare KV as an efficient memory store for storing github etags and responses. And deploy the rest of the site using gatsby's own thing, which will give us the fast builds we want.

it's not cleat if it's easy to run them locally in this case. I would avoid this fancy stuff because of this. Unless there is a simple solution.

@fabiosantoscode
Copy link
Contributor

@shcheklein there's a few solutions for local dev. One of them is this. this is a tad unmaintained but seems promising.

PR environments is where it breaks down. We'd have to use production APIs there.

@fabiosantoscode
Copy link
Contributor

@JIoJIaJIu I've used amplify before, it's great. I saw, however, that their automated PR environment thing only works on private github repositories, to avoid unsolicited PRs increasing costs.

We can always roll our own PR environments.

They do seem to have local dev facilities though, and the flexibility is through the roof since we're free to use any AWS service with it without going through the open internet.

@iAdramelk
Copy link
Contributor Author

Sorry for the long silence guys. Make some testing by myself. Some results:

Gatsby image build optimisation

Current image count:

  • Total image number – 101
  • Total number of generated images – 1011
  • Number of generated webp images – 504

To trigger rebuild I edited title field in the gatsby-config.js file everything else was the same. Clean build and rebuild results were following:

Local

Total GQL queries Images
No cache 2m 43s 26s 2m 19s
Cache 22s 11s 0s

Local, after updating to the latest gatsby-image

Total GQL queries Images
No cache 3m 55s 35s 3m 46s
Cache 24s 13s 0s

Definitely still broken, so immediately reverted.

Local, with webp disabled

Total GQL queries Images
No cache 1m 42s 27s 1m 22s
Cache 22s 11s 0s

Gatsby cloud

Total GQL queries Images
No cache 12m 45s 1m 29s 10m 15s
Cache 57s 36s 0s

now.sh

Total GQL queries Images
No cache 3m 42s 38s 3m 14s
Cache 30s 21s 0s

Netlify

Total GQL queries Images
No cache 8m 8s 54s 5m 54s
Cache 7m 4s 46s 5m 39s

Some closing thoughts

Longest part of the build process is by far thumbnail generation on the first build. We are now generating more than 1K images.

Disabling webp reduces build time by approximately 1/3, but make end user's experience worse.

Rebuild time with existing cache (node_modules, .cache and public folders) are quite fast.

If hosting provider caches them like now.sh or gatsby cloud, then rebuild time can be less than 1m.

Netlify is by far slowest option between the ones that I tried and definitely didn't cache public folder. And even if we enable its caching overall build times still would be the longest. So I'd say we can safely remove it from the candidates list.

@iAdramelk
Copy link
Contributor Author

P. S. One more thing. After I updated gatsby's dependencies in package.json, it invalidated cache and made a full build again. So for commits like that, I think long rebuild time is inevitable.

@fabiosantoscode
Copy link
Contributor

Thanks @iAdramelk!

@fabiosantoscode
Copy link
Contributor

I think this shows how much a good cache can influence build times.

Our best options are clearly now.sh and gatsby cloud. However, gatsby cloud doesn't come with API endpoints and is a bit pricey.

Using cloudflare for local development is not very optimal, so I propose we get rid of it for local development, and in production use a worker which takes every request to /api/*, and uses etags and if-none-changed requests so our functions do not use any github API limits (remember, if-none-changed, then github-none-rate-limit). This should be rather simple to implement, and maybe some other CDN already does this for us out of the box. I'll look into this.

In PR environments, our APIs are the production APIs. Locally, our browsers are going to resend the etags headers to the API, not increasing our limits. If this doesn't work we can always wrap our functions in a caching wrapper.

@fabiosantoscode
Copy link
Contributor

Fastly is capable of doing if-none-match requests to our servers if we respond with etag headers. Therefore, if we are using this, we can do an if-none-match request to GitHub using the same etag, and respond with a 304 (or 200 if anything changes on GitHub's side). Then fastly will remember the old response and serve it.

I'm going to check whether fastly uses a global cache or if a request from China can't use a cached response from Europe.

@iAdramelk
Copy link
Contributor Author

So my summary would be this:

  1. By far longest part of our build process is creating thumbnails.
  2. We need to create these thumbnails anyway to have good use experience.
  3. 3 min to create 1K+ thumbnails looks reasonable time for me.

So the best way to optimize build time is not to create on each build. There are 2 possible solutions to that:

  1. Use cache (like now.sh above) and hope that we will have enough place to store it.
  2. Don't create images at build time at all and instead use image resizing server like https://imgproxy.net/ or something else like that.

With second approach we can do following:

  1. Replace gatsby-image with our own image component that will generate all the needed src paths for srcset in <picture>.
  2. Run imgproxy in separate container on Heroku.
  3. Then the first user will enter the site, image requests will go to the imgproxy, it will resize images and cache them on CDN. On the second request and after we will use CDN versions instead regenerating it.

This way we can have fast and consistent build time for local development too. But our infrastructure will be more complex.

@fabiosantoscode
Copy link
Contributor

With your second approach there we can also have the API, and we get the flexibility of heroku. I like that!

There's more options to resize images, including using cloudflare and fastly. This can make our infra a tad simpler.

For caching the API, there should be a bunch of ways to do it, like a varnish container or just in-process memory. Our options are pretty limitless here.

Also: fastly doesn't replicate cached results globally. So no go there.

@shcheklein
Copy link
Member

Ok, so it's Now vs Heroku custom server.

With Now we need to clarify the following:

  • redirects - we will need to support Cloudflare it looks like
  • / redirecting - Alex told me that Now solves this @fabiosantoscode could you confirm?
  • API response caching - not clear for different regions

With Heroku:

  • we potentially hit the cache size limit, but on the other hand it feels like it might be a good idea to invest time into proper images infra and it should not be something huge

any other thoughts? I'm missing anything else in this summary so far?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Feb 21, 2020

the best way to optimize build time is not to create on each build...
Don't create images at build time at all and instead use image resizing server like https://imgproxy.net/ or something else like that.

I like this approach of extracting the problem which is very specific and well defined to a specialized service.

Heroku custom server...
we potentially hit the cache size limit, but on the other hand it feels like it might be a good idea to invest time into proper images infra

Yep, since there is still some unfamiliarity with other platforms and possibly not super strong reasons to move out of Heroku I would incline to stay there (also seems like it has more predictable pricing).

But I haven't been as involved as the others in this research so I don't think my vote should be weighted equal. This way also there's only 3 (real) votes in this issue, so no possibility for ties 😬

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 24, 2020

@shcheklein

You're right about now.sh redirects and cross-region API response caching.

As per the trailing slash, I can confirm. You can curl this yourself and see the trailing slash disappear.

I think heroku and an image resizer might be our best option here. In terms of flexibility, we're able to deploy pretty much anything we want and cache things properly.

If the heroku slug size limit hits us, we can choose to not deploy the images there and use S3 or simply raw.githubusercontent.com/branch/path/to/image as a target for our image resize service.

It doesn't use the right content-type but otherwise it works and will work for branch preview as well.

@shcheklein
Copy link
Member

@iAdramelk Heroku it is?

@iAdramelk
Copy link
Contributor Author

@shcheklein @fabiosantoscode looks like Heroku. I don't really like idea of our own server running, but looks like we don't have a choice for API and redirects.

@shcheklein
Copy link
Member

@fabiosantoscode let's proceed with Heroku, we can start by moving blog as an example.

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 26, 2020

I got my fork deployed on heroku: https://dvc-blog-production.herokuapp.com/

Trailing slashes get removed, no redirects yet (but will be simple to do, will get to it later).

Build time

There's no timer on heroku builds but using a stopwatch I got 3m05s, 20s of which is the time Heroku takes to pick up the build (might be larger today because they seem to have an ongoing incident), and there's also 60s of "pruning devDependencies", which is basically ensuring no devDependencies end up in production. I find it weird that it takes so much time to do.

The public folder is cached, as well as the gatsby folder. And for some reason the slug size is small even though there are so many images.

Tests & Types & Lints

We have 2 choices here:

  1. Run in series with circle, meaning 2 separate services
  2. Run in series within heroku, with possible time gains, and less cruft in the PR checks box on each PR page on github.

I'm going for 2, unless anyone has any objections. I won't start right now (it's late here) but tomorrow if nobody's said anything I'll configure heroku to run the tests.

@shcheklein
Copy link
Member

and there's also 60s of "pruning devDependencies"

can we get rid of this>

Gatsby built time is ~7s which is great.

2 sounds good.

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 27, 2020

can we get rid of this

The reason for this is explained in this issue. TL;DR yarn rebuilds the prod dependencies from source when removing devDependencies.

I tried switching to npm after seeing the issue, about pruning dev dependencies in heroku when there's statically built dependencies. However it only shaved off around 10 seconds.

So I tried upgrading to yarn 2, which removes the install step and node_modules. I was super excited about it until I found out gatsby doesn't support this yet.

All we can do is try to trim some of the devDependencies (kill typescript, anyone? The compilation step and plugins can be replaced with JSDoc comments, plus a typechecker like tern.js or typescript itself)

So I'm going to move on.

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 27, 2020

Tests

Apparently they were already working for PRs and master, so I moved on.

Preview environments

They work, but the first build of each PR takes a very long time. This is because it refuses to use cache. However when running tests, the cache is used properly.

This means that opening a new PR means you need to wait for more than 10 minutes to see your preview. The cache issue also happens in the dvc.org repo, however since it doesn't compress a ton of images that's not a problem.

I think the obvious solution here is to compress images on demand, as mentioned above, which has the nice side effect of speeding up the build process further, since there will be no need to cache those images. But I feel like it's not part of this initial PR. So I will clean things up and issue a PR.

@shcheklein
Copy link
Member

However when running tests, the cache is used properly.

what do you by running tests precisely?
Am I correct that after I created a PR, next changes (commits) will be quick?

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Feb 28, 2020

The tests are in a separate pipeline from deploys, where the cache is respected.

Yes, more commits on top of the same PR are built quickly (3min).

@fabiosantoscode
Copy link
Contributor

@shcheklein
Copy link
Member

Closing this as we moved to Heroku.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: website Area: website p1-important Active priorities to deal within next sprints status: research Writing concrete steps for the issue type: discussion Requires active participation to reach a conclusion. website: eng-doc DEPRECATED JS engine for /doc
Projects
None yet
Development

No branches or pull requests

5 participants