Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posts Scheduler #2659

Closed
wants to merge 0 commits into from
Closed

Posts Scheduler #2659

wants to merge 0 commits into from

Conversation

MUmarShahbaz
Copy link
Contributor

Posts Scheduler

GitHub Workflow

Purpose

The workflow uploads posts that are scheduled for a later date automatically. The ability to schedule posts is a very important tool and I feel like it should also be included in this repository.

Error ?

I don't know for certain if this is an error or if it is intentional but al-folio doesn't publish posts which are dated on the future.'

How I found this out:
I was making a post but no matter what I did it just wouldn't show up at my website. After starting from scratching and changing everything one-at-a-time to see where the code breaks I found out that the date was causing the post to not be uploaded, the filename was 2024-08-25-lecture1.md and the day I was uploading this post was also 25-August-2024. When I tried to upload it after half a day it was allowed to be uploaded.

After comparing times I found out that at the time of the failed uploading the date was 25-Aug in my region and 24-Aug at UTC. Hence I concluded that only posts which are dated for the present or the past is uploaded on the website

Another thing to notice:

The website is only uploaded when the Deploy workflow is run. This means that the user must be online and manually deploy the website.

Quick Fix

You can solve this issue by having the deploy function run every day. The issue with that is the commit history for gh-pages branch. Deploying every day will make the commit history very annoyingly big.

Better Solution

In the main branch, add a new folder _scheduled and make posts for a later date in there. Then using the workflow I made, the posts should automatically deployed on the date in their names. If there are no files scheduled for today, then there would be no changes made hence keeping the commit history minimal.

@george-gca
Copy link
Collaborator

The ability to schedule posts is a very important tool and I feel like it should also be included in this repository.

I agree this would be useful.

I don't know for certain if this is an error or if it is intentional but al-folio doesn't publish posts which are dated on the future.'

Actually this is a static sites generator (jekyll in this case) thing. Someone even opened an issue in jekyll a long time ago, but summarizing: static sites generators build a website and then it is done. It can't change a site after it has been built. Since the blog is in a future date, it just ignores it during build.

Now, how would work your solution if a post has a timestamp for later in the day? For example, the post name has 2024-08-27, but inside the post has something like date: 2024-08-27 15:09:00? I believe when your cron job runs it will move the posts correctly to the _posts/ dir, but during build these posts will be ignored, and will only be included in the next build.

.github/workflows/posts-scheduler.yml Outdated Show resolved Hide resolved
_scheduled/README.md Outdated Show resolved Hide resolved
_scheduled/README.md Outdated Show resolved Hide resolved
@MUmarShahbaz
Copy link
Contributor Author

MUmarShahbaz commented Aug 26, 2024

The ability to schedule posts is a very important tool and I feel like it should also be included in this repository.

I agree this would be useful.

I don't know for certain if this is an error or if it is intentional but al-folio doesn't publish posts which are dated on the future.'

Actually this is a static sites generator (jekyll in this case) thing. Someone even opened an issue in jekyll a long time ago, but summarizing: static sites generators build a website and then it is done. It can't change a site after it has been built. Since the blog is in a future date, it just ignores it during build.

Now, how would work your solution if a post has a timestamp for later in the day? For example, the post name has 2024-08-27, but inside the post has something like date: 2024-08-27 15:09:00? I believe when your cron job runs it will move the posts correctly to the _posts/ dir, but during build these posts will be ignored, and will only be included in the next build.

That is certainly an interesting case, but we can't run deploy once every hour, it will clutter up the actions tab too much.

This can be solved with a bit of sacrifice. Instead of running the scheduler at 00:00 we can run it at 23:59. This way all of the posts will be deployed. On that specific day but only right before midnight which I assume is not ideal.

Another option is to use delays but that will cause the workflow to run for potentially upto 7-8 hours and that will cause problems for GitHub itself

A 3rd option which I think may have a good chance of working is by separating this workflow into 2

  • A scanner which runs every 00:00 utc and modifies the publisher. The scanner will also run once everytime after the publisher finishes
  • A publisher which will move the files and push

@george-gca
Copy link
Collaborator

Maybe we could also fetch the timestamp from the post itself and use it somehow? For example, by doing:

sed -ne '/---/,/---/{/---/N;p}' 2015-03-15-formatting-and-links.md | grep "date: " | cut -c7-

We can obtain the timestamp 2015-03-15 16:40:16 from inside the post.

@MUmarShahbaz
Copy link
Contributor Author

Maybe we could also fetch the timestamp from the post itself and use it somehow? For example, by doing:

sed -ne '/---/,/---/{/---/N;p}' 2015-03-15-formatting-and-links.md | grep "date: " | cut -c7-

We can obtain the timestamp 2015-03-15 16:40:16 from inside the post.

Yeah but that only helps with comparison. The difficult part is to trigger the event as minimally as possible.

I've got an idea but I am a bit busy for a few days. I'll send a diagram of a possible solution this Thursday.

@george-gca
Copy link
Collaborator

I believe the easiest solution would be to trigger the action twice a day (00:00 and 23:59), and check each time which posts based on the timestamp should be moved to _posts/.

@MUmarShahbaz
Copy link
Contributor Author

I believe the easiest solution would be to trigger the action twice a day (00:00 and 23:59), and check each time which posts based on the timestamp should be moved to _posts/.

I've got a more complex but feasible idea for that. here is a flowchart of it

Post-Scheduler-Proposal-Flowchart

@CheariX
Copy link
Contributor

CheariX commented Aug 27, 2024

Maybe we could also fetch the timestamp from the post itself and use it somehow? For example, by doing:

sed -ne '/---/,/---/{/---/N;p}' 2015-03-15-formatting-and-links.md | grep "date: " | cut -c7-

We can obtain the timestamp 2015-03-15 16:40:16 from inside the post.

Just my few cents: instead of grep + cut, one could also use awk '/^date:/ {print $2}' to print the date only (If I understood correctly, the time is not necessary. Otherwiese print $2, $3). This is maybe less error-prone in case there are simple formatting issues, like multiple whitespaces after the colon.

@george-gca
Copy link
Collaborator

@CheariX the time is necessary.

@KingHowler then the action itself would change its own cron schedule? It is feasible, but I think it is kind of overcharged. For example, if someone adds like 5 scheduled posts for the same day, but for different timestamps, the action would run 5 times, and it seems a bit excessive. What do you think?

@MUmarShahbaz
Copy link
Contributor Author

MUmarShahbaz commented Aug 27, 2024

@CheariX the time is necessary.

@KingHowler then the action itself would change its own cron schedule? It is feasible, but I think it is kind of overcharged. For example, if someone adds like 5 scheduled posts for the same day, but for different timestamps, the action would run 5 times, and it seems a bit excessive. What do you think?

Well yeah, it does seem excessive but that's only to ensure that each post is posted on the exact time it was said to be on.

Other than that we can just run it once every day at 23:59. I'm fine with either way as I don't use time in my website, only date.

Running it once will save us from the pain of modifying the workflow and extracting timestamps.

@MUmarShahbaz
Copy link
Contributor Author

@CheariX the time is necessary.

@KingHowler then the action itself would change its own cron schedule? It is feasible, but I think it is kind of overcharged. For example, if someone adds like 5 scheduled posts for the same day, but for different timestamps, the action would run 5 times, and it seems a bit excessive. What do you think?

Another thing I would like to mention is that this highly depends on the user. Like I said I don't use time but there may be some people who think it's crucial for their post to be published at the exact time.

How about we make a config.txt file?

We can build both versions. An action that runs daily and an action that runs at specific times. A third action, the scanner, will check the config.txt and run the appropriate action according to it.

This makes it customizable for the user.

@george-gca
Copy link
Collaborator

george-gca commented Aug 27, 2024

I am more inclined into doing a single one that checks everyday (ignoring time) and add an explanation of how one would do that considering the timestamp in our FAQ or CUSTOMIZE. What do you think?

@MUmarShahbaz
Copy link
Contributor Author

I am more inclined into doing a single one that checks everyday (ignoring time) and add an explanation of how one would do that considering the timestamp in our FAQ or CUSTOMIZE. What do you think?

The current code can fulfill those requirements, but as the site doesn't publish posts dated for a later time, it's essential to have the scheduler work exactly at 23:59.

Earlier you proposed that we should run it at 00:00 and 23:59. If we run it at 00:00, all files dated (ignoring time) will be sent to _post/ but deploy won't upload them. When we run it at 23:59 later, all the files will already have been moved to _posts/ and there will be nothing to push.

Without pushing the deploy action won't trigger and we will have to wait for the next scheduled post (that could be the next day, could be next month maybe even next year)

So running it twice a day will cause the scheduler to not work properly.

@george-gca
Copy link
Collaborator

The current code can fulfill those requirements, but as the site doesn't publish posts dated for a later time, it's essential to have the scheduler work exactly at 23:59.

We can simply ignore the time like you currently use, and give instructions for adding the time if someone wishes it to.

Earlier you proposed that we should run it at 00:00 and 23:59. If we run it at 00:00, all files dated (ignoring time) will be sent to _post/ but deploy won't upload them. When we run it at 23:59 later, all the files will already have been moved to _posts/ and there will be nothing to push.

My mistake, I meant 12:00 and 23:59. That way we would have maybe 2 pushes per day. But it is ok for me doing once a day and gathering all posts for that day.

@george-gca
Copy link
Collaborator

I believe I can merge this PR now. Would you mind sending another PR with the information about how one could add support for time in this action @KingHowler, maybe in CUSTOMIZE.md or FAQ.md? I believe it would be useful for some users.

@MUmarShahbaz
Copy link
Contributor Author

I believe I can merge this PR now. Would you mind sending another PR with the information about how one could add support for time in this action @KingHowler, maybe in CUSTOMIZE.md or FAQ.md? I believe it would be useful for some users.

Wait, let me set trigger time to 23:59 first

.github/workflows/posts-scheduler.yml Outdated Show resolved Hide resolved
_scheduled/README.md Outdated Show resolved Hide resolved
@MUmarShahbaz
Copy link
Contributor Author

@george-gca I have set the trigger time to 23:59 and will send a PR for CUSTOMIZE.md

Here's a few things I'd like you to know before you merge this

  • This code completely ignores time given in the post, it makes up for it by running just before midnight
  • This code runs exactly once every day at 23:59
  • This code works with files using the following name pattern only: yyyy-mm-dd-title.md
  • It moves files from _scheduled/ to _posts all other directories are unaffected by the action

@george-gca
Copy link
Collaborator

I just added a few more things to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants