Skip to content

Latest commit

 

History

History
368 lines (268 loc) · 19.8 KB

concurrency_and_php.md

File metadata and controls

368 lines (268 loc) · 19.8 KB

Concurrency and PHP in relation to modern programming languages, Python, Go, NodeJS, Rust, Etc

This post is to address an misconception about PHP. It is inspired by this multi-part series from Concurrency in modern programming languages on building and benchmarking a concurrent web server. The series covers in details the behavior working of concurrency with the languages of Rust, Go, Javascript/NodeJS, TypeScript/Deno, Kotlin, and Java. Not sure why Python excluded?

I will include the source code for each language, then a PHP version with the same simplicity in code and then benchmarks results.

I will start off with one think in mind, if an developer had some computer science or related like study they will find all languages essential are the same, some may already include pre-made routine solutions for problems. If theres no pre-made solution, it can be written in the language of choice to get the same outcome and behavior, taken from the another language. Trying to get the same performance is a different story, but the concept stays the same, the syntax is different.

For the details of the following code block see Concurrency in modern programming languages: Rust.

#[async_std::main]
async fn main() {
    let listener = TcpListener::bind("127.0.0.1:8080").await.unwrap(); // bind listener
    let mut count = 0; // count used to introduce delays

    loop {
        count = count + 1;
        // Listen for an incoming connection.
        let (stream, _) = listener.accept().await.unwrap();
        // spawn a new task to handle the connection
        task::spawn(handle_connection(stream, count));
    }
}

async fn handle_connection(mut stream: TcpStream, count: i64) {
    // Read the first 1024 bytes of data from the stream
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).await.unwrap();

    // add 2 second delay to every 10th request
    if (count % 10) == 0 {
        println!("Adding delay. Count: {}", count);
        task::sleep(Duration::from_secs(2)).await;
    }

    let header = "
HTTP/1.0 200 OK
Connection: keep-alive
Content-Length: 174
Content-Type: text/html; charset=utf-8
    ";
    let contents = fs::read_to_string("hello.html").unwrap();

    let response = format!("{}\r\n\r\n{}", header, contents);

    stream.write(response.as_bytes()).await.unwrap(); // write response
    stream.flush().await.unwrap();
}

For the details of the following code block see Concurrency in modern programming languages: Golang.

func main() {
    var count = 0
    // set router
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        defer r.Body.Close()
        count++
        handleConnection(w, count)
    })
    // set listen port
    err := http.ListenAndServe(":8080", nil)
    if err != nil {
        log.Fatal("ListenAndServe: ", err)
    }
}

func handleConnection(w http.ResponseWriter, count int) {
    // add 2 second delay to every 10th request
    if (count % 10) == 0 {
        println("Adding delay. Count: ", count)
        time.Sleep(2 * time.Second)
    }
    html, _ := ioutil.ReadFile("hello.html") // read html file
    w.Header().Add("Connection", "keep-alive")
    w.WriteHeader(200)           // 200 OK
    fmt.Fprintf(w, string(html)) // send data to client side
}

For the details of the following code block see Concurrency in modern programming languages: JavaScript on NodeJS.

const http = require("http");
const fs = require("fs").promises;

let count = 0;

// set router
const server = http.createServer((req, res) => {
  count++;
  requestListener(req, res, count);
});

const host = "localhost";
const port = 8080;

// set listen port
server.listen(port, host, () => {
  console.log(`Server is running on http://${host}:${port}`);
});

const requestListener = async function (req, res, count) {
  // add 2 second delay to every 10th request
  if (count % 10 === 0) {
    console.log("Adding delay. Count: ", count);
    await sleep(2000);
  }
  const contents = await fs.readFile(__dirname + "/hello.html"); // read html file
  res.setHeader("Connection", "keep-alive");
  res.writeHead(200); // 200 OK
  res.end(contents); // send data to client side
};

function sleep(ms) {
  return new Promise((resolve) => {
    setTimeout(resolve, ms);
  });
}

For the details of the following code block see Concurrency in modern programming languages: TypeScript on Deno.

import { serve, ServerRequest } from "https://deno.land/std/http/server.ts";

let count = 0;

// set listen port
const server = serve({ hostname: "0.0.0.0", port: 8080 });
console.log(`HTTP webserver running at:  http://localhost:8080/`);

// listen to all incoming requests
for await (const request of server) handleRequest(request);

async function handleRequest(request: ServerRequest) {
  count++;
  // add 2 second delay to every 10th request
  if (count % 10 === 0) {
    console.log("Adding delay. Count: ", count);
    await sleep(2000);
  }
  // read html file
  const body = await Deno.readTextFile("./hello.html");
  const res = {
    status: 200,
    body,
    headers: new Headers(),
  };
  res.headers.set("Connection", "keep-alive");
  request.respond(res); // send data to client side
}

// sleep function since NodeJS doesn't provide one
function sleep(ms: number) {
  return new Promise((resolve) => {
    setTimeout(resolve, ms);
  });
}

Here we have the PHP version. For this to work as posted, an external package Coroutine is required.

This script is hosted on Github

include 'vendor/autoload.php';

use function Async\Path\file_get;
use function Async\Stream\{messenger_for, net_accept, net_close, net_local, net_response, net_server, net_write};

const WEB_DIR = __DIR__ . DS;

function main($port)
{
  $count = 0;
  $server = yield net_server($port);
  print('Server is running on: ' . net_local($server) . EOL);

  while (true) {
    $count++;
    // Will pause current task and wait for connection, all others tasks will continue to run
    $connected = yield net_accept($server);
    // Once an connection is made, will create new task and continue execution there, will not block
    yield away(handleClient($connected, $count));
  }
}

function handleClient($socket, int $counter)
{
  yield stateless_task();
  // add 2 second delay to every 10th request
  if ($counter % 10 === 0) {
    print("Adding delay. Count: " . $counter . EOL);
    yield sleep_for(2);
  }

  $html = messenger_for('response');
  $contents = yield file_get(WEB_DIR . 'hello.html');
  if (is_string($contents)) {
    $output = net_response($html, $contents, 200);
  } else {
    $output = net_response($html, "The file you requested does not exist. Sorry!", 404);
  }

  yield net_write($socket, $output);
  yield net_close($socket);
}

coroutine_run(main(8080));

The Execution Flow, Under The Hood

The first thing you might notice, is the usage of yield.

Normally the yield statement turns the surrounding function/method into a generator object to return to the user. In which case, the user is then required to step-thru to have the function/method instructions executed.

This script gets started by calling coroutine_run().

This inturn will add the main() generator object into an Queue, but before adding it wraps the object into another class Task to use instead.

  • This Task class is responsible for keeping track of the generator state/status, it's running by invoking ->current(), ->send(), ->throw() and storing any results.
  • The Task class will then register the passed in generator with a single universal scheduling routine, a Coroutine class method process to step-thru all generator objects, that uses an different Stack to manage itself.

The yield statement now just leaves or returns in any context with results, it's an control flow mechanism, it suspends and resumes where it left off.

More exactly, it marks an exact place where our code gives up control, it signals that it's ready to be added to a waiting list, meanwhile, the CPU/Application can shift to other tasks.

The next step that happens in our execution flow, is adding the main supervisor taska into the Queue.

  • This supervisor task for checking streams/sockets, timers, processes, signals, and events, or just waiting for them to happen. Each of these checks have there own key–value pair store array.

There will be up to 9 various abstract data types happening, only one holds the task to be executed next, the Queue.

By using a functional programming paradigm where function composition is mixed with mutual recursion, we will get true PHP concurrency as I have here, it's based on Python's original model of using @decorators on generator functions, that eventually lead to reserve words async/await, just syntactic sugar.

For an general overview of the power of Generators watch: Curious Course on Coroutines and Concurrency by David Beazley

Q: What's the underlying concept of why calling await outside a function not created with async throws a syntax error?

  • The languages that has these constructed calls, internally see them as a special private class calls. You would get similar error calling any class private/protect methods directly.

The Promise Problem, We Don't Make Any

Normally, when hearing of asynchronous programming, an event loop, callbacks, promises/futures, and threading has to be addressed or come into play to deal with the blocking nature of certain OS, Hardware features.

There are many actions when looked at very technically, and by it's actual behavior, can be described using different terms.

In computer science, the event loop is a programming construct that waits for events (triggers) and then performs specific (programmed) actions. A promise/future is also programming construct that is better at handling callbacks after some routine is finish, but on it's initial use, it returns an object. Both of these constructs runs in an single thread, they are use together to orchestrate handling blocking code.

So when we use yield within any function, that function can now be seen as a Promise, we can't do anything until we take the returned object and step-thru it. This object is a language feature, that already has a looping procedure process, and state management. With each tick of the stepping thru, an whole bunch of things can be preformed. This is where our main supervisor task steps in.

The supervisor task does check, will conditionally wait, but there is no task execution here, just the actions to schedule the Task back into the Queue, if a check is triggered.

  • Task objects that is not completed will be rescheduled.
  • This Event Loop is task, that is always yielding.
  • This Event Loop is an natural Generator process.

Handling The Blocking Conundrum. What, Wait Some More?

Now we come to a fork, where we need to request hardware or a OS feature, and still allow other things to processed. We have routines to address these requests, they are part of the Coroutine class. Each routine require an callback function to execute after completion. The internal instructions of these routines are PHP built-in native or an external extension library functions.

To handle these instructions and constraints, we need to bridge them together, and mix in a Task with the help of another class, the Kernel. The kernel does the trampoline, it is what initiates the recursion, and any system call. Python has a decorator process that can change the behavior of any function. PHP has something similar, any class can create an magic method, one of such, can invoke itself, the object returned, instantiated.

Most of the time you will be making kernel calls to get anything performed. The kernel is what gives the supervisor task, our Event Loop the list of checks to trigger on.

Let's pull out 3 things that our concurrent web server will do.

  • Get an socket, read from file, write to socket.

Each of these has the potentials to block. So instead we take an different route, which depends on the platform we're using. Under Windows and native PHP, local file request resources can't be put in non-blocking mode. Linux, does not have this restriction.

The process we take for any request/action, that has a blocking nature, is to take that action's associated resource and tie it to a Task and store, then proceed to next Task in the Queue. The stored resource pair will be processed by supervisor task.

  • For maximum performance we can arrange blocking code to run in an separate thread. In order to achieve, an cross-platform library libuv functions has been incorporated, the PHP ext-uv extension will need to be installed. Once installed, the supervisor task will call uv_run() to execute libuv event loop.

  • In case libuv not installed, on Windows, some requests/actions will be executed in an separate child-process, if not able to be put in non-blocking mode. Linux does not have this issue, all requests/actions can be set to non-blocking. Thereafter, the supervisor task will perform stream_select call.

When the supervisor task has nothing to check and nothing in the Queue, the whole system here will just stop and exit.

Since we are opening an socket connection, this script will not stop running.

Running Multiple Tasks

Now, let's take a quick look to how to work with a bunch of tasks, which is the whole point of Concurrency.

Take this block of code from Python Asyncio: Basic Fundamentals.

It shows how to schedule a few tasks using asyncio.gather(), and asyncio.sleep() is used to imitate waiting for a web response:

import asyncio

async def task(num: int):
    print(f"Task {num}: request sent")
    await asyncio.sleep(1)
    print(f"Task {num}: response arrived")

async def main():
    await asyncio.gather(*[task(x) for x in range(1,3)])

if __name__ == '__main__':
    asyncio.run(main())

Will output:

Task 1: request sent
Task 2: request sent
Task 3: request sent
Task 1: response arrived
Task 2: response arrived
Task 3: response arrived

This PHP version will produce same output.

  • The gather() and sleep_for() functions was created to behave same as Python's specs have them.
include 'vendor/autoload.php';

function task(int $num) {
  print("Task {$num}: request sent" . EOL);
  yield sleep_for(1);
  print("Task {$num}: response arrived" . EOL);
}

function main() {
  yield gather(array_map('task', range(1, 3)));
}

coroutine_run(main());
  • todo go over all functions in our web-sever script

  • todo show benchmark under Windows 10, PHP 8 no libuv, since currently no build version available

  • todo show benchmark under Windows 10, PHP 7.4 with libuv

  • todo show benchmark under WSL - Linux on Windows, PHP 7.3 with libuv

  • todo show benchmark under Raspberry Pi, PHP 7.3 with libuv


If you watch the above video Curious Course on Coroutines and Concurrency very closely starting at 1:49:30, and then reference Nikita Popov Cooperative multitasking using coroutines (in PHP!) article/post, seems most of what he introduced about yield, generators into PHP 5.5 originated from.

The only thing that seem strange, why at the end he states:

"When I first heard about all this I found this concept totally awesome and that’s what motivated me to implement it in PHP. At the same time I find coroutines really scary. There is a thin line between awesome code and a total mess and I think coroutines sit exactly on that line. It’s hard for me to say whether writing async code in the way outlined above is really beneficial."

What? From that time period to now, Python created an whole ecosystem around the concept, totally developed async/await from it.


There is an big issue with PHP and why it might be getting looked down on. Basically, to me the ad-hoc nature of it's origins, and how many users still write code, I mean just copy, just follow. They seem to have no computer science like background study, in which case, this might be of interest:

Teach Yourself Computer Science, goes into Why learn computer science? and recommending: