Skip to content

prototype: headless-browser orchestration server, for robotic-process-automation

Notifications You must be signed in to change notification settings

BenMullan/carpe-datum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

carpe-datum

...is a prototype headless-browser orchestration server (and impropper Latin for "seize the data").

Watch the "JavaScript in RPA" YouTube Video!



Err... what's going on

I spent 5 months with a process-automation team. They ran an amazing peice of flowcharting software on a farm of VMs, to simulate mouse-clicks and keystrokes - and run processes automatically (eg data-entry into a web-based business system), without paying humans to perform them. This system was an at-once beguiling and stupyfying conflation of numerous bizzare and anachronistic technologies; Win32-spying, remote DOM-inspection, VB6-style expression functions, .NET Remoting, and even the obscure "Visual J#"!

But it was precarious. And slow. Browsers would become detached during execution. It would completely & inexplicably freeze-up during use. It couldn't effectively handle multiple browser-windows. XPath-mapped elements would become unfindable after UI-updates. Subtle differences in environment would cause certain unattended executions to fail, and it would be near-impossible to catch what went wrong. Data would be perilously plucked from inconsistenyly-formatted excel spreadsheets, and run through fragile type-coercion. I wanted to do better.

This prototype prooved that a 5-minute Blue Prism process could be executed in under 20 seconds with JavaScript.

JavaScript, you say...?

Imagine: instead of 50 VMs; one server, and 50 headless browsers.

An easy-to-use UI with a library of pre-defined baps (browser-automated processes - eg entering business data and scraping some output), and live previewing & interaction with the browser-pool. A http-API for triggering & scheduling bap executions on browsers from the pool. Robustly-implemented processes with watertight javascript, using playwright to manipulate the DOM directly, instead of prodding at the UI from above. Execution traces capturing precise screenshots & DOM-state at each stage. Consistent, schema-validated process input- and output-data. (Oh, and you'd save ~£500,000 on Blue Prism lisencing costs too).

ui-screenshot

...and a dynamically-sized browser-pool...

ui-screenshot

What we learned

if only this one had come up in the A-level...

  • For a process to run as robustly as possible, it needs to interface with the system at the lowest available layer; javascript enables direct manipulation of the DOM underlying the UI.
  • To use Blue Prism the most effectively, you need in-depth programming knowledge (ie Visual Basic, webpage structuring, and http APIs mechanisms). But if you have this, why remain tied to Blue Prism? You could escape the sluggishness, precarity, and extortionate cost - in exchange for free, unfettered, democratised code.
  • Suffice it to say, there exists a skills-gap between the disciplines of blue-prism-operation and playwright-scripting; a skills-gap likely to take some time to bridge in most working environments.

This project

code-screenshot

Amongst the most important code is...

To use this software...

How it worketh

  • A cd-server runs the carpe-datum-service, which listens for bap-execution requests on a http API.
  • The server maintains a pool of headless chromium instances, which are comandeer()ed and relinquish()ed as required.
  • The server has a bap-library (a folder of playwright-scripts and process-data schema definitions, for different browser-based processes).
  • A client somewhere makes a *start-new execution POST request; this contains execution-parameters (eg whether to use a headed/headless browser) and process-input-data (eg the string to inject into the google-search box). The client can then make a *wait-for-exit long-polling request, to determine when the bap-execution has finished.
  • On receipt of a *start-new execution request (eg POST /api/baps/google-search-demo/executions/*start-new), the server validates the input-data against the schema defined for the specified bap, and creates a new execution folder, with an .execution-in-progress flag file. A bap-execution-worker process is instanciated (eg bap-execution-worker --cd-base-dir="..." --bap-name="google-search-demo" --execution-id="67f36fb23c9e" --target-browser-cdp-endpoint="http://localhost:9294"), and the server vigilantly captures this child-process's stdout/err and exit-code.
  • After execution, the execution endpoint (eg GET /api/baps/google-search-demo/executions/67f36fb23c9e) returns an object describing the execution-duration, -exit-reason, -error-state, and any process-output-data (eg a value scraped from the webpage).

In other words, this prototype provides an interface for a process's input- and output-data, which is completely abstracted from the nitty-gritty of the process's execution. You don't have to see the process - and it doesn't even have to run on your computer; as long as it's robustly implemented in JavaScript, it can heedfully process as much data as you fancy, without you touching it once.




Ben Mullan 2024