Replies: 6 comments
-
Wow this seems like a big new (open) feature! Quick question: I read that you pass the CDP commands to Selenium WebDriver. Is this necessary? is passing them directly to the browser much more complex? |
Beta Was this translation helpful? Give feedback.
-
Hi @6DiegoDiego9 - yeah I too think it opens up a lot of custom functionality. I think this method is the simplest way from a development perspective, but then of course has the disadvantage of having to manage the Selenium WebDriver(s) and all that that entails. The other two ways that I know of (and I know you are aware of too) for accessing the CDP are using your demonstrated DebuggerAddress technique to attach to an existing browser, and then getting a reference to the CDP WebSocket for communication; or using the "pipes method" to communicate directly from VBA. I suppose that if I were building a very narrow/specific automation application, I would probably lean toward the later method in order to do away with the WebDriver complications (install, version alignment, etc.). But for a very general tool like SeleniumVBA, I personally think what we have here is the easiest in terms of maintainability. |
Beta Was this translation helpful? Give feedback.
-
After experimenting with the ExecuteCDP method for a short while, two potential limitations have arisen with using the Selenium WebDriver as means of utilizing full access to the CDP functionality... Detecting and Responding to CDP Events. There seems to be no elegant way that I know of for detecting and responding to CDP events using WebDriver as the intermediary, which could be important for developing new methods. There only seems to be a WebDriver endpoint for sending commands and receiving the results. This as opposed to the "piping" methods used to connect directly to browser via a Websocket interface, such as VBAChromeDevProtocol. The later does appear to support CDP event callbacks but I have not tested to see if/how well it works. DOM Element Interaction. The WebDriver and CDP indexing schemes for tracking/referencing elements in the DOM tree are different, and there seems to be no elegant way that I was able to find to map between the two indexing schemes. This is a problem if we wish to mix and match between the two sets of functionalities, such as locating an element using one system, and then interacting with the same element using the other. Thus the CDP may be less useful or practical for improving on functionality to manipulate DOM. With those two limitations noted, there still does seem to be a significant number of useful things we can do with ExecuteCDP - a few of which I have demonstrated in test_ExecuteCDP, and the Advanced Customization section of the SeleniumVBA Wiki. Cheers! |
Beta Was this translation helpful? Give feedback.
-
I experimented a bit too on the DOM Element Interaction and I was only able to convert CDP ID to JavaScript ID and viceversa. I didn't find a way a bridge between them and WebDriver ID. Here is what I obtained for the same element:
Even GPT-4 temp0 just confirms what you already noted: https://cloud.typingmind.com/share/0cd848a8-597d-4895-86fe-54251c918246 |
Beta Was this translation helpful? Give feedback.
-
Thanks for looking into this and sharing what you found @6DiegoDiego9. Again, wow - that's pretty amazing what GPT-4 can do! That is such coherent response - I'm tempted to insert it here in the discussion in order to memorialize it... |
Beta Was this translation helpful? Give feedback.
-
On second thought - does it add any value to the current and future world to memorialize a GPT response? Knowlege is now an open-source commodity! :-) |
Beta Was this translation helpful? Give feedback.
-
As of version 4.2, SeleniumVBA now supports communicating with the Chrome and Edge browsers via the Chrome DevTools Protocol, or CDP, providing a low-level interface for browser interaction. There are many 100's of commands that can be used to customize SeleniumVBA automation. For a complete list of CDP commands, see https://chromedevtools.github.io/devtools-protocol/tot/.
This new functionality is exposed through the
ExecuteCDP
method of the SeleniumVBA WebDriver class. To useExecuteCDP
, specify the command name (such asPage.captureScreenshot
) and if inputs arguments are required, a Dictionary object containing input parameter key names and values. The return payload is always a Dictionary with a root key of "value". TheExecuteCDP
method packages the input arguments into a CDP JSON object, sends the command to the Selenium WebDriver through two special HTTP endpoints (one for Edge and one for Chrome), and then receives the payload from the Selenium WebDriver. The process is not unlike what is internally performed when executing a normal WebDriver command. Beyond that, it is the User's responsibility to research how to construct the CDP command name and input parameters, and how to parse the return payload.A few examples
Below is an example of executing one such command to produce an enhanced version of the
SaveScreenshot
method, with more optionality.Another example showing use of the
Page.setDownloadBehavior
command, which allows us to redirect the browser's download directory AFTER capabilities have been set and the OpenBrowser command has been executed...CDP versus supported WebDriver commands
There is obviously overlap in the two buckets of functionality so it's probably worth discussing the comparison. Here are some of my thoughts on the differences, but would be interested in hearing from others...
Currently, CDP can only be used with Edge and Chrome browsers - so Firefox gets left out of cross-browser compatibility.
CDP is for testing/automation at a lower-level of interaction with the browser. As a result, programming effort is generally greater for CDP. You can use CDP commands to emulate most if not all Selenium WebDriver commands, but the reverse is not true. In SeleniumVBA, we could wrap the CDP into new methods or enhanced versions of existing methods.
Due to its low-level nature, CDP is a relatively fast-moving target as the browsers evolve. On the other hand, the Selenium WebDriver's public-facing interface, especially for commands that conform to W3C standard, is more stable due to the issuance of a new WebDriver version for each new browser version, maintaining a mostly unchanged interface. So I would guess that for an application (such as SeleniumVBA) that extensively uses either or both of these two functionalities, maintaining an interface that relies heavily on CDP will likely require more effort than for the WebDriver command-based interface.
So should we use the CDP to design new or enhanced methods? My opinion is a qualified "yes", if we can find compelling use-cases...
I've included a few other examples besides the one shown above in the
test_ExecuteCDP
module.Any comments or ideas? Have something to share? Looking for compelling or interesting use cases - if you have one please post it here.
Beta Was this translation helpful? Give feedback.
All reactions