you're reading...

WebDriver as I know it

Many a times I have seen various different people ask this same question in various different forms.. “How does WebDriver work”..

Here’s how I understand its working. Ofcourse I could never even remotely come close to the actual crystal clear explanation that Simon Stewart shared in this link. So please do spend sometime reading it to understanding things in a deeper fashion.

Here’s how it works on a high level.

You have a client binding based on your choice of language. The client binding basically gives you wrapper methods that help you interact with the web browser using selenium in an API like fashion.
You have a server component which basically adheres to the JSONWireProtocol and provides actual actions on the web browser, based on the JSONWireProtocol adherant calls.

The server component is implemented in different forms for different browsers:

  • IE – IEDriverServer
  • Chrome – ChromeDriver
  • Firefox – A firefox plugin
  • Safari – Safari Extension
  • iOS – iOS driver which is basically a standalone java application which interacts internally with either the device/simulator.
  • Android – Selendroid which is again a standalone java application which interacts internally with either the device/simulator.

This is how a typical flow happens.

When you basically do this :

WebDriver driver = new FirefoxDriver();

The constructor of FirefoxDriver internally does the following :

  • Locates the firefox browser executable.
  • Extracts the custom web driver firefox plugin which is embedded inside the codebase, puts it into a temp directory, creates an anonymous firefox profile into which it includes this plugin, and then feeds that to the firefox browser.

When you do something like this :


This call internally gets translated to New URL JSONWireProtocol command by the client bindings and then gets executed against the server component for the browser [ so in this case its against the firefox web driver plugin that is embedded into the firefox browser that was spawned ]. The server component translates the JSONWireProtocol command into an appropriate action on the browser [ in this case it loads the URL in the address bar of the browser and loads the URL ].

WebDriver is slowly moving towards an architectural approach wherein the browser manufacturers are expected to provide the server component against which the client bindings would be executed.

ChromeDriver is the result of that initiative. Microsoft is also working on building a similar server component for IE [ http://www.microsoft.com/en-in/download/details.aspx?id=44069 ]. This I believe would basically be a replacement for IEDriverServer binary that Jim Evans has created currently for IE.

For Firefox I believe there is an initiative called Marionette that is being worked upon [ https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette ]
Once that is completely rolled out, the approach across all the browsers would basically become the same.



4 thoughts on “WebDriver as I know it

  1. Nice post, thanks for sharing. I think you forgot to factor in the case of RemoteWebDriver and grid. In all the local driver cases, the client server talk directly (where for FF and Safari, they feel like one and the same to the end user as they don’t have to do much nor link an external server executable). But for Remote WebDriver, the server JAR acts as a middle man or proxy between the client & server components. The Remote WebDriver in standalone mode would be the simplest case of grid (a grid of 1 node, so really not grid at all). And with Grid you have a more complex proxy setup where the proxy server/host farms out the work to the actual servers behind the scenes.

    Posted by autumnator | April 5, 2015, 7:55 am
  2. Also wanted to mention when looking at how WebDriver works, considering the client, server, and JSONWireProtocol aspects, it can be an implementation of a (REST) web service that just happens to perform web UI automation. The client bindings/components are the custom web service clients, the server components are specific implementations of the web service (e.g. Amazon AWS but other companies have replicated Amazon’s AWS API for internal cloud implementations with their software, or like .NET where you have Microsoft version and the open source Mono version), and JSONWireProtocol is basically the custom REST web service API for WebDriver.

    Seeing it that way, you just really need the server components (though generally in tandem with the server JAR for Remote WebDriver is easiest) to be able to use WebDriver by directly making JSONWireProtocol API calls instead of the nice wrapped client bindings. That’s what developers of unofficial 3rd party client bindings of WebDriver do, call the API and provide a nice wrapper to it to make it easier for general users to use it.

    So those who are interested in learning more of the internals of WebDriver should first better understand REST and web services if they don’t already.

    You can even debug/analyze the WebDriver client and server interactions if you run a proxy server or network data capture on your machines or network when running the WebDriver tests, seeing in real time the actual JSONWireProtocol calls/requests made and the responses returned.

    Posted by autumnator | April 5, 2015, 8:06 am
    • It’s also kind of fun to learn of and work with WebDriver internals by building something that interfaces to it. For those interested, should try working on a 3rd party unofficial client binding for WebDriver, or try to fix/patch/enhance an official client binding (or a server component).

      Or try to build a tool that is compatible with WebDriver/JSONWireProtocol API, basically a tool that automates or does something else but uses the same WebDriver API or client bindings. For this latter case, examples are most notably Appium, ios-driver, Selendroid. Another example would be https://github.com/daluu/AutoItDriverServer

      Posted by autumnator | April 5, 2015, 8:10 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: