Friday, February 3, 2017

Running a Selenium Grid with docker in swarm mode



Selenium Grid combined with a docker cluster in a swarm mode is a pretty simple but at the same time very powerful solution to run and scale selenium infrastructure quickly and easily.

Advantage of using docker is that we don't have to deal with a hassle of manual installation and configuration. What you need is only docker and selenium docker images.


What is docker

Docker is an open-source program that enables a Linux application and its dependencies to be packaged as a container. Container-based visualization isolates applications from each other on shared operating system (OS). This is just a general picture of the Docker motivation. For detailed documentation and installation guides visit Docker official site.

Selenium Grid configuration

Let me remind you a Selenium Grid architecture. The entry point of Selenium Grid is a Selenium Hub. Your test cases will hit the hub and spin up whatever browser is available within your Grid using the DesiredCapabilities function of Selenium. 
Next elements are nodes, which are machines that - previously registered to hub - can execute your test cases.




In order to create a docker container with a hub on your local machine, you need to pull and run container from Docker repository :
$ docker run -d --name selenium-hub -p 4444:4444 selenium/hub
The command above downloads selenium hub image and and starts hub on the port 4444 locally. To see a grid console you need to open http://localhost:4444/grid/console. At this moment console is empty because we haven't created any nodes.
Let's create 2 selenium nodes, one with Firefox  and another with Chrome. To create and start Firefox container you need to execute a command below:
$ docker run -d --name node-firefox --link selenium-hub:hub selenium/node-firefox
And for chrome:
$ docker run -d --name node-chrome --link selenium-hub:hub selenium/node-chrome
Let's check the status of the containers by running:
$ docker ps
You should see something like that:

And let's also check a selenium grid console:


Provisioning Docker Swarm Mode Cluster  

Everything is fine until you decide to put your selenium nodes on separate machines. 
In case where a simple --link worked when you are on the same machine not the same is true when your hub and node sit on different machines and they don't know about each other's existence. You can't connect the node to the hub with a --link. Because of that we need to tell the node what the address and port is for the hub by running the node with something like this. The env variables don't get created with a link. We have to create them as env vars when starting the node.
$ docker run -d -e HUB_PORT_4444_TCP_ADDR= -e HUB_PORT_4444_TCP_PORT=<4444-usually> selenium/node-chrome
And entry_point.sh gets the info to register the node to the hub. But hub also need an address of the node to poll its status. We can solve it with SE_OPTS='-host <outside-ip-of-your-node-vm>'.

If you are working with multi-containers, then docker-compose is your best friend.

Let's start creating a cluster and provisioning it with a docker-compose.


Step 1. Initialize a swarm

$ docker swarm init

The output contains a token which we will need when we add workers and/or managers to the swarm.


Step 2. Add separate machine to the cluster

To add a separate machines to the cluster we need to execute on them the command which we received in the previous step

$ docker swarm join \
    --token SWMTKN-1-3unut2yz2f9fgecexcchunh90m3f4bd5t5r534jgq337352p5s-632cf05url6hatcj7s0i5shig \
    192.168.65.2:2377

Step 3. Provisioning with a docker-compose

We need to create a docker-compose.yml file with the following content:
version: '3'
networks:
  private:
    driver: overlay
services:
  hub:
    image: selenium/hub:${SELENIUM_VERSION}
    ports:
      - "$SELENIUM_PORT:$SELENIUM_PORT"
    deploy:
     mode: global
     placement:
       constraints:
         - node.role == manager
    environment:
     - GRID_BROWSER_TIMEOUT=60000
     - GRID_TIMEOUT=60000
     - GRID_MAX_SESSION=50
     - GRID_MAX_INSTANCES=3
     - GRID_CLEAN_UP_CYCLE=60000
     - GRID_UNREGISTER_IF_STILL_DOWN_AFTER=180000
     - GRID_NEW_SESSION_WAIT_TIMEOUT=60000
    networks:
     - private
  firefox:
    image: selenium/node-firefox:${SELENIUM_VERSION}
    volumes:
      - /dev/urandom:/dev/random
    depends_on:
      - hub
    environment:
      - HUB_PORT_4444_TCP_ADDR=hub
      - HUB_PORT_4444_TCP_PORT=${SELENIUM_PORT}
      - NODE_MAX_SESSION=1
    entrypoint: bash -c 'SE_OPTS="-host $$HOSTNAME -port 5555" /opt/bin/entry_point.sh'
    ports:
      - "5555:5555"
    deploy:
      replicas: 1
    networks:
      - private

  chrome:
    image: selenium/node-chrome:${SELENIUM_VERSION}
    volumes:
      - /dev/urandom:/dev/random
    depends_on:
      - hub
    environment:
      - HUB_PORT_4444_TCP_ADDR=hub
      - HUB_PORT_4444_TCP_PORT=${SELENIUM_PORT}
      - NODE_MAX_SESSION=1
    entrypoint: bash -c 'SE_OPTS="-host $$HOSTNAME -port 5556" /opt/bin/entry_point.sh'
    ports:
      - "5556:5556"
    deploy:
      replicas: 1
    networks:
      - private

I've defined variables SELENIUM_VERSION and SELENIUM_PORT as environment variables. So, if decide later to update a selenium version or change a hub port, we do not need to change docker-compose.yml file.Just have a look, how entrypoints of selenium nodes are defined. This trick allows to set as a hostname the name of the docker container. We need that because all the nodes are in one cluster and setting host to the outside address of vm wan't help. 
In our case all the chrome nodes will listen on the port 5556 and firefox nodes on 5555.

Step 4. Deploy a docker stack

To deploy a docker stack we only need to execute a very simple command:
$ docker stack deploy --compose-file=/opt/selenium/docker-compose.yml selenium
If everything went smooth, you can check docker ps on both machines or open directly selenium grid console in your browser http://localhost:4444/grid/console. In result, just like previously there is a hub with two nodes, but this time the configuration is defined in one file, can be run with one command and nodes are on separate machines. Docker-compose file can be added now to your repository and reuse.

Step 5. Scaling

When our test base grows, two nodes can be far not enough. Luckily, docker service comes with a great feature which allows to scale number of similar containers on the fly. If your two-nodes grid is running, and you want to increase the number of chrome nodes to five, enter command:
$ docker service scale selenium_chrome=5
Now we have 4 more containers with chrome node, registered to our hub and deployed on separate machines. 


Was this post interesting? If so, please click the like button below, it really does help.
Do you have any questions? Feel free to post a comment. I will try to do my best to answer it as soon as possible.

4 comments :

  1. I can see everything running on the node, the swarm worker tasks( browser node) are not visible on hub. when I do $docker stack ps selenium , I see:

    ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
    xae5hr2xmo1p selenium_hub.n6m3q7z0loy0ibiq113cze3ye selenium/hub:latest amit Running Running 21 minutes ago
    6faff7l67vy3 selenium_chrome.1 selenium/node-chrome:latest ubuntu-worker1 Running Preparing 21 minutes ago
    ts1hlx38vlt2 selenium_firefox.1 selenium/node-firefox:latest amit Running Running 21 minutes ago
    msiz8h5cqi48 selenium_chrome.2 selenium/node-chrome:latest ubuntu-worker3 Running Preparing 7 minutes ago

    ReplyDelete
    Replies
    1. Could you please give a little bit more information? What are you trying to do and what do you want to see?

      Delete
  2. I disagree with running your hub on the swarm manager, but, nice write up none-the-less.
    Personally, I feel as though the swarm manager should not be running containers other than the management ones it needs to run to run the swarm itself. That is the very purpose (and sole purpose) of the swarm manager. All of your containers and services should be run on swarm workers.
    If you're using a load balancer and virtual machine scale sets to build your workers, your traffic will never see the manager (this is as intended), and, access to the manager should be very tightly restricted. This is controlling your whole cluster of services.
    Typically I use docker node update --availability drain master to remove the master vm from ever being in contention to running service. That is the reason that command exists, even.

    ReplyDelete
  3. Hi,

    I have a question...I am able to set up swarm successfully, but how to view the running browser/container in vnc. Do we need to open any ports?.

    The command 'docker ps -a' not displaying any port number to for chrome and fire fox containers

    Thanks

    ReplyDelete