Crate with Docker & Weave
Please welcome our guests from Crate team, they have a very exciting Big Data use-case! — Ilya Introduction Crate is a massively scalable, distributed, and highly available database that leverages the power of Lucene for lighting fast...
Please welcome our guests from Crate team, they have a very exciting Big Data use-case!
— Ilya
Introduction
Crate is a massively scalable, distributed, and highly available database that leverages the power of Lucene for lighting fast response times, but you access it using familiar SQL. Crate’s distributed SQL planner layer analyzes and optimizes your queries for maximum efficiency in a distributed environment. You can think of it as SQL for Elasticsearch. Because Crate is designed from the ground up for scalability, it is also perfectly suited for containerization, in our case Docker.
Weave is a software-defined networking technology (SDN) that integrates with Docker and is particularly easy to use. Weave allows developers to implement overlay networks that are tailored for their applications without a need to touch the underlying infrastructure. Among many features, it enables multicast-based discovery to work in any cloud.
This post will give you a step-by-step guide to using Docker and Weave to build a Crate cluster on Google Compute Engine.
Launch Instances
Launching instances on Google Compute Engine is simple. Just make sure you have the gcloud utilites installed.
If you prefer to separate your Crate instances in a separate network you can of course do so, but make sure that TCP
port 4300
(used by Crate) and TCP
+UDP
port 6783
(used by Weave) are open to all hosts on the network.
For the purpose of this blog post we’ll use three n1-standard-4 instances with a 12GB SSD root disk running CentOS 7, which conveniently has Docker package available in the standard repository.
$ gcloud compute instances create weave-{1..3} --project YOUR_PROJECT_NAME --zone us-central1-a --machine-type n1-standard-4 --image centos-7 --boot-disk-type pd-ssd --boot-disk-size 12GB --metadata-from-file startup-script=user-data.sh
The startup script user-data.sh
is a minimalistic bash script containing just the necessary commands to bootstrap the instances, e.g. installing Docker and Weave and pulling the required images:
#!/bin/bash # update repos yum update -y # install docker yum install -y docker service docker start # install weave curl --silent --location --output /usr/sbin/weave https://github.com/zettio/weave/releases/download/latest_release/weave chmod a+x /usr/sbin/weave # pull docker images docker pull crate:latest docker pull zettio/weave:latest
Once the instances are started you can list them using the gcloud
utility.
$ gcloud compute instances list
The output should look similar like this:
NAME ZONE MACHINE_TYPE INTERNAL_IP EXTERNAL_IP STATUS weave-1 us-central1-a n1-standard-4 10.240.81.152 23.251.159.143 RUNNING weave-2 us-central1-a n1-standard-4 10.240.60.209 23.251.158.46 RUNNING weave-3 us-central1-a n1-standard-4 10.240.82.60 23.251.145.69 RUNNING
Launch Weave Network
Now we’re good to go to launch weave to create a virtual network that spans across the 3 instances. To do so, connect to the first machine weave-1
and execute weave launch as root.
$ gcloud compute ssh weave-1 --zone us-central1-a > sudo weave launch
You might have to wait a bit after the instance is ready for the startup script to complete. If you get “command not found” just wait a minute for the instance to catch up. To be sure, you can tail /var/log/startupscript.log
and watch for Finished running startup script
line to appear.
Repeat that for weave-2
and weave-3
but add weave-1
as a launch option.
$ gcloud compute ssh weave-2 --zone us-central1-a > sudo weave launch weave-1
To verify the status of the network, type:
> sudo weave status
You should get output similar to the following:
weave router 0.8.0 Our name is 7a:dd:1b:b3:31:80 Sniffing traffic on &{9 65535 ethwe 0e:45:c3:8a:aa:44 up|broadcast|multicast} MACs: 0e:45:c3:8a:aa:44 -> 7a:dd:1b:b3:31:80 (2014-12-30 21:12:02.438783157 +0000 UTC) 7a:dd:1b:b3:31:80 -> 7a:dd:1b:b3:31:80 (2014-12-30 21:12:02.521045141 +0000 UTC) Peers: Peer 7a:de:1b:b3:31:80 (v2) (UID 14445032660173218941) -> 7a:0a:23:70:55:60 [10.240.153.241:6783] -> 7a:0a:b9:aa:2b:bb [10.240.40.159:6783] Peer 7a:0a:23:70:55:60 (v2) (UID 879772198233532096) -> 7a:0a:b9:aa:2b:bb [10.240.40.159:53611] -> 7a:de:1b:b3:31:80 [10.240.57.155:57562] Peer 7a:0a:b9:aa:2b:bb (v2) (UID 3350071212069558429) -> 7a:0a:23:70:55:60 [10.240.153.241:6783] -> 7a:dd:1b:b3:31:80 [10.240.57.155:49729] Routes: unicast: 7a:dd:1b:b3:31:80 -> 00:00:00:00:00:00 7a:0a:23:70:55:60 -> 7a:0a:23:70:55:60 7a:0a:b9:aa:2b:bb -> 7a:0a:b9:aa:2b:bb broadcast: 7a:dd:1b:b3:31:80 -> [7a:0a:23:70:55:60 7a:0a:b9:aa:2b:bb] 7a:0a:23:70:55:60 -> [] 7a:0a:b9:aa:2b:bb -> [] Reconnects:
Launch Crate
Now that the Weave layer is ready, it’s a breeze to launch the Crate cluster. The Weave network has multicast enabled that is you don’t need to think about the unicast setup. Just start the Crate nodes with weave run <CIDR>
and the usual docker run
options (Weave passes the options to docker run), with the only difference that you’ll need to bind to the Weave network interface for inter-cluster communication and on any IP for external access.
> sudo weave run 10.0.1.x/24 -p 4300:4300 -p 4200:4200 crate:latest crate -Des.cluster.name=crate-weave -Des.network.publish_host=_ethwe:ipv4_ -Des.network.bind_host=0.0.0.0
Repeat that on all 3 nodes, starting with 10.0.1.1/24
for weave-1
and so on.
That’s it! The Crate cluster with 3 nodes should be ready.
Verify Cluster
To verify the cluster, we just need to expose Weave to the local host’s network. Just be sure to choose an IP address that isn’t in use, here we randomly pick 101:
> sudo weave expose 10.0.1.101/24
The SQL command to list all names of the nodes of a Crate cluster is:
(For more information about the SQL commands see Crate’s SQL documentation.)
SELECT name FROM sys.nodes;
Using ’curl’ we can POST the statement to the _sql
endpoint of any Crate node and we can see that there are 3 nodes with random names.
> curl -XPOST 10.0.1.1:4200/_sql?pretty -d ’{ "stmt": "select name from sys.nodes" }’ { "cols" : [ "name" ], "duration" : 7, "rows" : [ [ "Bloodstorm" ], [ "Dmitri Smerdyakov" ], [ "Perfection" ] ], "rowcount" : 3 }
If you want to “turn off” the exposed network, you can use the Weave “hide” command:
> sudo weave hide 10.0.1.101/24
In order to make the cluster accessible from the outside world, we have already set bind_host=0.0.0.0
and now need to configure GCE firewall to allow access:
$ gcloud compute firewall-rules create allow-crate --project YOUR_PROJECT_NAME --allow tcp:4200 tcp:4300 --source-ranges 0.0.0.0/0
Final Steps:
Now that you’ve got your Crate cluster running, you can connect to the Admin UI by visiting the public IP of any node on port 4200, like so: http://PUBLIC_IP:4200/admin
You can then import some sample data from Twitter using the “Get Started” link in the left sidebar of the Admin UI: http://PUBLIC_IP:4200/admin/#/tutorial
Enjoy!
Conclusion
Using Crate with Weave has some clear advantages over the regular Docker usage. It removes the overhead of a unicast setup and makes it ever easier to deploy Crate clusters that span across availability zones or even cloud providers. And all this happens in a secure network that you can adjust to your needs.
If you have any questions about Crate, you can ask in their Google group or in IRC #crate. Happy deploying!
Appendix
This post shows shell commands executed in different places. The line prefix shows where to run them:
-
$ ...
– local machine -
> ...
– remote machine -
# ...
– docker container