Tuesday, February 19, 2019

An easy way to build a RESTful micro service in Go to access a Cassandra Table

TL;DR

This code repo allows one to run a single program to create a RESTful micro service that can read from a Cassandra Table a single row by using the primary key fields to restrict the SELECT clause and return it in JSON format. As of 22nd April 2019 it can also POST into simple tables!

It is hosted by the go-swagger framework and uses gocql as the Cassandra data access layer. The code is all Go and a single main programme provides a single command to accomplish this feat. It should be considered alpha software because I have only tested it using a limited number of use cases, but I hope it should suffice for many uses as is.

Motivation

I had a few reasons to write this code. One was a question from a former client "Can't we generate the micro services to read data from Cassandra?", another was a desire to learn a new programming language, Go, with a real project, but probably the main reason was I wanted to create something other than PowerPoints like I did in the good old days 😏 

My Approach

I wanted to generate all the code so I started looking at the Swagger tooling, but whilst looking good for Java, I didn't find that the generator for Go worked very well. A little looking around and I came across go-swagger, which worked first time. This tool lets one generate server or client side code from an input swagger specification and there are loads of capabilities it provides that I haven't even looked into.

The data access options were quite limited, but once I found gocql  and checked out the code I was happy with my choice. 

Both of the core technologies have active communities and mature code.

That left the question of how to create the handler code to run in the framework created by go-swagger that would retrieve the data. To do this I needed to write some code myself. The approach I took was a 4 step process:
  1. Parse the Cassandra DDL that defined the table and any required types (UDTs) to create the swagger API specification for the table
  2. Use go-swagger to generate the RESTful framework for the API defined in the created swagger file
  3. Using the parser output from (1) create the handler function that uses gocql to access the Cassandra table
  4. Patch the generated go-swagger code to call the functions of my generated data access code
All of the above can (and has) been implemented in a single main.go file

Code Structure & Running Output

The main.go program resides in the root folder, the sub directories contain:
  • handler - the code that generates the data access code
  • parser - the code that parses the Cassandra DDL file 
  • swagger - the code that takes the parser output and creates the swagger output file
The main.go file expects a number of program arguments to be set:
  • -file 
  • -goPackageName
  • -dirToGenerateIn
The file parameter defines the full path of the input Cassandra DDL file to process.

The goPackageName is that of the desired Go package name.

The dirToGenerateIn is optional as it defaults to /tmp, but without go modules needs to be set to a file under $GOPATH/src

The example command I used was:

go run main.go -file=/Users/stevef/Source_Code/go/src/github.com/stevef1uk/test4/t.cql\  -goPackageName=github.com/stevef1uk/test4 \
- dirToGenerateIn=/Users/stevef/Source_Code/go/src/github.com/stevef1uk/test4

There are several other flags that may be set e.g. -debug=true & -endPoint=<end point name> 

Then, in the directory where go-swagger will have created the framework run:

export CASSANDRA_SERVICE_HOST=127.0.0.1
go run cmd/simple-server/main.go --port=5000

Assuming all is well you should see:

2019/02/19 22:18:55 Tring to connect to Cassandra database using  127.0.0.1
2019/02/19 22:18:55 Yay! Connection to Cannandra established
2019/02/19 22:18:55 Serving simple at http://127.0.0.1:5000

At this point the microservice will be accessible at the following URL:

http://127.0.0.1:5000/v1/<table name>

note:  <end point name> would overide <table name> 

e.g. curl -X GET "http://127.0.0.1:5000/v1/accounts4?id=1&name=steve&time1=2013-01-01T00:00:00.000Z"

The Parser -

In the (very distant) past I had used the Unix tools, lex & yacc to create and parse a formal language for the pretentiously named General Purpose Test Tool. This was a Visual Basic like language that I designed to let a software house test its 'C' code in the relatively dark ages of computing (early 1980s). I even had a nice book on these tools, which my wife persuaded me to throw away a year or so ago when she she pointed out that this book had been in our attic for decades and I was not likely to need it again 😢

I took a look at the Cassandra schema for DDL and decided parsing it using a lex & yacc approach was going to be very challenging, so I decided to try using regular expression matching instead. This approach worked, but the code was very hard to follow making it hard to extend and maintain. Therefore, I rewrote it using the Finite State Machine approach together with regular expressions. I did look at a couple of FSM libraries, but in the end wrote my own for this.

Go-Swagger

Once a swagger file is created the command 'swagger generate server -f swagger.json' will create the server code. This command will also list the packages that need to be installed to build the framework successfully.

The file in the restapi folder called configure_simple.go is the one that step (4) above patches. Step (3) creates the generated data access file in a new directory called data. This file is called Generatedhandler.go

Handler Options

By default all of the fields defined as primary fields will be used to select data from the database, if the flag numberOfPrimaryKeys is set to a number then only this number of keys will be used in the select statement in the order in which they are defined. All will still need to be passed on the RESTful API call though.

The consistency flag can be defined to one of the standard gocql consistency modes, the default is gocql.One

The allowFiltering flag was supposed to add this clause to the generated select statement, but I seem to have forgotten to implement it 😑

The handler code has its own Setup() and Stop() functions that establish a connection to the Cassandra database defined by the CASSANDRA_SERVICE_HOST environment variable.

The micro service can be scaled horizontally. If deploying the micro service to something like OpenShift or Kubernetes, I found the hard way that the pod needs to be configured to listen on all networks e.g. --host=0.0.0.0 in order to run successfully.

Tests

I like testing my software as it gives me confidence to refactor so there are test modules in each sub package that contain tests.

The hard part with testing the handler code was having to define Cassandra schemas, populate the resultant tables with data! I have tested some User Defined Types alongside a table.

The most complex test I ran had the following schema:

CREATE TYPE demo.simple (
       dummy text
    );

    CREATE TYPE demo.city (
    id int,
    citycode text,
    cityname text,
    test_int int,
    lastUpdatedAt TIMESTAMP,
    myfloat float,
    events set<int>,
    mymap  map<text, text>,
    address_list set<frozen<simple>>
);

CREATE TABLE demo.employee (
    id int,
    address_set set<frozen<city>>,
    my_List list<frozen<simple>>,
    name text,
    mediate TIMESTAMP,
    second_ts TIMESTAMP,
    tevents set<int>,
    tmylist list<float>,
    tmymap  map<text, text>,
   PRIMARY KEY (id, mediate, second_ts )
 ) WITH CLUSTERING ORDER BY (mediate ASC, second_ts ASC)

Note: using describe table is the best way to populate this file. The FSM is configured to look for the WITH text to end processing. If you have a data type or field with the name WITH then you will need to change this to WITH CLUSTERING.

To insert data into this table I used:

insert into employee ( id, mediate, second_ts, name,  my_list, address_set  ) values (1, '2018-02-17T13:01:05.000Z', '1999-12-01T23:21:59.123Z', 'steve', [{dummy:'fred'}], {{id:1, mymap:{'a':'fred'}, citycode:'Peef',lastupdatedat:'2019-02-18T14:02:06.000Z',address_list:{{dummy:'foobar'}},events:{1,2,3} }} ) ;

Then, running the main & testing gave me:

curl -X GET "http://127.0.0.1:5000/v1/employee?id=1&mediate=2018-02-17T13:01:05.000Z&second_ts=1999-12-01T23:21:59.123Z"
[{"address_set":[{"address_list":[{"dummy":"foobar"}],"citycode":"Peef","events":[1,2,3],"id":1,"lastupdatedat":"2019-02-18 14:02:06 +0000 UTC","mymap":{"a":"fred"}}],"id":1,"mediate":"2018-02-17 13:01:05 +0000 UTC","my_list":[{"dummy":"fred"}],"name":"steve","second_ts":"1999-12-01 23:21:59.123 +0000 UTC","tevents":[],"tmylist":[],"tmymap":{}}]