The Boredom of Authoring an API Client
22 November 2014In my day job as a glorified System Administrator I have the opportunity to write infrastructure, services, and tooling in Haskell, where traditionally someone in my position might reach for the hammers labeled Perl, Python, or Ruby et al.
While the advantages are many and those can be left to another blog post - a recurring pain point where Haskell falls down is in what I would categorise as mundane and commercial library availability:
-
Mundane: offers little intellectual reward to the library author. For myself this is anything that includes vast swathes of (mostly) repititious serialisation code that cannot be nicely abstracted using something like
GHC.Generics
. -
Commercial: Company X offers compelling service Y that you wish to utilise, of which there are officially supported client libraries in Java, .NET, Python, and Ruby.
Haskell offers plenty of mechanisms for limiting boilerplate and these generally work well in the face of uniformity (See: pagerduty), but faced with supporting an inconsistent API of sufficient scope, I hereby postulate both of the above categories will be satisfied and many shall wring their hands and despair.
Contents
- Status Quo
- A Comprehensive Haskell AWS Client
- Liptstick on a Pig
- Lenses and Roles
- Smart Constructors
- Type Families
- Documentation for Free
- One Library per Service
- Conclusion
Status Quo
As a concrete example, In early 2013 we decided to exclusively use Amazon Web Services for our entire infrastructure. Coupled with the fact that all of our backend/infrastructure related code is written in Haskell, the lack of comprehensive and consistent AWS libraries proved to be a problem.
Looking at the AWS category on Hackage, the collectively supported services are:
- Cloud Watch
- Elastic Compute Cloud
- Elastic Load Balancing
- Elastic Transcoder
- Identity and Access Management
- Kinesis
- Relational Database Service
- Route53
- Simple Database Service
- Simple Email Service
- Simple Notification Service
- Simple Storage Service
In some of these implementations the supported feature set is incomplete and approximately 30 services from Amazon’s total offering are not available at all.
This results in a subpar experience relative to Python, Ruby, Java, or .NET, for which there are official SDKs.
A Comprehensive Haskell AWS Client
After coming to the realisation in late 2012 - early 2013, that there were no Haskell libraries supporting the services we wished to use, I went down the route of providing a stopgap solution so we could begin building our infrastructure without having to compromise our language choice. This yielded a code generation Frankenstein which crawled the AWS documentation HTML, available SOAP definitions, and XSDs to provide AutoScaling, EC2, IAM, S3, CloudWatch, Route53, and ELB bindings.
While this was immediately useful, the obvious inconsistencies arising from HTML brittleness along with public XSDs in particular being an apparently legacy artifact for most services, intertia set in and I was unable to continue utilising the above approach for expanding the library offerings.
Going back to the drawing board in mid 2013, I started working on implementing a more future proof and sustainable approach to providing a truly comprehensive AWS SDK I could use for all my projects, both personal and professional.
The key enabler for this next approach was the discovery of the Amazon Service models, which are typically vendored with each of the official SDKs and provide a reasonably well typed representation of each of the services, warts and all.
Aside: the format of the service definitions has changed a couple of times and I’ve been forced to rewrite pieces of the generation code more than once due to oversight.
The end result is called amazonka, consisting of 43 different libraries covering all currently available non-preview AWS services.
The core libraries are:
amazonka
: contains a monad transformer, send/receive, and pagination logic.amazonka-core
: contains serialisation/request/response logic, and common data types.
With the supported services being:
- amazonka-autoscaling
- amazonka-cloudformation
- amazonka-cloudfront
- amazonka-cloudsearch-domains
- amazonka-cloudsearch
- amazonka-cloudtrail
- amazonka-cloudwatch-logs
- amazonka-cloudwatch
- amazonka-codedeploy
- amazonka-cognito-identity
- amazonka-cognito-sync
- amazonka-config
- amazonka-datapipeline
- amazonka-directconnect
- amazonka-dynamodb
- amazonka-ec2
- amazonka-elasticache
- amazonka-elasticbeanstalk
- amazonka-elastictranscoder
- amazonka-elb
- amazonka-emr
- amazonka-iam
- amazonka-importexport
- amazonka-kinesis
- amazonka-kms
- amazonka-lambda
- amazonka-opsworks
- amazonka-rds
- amazonka-redshift
- amazonka-route53-domains
- amazonka-route53
- amazonka-s3
- amazonka-sdb
- amazonka-ses
- amazonka-sns
- amazonka-sqs
- amazonka-storagegateway
- amazonka-sts
- amazonka-support
- amazonka-swf
Some preliminary Hackage documentation is available here.
In the following topics I’ll briefly highlight some of the features and potentially contentious design decisions, and the reasoning behind them.
Note: This is a preview release designed to gather feedback, and I’ve not used all of the services (for example Kinesis, or SNS) personally, which will no doubt result in issues regarding the de/serialisation of requests, responses, errors, and possibly tears.
I’m relying on the brave to offer up constructive feedback via GitHub Issues since the scope is too much for me to test in practice, alone.
Liptstick on a Pig
Since the definitions appear to be generated from Java-style services, the corresponding AST and type information follows similar Object Oriented naming conventions and class level nesting.
This isn’t particuarly nice to work with in a langauge like Haskell, as it results in alot of extraneous types. Libraries in various other languages provide the proverbial lipstick on a pig and alter the types in such a way to make them more consistent with the host language’s semantics.
Despite these points, I feel the advantages of providing types which strictly implement the naming and structure of the AWS types makes it easier to follow along with the Amazon API reference, and the use of lenses in this case mitigates some of the annoyances relating to access and traversal.
The intent is to provide a more low-level interface which corresponds 1:1
with the actual API, and let people supply their own lipstick.
Lenses and Roles
Amazon utilises a number of different de/serialisation mechanisms ranging from the venerable XML and JSON, to more esoteric querystring serialisation of datatypes, and I inevitably ran up against the prototypical newtype
explosion when avoiding orphan instances due to the heavy usage of type classes.
The solution for this was divorcing the internal structure from the representation observed and manipulated by the user. This approach allows extensive use of newtype wrappers internally, to define non-orhpaned instances for types such as NonEmpty
, Natural
, HashMap
, or Bool
, but exposes the underlying type to the user and the wrapper is never needed outside the core library.
Isos are paired with lenses to hide the (un)wrapping of newtypes from the user.
Roles are used to avoid the need to traverse structures such as NonEmpty
or HashMap
when converting between the internal and external representations.
Here is the List
and Map
newtype wrappers from amazonka-core
:
-- | List is used to define specialised JSON, XML, and Query instances for
-- serialisation and deserialisation.
--
-- The e :: Symbol over which list is parameterised
-- is used as the enclosing element name when serialising
-- XML or Query instances.
newtype List (e :: Symbol) a = List { list :: [a] }
deriving (Eq, Ord, Show, Semigroup, Monoid)
-- Requires the RoleAnnotations GHC extension.
type role List phantom representational
_List :: (Coercible a b, Coercible b a) => Iso' (List e a) [b]
_List = iso (coerce . list) (List . coerce)
-- | Map is used similarly to define specialised de/serialisation instances
-- and to allow coercion of the values of the HashMap, but not the Key.
newtype Map k v = Map
{ fromMap :: HashMap k v
} deriving (Eq, Show, Monoid, Semigroup)
type role Map nominal representational
_Map :: (Coercible a b, Coercible b a) => Iso' (Map k a) (HashMap k b)
_Map = iso (coerce . fromMap) (Map . coerce)
And the usage from Network.AWS.DynamoDB.Scan
in amazonka-dynamodb
:
data ScanResponse = ScanResponse
{ _srItems :: List "Items" (Map Text AttributeValue)
...
} deriving (Eq, Show)
srItems :: Lens' ScanResponse [HashMap Text AttributeValue]
srItems = lens _srItems (\s a -> s { _srItems = a }) . _List
This hopefully illustrates the usefullness of the approach to convert between the two representations. The srItems
lens above can be used to manipulate the field with the more friendly [HashMap Text AttributeValue]
representation, and you can retain all of the benefits of wrapping newtypes at arbitrary depths internally.
The following links provide detailed explanations of Roles and their implementation:
- POPL 2011 Generative Type Abstraction and Type-level Computation [PDF]
- ICFP 2014 Safe Coercions [PDF]
- GHC specific implementation notes.
Smart Constructors
Providing the minimum number of parameters to satisfy construction of a valid request is desirable for succinctness, as opposed to comprehensively specifying every field of the underlying record.
This simply involves defaulting any Maybe a
or Monoid
field types to their respective Nothing
or mempty
, and supplying a smart constructor which delineates only the required parameters.
For example the operation CreateAutoScalingGroup contains 15 fields, most of which are optional, and can be constructed with the fewest parameters required to create a valid Auto Scaling Group, or modified using lenses to specify any additional values for the optional fields before sending.
minimal :: CreateAutoScalingGroup
minimal = createAutoScalingGroup "asg-name" 1 5 zones
Is equivalent to:
comprehensive :: CreateAutoScalingGroup
comprehensive = minimal
& casgLaunchConfigurationName .~ Nothing
& casgInstanceId .~ Nothing
& casgDesiredCapacity .~ Nothing
& casgDefaultCooldown .~ Nothing
& casgLoadBalancerNames .~ mempty
& casgHealthCheckType .~ Nothing
& casgHealthCheckGracePeriod .~ Nothing
& casgPlacementGroup .~ Nothing
& casgVPCZoneIdentifier .~ Nothing
& casgTerminationPolicies .~ mempty
& casgTags .~ mempty
Type Families
Type families are used to associate service errors, signing algorithms, and responses with requests.
For example, issuing a DescribeInstances request:
import Network.AWS
import Network.AWS.EC2
main :: IO ()
main = do
env <- getEnv NorthVirginia Discover
rs <- send env describeInstances
print rs
Where :type rs
is:
Either (Er (Sv Describeinstances)) (Rs Describeinstances)
Or more concretely:
Either EC2Error DescribeInstancesResponse
This works well in practice provided the user is familiar with type families, due to the slightly more arcane type signatures and error messages.
Documentation for Free
The service definitions contain reasonably comprehensive documentation which allows us to include the actual AWS reference alongside a majority of the fields and operations.
Take for example this response lens from GenerateDataKey:
-- | Ciphertext that contains the wrapped key. You must store the blob
-- and encryption context so that the ciphertext can be decrypted.
-- You must provide both the ciphertext blob and the encryption context.
gdkrCiphertextBlob :: Lens' GenerateDataKeyResponse (Maybe Base64)
Currently links and other markup are stripped, but in future I hope to convert it directly to Haddock and retain all of the supplied documentation in a fashion similar to the official SDKs.
One Library per Service
To illustrate the large nature of the codebase, everybody’s favourite productivity measurer cloc
shows:
Language files blank comment code
Haskell 1258 34462 78158 145314
Since you generally do not depend on every service simultaneously, forcing users to compile 140,000+ lines of code they are probably not interested in is pointless.
Despite the maintenance overheads, cabal versioning, and potential discovery problems, encapsulating the code along service boundaries results in a much better user experience.
Conclusion
While generating code may not yield the same user friendliness as hand written code in every case, it seems to scale very well for this particular class of problem.
During the recent 2014 AWS Invent over 8 new services were announced, with Key Management Service, Lambda, Config, and CodeDeploy being available, effective immediately. I was able to support these services not long after announcement by running amazonka-gen
:
make clean
make
Which was a nice validation of the approach.
Overall I’m happy with the current status and direction, despite there still being a large amount of work ahead to place Haskell on an equal footing with other langauges in regards to building Cloud services and infrastructure.
Some items that I’ve identified for the immediate roadmap are:
- Some responses lack
required
field information, resulting inMaybe a
for always-present fields. Overrides need to be manually annotated. - Comprehensive testing and usage of all services.
- Improved documentation parsing (retaining links as Haddock markup).
- Additional hand written documentation about usage.
- Implement waiters and retries according to the service specifications.
- Examples.
- Performance benchmarks and evaluation.
- Utilise type-information for string patterns and maximum list lengths.
- Remove the dependency on conduit (it should be trivial to only depend on http-client).
You can follow the reddit discussion here.
Example: Here is a less trivial example which creates a KeyPair, SecurityGroup, authorises port 22 ingress, and launches an Instance:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Applicative
import Control.Lens
import Control.Monad
import Control.Monad.IO.Class
import Control.Monad.Trans.AWS
import Data.Monoid
import Data.Text (Text)
import qualified Data.Text as Text
import qualified Data.Text.IO as Text
import Data.Time.Clock.POSIX
import Network.AWS.EC2
main :: IO ()
main = do
ts <- Text.pack . show <$> getTimestamp
env <- getEnv NorthVirginia Discover
r <- runAWST env $ do
say "Create KeyPair " ts
k <- send (createKeyPair ts)
let key = Text.unpack ts ++ ".pem"
trusty = "ami-5895242f"
say "Writing KeyPair material to " key
liftIO (Text.writeFile key (k ^. ckprKeyMaterial))
say "Create SecurityGroup " ts
g <- view csgrGroupId <$>
send (createSecurityGroup ts "amazonka-examples")
say "Authorizing SSH on SecurityGroup " g
void . send $ authorizeSecurityGroupIngress
& asgiGroupId ?~ g
& asgiIpProtocol ?~ "tcp"
& asgiFromPort ?~ 22
& asgiToPort ?~ 22
& asgiCidrIp ?~ "0.0.0.0/22"
say "Launching Instance with ImageId " trusty
i <- sendCatch $ runInstances trusty 1 1
& riKeyName ?~ ts
& riInstanceType ?~ T2Micro
& riSecurityGroupIds .~ [g]
either (\e -> do
say "Failed to Launch Instance " e
say "Deleting SecurityGroup " g
void . send $ deleteSecurityGroup & dsgGroupId ?~ g
say "Deleting KeyPair " ts
void . send $ deleteKeyPair ts
throwAWSError e)
return
i
print r
getTimestamp :: IO Integer
getTimestamp = truncate <$> getPOSIXTime
say :: Show a => Text -> a -> AWS ()
say msg = liftIO . Text.putStrLn . mappend msg . Text.pack . show
It’s worth mentioning that async
and wait
from the lifted-async
library can be used to run the KeyPair and SecurityGroup related code above, concurrently.