The Boredom of Authoring an API Client

22 November 2014

In my day job as a glorified System Administrator I have the opportunity to write infrastructure, services, and tooling in Haskell, where traditionally someone in my position might reach for the hammers labeled Perl, Python, or Ruby et al.

While the advantages are many and those can be left to another blog post - a recurring pain point where Haskell falls down is in what I would categorise as mundane and commercial library availability:

Mundane: offers little intellectual reward to the library author. For myself this is anything that includes vast swathes of (mostly) repititious serialisation code that cannot be nicely abstracted using something like GHC.Generics.
Commercial: Company X offers compelling service Y that you wish to utilise, of which there are officially supported client libraries in Java, .NET, Python, and Ruby.

Haskell offers plenty of mechanisms for limiting boilerplate and these generally work well in the face of uniformity (See: pagerduty), but faced with supporting an inconsistent API of sufficient scope, I hereby postulate both of the above categories will be satisfied and many shall wring their hands and despair.

Status Quo
A Comprehensive Haskell AWS Client
Liptstick on a Pig
Lenses and Roles
Smart Constructors
Type Families
Documentation for Free
One Library per Service
Conclusion

Status Quo

As a concrete example, In early 2013 we decided to exclusively use Amazon Web Services for our entire infrastructure. Coupled with the fact that all of our backend/infrastructure related code is written in Haskell, the lack of comprehensive and consistent AWS libraries proved to be a problem.

Looking at the AWS category on Hackage, the collectively supported services are:

Cloud Watch
Elastic Compute Cloud
Elastic Load Balancing
Elastic Transcoder
Identity and Access Management
Kinesis
Relational Database Service
Route53
Simple Database Service
Simple Email Service
Simple Notification Service
Simple Storage Service

In some of these implementations the supported feature set is incomplete and approximately 30 services from Amazon’s total offering are not available at all.

This results in a subpar experience relative to Python, Ruby, Java, or .NET, for which there are official SDKs.

A Comprehensive Haskell AWS Client

After coming to the realisation in late 2012 - early 2013, that there were no Haskell libraries supporting the services we wished to use, I went down the route of providing a stopgap solution so we could begin building our infrastructure without having to compromise our language choice. This yielded a code generation Frankenstein which crawled the AWS documentation HTML, available SOAP definitions, and XSDs to provide AutoScaling, EC2, IAM, S3, CloudWatch, Route53, and ELB bindings.

While this was immediately useful, the obvious inconsistencies arising from HTML brittleness along with public XSDs in particular being an apparently legacy artifact for most services, intertia set in and I was unable to continue utilising the above approach for expanding the library offerings.

Going back to the drawing board in mid 2013, I started working on implementing a more future proof and sustainable approach to providing a truly comprehensive AWS SDK I could use for all my projects, both personal and professional.

The key enabler for this next approach was the discovery of the Amazon Service models, which are typically vendored with each of the official SDKs and provide a reasonably well typed representation of each of the services, warts and all.

Aside: the format of the service definitions has changed a couple of times and I’ve been forced to rewrite pieces of the generation code more than once due to oversight.

The end result is called amazonka, consisting of 43 different libraries covering all currently available non-preview AWS services.

The core libraries are:

amazonka: contains a monad transformer, send/receive, and pagination logic.
amazonka-core: contains serialisation/request/response logic, and common data types.

With the supported services being:

Some preliminary Hackage documentation is available here.

In the following topics I’ll briefly highlight some of the features and potentially contentious design decisions, and the reasoning behind them.

Note: This is a preview release designed to gather feedback, and I’ve not used all of the services (for example Kinesis, or SNS) personally, which will no doubt result in issues regarding the de/serialisation of requests, responses, errors, and possibly tears.

I’m relying on the brave to offer up constructive feedback via GitHub Issues since the scope is too much for me to test in practice, alone.

Liptstick on a Pig

Since the definitions appear to be generated from Java-style services, the corresponding AST and type information follows similar Object Oriented naming conventions and class level nesting.

This isn’t particuarly nice to work with in a langauge like Haskell, as it results in alot of extraneous types. Libraries in various other languages provide the proverbial lipstick on a pig and alter the types in such a way to make them more consistent with the host language’s semantics.

Despite these points, I feel the advantages of providing types which strictly implement the naming and structure of the AWS types makes it easier to follow along with the Amazon API reference, and the use of lenses in this case mitigates some of the annoyances relating to access and traversal.

The intent is to provide a more low-level interface which corresponds 1:1 with the actual API, and let people supply their own lipstick.

Lenses and Roles

Amazon utilises a number of different de/serialisation mechanisms ranging from the venerable XML and JSON, to more esoteric querystring serialisation of datatypes, and I inevitably ran up against the prototypical newtype explosion when avoiding orphan instances due to the heavy usage of type classes.

The solution for this was divorcing the internal structure from the representation observed and manipulated by the user. This approach allows extensive use of newtype wrappers internally, to define non-orhpaned instances for types such as NonEmpty, Natural, HashMap, or Bool, but exposes the underlying type to the user and the wrapper is never needed outside the core library.

Isos are paired with lenses to hide the (un)wrapping of newtypes from the user.

Roles are used to avoid the need to traverse structures such as NonEmpty or HashMap when converting between the internal and external representations.

Here is the List and Map newtype wrappers from amazonka-core:

-- | List is used to define specialised JSON, XML, and Query instances for
-- serialisation and deserialisation.
--
-- The e :: Symbol over which list is parameterised
-- is used as the enclosing element name when serialising
-- XML or Query instances.
newtype List (e :: Symbol) a = List { list :: [a] }
    deriving (Eq, Ord, Show, Semigroup, Monoid)

-- Requires the RoleAnnotations GHC extension.
type role List phantom representational

_List :: (Coercible a b, Coercible b a) => Iso' (List e a) [b]
_List = iso (coerce . list) (List . coerce)

-- | Map is used similarly to define specialised de/serialisation instances
-- and to allow coercion of the values of the HashMap, but not the Key.
newtype Map k v = Map
    { fromMap :: HashMap k v
    } deriving (Eq, Show, Monoid, Semigroup)

type role Map nominal representational

_Map :: (Coercible a b, Coercible b a) => Iso' (Map k a) (HashMap k b)
_Map = iso (coerce . fromMap) (Map . coerce)

And the usage from Network.AWS.DynamoDB.Scan in amazonka-dynamodb:

data ScanResponse = ScanResponse
    { _srItems :: List "Items" (Map Text AttributeValue)
    ...
    } deriving (Eq, Show)

srItems :: Lens' ScanResponse [HashMap Text AttributeValue]
srItems = lens _srItems (\s a -> s { _srItems = a }) . _List

This hopefully illustrates the usefullness of the approach to convert between the two representations. The srItems lens above can be used to manipulate the field with the more friendly [HashMap Text AttributeValue] representation, and you can retain all of the benefits of wrapping newtypes at arbitrary depths internally.

The following links provide detailed explanations of Roles and their implementation:

POPL 2011 Generative Type Abstraction and Type-level Computation [PDF]
ICFP 2014 Safe Coercions [PDF]
GHC specific implementation notes.

Smart Constructors

Providing the minimum number of parameters to satisfy construction of a valid request is desirable for succinctness, as opposed to comprehensively specifying every field of the underlying record.

This simply involves defaulting any Maybe a or Monoid field types to their respective Nothing or mempty, and supplying a smart constructor which delineates only the required parameters.

For example the operation CreateAutoScalingGroup contains 15 fields, most of which are optional, and can be constructed with the fewest parameters required to create a valid Auto Scaling Group, or modified using lenses to specify any additional values for the optional fields before sending.

minimal :: CreateAutoScalingGroup
minimal = createAutoScalingGroup "asg-name" 1 5 zones

Is equivalent to:

comprehensive :: CreateAutoScalingGroup
comprehensive = minimal
    & casgLaunchConfigurationName .~ Nothing
    & casgInstanceId              .~ Nothing
    & casgDesiredCapacity         .~ Nothing
    & casgDefaultCooldown         .~ Nothing
    & casgLoadBalancerNames       .~ mempty
    & casgHealthCheckType         .~ Nothing
    & casgHealthCheckGracePeriod  .~ Nothing
    & casgPlacementGroup          .~ Nothing
    & casgVPCZoneIdentifier       .~ Nothing
    & casgTerminationPolicies     .~ mempty
    & casgTags                    .~ mempty

Type Families

Type families are used to associate service errors, signing algorithms, and responses with requests.

For example, issuing a DescribeInstances request:

import Network.AWS
import Network.AWS.EC2

main :: IO ()
main = do
    env <- getEnv NorthVirginia Discover
    rs  <- send env describeInstances
    print rs

Where :type rs is:

Either (Er (Sv Describeinstances)) (Rs Describeinstances)

Or more concretely:

Either EC2Error DescribeInstancesResponse

This works well in practice provided the user is familiar with type families, due to the slightly more arcane type signatures and error messages.

Documentation for Free

The service definitions contain reasonably comprehensive documentation which allows us to include the actual AWS reference alongside a majority of the fields and operations.

Take for example this response lens from GenerateDataKey:

-- | Ciphertext that contains the wrapped key. You must store the blob
-- and encryption context so that the ciphertext can be decrypted.
-- You must provide both the ciphertext blob and the encryption context.
gdkrCiphertextBlob :: Lens' GenerateDataKeyResponse (Maybe Base64)

Currently links and other markup are stripped, but in future I hope to convert it directly to Haddock and retain all of the supplied documentation in a fashion similar to the official SDKs.

One Library per Service

To illustrate the large nature of the codebase, everybody’s favourite productivity measurer cloc shows:

Language        files          blank        comment           code
Haskell          1258          34462          78158         145314

Since you generally do not depend on every service simultaneously, forcing users to compile 140,000+ lines of code they are probably not interested in is pointless.

Despite the maintenance overheads, cabal versioning, and potential discovery problems, encapsulating the code along service boundaries results in a much better user experience.

Conclusion

While generating code may not yield the same user friendliness as hand written code in every case, it seems to scale very well for this particular class of problem.

During the recent 2014 AWS Invent over 8 new services were announced, with Key Management Service, Lambda, Config, and CodeDeploy being available, effective immediately. I was able to support these services not long after announcement by running amazonka-gen:

make clean
make

Which was a nice validation of the approach.

Overall I’m happy with the current status and direction, despite there still being a large amount of work ahead to place Haskell on an equal footing with other langauges in regards to building Cloud services and infrastructure.

Some items that I’ve identified for the immediate roadmap are:

Some responses lack required field information, resulting in Maybe a for always-present fields. Overrides need to be manually annotated.
Comprehensive testing and usage of all services.
Improved documentation parsing (retaining links as Haddock markup).
Additional hand written documentation about usage.
Implement waiters and retries according to the service specifications.
Examples.
Performance benchmarks and evaluation.
Utilise type-information for string patterns and maximum list lengths.
Remove the dependency on conduit (it should be trivial to only depend on http-client).

You can follow the reddit discussion here.

Example: Here is a less trivial example which creates a KeyPair, SecurityGroup, authorises port 22 ingress, and launches an Instance:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Control.Applicative
import           Control.Lens
import           Control.Monad
import           Control.Monad.IO.Class
import           Control.Monad.Trans.AWS
import           Data.Monoid
import           Data.Text                (Text)
import qualified Data.Text                as Text
import qualified Data.Text.IO             as Text
import           Data.Time.Clock.POSIX
import           Network.AWS.EC2

main :: IO ()
main = do
    ts  <- Text.pack . show <$> getTimestamp
    env <- getEnv NorthVirginia Discover
    r   <- runAWST env $ do
        say "Create KeyPair " ts
        k <- send (createKeyPair ts)

        let key    = Text.unpack ts ++ ".pem"
            trusty = "ami-5895242f"

        say "Writing KeyPair material to " key
        liftIO (Text.writeFile key (k ^. ckprKeyMaterial))

        say "Create SecurityGroup " ts
        g <- view csgrGroupId <$>
            send (createSecurityGroup ts "amazonka-examples")

        say "Authorizing SSH on SecurityGroup " g
        void . send $ authorizeSecurityGroupIngress
            & asgiGroupId    ?~ g
            & asgiIpProtocol ?~ "tcp"
            & asgiFromPort   ?~ 22
            & asgiToPort     ?~ 22
            & asgiCidrIp     ?~ "0.0.0.0/22"

        say "Launching Instance with ImageId " trusty
        i <- sendCatch $ runInstances trusty 1 1
            & riKeyName          ?~ ts
            & riInstanceType     ?~ T2Micro
            & riSecurityGroupIds .~ [g]

        either (\e -> do
                   say "Failed to Launch Instance " e
                   say "Deleting SecurityGroup " g
                   void . send $ deleteSecurityGroup & dsgGroupId ?~ g
                   say "Deleting KeyPair " ts
                   void . send $ deleteKeyPair ts
                   throwAWSError e)
               return
               i

    print r

getTimestamp :: IO Integer
getTimestamp = truncate <$> getPOSIXTime

say :: Show a => Text -> a -> AWS ()
say msg = liftIO . Text.putStrLn . mappend msg . Text.pack . show

It’s worth mentioning that async and wait from the lifted-async library can be used to run the KeyPair and SecurityGroup related code above, concurrently.

Brendan Hay

Blog

The Boredom of Authoring an API Client

Contents

Status Quo

A Comprehensive Haskell AWS Client

Liptstick on a Pig

Lenses and Roles

Smart Constructors

Type Families

Documentation for Free

One Library per Service

Conclusion

Brendan Hay

Blog

The Boredom of Authoring an API Client

Contents

Status Quo

A Comprehensive Haskell AWS Client

Liptstick on a Pig

Lenses and Roles

Smart Constructors

Type Families

Documentation for Free

One Library per Service

Conclusion

Related Posts