Change our deployment model to use Terraform #5
Replies: 6 comments 9 replies
-
On the issue of developer velocity, CDK dev deployments are easy to cleanup even after the local configuration is lost or forgotten as the whole Stack can be deleted in Cloudformation at regular intervals. Terraform would require us to be more strict with our development practice to maintain and destroy environments. I would like to hear from the community whether a terraform provider which uses the cloudformation template under the hood would be a good middleground https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudformation_stack We can support this in gdeploy by outputting terraform compatible configuration, allowing users to use the useful features of gdeploy, while having a more familiar deployment experience using terraform. |
Beta Was this translation helpful? Give feedback.
-
This might be a tangential question, but could maybe speak a little to this bit:
Have you considered publishing the lambda function code to a registry? Could perhaps separate the function versions from the infrastructure versions. The infrastructure tool (CDK or terraform) could then use it's own dependency control mechanisms to pin the function versions (and automate updates via Dependabot or similar). As a registry for the function code, I'm thinking something like npmjs or pypi (language dependent), or even package them as docker images and use the container feature when deploying the lambda functions? Or perhaps use a SAM template and publish to the AWS Serverless Application Repository? Yet another option might be to create an AWS Marketplace offering, even if just listing it for free. That supports CloudFormation-based products, along with "update" paths, which are intended to deploy into the user's AWS account. |
Beta Was this translation helpful? Give feedback.
-
An additional goal to add here: we would like to support arbitrary deployment regions rather than requiring users to deploy to specific predefined regions. A limitation of our current approach is that our CloudFormation templates refer to Lambda binaries which must exist in the same region as the template itself. Because of this, we publish templates and binaries to I believe that publishing Lambda functions as Docker containers may help us avoid this problem as @lorengordon has suggested above. |
Beta Was this translation helpful? Give feedback.
-
Generally: TF vs. CFBefore I separately provide input on the differences between CDK and TF approaches for Granted Approvals, I thought it would be useful to transcribe some of the gathered thoughts we've captured over time in our company in CloudFormation versus Terraform as underpinning IaC options for our customers. I am very aware that CDK attempts to resolve some of the problems with CloudFormation; but in my opinion brings as many drawbacks as it brings advantages, not gaining any ground on a terraform approach. For me, Pulumi and Terragrunt (for example) are to Terraform as CDK is to CloudFormation, what you lose is not worth what you might gain. Caveat 1: There have been some improvements in CloudFormation in the last few years to introduce resolutions to some problems and they may not be accounted for in this summary, but on the whole I still believe these are sticking-plasters that don't address fundamental design flaws. Cleaning UpCloudFormation: Delete all my stacks Winner: CloudFormation Deployment ManagementCloudFormation: For some resources, such as Scaling Groups, CloudFormation has native integrations for automatically following different zero-downtime deployment methods, with resources calling back to the stack to signal completion, albeit a little difficult to set up. Terraform: It depends. With modern terraform you can hook in to all kinds of processes internal to your terraform run and set up complex dependencies, and involve lambdas to test completion before moving forward - but you have to design them yourselves and there's no "listener" available. Winner: CloudFormation Drift ControlCloudFormation: Can detect some drift, as an optional process external to the deployment lifecycle. It cannot do anything about drift. If your resources are too drifty, CloudFormation will error. Raise a support ticket, or if you want it fixed this week, orphan the resource, delete it, and try again. Terraform: Designed from the outset to detect and correct drift for any attribute of any resource it manages as part of the deployment lifecycle. Winner: Terraform Error HandlingCloudFormation: Had any resource fail in any way during a Stack deployment or in any nested stack in a stack of stacks? That's a paddlin'. Throw everything away and start again no matter how much state or data has accrued or how long the rest of the resources took to deploy. Workaround to the problem being break everything into smaller stacks, so long as they don't nest. Roll your own pipeline tooling to connect your stack deployments. Terraform: One resource had an issue, so it was skipped and any dependencies were skipped, but everything else succeeded. Fix the problem, and carry on from where you left off with no impact to any resources that you took the time to deploy or already contain state or data. If you need to recreate something that did get created due to having encountered an error, you can choose precisely what resources you want to recreate and what can be left alone - or you can start from scratch if it suits you to do so. Winner: Terraform Block repetition, code re-use and standardisationCloudFormation: Nest stacks into a stack of stacks defining each repeatable block as its own independent stack. Call each iteration of the block as its own static definition. Use mostly hard-coded mappings or blocks of parameters (so long as you dont run out of parameters) or custom lambda-backed resources to get the inputs for your iterations. Or just repeat the code manually. Terraform: Use a module. Iterate the module as many times as you need from 0 to infinity, iterating over any collection data type. Use hundreds of ways to collate your input data from wherever you see fit. Use any data transformation you need to translate variables with complex data types, other resources, data sources from any supported provider and other funky magic to put together what you need and iterate appropriately. Winner: Terraform Multi-region / Multi-Account DeploymentCloudFormation: Multi-region? What's that? Hey CodePipeline, do you know? Terraform: Multi-region, multi-account, multi-cloud, multi-service deployment is just a native capability. Deploy resources in any region, account, cloud or supported service you like by passing the correct provider configuration. Winner: Terraform Looking up DataCloudFormation: Invoke a Lambda. Lambda glues everything together. Write CloudFormation stack to deploy a lambda function that you write yourself, explicitly expecting a question and response only from CloudFormation, that cloudformation can then execute in order to get a specific piece of data about a specific external resource for one use case. Or if you're up for it, spend some months writing a company-standard lookup-interface function that allows you to lookup multiple things from multiple sources without having to write additional functions. Keep a team on standby to look after your special function. Terraform: Just look up the data. Use a data source for any supported provider. Or use a shell script. Or a command. Or pull it from a file. Or pull it from an API. Use a variable as an input to a data structure that you pass to a data source to generate a more complicated data set that you then iterate on where each iteration looks up something else. Look up data from remote terraform state whether the state is a live deployment or not. Look up any attribute of any resource you have deployed without declaring any explicit outputs for it. It's your data, have it your way. Winner: Terraform Resource lifecycle managementCloudFormation: First you create new things. Then you destroy old things. It puts the lotion on its skin or else it gets the hose again. CloudFormation: Drift is not my problem, if a resource I own changes, it's not my problem. CloudFormation: I protect you. I can stop a whole stack being deleted, or I can make sure not to delete a resource when I delete the stack. Winner: Terraform Data TransformationCloudFormation: Basic string manipulation functions, otherwise prepare and deploy transforms, lambda functions and other miscellany to try to turn parameters into anything more nuanced than basic data types. Terraform: ~100 intrinsic data manipulation functions from simple maths and string munging to calculating CIDR blocks and converting entire data structures between languages. Winner: Terraform Multi-User Simultaneous Change PlanningCloudFormation: A stack is a living breathing object that owns the resources it created. If you want to plan any changes, form an orderly queue or deploy multiple stacks for everyone to play with. Terraform: So long as you don't want to make any changes, everyone pile on. You can practice all you like so long as you're sharing my state file (see tfscaffold). If you want to simultanously make changes, even though it's not always the best idea, you can if you want to. Just configure some locking and you can put your changes into the queue. Winner: Terraform Access to Stored Resource StateCloudFormation: Ask my API nicely and I will tell you what I think you need to know, there's a nice web interface! Terraform: I am an open book. You can read my state file if you want. It's all JSON so you can parse it with jq. Or you can just ask me to list the resources, or show you all the resource configurations. You can manipulate the state if you want in case you have any out-of-band changes. I can add things for you or you can add them yourself, or remove them, or change them. Winner: Terraform Planning Changes: ProcessCloudFormation: Upload a new template to an S3 bucket (you have an S3 bucket for that, right?), then I will look at it. I'll create an unsecured bucket for you if you don't have one. No bucket, no dice. Terraform: Point me at a state file, existing or not, local or not and just type plan. You don't need to copy anything anywhere or create any artifacts, just save your file and I can plan with it. And if you just want to plan changes for the resource you're working on, tell me to target it, or a whole module for that matter. Winner: Terraform Planning Changes: ConfidenceCloudFormation: If you update stack, I will tell you what might change in that stack. But only that stack. I'm not a mind-reader! If you have any nested stacks that the first stack feeds outputs into, your guess is as good as mine what will happen to them. I guess you could fake the outputs I give you into a hardcoded temporary template and use that to plan the child stack? Terraform: If I made it, I will check it. I will tell you what changed outside of my influence, and I will tell you what I'm going to change as a result. Winner: Terraform Language & SyntaxEveryone has a preference here. AWS certainly recognise that JSON is not exactly friendly, and so created a YAML alternative. But I defy anyone to say that HCL is not more friendly to read, write and maintain. It is strict and limited in such a way that it can never get as complex as your average javascript application and although it pushes the boundaries it is still a declarative language, but at the same time flexible and easy to use. Ignoring the issues regarding supported variable data types and intrinsic functions (and why Ref and GetAtt are not the same function), compare these equivalent code snippets: CloudFormation content:
!Join
- "\n"
-
- "ATL_PRODUCT_FAMILY=confluence"
- !Sub ["ATL_PRODUCT_VERSION=${ConfluenceVersion}", ConfluenceVersion: !Ref ConfluenceVersion] Terraform content = <<EOF
ATL_PRODUCT_FAMILY=confluence
ATL_PRODUCT_VERSION=${var.confluence_version}
EOF CloudFormation ImageId:
!FindInMap
- AWSRegionArch2AMI
- !Ref AWS::Region
- !FindInMap
- AWSInstanceType2Arch
- !Ref NodeInstanceType
- Arch Terraform image_id = var.images[var.region][var.node_type] CloudFormation Name: !Join ['.', [!Ref 'AWS::StackName', 'db', !Ref 'HostedZone']] Terraform name = "${var.name}.db.${aws_route53_zone.main.domain_name}" Winner: Terraform Parameter / Variable Data TypesCloudFormation: String, Number, List, CommaDelimitedList, AWS Resource IDs, SSM Parameter Names Terraform: Any JSON object in HCL format Winner: Terraform InteroperabilityWhile I don't hold interoperability as a major concern, and not anything I would use to make a decision, it's worth bearing in mind: CloudFormation: To run terraform, simply write a stack that starts a complicated custom pipeline that runs a container to download your source, and terraform and then execute your apply. Terraform: resource "aws_cloudformation_stack" "main" {
name = local.csi
template_body = yamlencode({
Resources = {
SlackChannelConfiguration = {
Type = "AWS::Chatbot::SlackChannelConfiguration"
Properties = {
ConfigurationName = local.csi
IamRoleArn = aws_iam_role.main.arn
SlackChannelId = var.slack_channel_id
SlackWorkspaceId = var.slack_workspace_id
LoggingLevel = var.log_level
SnsTopicArns = [ aws_sns_topic.main.arn ]
}
}
}
})
} Winner: Terraform Resource Naming ScopeCloudFormation: All of my resources share the same scope. If you want to make a Candy IAM role, and a Candy IAM Policy, and a Candy IAM Role Policy Attachment, I can't tell the difference between them so make sure to name them CandyRole, CandyPolicy and CandyRolePolicyAttachment. Terraform: aws_iam_role.candy, aws_iam_policy.candy, aws_iam_role_policy_attachment.candy. Winner: Terraform ConditionsCloudFormation: Use Parameters to set up simple Conditions with a truthiness you can reference later in strict circumstances only. Terraform: Native boolean truthiness with no interim conversions. Set up whatever conditions you need, chained as deeply as you need, with as much interim logic and external data values as you need. Use conditional output almost wherever you like. In the simple case.. CloudFormation: Parameters:
EnableGeoBlocking:
Type: String
Conditions:
EnableGeoBlocking: !Equals [ !Ref EnableGeoBlocking, "true" ]
Resources:
CloudFrontDistribution:
Type: "AWS::CloudFront::Distribution"
Properties:
DistributionConfig:
Enabled: true
Restrictions:
GeoRestriction:
!If
- EnableGeoBlocking
-
RestrictionType: whitelist
Locations:
- BE
- LU
- NL
- RestrictionType: none Terraform (one of several ways to achieve the result: making the attributes conditional; but you could make the whole restriction declaration optional): variable "enable_geo_blocking" {
type = bool
}
resource "cloudfront_distribution" "main" {
enabled = true
restrictions {
geo_restriction {
restriction_type = var.enable_geo_blocking ? "whitelist" : "none"
locations = var.enable_geo_blocking ? ["BE", "LU", "NL"] : null
}
}
} Winner: Terraform Size ConstraintsCloudFormation: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html -- Everything has a limit. Number of Parameters. Size of a Stack. Number of Stacks in an Account. Number of Mapping Attributes. Terraform: How much RAM have you got? Winner: Terraform |
Beta Was this translation helpful? Give feedback.
-
Terraform vs. CDK for Granted ApprovalsAll of the following is subject to IMHO and YMMV. Executive SummaryI think that Common Fate's best interests would be served by maintaining both Terraform and CloudFormation infrastructure code, but without CDK, and modifying the purpose of gdeploy to be additive to the configuration and deployment experience rather than a required component of it. My RequirementsAs an Enterprise systems integrator, my primary concerns for deploying infrastructure for an application such as Granted Approvals:
ConsiderationsgdeployI am personally a big fan of providing a custom binary that solves automation decision making solutions, and provides a quick-start interactive option for users performing a proof of concept, or with little time, skill or resource to properly implement an enterprise-grade approach. In fact one of my first "DevOps" experiences was creating a deployment pipeline interface for Java applications on VPS using Bash and xdialog to let developers deploy where they needed to, when they needed to and incorporate process like database cleanups, service restarts etc as logical prescribed menu choices rather than getting their hands dirty. Frankly I think we don't do enough of this type of thing these days, and gdeploy providing this makes me happy. However, the current use of gdeploy is not as a quick-start for the inexperienced, it is a required component of the deployment process. Documentation does not exist in many places as to how to manually replicate what gdeploy does, as it is expected that gdeploy is the only approach to use. When trying to create a terraform approach to the infrastructure deployment - necessary to integrate with our business' Enterprise Cloud Accelerator (Landing Zone) solution, this made it very difficult to identify the configuration options I needed to replicate without first having a complete gdeploy-curated deployment that could then be reverse-engineered. Several times I got what I needed by having @chrnorm forward to me copies of configuration from a deployment of his own. This for me speaks to pitfalls that I see AWS and Hashicorp both falling into all of the time. They are so focussed on the adoption of their solutions by new users who are just beginning their journey, that they sometimes forget about experienced users. You will find many places in AWS documentation where example solutions are described without any encryption, and little to no documentation on how to implement the same solution with all possible encryption capabilities enabled. Hashicorp have made changes to terraform in the past that reduce flexibility in how you may use it; which have caused problems in our deployment processes because they have tried to make it harder for new users to make mistakes, and in doing so breaking innovative solutions that made use of the flexibility. My approach to this type of situation would be to first think about the most complex, hardened, best-practice use case and provide solutions that are suitable to that use case - and then provide the Wizards and Examples and Guidance for enabling users with simpler use cases or limited experience. As I would say to AWS - first make sure the API is right and available to the public, and then add the Console UI instead of the choice they often make to work the other way round, prioritising the console users over the enterprise developers. Wandering back to the topic at hand.. I would use the existence of gdeploy as a way to request configuration items from the user interactively if they wish for that, which could then output either or both of CloudFormation parameters or terraform tfvars for use in a deployment. gdeploy is then perfectly capable of deploying Stacks or Running Terraform for the user, or allowing the user to perform the task themselves. This gives the most flexibility to any type of user. A very advanced user may not use it at all - using the documentation to create configuration, and their own approach to the deployment with whichever code they choose. An intermediate user may use gdeploy to create the configuration, but deploy and/or modify that configuration themselves. The novice user can just keep telling gdeploy what to do until the deployment is complete; probably choosing the CloudFormation option because it's more novice-aws-console-user friendly. Development and Deployment ExtensibilityA major benefit of terraform is how easy it is to modify infrastructure. When a new AWS feature is available, for example when EBS Encryption by Default became available, integrating them into terraform solutions can be as easy as adding a line or two of code. In this particular example, the addition of this file called resource "aws_ebs_encryption_by_default" "main" {
enabled = true
} Conversely should we want to make that feature optional per account, making the boolean a defaulted variable would take mere moments. In most cases with terraform solutions, I do not need to know very much about the existing infrastructure to extend it. I need to know how to define the resource I want, or the attributes I need to change for a resource that exists, I do not need to spend any time learning the manner in which the code is built. This is generally true of CloudFormation too, albeit CloudFormation is usually harder to extend, especially where multiple stacks are tied together, passing data between them and stacks are reaching their limits. With CDK, you don't have a declarative language expressing the infrastructure configuration, you have a bespoke application, written for your use-case; the job of which is to output an artefact that is your infrastructure configuration. Your custom application takes a language and concept that is simple and standard and abstracts it into a bespoke way of thinking, requiring significant software development experience and a whole extra language in the developer requirements. The output from that custom application is then an artefact of its own; not a human-manageable declarative script, but something intended to only be read and written by a higher-level construct. Exactly as you could not expect developers to work with minified JavaScript, you can't necessarily expect them to work with templates generated by CDK. You lock out of the process anyone who is not working to your development approach in your custom application. The reason why, in my opinion, CDK for CloudFormation, CDK for Terraform, Terragrunt, Pulumi and other solutions exist is because of perceived deficiencies and immaturities in the underlying language. Terraform has made great strides to make these solutions redundant through improvements in its core code. It used to be that in optionally deploying resources to multiple regions, we had to declare identical resources many times, each of them optionally toggled on for a given region. This would have been made cleaner were we to have adopted a templating structure that could perform iteration for us, but now that we can iterate module calls natively in terraform, it's a moot issue. There are still occasions where certain things could be made a little simpler by adding some templating but they are so few and far between as to not be worth sitting a custom templating engine as a dependency in our codebase and processes. The language is mature enough on its own now. In the case of CloudFormation, there are still such sufficient immaturities that AWS are yet to address that the value of CDK is more obvious, but at the same time the arguments above as to CDK making the solution harder to maintain and less standard make it a difficult justification. Additionally, a lot of the troubles that CloudFormation presents is not in the configuration of the resources but in the co-ordination and deployment of multiple stacks and conditional configurations and such, as you might bring together into one CDK App. However this is just one way of co-ordinating stacks. Given the presence of a custom deployment management tool such as gdeploy, any of that deployment integration logic you want to implement could be implemented with gdeploy, or left for a system integrator to implement in the CI/CD tooling they happen to be using. Build, Deploy and Run DependenciesBeing that granted-approvals is actually quite a simple application from an infrastructure perspective, there's not really much complexity to worry about. Deploying the whole thing as a single terraform module or a single human-derived CloudFormation stack is a simple proposition. The only challenge comes with building and deploying the lambda functions, and cloning the website assets and issuing a CloudFront invalidation as part of the deployment. There are many ways to address this requirement, and while CDK's approach is reasonably pleasant for the stand-alone developer, it's not the only efficient way to achieve the goal - so much so that part of the work is already devolved to a frontend-deployer Lambda function. Really it's a choice of deciding whether to build, test version and deploy the applications as independent applications with their own lifecycles, sourcing the artefacts from a repository, or to build the applications from code as part of one end-to-end deployment process. I don't see these options as mutually exclusive. Writing terraform code that is capable of doing either approach based on a conditional is not particularly difficult. Using gdeploy to effect a build stage is also feasible. The only limitation would be how the CloudFormation option would look without CDK or gdeploy, in which case it may be just a set of user instructions for performing a build, with CommonFate developers using gdeploy for one-shot workstation simplicity and SIs using terraform picking and choosing the approach that works best for them. The main priority for me is meeting the requirements as stated at the top. Ideally it is possible to achieve perfectly idempotent deployment, with no external dependencies at all, or with external dependencies imported at Build time. By this I mean that whether you build the code yourself, or you bring in pre-built release artefacts from a CommonFate deployment channel, you should be able to do these things prior to your CloudFormation or Terraform execution so that that execution can be performed with a predictable and guaranteed result in production. In building the terraform solution that exists today, I have compromised and used a Deploy-Time dependency resolution, but with a cache. So that the upstream code artefacts are cloned when the deployment occurs, but if the upstream copy is unavailable, a locally cached copy is used instead. This is a compromise as it only works for same-version deployments. It does not work for version upgrades. This means that I could deploy an upgrade in pre-production, and have everything work perfectly. Then I could go to do the same in production, and find that if something happened to the CommonFate release bucket, my production deployment would fail, which would not really be acceptable. I would like to improve this approach, but as it is I am already selecting the upstream artefacts in a particularly dodgy manner, scraping the artefact naming hashes from the CDK generated CloudFormation file. I did not feel this was a problem I could easily solve without some support from the core development team in abstracting the build and deploy processes a little from the CDK solution, or by building my own application packaging pipeline - which I could and may do, depending on the outcome of this RFD. Continuing to Provide a CloudFormation OptionI personally have no interest in a CloudFormation option for Granted. CloudFormation as described elsewhere in this RFD is such a limited solution, that we try to avoid including it in our development ecosystem where we can, and so we do not include processes for stack management, for static code analysis for CloudFormation and other things you do to look after its existence in your software stack. We use CloudFormation only where it is explicitly necessary. But that is our situation. There are similarly system integrators that are all-in on AWS-native solutions exclusively. There are people for whom the AWS Console is life and anything they cannot do in the console is too complex for consideration. There are people who's sole purpose in life is to integrate with AWS tooling such as AWS Service Catalog, AWS Marketplace and AWS Control Tower. My feelings about each of these solutions is irrelevant, people do and will use them. If CommonFate wants to continue to provide the most seamless and inclusive solutions possible for its users, I do not feel that completely dropping support for CloudFormation-managed Infrastructure is the way to go (although I probably would choose to do so against my own interests just out of principal). I understand completely that the concept of managing two different implementations of the same solution in multiple languages is not immediately the most desirable choice, I think for this situation it is the most practical one, and the correct one. It is also not an uncommon approach. Consider API developers that provide SDKs in multiple languages. AWS themselves provide Java, NodeJS and Python SDKs because they understand the value of the flexibility and not forcing choices that hinder the innovation of developers and integrators. Because the SDKs are an interface to the API, they have the ability to implement their own features without those features being an absolute requirement for porting to the other languages, e.g. different pagination solutions, but they all fundamentally implement the capabilities the APIs offer. For example, there is nothing to stop a terraform solution providing more mature deployment approaches than the CloudFormation option; just because you implement a new capability in the terraform solution, you don't have to ensure it is implemented in the CloudFormation solution. All that matters is that both solutions provide the same infrastructure that a given application version requires. So if you release 0.9.0 and 0.9.0 requires a new lambda function for processing events, that function must be implemented in both CF and TF infrastructure, but without much difficulty as each infra codebase has copypasta to deliver it as only a minor change. If you however release an optional feature for backing up your S3 assets bucket using replication, or an improved way to cache assets, you can choose if and when you want each infrastructure solution to support it. As I have said before, Granted Approvals infrastructure is not very complex. It does not even change often because it is the lambda functions that provide most of the application logic. I do not personally believe that the rate of change in infrastructure is sufficient to warrant concerns over the labour required for maintenance of both terraform and cloudformation solutions. CommonFate Terraform ProviderThis, for me, is the game changer. The absolute silver bullet of all things. My number one feature request, surpassing all others, is a terraform provider for managing Access Rules in Granted Approvals. For all of the reasons described above, I do not just want my application and its infrastructure to be defined as code and truly idempotent, I want my application configuration to act the exact same way. I do not want my deployment process to include "and then add these rules" or "and then run these gdeploy commands" or "and then restore this database backup". I want my configuration to be in code that deploys the app configuration immediately after or even during the application deployment. This is how our use of AWS SSO works today. Some of our AWS SSO configuration is done in Granted Approvals, and some of it is done in terraform. After Terraform has configured all of the appropriate permission sets and other configuration, it then deploys any and all static associations. So, for example, if we have a rule that all developers have read-only access to an account, but they can escalate their privilege to an Admin role for an admin activity, then their ReadOnly access would be configured by Terraform. Terraform associates the Developer group in SSO (as provided to SSO from AAD SCIM) to the appropriate permission set for the account as a permanent (or at least terraform-managed) association. It is only then when a developer wishes to escalate their privilege that they go to Granted Approvals to make that time-limited change. This means that all of our permission configuration is stored as code in terraform, except for the Granted Approvals access rules. They are the only aspect of the configuration that has to be replicated dynamically for each deployment. The second I have a Granted Approvals or Common Fate terraform provider available to me to represent those access rules in terraform, my entire estate goes back to true idempotency. I no longer need to allow administrators to create and modify access rules using the user interface, I declare the rules in terraform and any change to the rules goes through a code review process, an approval process and are applied via terraform deployment automation the same as everything else. This is the one area in which there is no real way to compete for CloudFormation. Other than the mechanism already employed with gdeploy, the only approach I can think of is to deploy configuration to an S3 bucket which a lambda function would parse and use to enact changes to the database; but even that approach feels loosely coupled and, well, shonky. It certainly doesn't come close to the capabilities that would be provided by a terraform provider. Yes, you could use lambda-backed custom resources to create Access Rules, but you would probably enter a world of pain and limitation to do that as each rule would need hard-coding in CloudFormation, and would take up one of a limited number of resources per stack, and I can't even begin to imagine how you would go about parsing a structure of rules (e.g. in JSON) and implementing resources off the back of it. I suppose you could just construct JSON with gdeploy and then send the whole JSON block into a Lambda function. It's possible, it's just not clean. Compared to creating a pretty data structure in HCL in terraform and then using only one or two resources that iterate over the data structure to create all of your rules, or maybe one resource per group of rules that share a construct it's just a different ball-game. Conclusion
|
Beta Was this translation helpful? Give feedback.
-
Adding some of my own thoughts and research from investigating improved approaches to our infra. @lorengordon and @Zordrak - thankyou both for sharing your experience and viewpoints here. @Zordrak I'm very appreciative of how comprehensive your comments are on Terraform vs CDK. To summarise some of the discussion so far:
Based on @lorengordon's suggestion I published a test Serverless Application Repository (SAM) application. I found that this approach had the following benefits over CDK:
I'll also note here that I would prefer to keep everything on the same release cycle - releasing a new version of the Lambda binaries should also corresponding with a new version of the infrastructure, even if there are no underlying infra changes other than the Lambdas. This is purely for simplicity - it avoids us having to maintain a version matrix where version X of the Lambdas require version Y of the CloudFormation and version Z of Terraform. What I am still unsure about is whether we can improve the names of the Lambda function assets by using SAM. SAM uses the MD5 hash of the Lambda zip file as the S3 object key. There is an open issue here on customising asset names over on the SAM CLI repo. We could try to customize this process but I'm not sure whether it would impact SAM's built-in code signing. Essentially we may need to do more steps ourselves, but this may be acceptable in order to get readable function names. CommonFate Terraform Provider(this is referencing @Zordrak's reply above) This is a fairly high priority on our roadmap and no questions here in my opinion that Terraform is the ideal way to go for infrastructure-as-code configuration of Access Rules. Using CloudFormation and CRDs for definiting Access Rules definitely feels like a suboptimal solution and I agree with all of the points you've raised @Zordrak. Regardless of whether you deploy the infra layer as a SAM stack, or some native Terraform infra, we will be encouraging users to use Terraform to manage their Access Rules as code. My proposed way forwardsI propose that we make the following changes to the Granted Approvals codebase:
Personally, my viewpoint is that we should compromise on Terraform deploying the SAM application as our 'official' Terraform infra approach to minimise the amount of infra work we are doing (there are many application-level features we want to build and we are a small core team at the moment). However, given that @Zordrak has created a top-notch native Terraform deployment I think it is worthwhile to try to maintain both deployments together using the component tagging approach I've mentioned above. So my proposal is that we bring the Terraform deployment up to v0.9.0 and then try to release both TF and SAM flavoured deployments for the forseeable future. One particular pain point that SAM will not help with is the Frontend Deployer CRD. Ideally, CloudFormation would have a nice way for us to specify "here is a set of frontend assets to host with CloudFront". I note that Terraform has a lot more flexibility here as we can copy the assets over as part of a deploy. A limitation in our current Frontend Deployer is that it deletes objects in S3 prior to updating them. If the Frontend Deployer fails during a deployment this will cause the dashboard to become unavailable. As part of shifting to SAM I propose that we adjust the Frontend Deployer to use version tags as a 'namespace' for deployments rather than deleting objects. So a frontend hosting bucket with multiple releases would look like this:
|
Beta Was this translation helpful? Give feedback.
-
Granted Approvals has been developed using a serverless architecture and runs on AWS. When beginning the project we opted to use AWS CDK in TypeScript to define the required infrastructure. This had the following benefits, namely:
mage deploy:dev
)We have, however, ran into some trade-offs with CDK. CDK is designed for deploying cloud applications into one's own AWS environment, and is not built for packaging CloudFormation templates to be deployed by users into their own AWS environments. To achieve this we have written build steps in Mage and TypeScript which synthesize the CloudFormation template and publish assets to S3.
An immediate downside of this approach is that our synthesized assets use names based on SHA256 hashes of their content. This makes it very difficult to interpret which Lambda function is which in our release builds:
Another downside of this approach is that CDK does not play nicely with CloudFormation conditional parameters. This requires us to frequently drop down into L1 Constructs (example), meaning that we lose some of the higher-level abstraction benefits of using CDK in the first place.
In order to manage a Granted Approvals deployment we built
gdeploy
, a deployment helper tool. When runninggdeploy init
, agranted-deployment.yml
file is created which contains various deployment parameters, like the release version being run, the region being deployed to, and the Access Provider configuration.While this works well for a user evaluating Granted Approvals, many users have adopted Terraform to manage their cloud infrastructure, and
gdeploy
is not suitable for them. Withgdeploy
we are introducing additional operational workflows for these users to implement, and a new tool which must be managed. Teams may already have CI/CD workflows implemented for Terraform, and now need to add new workflows to usegdeploy
. For some teams we have spoken with, this adds a high amount of friction to adopting Granted Approvals.@Zordrak has developed a Terraform-based deployment of Granted Approvals: https://github.com/bjsscloud/terraform-aws-granted-approvals. @Zordrak's Terraform deployment contains some improvements over our current approach to infra:
I'm raising this RFD to discuss whether we should adopt this as the core infrastructure-as-code layer for the Granted Approvals project.
These are the goals which I propose we should aim to meet with our cloud infrastructure:
I'd love to hear feedback on the following discussion points:
gdeploy
method preventing you from deploying Granted Approvals?granted-deployment.yml
to manage deployment parameters, or is this made invalid by Terraform?Beta Was this translation helpful? Give feedback.
All reactions