Terraform Config Root Setups

62 points by carty7 5 months ago

iliaxj 5 months ago

We use a single codebase to deploy to multiple environments. The setup looks like this:

  - root
  -- /envs
  --- / dev.tfvars
  --- / prod.tfvars
  - main.tf

When it gets deployed by the CICD, the right tfvars file is passed in via the -var-file parameter. A standard `env` var is also passed in, and used as a basis for a naming convention. Backend is also set by the pipeline.

The rationale here is that our environments should be almost the same between them, and any variations should be accomplished by parameterization.

Modules are kept either in separate repos, if they need to be shared between many workspaces, or under the `modules` subfolder.

based2 5 months ago

alt

    - root
     -- /envs
     --- / .dev.env
     --- / .test.env
     --- / .prod.env
     --- / dev.tfvars
     --- / test.tfvars
     --- / prod.tfvars
     - 1_create_network.tf
     - 2_create_storage.tf
     - 3_create_service.tf

https://www.youtube.com/watch?v=WgPQ-nm_ers Compliance At Scale: Hardened Terraform Modules at Morgan Stanley

https://www.weekly.tf/

carty7 5 months ago

Thanks for sharing. I’ll add the video to my queue.

FrenchyJiby 5 months ago

Yeah I'm confused why none of the solutions presented deploy the same TF across the environments: Surely if you have dev vs prod, 90% of dev infra is also needed in prod?
Then sure sprinkle a few per-env-toggles `count: 1 if env == dev else 0` (or whatever the latest nicest way to do that is today), but it feels weird to have to set up an artificial TF module around our entire codebase to be able to share the bulk of the code?
- manfre 5 months ago
  
  Instead of env conditionals, I strongly recommended feature/functionality conditionals or variables. E.g. var.create_s3 is better than var env == 'prod'
  
  baby_souffle 5 months ago
  
  +1 for this. Do as much as you can in modules and write each of your modules to have a simple enable or disable input.
  
  bloopernova 5 months ago
  
  I just wanted to clarify: are you referring to this pattern?
  module "cool_name" { count = var.create_cool_name ? 1 : 0 }
  I've always wanted a simple enable/disable syntax for modules in tf.
  
  baby_souffle 5 months ago
  
  > I just wanted to clarify: are you referring to this pattern?
  Does TF support count on modules now?! The pattern I was referring to is akin to having an `enabled` var that you define in _every_ module (there are ways to make this easier to do [0]) and inside of each module you have a
  local { make_the_s3_bucket = var.enabled && var.make_s3_bucket make_the_rds_instance = var.enabled && var.enable_rds }
  Then you use the typical `count: foo == bar ? 1:0` trick where some part of the `foo/bar` conditional ropes in local.make_the_rds_instance
  [0]: https://github.com/cloudposse/terraform-null-label/tree/main
  
  bloopernova 5 months ago
  
  https://developer.hashicorp.com/terraform/language/modules/s...
  count - Creates multiple instances of a module from a single module block.
  for_each - Creates multiple instances of a module from a single module block.
  providers - Passes provider configurations to a child module. If not specified, the child module inherits all of the default (un-aliased) provider configurations from the calling module.
  depends_on - Creates explicit dependencies between the entire module and the listed targets.
  I hope this doesn't come across as snarky, I just wanted to share the doc link :)
  
  bloopernova 5 months ago
  
  Yeah I use the count attribute on modules all the time.
  Thank you for replying though, I appreciate it :)
- forty 5 months ago
  
  In practice for complex env it ends up creating a lot of if(env) and the code ends up being horrible. At least it was our experience and we ended up having common modules instantiated in different environment as needed, which was much nicer

OJFord 5 months ago

This is terrible. All options seem to assume state is stored locally (not, say, in S3). Many options are just the same as others but only one environment, or with some project-specific difference like 'there is a backend and a frontend' or 'a few services' which has nothing to do with how you structure the terraform side of it.

All of them either don't address or use multiple directories to handle multiple environments (and assume a static list). What?! No, use terraform workspaces, or a single module that you instantiate for each environment. Or terragrunt if you really want. TFA is just a blogspam mess from either AI or someone that's spent about 20 minutes with a Youtube video to learn terraform & push out their latest deep dive content guys.

bloopernova 5 months ago

Agreed on the quality of the article, it feels very shallow.
- carty7 5 months ago
  
  Sorry it feels shallow and that you feel the need to troll.
  If you read my first post related to this, I was giving myself a refresher to understand different dynamics that people think about.
  I did not watch one YouTube video or spend 20 minutes on this or create with GPT.
  The original source of inspiration came from me wanting to understand the examples our Eng team put together on how our config file correlates to what customers are actually using to find any gaps.
  https://docs.resourcely.io/concepts/other-features-and-setti...
  This is also a part 1 of the article and I clearly asked what was missing.
  
  isoprophlex 5 months ago
  
  I think it's perfectly valid to voice an opinion of shallowness. That doesn't automatically make it trolling.
  
  carty7 5 months ago
  
  My mistake. Maybe the wrong use of the word.
  And yes, it was a shallow article. It was not a deep dive and nor do I have the technical chops to write a deep dive.
  
  bloopernova 5 months ago
  
  If I was trying to troll you, I'd have mostly agreed with your article then made some outlandish accusation or completely incorrect statement. Then I'd defend that while continually moving the goalposts and getting people as invested, amused, and angry as possible.
  Simply stating an opinion, by itself, is not trolling no matter how much you may disagree with the comment. If you're open to advice, I suggest that you look at a user's comment history to determine if they're the type of person who trolls others.
carty7 5 months ago

The entirety of this research was about structure of directories.
Storing state in S3 or TFC or Spacelift or somewhere else is out of scope. S3 is where 90% of the world stores their state and writing those configuration lines is not in scope. You can find other resources on that.
I struggled to find an exhaustive list of how people manage their directory structures and hence the focus of this piece.
If you’d like to provide constructive feedback and avoid comments regarding scope creep, please share.
- OJFord 5 months ago
  
  Sorry, I assumed this wasn't author-shared; I wouldn't have phrased it like that if I'd realised, or on a 'Show HN'.
  It just struck me as a collection of ways you could split your configuration where none of them is something I would suggest. You could split your frontend/backend or services I guess sure, but then why are you assuming it's in the same repo, and why does that matter anyway, it has nothing to do with Terraform.
  Why have dev & prod as different projects, assuming they're supposed to look the same? Use workspaces or different state or terra grunt. Ditto regions; use provider aliases.
  If you say Well I'm not assuming they are the same, this is about structuring directories on the basis that they are already isolated Terraform configurations... Ok sure, but why does how you do that matter and what does it have to do with Terraform? It's the same as talking about structuring the directories for their application code isn't it?
TYMorningCoffee 5 months ago

Why do you think the article assumes statefile is local? I don't think that assumption was made.
- OJFord 5 months ago
  
  #2 splits the configuration into 'dev' & 'prod' duplicates, describes it as 'with separate state files', and from there they're all different ways of splitting the project for some reason.
  If the state isn't local, then if you want that split in order to apply them separately, there would be no reason to physically separate & duplicate them into different directories.

lijok 5 months ago

This is ever evolving but we currently do;

Composable modules such as `terraform-aws-lambda/modules/standard-function`, `terraform-aws-iam/moduls/role-for-aws-lambda`, etc, which get composed for a specific usecase in a root module (which we call stacks). The stack has directories under it such as `dev/main/primary/` `dev/sandbox-a/primary/` `dev/sandbox-a/test-a/`, etc, where `dev` is environment, `main/sandbox-a` is tenant and the `primary/test-a` is the namespace. The namespaces contain a `tfvars` file and potentially some namespace specific assets, readme's, documentation, etc. The CD system then deploys the root module for each namespace present.

Stacks are then optionally (sometimes deeply) nested under parent directories, which are used for change control purposes, variable inference and consistency testing.

OpenTofu >1.8.0 is required for all of this to keep it nice and tidy.

carty7 5 months ago

Thank you for sharing your example.

dayallnash 5 months ago

Terragrunt goes a long way to providing a near-perfect way to structure Terraform repos.

spicyusername 5 months ago

I didn't understand the benefit of using terragrunt. Modern terraform supports all of the features terragrunt was originally designed to work around way back when.
Outside of being able to use variables in very niche places that you can't in terraform (and can easily work around, and that last I heard is on the road map for open tofu), what does terragrunt do that using regular module imports in terraform don't?
This may be anecdotal but every terragrunt repository I've ever seen was a mess of spaghetti trying too hard to stay DRY.
- joshpadnick 5 months ago
  
  I'm from Gruntwork; we're the maintainers of Terragrunt.
  We get this question a lot because Terragrunt did in fact start as a feature shim for Terraform. But that was a long time ago and Terragrunt has evolved into a first-class "orchestration" tool for Terraform or OpenTofu.
  We wrote a blog post addressing exactly your concern that you might find helpful: https://blog.gruntwork.io/terragrunt-opentofu-better-togethe....
- slillibri 5 months ago
  
  1) I use terragrunt to generate all the 'boilerplate' files, for example the backend configuration to ensure they are all in spec.
  2) I use terragrunt to provide inherited values, such as region, environment, etc. I have a directory tree of `dev/us-west-2/` I can set a variable in my `dev/environment.hcl` that is inherited across everything under that environment. This is useful if you have more than one dev, prod, etc environment.
  3) I use terragrunt to allow shared, versioned root modules. I don't include any terraform in my terragrunt repo. The terragrunt repo is just configuration. In my terragrunt.hcl I can reference something like `source = "github.com/example.com/root.git//ipv6-vpc?ref=v0.1.3"` to pull in the version of the root module I want. Again this is useful if you have multiple dev/stage/prod environments.
  None of this is actually possible with plain terraform.
  
  spicyusername 5 months ago
  
  This is exactly what I'm talking about.
  Using variables isn't supported in backends - e.g. "niche places" (although it sounds like it is in OpenTofu)
  And #2 and #3 are exactly the use-case for just regular old tfvars files and module imports.
  Terragrunt encourages you to abstract things way, way too far, until you have an absolute mess of tangled imports and deeply nested directories.
  Nothing wrong with having a root module, some child modules, and a single tfvars file being fed in on the front end.
- cube2222 5 months ago
  
  Not just on the roadmap!
  With OpenTofu as of the latest release you can already use constant variables in places like module sources and versions, backend configurations, and even in order to for_each on providers!
  Disclaimer: involved in OpenTofu
- cyberpunk 5 months ago
  
  Pain and suffering lies behind any attempts to make tf DRY. Unless you have a really insane amount of envs, I just copy/pasta these days. Fuck it. Life is too short.

carty7 5 months ago

I spend a lot of time speaking with clients and have found myself partially understanding organizational structure so I dove in to collect my thoughts and put myself closer to the customer on what they are navigating.

This gave myself a refresher on how they are organizing their cloud infrastructure within their source control systems. I took a lense from the world of terraform since that’s mostly the world i live in today and the last few years.

I explored 10 different ways to structure your Terraform config roots, each promising scalability but delivering varying degrees of chaos. From single-environment simplicity to multi-cloud madness, customers are stuck navigating spaghetti directories and state file hell.

I probably missed things. Might have gotten things wrong. Take a look and let me know what you think.

What patterns are you using that I missed?

jjayj 5 months ago

We are multi-cloud, multi-region, multi-environment, multi-deployment with hundreds of AWS accounts.
This is split over hundreds of microservice repositories, each of which maintains its own Terraform.
We don't read state from other Terraform deployments, and use published reusable modules when convenient and a tfvars file for every deployment.
At this point I can't imagine doing Terraform any other way.
unop 5 months ago

Nice! We'll link to this for our internal consultancy work.
It'd be nice to show the other dimension of the git branching strategies to apply. Github flow/feature-branches vs per-env branches of main vs git flow. How and when to apply changes in different environments - before vs after PRs, etc.
- carty7 5 months ago
  
  This was out of scope for my research. Have you seen any good resources on this?

bloopernova 5 months ago

The client I'm contracted to is all-in on Terraform Cloud. (TFC)

TFC uses workspaces, which annoyingly aren't the same thing as terraform workspaces. I've divided up our workspaces into dev, qa, staging, and prod, and each group of workspaces has the OIDC setup to allow management of a specific cloud account. So dev workspaces can only access the dev account, etc etc. Each grouping of workspaces also has a specific role that can access them. Each role then has its own API key.

The issues I've run into are mostly management of workspace variables. So now I have a manager repo and matching workspace that controls all the vars for the couple hundred TFC workspaces. I use a TFC group API key for the terraform enterprise provider, one provider per group. This prevents potential mistakes where dev vars could get written to qa, etc etc.

  Manager repo
  - dev TFE provider
  - qa TFE provider
  - staging TFE provider
  - prod TFE provider

Workspace variables are set by a single directory of terraform, so there's good sharing of the data and locals blocks.

I use lists of workspaces categorized by "pipeline deployers" and "application resource deployers", along with lists of dev, qa, staging, and prod workspaces. I then use terraform's "setintersection" function to give me "dev pipeline" workspaces, "prod app" workspaces, etc. I also do the same with groups of variables, as there's some that are specific to pipeline workspaces, and so on. It works well, and it's nice to have an almost 100% terraform control of vars and workspaces.

I split app and pipeline workspaces based on historical decisions, I'm not sure if I'd replicate that on a new project. The workflow there is that an app workspace creates the resources for a given deployment, then saves pertinent details to a couple of parameters. The pipeline workspace then pulls those parameters and uses them to create a pipeline that builds and deploys the code.

Unfortunately I can't share code from this particular setup, but I do intend to write about it "someday".

carty7 5 months ago

Thank you. I’ll dive into this when I have a chance next week.

solatic 5 months ago

This is all besides the point that Terraform's biggest weakness is refactoring large workspaces into multiple smaller workspaces. Transitioning IDs from one workspace to another, at scale, is annoying to say the least. The only remotely feasible generic solution here would be to treat statefiles as tables, write migrations as SQL, and use pre-existing tooling for database migrations and rollbacks... Maybe I'll write something like that someday.

Uvix 5 months ago

The addition of `import` and `removed` blocks make this a lot easier to manage than it was a year or two ago. You can manage the migration in the Terraform code rather than having to run separate state management commands.

bilekas 5 months ago

The method I go to almost always is

> Multi-Environment Setup with Shared Modules

But the con of saying versioning is tricky across modules is damn near impossible to reliably manage.. especially because if I'm introducing a new variable to a shared module A) I need to also add this variable in the inputs of each of the environment.

I haven't found a way to manage multiple versions of the modules across environments if all using the same shared modules. Is it even possible?

moredhel 5 months ago

I’ve always adopted the protobuf idea of growing an api interface.
Define a default which is backwards compatible.
carty7 5 months ago

Thank you for sharing. I’ve heard this module conundrum a few times and creates a proliferation of forked modules.
- bilekas 5 months ago
  
  I mean when it becomes a bit larger and harder to manage so many modules we would split them into standalone repos which could be versiones And deployed. Then in the tfstate file get the env arns for example, but the complexity in the pipeline and environment becomes a bit more difficult to maintain strictly. It's tough

NomDePlum 5 months ago

Good overview of potential options. Which is most appropriate really depends.

I am a big fan of modularisation, it is possible to extend this approach to divide logically your infrastructure and mirror that by separating out the terraform state files too.

Number of TF deployment increases but they each have smaller blast radiuses and you now need to manage making available outputs of builds to those which are dependent on them.

TYMorningCoffee 5 months ago

What's the craziest terraform repo you have seen? Is it okay to mix code (ex: Java) and terraform in a large mono repo?

dijit 5 months ago

I’ve seen things you wouldn't believe.
Python deterministically generating terraform HCL files based on yaml.
Execution wrappers that encapsulate terraform in CI/CD to parse the json output and prevent database deletion, but apply everything else.
Scripts that pull every git repo and execute every terraform file they can find while walking the directory tree.
Terraform is about 80% of the way to a good tool, that last 20% is a ball-ache and solved totally differently every time; the best setups I’ve seen is where terraform just “hands off” to something else after making a minimum infrastructure.
But, otherwise, it can get incredibly messy.
- maccard 5 months ago
  
  I agree with your 80/20 take, but I’ve come to the conclusion that the only reason terraform is a good tool is because it leaves the messy stuff (the other 20%) to other tools and pretends it doesn’t exist.
  
  cruffle_duffle 5 months ago
  
  Yeah but it sure feels like what is in that 20% has to be common enough to have good best practices around it?
  Or perhaps that 20% is simply the remainder of organizational complexity that cannot be standardized in a single tool like Terraform? Every org and every product have unique enough attributes that it is just not possible.
  I dunno. I find terraform a uniquely fascinating product because it does so much yet leaves so much for you to do on your own.
  
  maccard 5 months ago
  
  > Or perhaps that 20% is simply the remainder of organizational complexity that cannot be standardized in a single tool like Terraform
  I think this is what it is in practice. My opinion is that if these organisations had slightly less opinions terraform could probably solve another 15% and just leave the 5% that is _definitely_ organizational complexity.
- TYMorningCoffee 5 months ago
  
  > Python deterministically generating terraform HCL files based on yaml.
  That sounds terrible. I'm sorry you had to deal with that.
  If they could go back in time, would modules have been good enough?
  
  dijit 5 months ago
  
  No, because control flow is very janky and completely tacked-on.
  I considered (in all three cases) making a forked version of terraform for use in CI and for aiding control flow.
  Pulumi/Terragrunt would probably have helped the specific issue you’re mentioning though.

cruffle_duffle 5 months ago

Terraform is such a weird product and it’s hard to describe why, really. It has really awesome things about it like a functional way to pull in support for just about any providers crazy thing. And the crazy part is somehow terraform can easily configure local infrastructure, cloud infrastructure on almost any host, set up repos in GitHub, and then create a new auth provider in Auth0. The language is flexible enough to support all of it.

Yet as a language, it’s quirky as heck. For example how modules are basically wrappers on providers and how different modules can all most “see inside” other modules to iron out dependency ordering but yet also can’t. And speaking of, circular dependencies suck to work around in a modular way without tearing half your structure apart.

Like I said, I am not anywhere close to an expert on terraform and can only describe my limited experience building a fairly simple stack on top of it. The whole thing is just… both amazing and also weird and a bit frustrating. And I have yet to “grow” into multiple environments… lots of my complaints are probably down to my limited experience with it and, honestly, not much out there in terms of best practices for maintaining scalable configuration (or maybe my ADD brain refuses to dive into that, who knows?)

My last adventure into infrastructure as code was with Puppet and Salt. All of that was provisioning on top of bare metal. It was all file operations and the “provider specific modules” were really just wrappers to nicely encapsulate things like nginx or apt. Perhaps it is because of Puppet or Salt’s much more limited scope that didn’t have me feeling the same way.

I mean terraform can be used to configure just about anything that has an API if you wanted. Maintaining a declarative language around that is bound to have its quirks.

michaelmcmillan 5 months ago

Please, please, please use Terraform workspaces: One workspace per environment.

For environment specific things use conditionals:

  nodes = terraform.workspace == "prod" ? 2 : 1