Not entirely sure how people use fluxcd, but I guess
most people have something like a flux-system
flux kustomization as the
root to add more flux kustomizations to their kubernetes cluster.
Here all of that is living in a monorepo, and as we're all humans people figure
out different ways to break it, which brings the reconciliation of the flux controllers down.
Thus we set out to do some pre-flight validations.
Note1: We do not use flux variable substitutions for those root kustomizations, so
if you use those, you've to put additional work into the validation and pipe things through
flux envsubst
.
First Iteration: Just Run kustomize Like Flux Would Do It
With a folder structure where we've a cluster
folder with
subfolders per cluster, we just run a for loop over all of them:
for CLUSTER in ${CLUSTERS}; do
pushd clusters/${CLUSTER}
# validate if we can create and build a flux-system like kustomization file
kustomize create --autodetect --recursive
if ! kustomize build . -o /dev/null 2> error.log; then
echo "Error building flux-system kustomization for cluster ${CLUSTER}"
cat error.log
fi
popd
done
Second Iteration: Make Sure Our Workload Subfolder Have a kustomization.yaml
Next someone figured out that you can delete some yaml files from a workload subfolder,
including the kustomization.yaml
, but not all of them. That left around a resource
definition which lacks some other referenced objects, but is still happily included into
the root kustomization by kustomize create
and flux, which of course did not work.
Thus we started to catch that as well in our growing for loop:
for CLUSTER in ${CLUSTERS}; do
pushd clusters/${CLUSTER}
# validate if we can create and build a flux-system like kustomization file
kustomize create --autodetect --recursive
if ! kustomize build . -o /dev/null 2> error.log; then
echo "Error building flux-system kustomization for cluster ${CLUSTER}"
cat error.log
fi
# validate if we always have a kustomization file in folders with yaml files
for CLFOLDER in $(find . -type d); do
test -f ${CLFOLDER}/kustomization.yaml && continue
test -f ${CLFOLDER}/kustomization.yml && continue
if [[ $(find ${CLFOLDER} -maxdepth 1 \( -name '*.yaml' -o -name '*.yml' \) -type f|wc -l) != 0 ]]; then
echo "Error Cluster ${CLUSTER} folder ${CLFOLDER} lacks a kustomization.yaml"
fi
done
popd
done
Note2: I shortened those snippets to the core parts. In our case some things are a bit specific to how we implemented the execution of those checks in GitHub action workflows. Hope that's enough to transport the idea of what to check for.