cloud-sre

Deploying a New EKS Service: GitOps, Internal ALB, and Cloudflare DNS

A practical deployment and troubleshooting record for a Spring Boot service on EKS: ECR, GitLab CI, ArgoCD, Ingress, internal ALB, Route53, and Cloudflare DNS.

Jun 18, 2026
AWSEKSALBRoute53CloudflareGitOpstroubleshooting

This is both a deployment note and a DNS troubleshooting note.

The goal was to add a new Spring Boot service to an existing EKS dev environment and expose it through a new dev subdomain. The service itself was not meant to be directly public. It sits behind an existing internal ALB, and Kubernetes Ingress routes requests to the right Service based on the Host header.

The final health check looked like this:

https://app-api-dev.example.com/actuator/health/readiness

Response:

{"status":"UP"}

Here is the process in the order it was debugged.

Starting Point

The backend system already had several services:

ServiceAccess pattern
Main APIReached through a dev API domain
WorkersInternal to the cluster
New app serviceNeeded a separate dev API domain

The new service was an independent Spring Boot module on port 8082. The main API calls it inside the cluster:

http://app-service:8082

So the first step was not DNS. The first step was to make the service run inside EKS with a stable Kubernetes Service name.

Step 1: Add the ECR Repository

CI needs somewhere to push the new image, so the dev ECR repository list gained one more repository:

ecr_repos = [
  "dev-api",
  "dev-app-service",
  "dev-worker-core",
  "dev-worker-growth"
]

Then run a plan against the dev ECR stack:

AWS_PROFILE=<profile> terragrunt plan -no-color

The important result was:

Plan: 2 to add, 0 to change, 0 to destroy.

That proved the Terraform change was additive: one ECR repository and its lifecycle policy, with no changes to existing services.

After that, apply it:

AWS_PROFILE=<profile> terragrunt apply -auto-approve -no-color

Step 2: Add the GitOps Deployment Directory

The environment already used ArgoCD and Kustomize, so the new service followed the same layout:

dev/app-service/
  application.yaml
  external-secret.yaml
  k8s-app-service-dev.yaml
  kustomization.yaml

The Deployment had to match the service port:

containers:
  - name: app-service
    image: <ecr>/dev-app-service:dev-REPLACE_ME
    ports:
      - containerPort: 8082
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8082

The Kubernetes Service kept the same DNS name the main API expected:

apiVersion: v1
kind: Service
metadata:
  name: app-service
spec:
  ports:
    - port: 8082
      targetPort: 8082

Do not casually expose the Service as port 80 here. The application configuration calls:

http://app-service:8082

If the Service only exposes 80, the in-cluster call will fail even though the Pod is running.

Step 3: Reuse the Existing Backend Configuration

The first idea was to create a separate Secrets Manager entry for the new service:

dev/app-service/application-secrets.yaml

After checking the Java configuration, the better choice was to reuse the same runtime configuration used by the main API. The ExternalSecret still creates a separate Kubernetes Secret, but its remoteRef points to the existing backend configuration:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: app-service-secrets
spec:
  target:
    name: app-service-secrets
  data:
    - secretKey: application.yaml
      remoteRef:
        key: dev/api/application-secrets.yaml

The app service reads the same database variables:

spring:
  datasource:
    url: ${DB_URL}
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}

This avoids duplicating environment-specific configuration and reduces the chance that the API and the new service drift apart.

Step 4: Update GitLab CI

The new module existed in the code repository, but CI only built the main API and worker images. Three updates were needed:

  1. Add a app service ECR repository variable.
  2. Add Docker build and push steps.
  3. Update the GitOps image tag for dev/app-service/kustomization.yaml.

The CI change looked like this:

variables:
  DEV_ECR_APP_SERVICE_REPO: "dev-app-service"

build_push:
  script:
    - docker build -f app-service-app/Dockerfile \
        -t "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$APP_SERVICE_REPO:$TAG" .
    - docker push "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$APP_SERVICE_REPO:$TAG"

The GitOps tag update also had to include the new path:

update_newtag(
  f"{prefix}/app-service/kustomization.yaml",
  os.environ["APP_SERVICE_IMAGE"],
  tag
)

Without this, the ECR repository and Kubernetes manifests can be correct, but no new image will actually be deployed.

Step 5: Create the ArgoCD Application

The app-of-apps file got one more ArgoCD Application:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dev-app-service
spec:
  source:
    repoURL: <gitops-repo>
    targetRevision: HEAD
    path: dev/app-service
  destination:
    server: https://kubernetes.default.svc
    namespace: dev
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

After applying it, check ArgoCD and Kubernetes:

kubectl -n argocd get application dev-app-service
kubectl -n dev get deploy,svc,pod -l app=app-service -o wide

The expected state:

dev-app-service   Synced   Healthy

deployment/app-service   1/1
pod/app-service-...       Running
service/app-service       8082/TCP

Also check ExternalSecret:

kubectl -n dev get externalsecret app-service-secrets

Expected:

SecretSynced   True

At this point the service is working inside the cluster.

Step 6: Expose a New Host Through the ALB

The existing dev API already used an internal ALB. The new service was added to the same Ingress with a new host rule:

rules:
  - host: api-dev.example.com
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: api
              port:
                number: 80

  - host: app-api-dev.example.com
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: app-service
              port:
                number: 8082

After applying it:

kubectl -n dev describe ingress dev-apps

The Rules section should include:

app-api-dev.example.com
  /   app-service:8082 (<pod-ip>:8082)

Then check the ALB target group:

aws elbv2 describe-target-health \
  --target-group-arn <app-service-target-group-arn>

Expected:

State: healthy

This proves the Ingress rule, ALB listener rule, target group, and Pod readiness are all correct.

Step 7: The DNS Record Was Added in the Wrong Place

DNS was the easy part to misread.

At first, a record was added in Route53:

app-api-dev.example.com CNAME <internal-alb-dns-name>

But local access still failed:

Could not resolve host

Check the name:

dig +short app-api-dev.example.com

No result.

At that point, do not keep debugging ALB or Kubernetes. Check the authoritative name servers for the root domain:

dig +short NS example.com

The result showed that the domain was delegated to Cloudflare, not Route53.

That means a same-name hosted zone in Route53 does not necessarily control the real domain. Public DNS follows the authoritative NS delegation.

The correct fix was to add the record in Cloudflare:

Type: CNAME
Name: app-api-dev
Target: <internal-alb-dns-name>
Proxy status: DNS only

It must be DNS only. An internal ALB cannot be proxied through Cloudflare orange-cloud mode.

Step 8: An Internal ALB Still Requires Internal Network Access

After DNS was fixed:

dig +short app-api-dev.example.com

The result looked like:

<internal-alb-dns-name>.
10.x.x.x
10.x.x.x

Seeing 10.x.x.x is expected because the ALB is internal.

It also means callers must be on VPN, inside the VPC, or on a network that can route to those private addresses. A public machine cannot reach the internal ALB just because DNS resolves.

Step 9: Separate 200, 401, and Network Failures

Final health check:

curl -i https://app-api-dev.example.com/actuator/health/readiness

Response:

HTTP/2 200

Body:

{"status":"UP"}

Root path:

curl -i https://app-api-dev.example.com/

Response:

HTTP/2 401

That is not a service outage. It means the request reached the application and authentication is required.

SymptomMeaning
Could not resolve hostDNS is not resolving
Connection timed outNetwork path to the internal ALB is unavailable
502ALB cannot reach the backend target, or the app is failing
401Request reached the app, but authentication is required
/actuator/health/readiness returns UPBackend readiness is healthy

Main Lesson

Do not start with browser symptoms and guess. A better order is:

  1. Check Pod, Service, and ExternalSecret.
  2. Check whether Ingress generated the ALB rule.
  3. Check whether the target group is healthy.
  4. Check whether the certificate covers the new host.
  5. Check the authoritative DNS and the actual record.

When Route53 and Cloudflare both exist, always confirm the authoritative DNS first. A record visible in Route53 is irrelevant if the domain is delegated to Cloudflare.

Final Path

The working path became:

browser
  -> Cloudflare DNS (DNS only)
  -> internal ALB
  -> Kubernetes Ingress host rule
  -> app-service Service:8082
  -> app-service Pod:8082

Health check:

{"status":"UP"}

At that point the backend path is proven. If a business endpoint returns 401, the next investigation should be authentication, tokens, or authorization scope, not ALB or DNS.