cloud-sre
Deploying a New EKS Service: GitOps, Internal ALB, and Cloudflare DNS
A practical deployment and troubleshooting record for a Spring Boot service on EKS: ECR, GitLab CI, ArgoCD, Ingress, internal ALB, Route53, and Cloudflare DNS.
This is both a deployment note and a DNS troubleshooting note.
The goal was to add a new Spring Boot service to an existing EKS dev environment and expose it through a new dev subdomain. The service itself was not meant to be directly public. It sits behind an existing internal ALB, and Kubernetes Ingress routes requests to the right Service based on the Host header.
The final health check looked like this:
https://app-api-dev.example.com/actuator/health/readiness
Response:
{"status":"UP"}
Here is the process in the order it was debugged.
Starting Point
The backend system already had several services:
| Service | Access pattern |
|---|---|
| Main API | Reached through a dev API domain |
| Workers | Internal to the cluster |
| New app service | Needed a separate dev API domain |
The new service was an independent Spring Boot module on port 8082. The main API calls it inside the cluster:
http://app-service:8082
So the first step was not DNS. The first step was to make the service run inside EKS with a stable Kubernetes Service name.
Step 1: Add the ECR Repository
CI needs somewhere to push the new image, so the dev ECR repository list gained one more repository:
ecr_repos = [
"dev-api",
"dev-app-service",
"dev-worker-core",
"dev-worker-growth"
]
Then run a plan against the dev ECR stack:
AWS_PROFILE=<profile> terragrunt plan -no-color
The important result was:
Plan: 2 to add, 0 to change, 0 to destroy.
That proved the Terraform change was additive: one ECR repository and its lifecycle policy, with no changes to existing services.
After that, apply it:
AWS_PROFILE=<profile> terragrunt apply -auto-approve -no-color
Step 2: Add the GitOps Deployment Directory
The environment already used ArgoCD and Kustomize, so the new service followed the same layout:
dev/app-service/
application.yaml
external-secret.yaml
k8s-app-service-dev.yaml
kustomization.yaml
The Deployment had to match the service port:
containers:
- name: app-service
image: <ecr>/dev-app-service:dev-REPLACE_ME
ports:
- containerPort: 8082
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8082
The Kubernetes Service kept the same DNS name the main API expected:
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
ports:
- port: 8082
targetPort: 8082
Do not casually expose the Service as port 80 here. The application configuration calls:
http://app-service:8082
If the Service only exposes 80, the in-cluster call will fail even though the Pod is running.
Step 3: Reuse the Existing Backend Configuration
The first idea was to create a separate Secrets Manager entry for the new service:
dev/app-service/application-secrets.yaml
After checking the Java configuration, the better choice was to reuse the same runtime configuration used by the main API. The ExternalSecret still creates a separate Kubernetes Secret, but its remoteRef points to the existing backend configuration:
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: app-service-secrets
spec:
target:
name: app-service-secrets
data:
- secretKey: application.yaml
remoteRef:
key: dev/api/application-secrets.yaml
The app service reads the same database variables:
spring:
datasource:
url: ${DB_URL}
username: ${DB_USERNAME}
password: ${DB_PASSWORD}
This avoids duplicating environment-specific configuration and reduces the chance that the API and the new service drift apart.
Step 4: Update GitLab CI
The new module existed in the code repository, but CI only built the main API and worker images. Three updates were needed:
- Add a app service ECR repository variable.
- Add Docker build and push steps.
- Update the GitOps image tag for
dev/app-service/kustomization.yaml.
The CI change looked like this:
variables:
DEV_ECR_APP_SERVICE_REPO: "dev-app-service"
build_push:
script:
- docker build -f app-service-app/Dockerfile \
-t "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$APP_SERVICE_REPO:$TAG" .
- docker push "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$APP_SERVICE_REPO:$TAG"
The GitOps tag update also had to include the new path:
update_newtag(
f"{prefix}/app-service/kustomization.yaml",
os.environ["APP_SERVICE_IMAGE"],
tag
)
Without this, the ECR repository and Kubernetes manifests can be correct, but no new image will actually be deployed.
Step 5: Create the ArgoCD Application
The app-of-apps file got one more ArgoCD Application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dev-app-service
spec:
source:
repoURL: <gitops-repo>
targetRevision: HEAD
path: dev/app-service
destination:
server: https://kubernetes.default.svc
namespace: dev
syncPolicy:
automated:
prune: true
selfHeal: true
After applying it, check ArgoCD and Kubernetes:
kubectl -n argocd get application dev-app-service
kubectl -n dev get deploy,svc,pod -l app=app-service -o wide
The expected state:
dev-app-service Synced Healthy
deployment/app-service 1/1
pod/app-service-... Running
service/app-service 8082/TCP
Also check ExternalSecret:
kubectl -n dev get externalsecret app-service-secrets
Expected:
SecretSynced True
At this point the service is working inside the cluster.
Step 6: Expose a New Host Through the ALB
The existing dev API already used an internal ALB. The new service was added to the same Ingress with a new host rule:
rules:
- host: api-dev.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 80
- host: app-api-dev.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-service
port:
number: 8082
After applying it:
kubectl -n dev describe ingress dev-apps
The Rules section should include:
app-api-dev.example.com
/ app-service:8082 (<pod-ip>:8082)
Then check the ALB target group:
aws elbv2 describe-target-health \
--target-group-arn <app-service-target-group-arn>
Expected:
State: healthy
This proves the Ingress rule, ALB listener rule, target group, and Pod readiness are all correct.
Step 7: The DNS Record Was Added in the Wrong Place
DNS was the easy part to misread.
At first, a record was added in Route53:
app-api-dev.example.com CNAME <internal-alb-dns-name>
But local access still failed:
Could not resolve host
Check the name:
dig +short app-api-dev.example.com
No result.
At that point, do not keep debugging ALB or Kubernetes. Check the authoritative name servers for the root domain:
dig +short NS example.com
The result showed that the domain was delegated to Cloudflare, not Route53.
That means a same-name hosted zone in Route53 does not necessarily control the real domain. Public DNS follows the authoritative NS delegation.
The correct fix was to add the record in Cloudflare:
Type: CNAME
Name: app-api-dev
Target: <internal-alb-dns-name>
Proxy status: DNS only
It must be DNS only. An internal ALB cannot be proxied through Cloudflare orange-cloud mode.
Step 8: An Internal ALB Still Requires Internal Network Access
After DNS was fixed:
dig +short app-api-dev.example.com
The result looked like:
<internal-alb-dns-name>.
10.x.x.x
10.x.x.x
Seeing 10.x.x.x is expected because the ALB is internal.
It also means callers must be on VPN, inside the VPC, or on a network that can route to those private addresses. A public machine cannot reach the internal ALB just because DNS resolves.
Step 9: Separate 200, 401, and Network Failures
Final health check:
curl -i https://app-api-dev.example.com/actuator/health/readiness
Response:
HTTP/2 200
Body:
{"status":"UP"}
Root path:
curl -i https://app-api-dev.example.com/
Response:
HTTP/2 401
That is not a service outage. It means the request reached the application and authentication is required.
| Symptom | Meaning |
|---|---|
Could not resolve host | DNS is not resolving |
Connection timed out | Network path to the internal ALB is unavailable |
502 | ALB cannot reach the backend target, or the app is failing |
401 | Request reached the app, but authentication is required |
/actuator/health/readiness returns UP | Backend readiness is healthy |
Main Lesson
Do not start with browser symptoms and guess. A better order is:
- Check Pod, Service, and ExternalSecret.
- Check whether Ingress generated the ALB rule.
- Check whether the target group is healthy.
- Check whether the certificate covers the new host.
- Check the authoritative DNS and the actual record.
When Route53 and Cloudflare both exist, always confirm the authoritative DNS first. A record visible in Route53 is irrelevant if the domain is delegated to Cloudflare.
Final Path
The working path became:
browser
-> Cloudflare DNS (DNS only)
-> internal ALB
-> Kubernetes Ingress host rule
-> app-service Service:8082
-> app-service Pod:8082
Health check:
{"status":"UP"}
At that point the backend path is proven. If a business endpoint returns 401, the next investigation should be authentication, tokens, or authorization scope, not ALB or DNS.