Rant - I've a theory about istio: It feels like a software designed by people who hate the IT industry and wanted revenge. So they wrote a software with so many odd points of traffic interception (e.g. SNI based traffic re-routing) that's completely impossible to debug. If you roll that out into an average company you completely halt the IT operations for something like a year.
On topic: I've two endpoints (IP addresses serving HTTPS on a none standard port)
outside of kubernetes, and I need some rudimentary balancing of traffic. Since istio
is already here one can levarage that, combining the resource kinds ServiceEntry
,
DestinationRule
and VirtualService
to publish a service name within the istio mesh.
Since we do not have host names and DNS for those endpoint IP addresses we need to
rely on istio itself to intercept the DNS traffic and deliver a virtual IP address to
access the service. The sample given here leverages the exportTo
configuration to
make the service name only available in the same namespace. If you need broader access
remove or adjust that. As usual in kubernetes you can resolve the name also as FQDN,
e.g. acme-service.mynamespace.svc.cluster.local
.
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: acme-service
spec:
hosts:
- acme-service
ports:
- number: 12345
name: acmeglue
protocol: HTTPS
resolution: STATIC
location: MESH_EXTERNAL
# limit the availability to the namespace this resource is applied to
# if you need cross namespace access remove all the `exportTo`s in here
exportTo:
- "."
# use `endpoints:` in this setup, `addreses:` did not work
endpoints:
# region1
- address: 192.168.0.1
ports:
acmeglue: 12345
# region2
- address: 10.60.48.50
ports:
acmeglue: 12345
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: acme-service
spec:
host: acme-service
# limit the availability to the namespace this resource is applied to
exportTo:
- "."
trafficPolicy:
loadBalancer:
simple: LEAST_REQUEST
connectionPool:
tcp:
tcpKeepalive:
# We have GCP service attachments involved with a 20m idle timeout
# https://cloud.google.com/vpc/docs/about-vpc-hosted-services#nat-subnets-other
time: 600s
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: acme-service
spec:
hosts:
- acme-service
# limit the availability to the namespace this resource is applied to
exportTo:
- "."
http:
- route:
- destination:
host: acme-service
retries:
attempts: 2
perTryTimeout: 2s
retryOn: connect-failure,5xx
---
# Demo Deployment, istio configuration is the important part
apiVersion: apps/v1
kind: Deployment
metadata:
name: foobar
labels:
app: foobar
spec:
replicas: 1
selector:
matchLabels:
app: foobar
template:
metadata:
labels:
app: foobar
# enable istio sidecar
sidecar.istio.io/inject: "true"
annotations:
# Enable DNS capture and interception, IP resolved will be in 240.240/16
# If you use network policies you've to allow egress to this range.
proxy.istio.io/config: |
proxyMetadata:
ISTIO_META_DNS_CAPTURE: "true"
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
Now we can exec into the deployed pod, do
something like curl -vk https://acme-service:12345
, and it will talk to one of the endpoints
defined in the ServiceEntry
via an IP address out of the 240.240/16
Class E network.
Documentation
https://istio.io/latest/docs/reference/config/networking/virtual-service/
https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution
https://istio.io/latest/docs/reference/config/networking/destination-rule/#LoadBalancerSettings-SimpleLB
https://istio.io/latest/docs/ops/configuration/traffic-management/dns-proxy/#sidecar-mode
Brief dump so I don't forget how that worked in August 2025. Requires npm, npx and nodejs.
- Install Chrome
- Add the BrowserMCP extension
- Install gemini-cli
npm install -g @google/gemini-cli
- Retrieve a Gemini API key via AI Studio
- Export API key for gemini-cli
export GEMINI_API_KEY=2342
- Start BrowserMCP extension, see manual, an info box will appear that it's active with a cancel button.
- Add mcp server to gemini-cli
gemini mcp add browsermcp npx @browsermcp/mcp@latest
- Start
gemini
-cli, let it use the mcp server and task it to open a website.
even I managed to migrate my last setup to sway a few weeks ago. I was the last one you've been waiting for dear X Strike Force, right?
Multi display support just works, no more modeline hackery. Oh and we can also remove those old clipboard manager.
One oddity with sway I could not yet solve is that I had to delete the default
wallpaper /usr/share/backgrounds/sway/Sway_Wallpaper_Blue_1920x1080.png
to allow it to load
the Debian wallpaper via
output * bg /usr/share/desktop-base/active-theme/wallpaper/contents/images/1920x1080.svg fill
Update: Thanks to Birger and Sebastian who easily could explain that. The sway-backgrounds package
ships a config snippet in /etc/sway/config.d
and if that's included e.g. via include /etc/sway/config.d/*
after setting the background in your ~/.config/sway/config
it does the obvious and overrides
your own background configuration again. Didn't expect that but makes sense. So the right fix
is to just remove the sway-backgrounds package.
I also had a bit of fist fight with sway to make sure I've as much screen space available as possible. So I tried to shrink fonts and remove borders.
default_border none
default_floating_border none
titlebar_padding 1
titlebar_border_thickness 0
font pango: monospace 9
Rest I guess is otherwise well documented. I settled on wofi as menu tool, cliphist for clipboard access, of course waybar to be able to use the nm-applet, swayidle and swaylock are probably also more or less standard for screen locking.
Having
for_window [app_id="firefox"] inhibit_idle fullscreen
is also sensible for video streaming, to avoid the idle locking.
I know that Canonical / Ubuntu people are sometimes not well received due to promotion of Canonical tooling (some might remember upstart and Mir, or more recently snap and netplan). Thus for some positive vibes consider that I could hand out the Ubuntu Desktop image on a USB flash drive to a family member, and the family member could just replace Windows 10 without any assistance. It just worked. This was made possible by the will to keep a slightly dated ThinkPad in use, which it's not supported by Windows 11.
I've to admit that I never looked at Ubuntu Desktop before, but the user experience is on par with everything else I know. Thanks to all the folks at Canonical who made that possible! Luckily the times when you had to fiddle with modelines for XFree86, and sleepless nights about configuring lpd to get printing up and running are long gone. I believe now that Microsoft is doing Microsoft things with rolling Windows updates which force users to replace completely fine working hardware is the time to encourage more people to move to open operating systems, and Ubuntu Desktop seems to be a very suitable choice.
Things to Improve
Albeit I think the out of the box experience is great, there are a few niche topics where things could improve.
Default Access to Apt / Ubuntu Universe
Well snaps are promoted as the primary application source, but having some graphical interface like synaptic available by default to just install from Ubuntu Universe would be helpful. In this case we wanted to install keepass2 to access the users keepass file kept from the Windows setup. Having to tell someone "open the terminal and type sudo apt install" is something that requires support.
Snaps and Isolation Overrides
I'm fine with snaps having least privileges, but it would be nice if one could
add overrides easily. Here the family member was playing with an Arduino Uno
and there is one sample in the workbook that utilizes a Java application called
Processing. It's available as a
snap, but that one doesn't have access to the required serial port device
file. I tried hard to make it work -
full details in the snapcraft forum - but failed, and opted to use the
--devmode
to install it without isolation enforcement. As far as I understood
snap that results in no more automatic updates for the application.
If someone from the Ubuntu crowd with more snap debug experience has
additional hints on how to narrow down which change is required, I would love
to improve that and create a PR for the processing developers. Either reply
in the forum or reach out via mail sven at stormbind dot net.
Latest oddities I ran into with Google Cloud products before I start to forget about them again.
e2 Compute Instances vs CloudNAT
Years ago I already had a surprising encounter with the Google Cloud e2 instances. Back then we observed CPU steal time from 20-60%, which made the instances unusable for anything remotely latency sensitive. Now someone started to run a workload which creates many outbound connections to the same destination IP:Port. To connect to the internet we utilize the Google Cloud product "CloudNAT" which implements a NAT solution somewhere in the network layer.
Starting the workload let after a few seconds to all sort of connection
issues, and of course logs from CloudNAT that it dropped connections.
The simplest reproducer I could find was
while true; do curl http://sven.stormbind.net; done
which already let to connection drops on CloudNAT.
We starred a bit at output of gcloud compute routers get-nat-mapping-info our-default-natgw
,
but allocating additional ports still worked fine in general. Further investigation
let to two differences between a project which was fine and those that failed:
- c2d or n2d machine types instead of e2 and
- usage of gVNIC.
Moving away from the e2 instances instantly fixed our issue. Only some connection
drops could be observed on CloudNAT if we set the min_ports_per_vm
value too low
and it could not allocate new ports in time. Thus we did some additional optimizations:
- raised
min_ports_per_vm
to 256 - raised
max_ports_per_vm
to 32768 (the sensible maximum because CloudNAT will always double its allocation) - set
nat_tcp_timewait_sec
to 30, default is 120, reclaim of ports is only running every 30s, thus ports can be re-used after 30-60s
See also upstream documentation regarding timeouts.
To complete the setup alignment we also enabled gVNIC on all GKE pools. Noteworthy detail a colleague figured out: If you use terraform to provision GKE pools make sure to use at least google provider v6.33.0 to avoid a re-creation of your node pool.
GKE LoadBalancer Force allPorts: true on Forwarding Rule
Technically it's possible to configure a forwarding rule to listen on some or all ports.
That gets more complicated if you do not configure the forwarding rule via terraform or gcloud
cli, but use a GKE resource kind: Service
with spec.type: LoadBalancer
. The
logic documented by Google Cloud
is that the forwarding rule will have per port configuration if it's five or less, and above
that it will open for all ports. Sadly that does not work e.g. in cases where you've an
internal load balancer and a serviceAttachment attached to the forwarding rule. In my
experience reconfiguring was also unreliable in cases without a serviceAttachment and
required a manual deletion of the service load balancer to have the operator reconcile it
and create it correctly.
Given that we wanted to have all ports open to allow us to dynamically add more ports on a specific load balancer, but there is no annotation for that, I worked around with this beauty:
ports:
- name: dummy-0
port: 2342
protocol: TCP
targetPort: 2342
- name: dummy-1
port: 2343
protocol: TCP
targetPort: 2343
- name: dummy-2
port: 2344
protocol: TCP
targetPort: 2344
- name: dummy-3
port: 2345
protocol: TCP
targetPort: 2345
- name: service-1
port: 4242
protocol: TCP
targetPort: 4242
- name: service-2
port: 4343
protocol: TCP
targetPort: 4343
If something in that area did not work out there's basically two things to check:
- Is the port open on the forwarding rule / is the forwarding rule configured
with
allPorts: true
? - Got the VPC firewall rule created by the service operator in GKE updated to open all required ports?
Rate Limiting with Cloud Armor on Global TCP Proxy Load Balancer
According to the Google Cloud support rate limiting on a TCP proxy is a preview feature. That seems to be the excuse why it's all very inconsistent right now, but it works.
- The Google Cloud Web Console is 100% broken and unable to deal with it. Don't touch it via the web.
- If you configure an
exceed_action
in agoogle_compute_security_policy
terraform resource you must use a value with response code, e.g.exceed_action = "deny(429)"
. The response code will be ignored. In all other cases I know you must use adeny
without response code if you want to be able to assign the policy to a L3/L4 load balancer. - If you use config-connector (kcc) you can already use
exceedAction: deny
albeit it's not documented. Neither for config-connector itself nor for the API. - If you use the gcloud cli you can use
--exceed-action=deny
which is already documented if you callgcloud beta compute security-policies create --help
, but it also works in the none beta mode. Also export / import via gcloud cli work with adeny
without defining a response code.
Terraform Sample Snippet
rule {
description = "L3-L4 Rate Limit"
action = "rate_based_ban"
priority = "2342"
match {
versioned_expr = "SRC_IPS_V1"
config {
src_ip_ranges = ["*"]
}
}
rate_limit_options {
enforce_on_key = "IP"
# exceed_action only supports deny() with a response code
exceed_action = "deny(429)"
rate_limit_threshold {
count = 320
interval_sec = 60
}
ban_duration_sec = 240
ban_threshold {
count = 320
interval_sec = 60
}
conform_action = "allow"
}
}
Config-Connector Sample Snippet
- action: rate_based_ban
description: L3-L4 Rate Limit
match:
config:
srcIpRanges:
- "*"
versionedExpr: SRC_IPS_V1
preview: false
priority: 2342
rateLimitOptions:
banDurationSec: 240
banThreshold:
count: 320
intervalSec: 60
conformAction: allow
enforceOnKey: IP
exceedAction: deny
rateLimitThreshold:
count: 320
intervalSec: 60
Terraform 1.9 introduced some time ago the capability to reference in an input variable validation condition other variables, not only the one you're validating.
What does not work is having two variables which validate each other, e.g.
variable "nat_min_ports" {
description = "Minimal amount of ports to allocate for 'min_ports_per_vm'"
default = 32
type = number
validation {
condition = (
var.nat_min_ports >= 32 &&
var.nat_min_ports <= 32768 &&
var.nat_min_ports < var.nat_max_ports
)
error_message = "Must be between 32 and 32768 and less than 'nat_max_ports'"
}
}
variable "nat_max_ports" {
description = "Maximal amount of ports to allocate for 'max_ports_per_vm'"
default = 16384
type = number
validation {
condition = (
var.nat_max_ports >= 64 &&
var.nat_max_ports <= 65536 &&
var.nat_max_ports > var.nat_min_ports
)
error_message = "Must be between 64 and 65536 and above 'nat_min_ports'"
}
}
That let directly to the following rather opaque error message:
Received an error
Error: Cycle: module.gcp_project_network.var.nat_max_ports (validation), module.gcp_project_network.var.nat_min_ports (validation)
Removed the sort of duplicate check var.nat_max_ports > var.nat_min_ports
on
nat_max_ports
to break the cycle.
Took some time yesterday to upload the current state of what will be at some point vym 3 to experimental. If you're a user of this tool you can give it a try, but be aware that the file format changed, and can't be processed with vym releases before 2.9.500! Thus it's important to create a backup until you're sure that you're ready to move on. On the technical side this is also the switch from Qt5 to Qt6.
If you ever face the need to activate the PROXY Protocol in HaProxy
(e.g. if you're as unlucky as I'm, and you have to use Google Cloud TCP
proxy load balancer), be aware that there are two ways to do that.
Both are part of the frontend
configuration.
accept-proxy
This one is the big hammer and forces the usage of the PROXY protocol on all connections. Sample:
frontend vogons
bind *:2342 accept-proxy ssl crt /etc/haproxy/certs/vogons/tls.crt
tcp-request connection expect-proxy
If you have to, e.g. during a phase of migrations, receive traffic directly, without
the PROXY protocol header and from a proxy with the header there is also a more
flexible option based on a tcp-request connection
action. Sample:
frontend vogons
bind *:2342 ssl crt /etc/haproxy/certs/vogons/tls.crt
tcp-request connection expect-proxy layer4 if { src 35.191.0.0/16 130.211.0.0/22 }
Source addresses here are those of GCP global TCP proxy frontends. Replace with whatever suites your case. Since this is happening just after establishing a TCP connection, there is barely anything else available to match on beside of the source address.
HaProxy Documentation
Mainly relevant for the few who still run their own mail server and use Postfix + pflogsumm.
Few weeks back Jim contacted me that he's going to pick up work on pflogsumm again, and as first step wanted to release 1.1.6 to incorporate patches from the Debian package. That one is now released. Since we're already in the Trixie freeze the package is in experimental, but as usual should be fine to install manually.
Heads Up - Move to /usr/bin
I took that as an opportunity to move pflogsumm
from /usr/sbin
to /usr/bin
!
There was not really a good reason to ever have it in sbin
. It's neither a system binary,
nor statically linked (like in the very old days), or something that really only makes sense
to be used as root. Some out there likely have custom scripts which do not rely on an
adjusted PATH
variable, those scripts require an update.
... or how I spent my lunch break today.
An increasing amount of news outlets (hello heise.de) start to embed bullshit which requires DRM playback. Since I keep that disabled I now get an infobar that tells me that I need to enable it for this page. Pretty useless and a pain in the back because it takes up screen space. Here's the quick way how to get rid of it:
- Go to
about:config
and turn ontoolkit.legacyUserProfileCustomizations.stylesheets
. - Go to your Firefox profile folder (e.g.
~/.mozilla/firefox/<random-value>.default/
) andmkdir chrome && touch chrome/userChrome.css
. Add the following to your
userChrome.css
file:.infobar[value="drmContentDisabled"] { display: none !important; }
Restart Firefox and read news again with full screen space.