RSS Atom Add a new post titled:

Rant - I've a theory about istio: It feels like a software designed by people who hate the IT industry and wanted revenge. So they wrote a software with so many odd points of traffic interception (e.g. SNI based traffic re-routing) that's completely impossible to debug. If you roll that out into an average company you completely halt the IT operations for something like a year.

On topic: I've two endpoints (IP addresses serving HTTPS on a none standard port) outside of kubernetes, and I need some rudimentary balancing of traffic. Since istio is already here one can levarage that, combining the resource kinds ServiceEntry, DestinationRule and VirtualService to publish a service name within the istio mesh. Since we do not have host names and DNS for those endpoint IP addresses we need to rely on istio itself to intercept the DNS traffic and deliver a virtual IP address to access the service. The sample given here leverages the exportTo configuration to make the service name only available in the same namespace. If you need broader access remove or adjust that. As usual in kubernetes you can resolve the name also as FQDN, e.g. acme-service.mynamespace.svc.cluster.local.

---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: acme-service
spec:
  hosts:
    - acme-service
  ports:
    - number: 12345
      name: acmeglue
      protocol: HTTPS
  resolution: STATIC
  location: MESH_EXTERNAL
  # limit the availability to the namespace this resource is applied to
  # if you need cross namespace access remove all the `exportTo`s in here
  exportTo:
    - "."
  # use `endpoints:` in this setup, `addreses:` did not work
  endpoints:
    # region1
    - address: 192.168.0.1
      ports:
        acmeglue: 12345
    # region2
     - address: 10.60.48.50
       ports:
        acmeglue: 12345
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: acme-service
spec:
  host: acme-service
  # limit the availability to the namespace this resource is applied to
  exportTo:
    - "."
  trafficPolicy:
    loadBalancer:
      simple: LEAST_REQUEST
    connectionPool:
      tcp:
        tcpKeepalive:
          # We have GCP service attachments involved with a 20m idle timeout
          # https://cloud.google.com/vpc/docs/about-vpc-hosted-services#nat-subnets-other
          time: 600s
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: acme-service
spec:
  hosts:
    - acme-service
  # limit the availability to the namespace this resource is applied to
  exportTo:
    - "."
  http:
  - route:
    - destination:
        host: acme-service
    retries:
      attempts: 2
      perTryTimeout: 2s
      retryOn: connect-failure,5xx
---
# Demo Deployment, istio configuration is the important part
apiVersion: apps/v1
kind: Deployment
metadata:
  name: foobar
  labels:
    app: foobar
spec:
  replicas: 1
  selector:
    matchLabels:
      app: foobar
  template:
    metadata:
      labels:
        app: foobar
        # enable istio sidecar
        sidecar.istio.io/inject: "true"
      annotations:
        # Enable DNS capture and interception, IP resolved will be in 240.240/16
        # If you use network policies you've to allow egress to this range.
        proxy.istio.io/config: |
          proxyMetadata:
            ISTIO_META_DNS_CAPTURE: "true"
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

Now we can exec into the deployed pod, do something like curl -vk https://acme-service:12345, and it will talk to one of the endpoints defined in the ServiceEntry via an IP address out of the 240.240/16 Class E network.

Documentation
https://istio.io/latest/docs/reference/config/networking/virtual-service/
https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution
https://istio.io/latest/docs/reference/config/networking/destination-rule/#LoadBalancerSettings-SimpleLB
https://istio.io/latest/docs/ops/configuration/traffic-management/dns-proxy/#sidecar-mode

Posted Wed Aug 20 17:56:02 2025

Brief dump so I don't forget how that worked in August 2025. Requires npm, npx and nodejs.

  1. Install Chrome
  2. Add the BrowserMCP extension
  3. Install gemini-cli npm install -g @google/gemini-cli
  4. Retrieve a Gemini API key via AI Studio
  5. Export API key for gemini-cli export GEMINI_API_KEY=2342
  6. Start BrowserMCP extension, see manual, an info box will appear that it's active with a cancel button.
  7. Add mcp server to gemini-cli gemini mcp add browsermcp npx @browsermcp/mcp@latest
  8. Start gemini-cli, let it use the mcp server and task it to open a website.
Posted Wed Aug 13 14:21:57 2025

even I managed to migrate my last setup to sway a few weeks ago. I was the last one you've been waiting for dear X Strike Force, right?

Multi display support just works, no more modeline hackery. Oh and we can also remove those old clipboard manager.

One oddity with sway I could not yet solve is that I had to delete the default wallpaper /usr/share/backgrounds/sway/Sway_Wallpaper_Blue_1920x1080.png to allow it to load the Debian wallpaper via

output * bg /usr/share/desktop-base/active-theme/wallpaper/contents/images/1920x1080.svg fill

Update: Thanks to Birger and Sebastian who easily could explain that. The sway-backgrounds package ships a config snippet in /etc/sway/config.d and if that's included e.g. via include /etc/sway/config.d/* after setting the background in your ~/.config/sway/config it does the obvious and overrides your own background configuration again. Didn't expect that but makes sense. So the right fix is to just remove the sway-backgrounds package.

I also had a bit of fist fight with sway to make sure I've as much screen space available as possible. So I tried to shrink fonts and remove borders.

default_border none
default_floating_border none
titlebar_padding 1
titlebar_border_thickness 0
font pango: monospace 9

Rest I guess is otherwise well documented. I settled on wofi as menu tool, cliphist for clipboard access, of course waybar to be able to use the nm-applet, swayidle and swaylock are probably also more or less standard for screen locking.

Having

for_window [app_id="firefox"] inhibit_idle fullscreen

is also sensible for video streaming, to avoid the idle locking.

Posted Fri Jul 18 18:11:45 2025

I know that Canonical / Ubuntu people are sometimes not well received due to promotion of Canonical tooling (some might remember upstart and Mir, or more recently snap and netplan). Thus for some positive vibes consider that I could hand out the Ubuntu Desktop image on a USB flash drive to a family member, and the family member could just replace Windows 10 without any assistance. It just worked. This was made possible by the will to keep a slightly dated ThinkPad in use, which it's not supported by Windows 11.

I've to admit that I never looked at Ubuntu Desktop before, but the user experience is on par with everything else I know. Thanks to all the folks at Canonical who made that possible! Luckily the times when you had to fiddle with modelines for XFree86, and sleepless nights about configuring lpd to get printing up and running are long gone. I believe now that Microsoft is doing Microsoft things with rolling Windows updates which force users to replace completely fine working hardware is the time to encourage more people to move to open operating systems, and Ubuntu Desktop seems to be a very suitable choice.

Things to Improve

Albeit I think the out of the box experience is great, there are a few niche topics where things could improve.

Default Access to Apt / Ubuntu Universe

Well snaps are promoted as the primary application source, but having some graphical interface like synaptic available by default to just install from Ubuntu Universe would be helpful. In this case we wanted to install keepass2 to access the users keepass file kept from the Windows setup. Having to tell someone "open the terminal and type sudo apt install" is something that requires support.

Snaps and Isolation Overrides

I'm fine with snaps having least privileges, but it would be nice if one could add overrides easily. Here the family member was playing with an Arduino Uno and there is one sample in the workbook that utilizes a Java application called Processing. It's available as a snap, but that one doesn't have access to the required serial port device file. I tried hard to make it work - full details in the snapcraft forum - but failed, and opted to use the --devmode to install it without isolation enforcement. As far as I understood snap that results in no more automatic updates for the application. If someone from the Ubuntu crowd with more snap debug experience has additional hints on how to narrow down which change is required, I would love to improve that and create a PR for the processing developers. Either reply in the forum or reach out via mail sven at stormbind dot net.

Posted Wed Jul 16 14:50:51 2025

Latest oddities I ran into with Google Cloud products before I start to forget about them again.

e2 Compute Instances vs CloudNAT

Years ago I already had a surprising encounter with the Google Cloud e2 instances. Back then we observed CPU steal time from 20-60%, which made the instances unusable for anything remotely latency sensitive. Now someone started to run a workload which creates many outbound connections to the same destination IP:Port. To connect to the internet we utilize the Google Cloud product "CloudNAT" which implements a NAT solution somewhere in the network layer.

Starting the workload let after a few seconds to all sort of connection issues, and of course logs from CloudNAT that it dropped connections. The simplest reproducer I could find was while true; do curl http://sven.stormbind.net; done which already let to connection drops on CloudNAT.

We starred a bit at output of gcloud compute routers get-nat-mapping-info our-default-natgw, but allocating additional ports still worked fine in general. Further investigation let to two differences between a project which was fine and those that failed:

  1. c2d or n2d machine types instead of e2 and
  2. usage of gVNIC.

Moving away from the e2 instances instantly fixed our issue. Only some connection drops could be observed on CloudNAT if we set the min_ports_per_vm value too low and it could not allocate new ports in time. Thus we did some additional optimizations:

  • raised min_ports_per_vm to 256
  • raised max_ports_per_vm to 32768 (the sensible maximum because CloudNAT will always double its allocation)
  • set nat_tcp_timewait_sec to 30, default is 120, reclaim of ports is only running every 30s, thus ports can be re-used after 30-60s

See also upstream documentation regarding timeouts.

To complete the setup alignment we also enabled gVNIC on all GKE pools. Noteworthy detail a colleague figured out: If you use terraform to provision GKE pools make sure to use at least google provider v6.33.0 to avoid a re-creation of your node pool.

GKE LoadBalancer Force allPorts: true on Forwarding Rule

Technically it's possible to configure a forwarding rule to listen on some or all ports. That gets more complicated if you do not configure the forwarding rule via terraform or gcloud cli, but use a GKE resource kind: Service with spec.type: LoadBalancer. The logic documented by Google Cloud is that the forwarding rule will have per port configuration if it's five or less, and above that it will open for all ports. Sadly that does not work e.g. in cases where you've an internal load balancer and a serviceAttachment attached to the forwarding rule. In my experience reconfiguring was also unreliable in cases without a serviceAttachment and required a manual deletion of the service load balancer to have the operator reconcile it and create it correctly.

Given that we wanted to have all ports open to allow us to dynamically add more ports on a specific load balancer, but there is no annotation for that, I worked around with this beauty:

      ports:
        - name: dummy-0
          port: 2342
          protocol: TCP
          targetPort: 2342
        - name: dummy-1
          port: 2343
          protocol: TCP
          targetPort: 2343
        - name: dummy-2
          port: 2344
          protocol: TCP
          targetPort: 2344
        - name: dummy-3
          port: 2345
          protocol: TCP
          targetPort: 2345
        - name: service-1
          port: 4242
          protocol: TCP
          targetPort: 4242
        - name: service-2
          port: 4343
          protocol: TCP
          targetPort: 4343

If something in that area did not work out there's basically two things to check:

  1. Is the port open on the forwarding rule / is the forwarding rule configured with allPorts: true?
  2. Got the VPC firewall rule created by the service operator in GKE updated to open all required ports?

Rate Limiting with Cloud Armor on Global TCP Proxy Load Balancer

According to the Google Cloud support rate limiting on a TCP proxy is a preview feature. That seems to be the excuse why it's all very inconsistent right now, but it works.

  • The Google Cloud Web Console is 100% broken and unable to deal with it. Don't touch it via the web.
  • If you configure an exceed_action in a google_compute_security_policy terraform resource you must use a value with response code, e.g. exceed_action = "deny(429)". The response code will be ignored. In all other cases I know you must use a deny without response code if you want to be able to assign the policy to a L3/L4 load balancer.
  • If you use config-connector (kcc) you can already use exceedAction: deny albeit it's not documented. Neither for config-connector itself nor for the API.
  • If you use the gcloud cli you can use --exceed-action=deny which is already documented if you call gcloud beta compute security-policies create --help, but it also works in the none beta mode. Also export / import via gcloud cli work with a deny without defining a response code.

Terraform Sample Snippet

  rule {
    description = "L3-L4 Rate Limit"
    action      = "rate_based_ban"
    priority    = "2342"
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    rate_limit_options {
      enforce_on_key = "IP"
      # exceed_action only supports deny() with a response code
      exceed_action = "deny(429)"
      rate_limit_threshold {
        count        = 320
        interval_sec = 60
      }
      ban_duration_sec = 240
      ban_threshold {
        count        = 320
        interval_sec = 60
      }
      conform_action = "allow"
    }
  }

Config-Connector Sample Snippet

  - action: rate_based_ban
    description: L3-L4 Rate Limit
    match:
      config:
        srcIpRanges:
          - "*"
      versionedExpr: SRC_IPS_V1
    preview: false
    priority: 2342
    rateLimitOptions:
      banDurationSec: 240
      banThreshold:
        count: 320
        intervalSec: 60
      conformAction: allow
      enforceOnKey: IP
      exceedAction: deny
      rateLimitThreshold:
         count: 320
         intervalSec: 60
Posted Wed Jul 16 14:00:38 2025

Terraform 1.9 introduced some time ago the capability to reference in an input variable validation condition other variables, not only the one you're validating.

What does not work is having two variables which validate each other, e.g.

variable "nat_min_ports" {
  description = "Minimal amount of ports to allocate for 'min_ports_per_vm'"
  default     = 32
  type        = number
  validation {
    condition = (
      var.nat_min_ports >= 32 &&
      var.nat_min_ports <= 32768 &&
      var.nat_min_ports < var.nat_max_ports
    )
    error_message = "Must be between 32 and 32768 and less than 'nat_max_ports'"
  }
}

variable "nat_max_ports" {
  description = "Maximal amount of ports to allocate for 'max_ports_per_vm'"
  default     = 16384
  type        = number
  validation {
    condition = (
      var.nat_max_ports >= 64 &&
      var.nat_max_ports <= 65536 &&
      var.nat_max_ports > var.nat_min_ports
    )
    error_message = "Must be between 64 and 65536 and above 'nat_min_ports'"
  }
}

That let directly to the following rather opaque error message: Received an error Error: Cycle: module.gcp_project_network.var.nat_max_ports (validation), module.gcp_project_network.var.nat_min_ports (validation)

Removed the sort of duplicate check var.nat_max_ports > var.nat_min_ports on nat_max_ports to break the cycle.

Posted Fri Jun 20 11:34:09 2025

Took some time yesterday to upload the current state of what will be at some point vym 3 to experimental. If you're a user of this tool you can give it a try, but be aware that the file format changed, and can't be processed with vym releases before 2.9.500! Thus it's important to create a backup until you're sure that you're ready to move on. On the technical side this is also the switch from Qt5 to Qt6.

Posted Mon Jun 16 09:19:38 2025

If you ever face the need to activate the PROXY Protocol in HaProxy (e.g. if you're as unlucky as I'm, and you have to use Google Cloud TCP proxy load balancer), be aware that there are two ways to do that. Both are part of the frontend configuration.

accept-proxy

This one is the big hammer and forces the usage of the PROXY protocol on all connections. Sample:

      frontend vogons
          bind *:2342 accept-proxy ssl crt /etc/haproxy/certs/vogons/tls.crt

tcp-request connection expect-proxy

If you have to, e.g. during a phase of migrations, receive traffic directly, without the PROXY protocol header and from a proxy with the header there is also a more flexible option based on a tcp-request connection action. Sample:

      frontend vogons
          bind *:2342 ssl crt /etc/haproxy/certs/vogons/tls.crt
          tcp-request connection expect-proxy layer4 if { src 35.191.0.0/16 130.211.0.0/22 }

Source addresses here are those of GCP global TCP proxy frontends. Replace with whatever suites your case. Since this is happening just after establishing a TCP connection, there is barely anything else available to match on beside of the source address.

HaProxy Documentation

Posted Wed Jun 11 17:54:32 2025

Mainly relevant for the few who still run their own mail server and use Postfix + pflogsumm.

Few weeks back Jim contacted me that he's going to pick up work on pflogsumm again, and as first step wanted to release 1.1.6 to incorporate patches from the Debian package. That one is now released. Since we're already in the Trixie freeze the package is in experimental, but as usual should be fine to install manually.

Heads Up - Move to /usr/bin

I took that as an opportunity to move pflogsumm from /usr/sbin to /usr/bin! There was not really a good reason to ever have it in sbin. It's neither a system binary, nor statically linked (like in the very old days), or something that really only makes sense to be used as root. Some out there likely have custom scripts which do not rely on an adjusted PATH variable, those scripts require an update.

Posted Fri May 23 13:52:35 2025

... or how I spent my lunch break today.

An increasing amount of news outlets (hello heise.de) start to embed bullshit which requires DRM playback. Since I keep that disabled I now get an infobar that tells me that I need to enable it for this page. Pretty useless and a pain in the back because it takes up screen space. Here's the quick way how to get rid of it:

  1. Go to about:config and turn on toolkit.legacyUserProfileCustomizations.stylesheets.
  2. Go to your Firefox profile folder (e.g. ~/.mozilla/firefox/<random-value>.default/) and mkdir chrome && touch chrome/userChrome.css.
  3. Add the following to your userChrome.css file:

     .infobar[value="drmContentDisabled"] {
       display: none !important;
     }
    
  4. Restart Firefox and read news again with full screen space.

Posted Wed May 14 12:59:56 2025