RSS Atom Add a new post titled:

PSA for those foolish enough to use Google Cloud and try to use private service connect: If you want to change the serviceAttachment your private service connect forwarding rule points at, you must delete the forwarding rule and create a new one. Updates are not supported. I've done that in the past via terraform, but lately encountered strange errors like this:

Error updating ForwardingRule: googleapi: Error 400: Invalid value for field 'target.target':
'<https://www.googleapis.com/compute/v1/projects/mydumbproject/regions/europe-west1/serviceAttachments/
k8s1-sa-xyz-abc>'. Unexpected resource collection 'serviceAttachments'., invalid

Worked around that with the help of terrraform_data and lifecycle:

resource "terraform_data" "replacement" {
    input = var.gcp_psc_data["target"]
}

resource "google_compute_forwarding_rule" "this" {
    count   = length(var.gcp_psc_data["target"]) > 0 ? 1 : 0
    name    = "${var.gcp_psc_name}-psc"
    region  = var.gcp_region
    project = var.gcp_project

    target                = var.gcp_psc_data["target"]
    load_balancing_scheme = "" # need to override EXTERNAL default when target is a service attachment
    network               = var.gcp_network
    ip_address            = google_compute_address.this.id

    lifecycle {
        replace_triggered_by = [
            terraform_data.replacement
        ]
    }
}

See also terraform data for replace_triggered_by.

Posted Mon May 15 09:21:21 2023

In my day job someone today took the time in the team daily to explain his research why some of our configuration is wrong. He spent quite some time on his own to look at the history in git and how everything was setup initially, and ended up in the current - wrong - way. That triggered me to validate that quickly, another 5min of work. So we agreed to change it. A one line change, nothing spectacular, but lifetime was invested to figure out why it should've a different value.

When the pull request got opened a few minutes later there was nothing of that story in the commit message. Zero, nada, nothing. :( I'm really puzzled why someone invests lifetime to dig into company internal history to try get something right, do a lengthy explanation to the whole team, use the time of others, even mention that there was no explanation of why it's not the default value anymore it should be, and repeat the same mistake by not writing down anything in the commit message.

For the current company I'm inclined to propose a commit message validator. For a potential future company I might join, I guess I ask for real world git logs from repositories I should contribute to. Seems that this is another valuable source of information to qualify the company culture. Next up to the existence of whiteboards in the office. I'm really happy that at least a majority of the people contributing to Debian writes somewhat decent commit messages and changelogs. Let that be a reminder to myself to improve in that area the next time I've to change something.

Posted Fri Apr 28 10:15:59 2023

I know a few people hold on to the exFAT fuse implementation due the support for timezone offsets, so here is a small update for you. Andrew released 1.4.0, which includes the timezone offset support, which was so far only part of the git master branch. It also fixes a, from my point of view very minor, security issue CVE-2022-29973. In addition to that it's the first build with fuse3 support. If you still use this driver, pick it up in experimental (we're in the bookworm freeze right now), and give it a try. I'm personally not using it anymore beyond a very basic "does it mount" test.

Posted Fri Mar 3 16:39:48 2023

tl;dr; OpenSSL 3.0.1 leaks memory in ssl3_setup_write_buffer(), seems to be fixed in 3.0.5 3.0.2. The issue manifests at least in stunnel and keepalived on CentOS 9. In addition I learned the hard way that running a not so recent VirtualBox version on Debian bullseye let to dh parameter generation crashing in libcrypto in bn_sqr8x_internal().

A recent rabbit hole I went down. The actual bug in openssl was nailed down and documented by Quentin Armitage on GitHub in keepalived My bugreport with all back and forth in the RedHat Bugzilla is #2128412.

Act I - Hello stunnel, this is the OOMkiller Calling

We started to use stunnel on Google Cloud compute engine instances running CentOS 9. The loadbalancer in front of those instances used a TCP health check to validate the backend availability. A day or so later the stunnel instances got killed by the OOMkiller. Restarting stunnel and looking into /proc/<pid>/smaps showed a heap segment growing quite quickly.

Act II - Reproducing the Issue

While I'm not the biggest fan of VirtualBox and Vagrant I've to admit it's quite nice to just fire up a VM image, and give other people a chance to recreate that setup as well. Since VirtualBox is no longer released with Debian/stable I just recompiled what was available in unstable at the time of the bullseye release, and used that. That enabled me now to just start a CentOS 9 VM, setup stunnel with a minimal config, grab netcat and a for loop and watch the memory grow. E.g. while true; do nc -z localhost 2600; sleep 1; done To my surprise, in addition to the memory leak, I also observed some crashes but did not yet care too much about those.

Act III - Wrong Suspect, a Workaround and Bugreporting

Of course the first idea was that something must be wrong in stunnel itself. But I could not find any recent bugreports. My assumption is that there are still a few people around using CentOS and stunnel, so someone else should probably have seen it before. Just to be sure I recompiled the latest stunnel package from Fedora. Didn't change anything. Next I recompiled it without almost all the patches Fedora/RedHat carries. Nope, no progress. Next idea: Maybe this is related to the fact that we do not initiate a TLS context after connecting? So we changed the test case from nc to openssl s_client, and the loadbalancer healthcheck from TCP to a TLS based one. Tada, a workaround, no more memory leaking. In addition I gave Fedora a try (they have Vagrant Virtualbox images in the "Cloud" Spin, e.g. here for Fedora 36) and my local Debian installation a try. No leaks experienced on both. Next I reported #2128412.

Act IV - Crash in libcrypto and a VirtualBox Bug

When I moved with the test case from the Google Cloud compute instance to my local VM I encountered some crashes. That morphed into a real problem when I started to run stunnel with gdb and valgrind. All crashes happened in libcrypto bn_sqr8x_internal() when generating new dh parameter (stunnel does that for you if you do not use static dh parameter). I quickly worked around that by generating static dh parameter for stunnel. After some back and forth I suspected VirtualBox as the culprit. Recompiling the current VirtualBox version (6.1.38-dfsg-3) from unstable on bullseye works without any changes. Upgrading actually fixed that issue.

Epilog

I highly appreciate that RedHat, with all the bashing around the future of CentOS, still works on community contributed bugreports. My kudos go to Clemens Lang. :) Now that the root cause is clear, I guess RedHat will push out a fix for the openssl 3.0.1 based release they have in RHEL/CentOS 9. Until that is available at least stunnel and keepalived are known to be affected. If you run stunnel on something public it's not that pretty, because already a low rate of TCP connections will result in a DoS condition.

Posted Sun Oct 16 20:24:40 2022

Update of my notes from 2020.

# Download a binary device tree file and matching kernel a good soul uploaded to github
wget https://github.com/vfdev-5/qemu-rpi2-vexpress/raw/master/kernel-qemu-4.4.1-vexpress
wget https://github.com/vfdev-5/qemu-rpi2-vexpress/raw/master/vexpress-v2p-ca15-tc1.dtb
# Download the official Rasbian image without X
wget https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-lite.img.xz
unxz 2022-04-04-raspios-bullseye-armhf-lite.img.xz
# Convert it from the raw image to a qcow2 image and add some space
qemu-img convert -f raw -O qcow2 2022-04-04-raspios-bullseye-armhf-lite.img rasbian.qcow2
qemu-img resize rasbian.qcow2 4G
# make sure we get a user account setup
echo "me:$(echo 'test123'|openssl passwd -6 -stdin)" > userconf
sudo guestmount -a rasbian.qcow2 -m /dev/sda1 /mnt
sudo mv userconf /mnt
sudo guestunmount /mnt
# start qemu
qemu-system-arm -m 2048M -M vexpress-a15 -cpu cortex-a15 \
 -kernel kernel-qemu-4.4.1-vexpress -no-reboot \
 -smp 2 -serial stdio \
 -dtb vexpress-v2p-ca15-tc1.dtb -sd rasbian.qcow2 \
 -append "root=/dev/mmcblk0p2 rw rootfstype=ext4 console=ttyAMA0,15200 loglevel=8" \
 -nic user,hostfwd=tcp::5555-:22
# login at the serial console as user me with password test123
sudo -i
# enable ssh
systemctl enable ssh
systemctl start ssh
# resize partition and filesystem
parted /dev/mmcblk0 resizepart 2 100%
resize2fs /dev/mmcblk0p2

Now I can login via ssh and start to play:

ssh me@localhost -p 5555
Posted Tue Apr 12 11:27:36 2022

tl;dr I ported a part of the python-suntime library to Lua to use it on OpenWRT and RutOS powered devices. suntime.lua

There are those unremarkable things which let you pause for a moment, and realize what a great gift of our time open source software and open knowledge is. At some point in time someone figured out how to calculate the sunrise and sunset time on the current date for your location. Someone else wrote that up and probably again a different person published it on the internet. The Internet Archive preserved a copy of it so I can still link to it. Someone took this algorithm and published a code sample on StackOverflow, which was later on used by the SatAgro guys to create the python-suntime library. Now I could come along, copy the core function of this library, convert it within a few hours - mostly spent learning a bit of Lua, to a working script fulfilling my needs.

Posted Thu Feb 3 21:11:09 2022

Sometimes it's nice for testing purpose to have the OpenWRT userland available locally. Since there is an x86 build available one can just run it within qemu.

wget https://downloads.openwrt.org/releases/21.02.1/targets/x86/64/openwrt-21.02.1-x86-64-generic-squashfs-combined.img.gz
gunzip openwrt-21.02.1-x86-64-generic-squashfs-combined.img.gz
qemu-img convert -f raw -O qcow2 openwrt-21.02.1-x86-64-generic-squashfs-combined.img openwrt-21.02.1.qcow2
qemu-img resize openwrt-21.02.1.qcow2 200M
qemu-system-x86_64 -M q35 \
  -drive file=openwrt-21.02.1.qcow2,id=d0,if=none,bus=0,unit=0 \
  -device ide-hd,drive=d0,bus=ide.0 -nic user,hostfwd=tcp::5556-:22
# you've to change the network configuration to retrieve an IP via
# dhcp for the lan bridge br-lan
vi /etc/config/network
  - change option proto 'static' to 'dhcp'
  - remove IP address and netmask setting
/etc/init.d/network restart
# now you should've an ip out of 10.0.2.0/24
ssh root@localhost -p 5556
# remember ICMP does not work but otherwise you should have
# IP networking available
opkg update
opkg install curl
Posted Thu Jan 20 21:20:54 2022

Wasted quite some hours until I found a working Modeline in this stack exchange post so the ThinkPad works with a HDMI attached Samsung QHD display.

Internal display of the ThinkPad is a FHD display detected as eDP-1, the external one is DP-3 and according to the packaging known by Samsung as S24A600NWU. The auto deteced EDID modes for QHD - 2560x1440 - did not work at all, the display simply stays dark. After a lot of back and forth with the i915 driver vs nouveau vs nvidia/nvidia-drm with and without modesetting, the following Modeline did the magic:

xrandr --newmode 2560x1440_54.97  221.00  2560 2608 2640 2720  1440 1443 1447 1478  +HSync -VSync
xrandr --addmode DP-3 2560x1440_54.97
xrandr --output DP-3 --mode 2560x1440_54.97 --right-of eDP-1 --primary

Modelines for 50Hz and 60Hz generated with cvt 2560 1440 60 did not work, neither did the one extracted with edid-decode -X from the hex blob found in .local/share/xorg/Xorg.0.log.

From the auto-detected Modelines FHD - 1920x1080 - did work. In case someone struggles with a similar setup, that might be a starting point. Fun part, if I attach my several years old Dell E7470 everything is just fine out of the box. But that one just has an Intel GPU and not the unholy combination I've here:

$ lspci|grep -E "VGA|3D"
00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD Graphics] (rev 05)
01:00.0 3D controller: NVIDIA Corporation GP107GLM [Quadro P620] (rev ff)
Posted Fri Oct 15 16:12:50 2021

Some time ago I looked briefly at an Envertech data logger for small scale photovoltaic setups. Turned out that PV inverter are kinda unreliable, and you really have to monitor them to notice downtimes and defects. Since my pal shot for a quick win I've cobbled together another Python script to query the portal at www.envertecportal.com, and report back if the generated power is down to 0. The script is currently run on a vserver via cron and reports back via the system MTA. So yeah, you need to have something like that already at hand.

Script and Configuration

You've to provide your PV systems location with latitude and longitude so the script can calculate (via python3-suntime) the sunrise and sunset times. At the location we deal with we expect to generate some power at least from sunrise + 1h to sunet - 1h. That is tunable via the configuration option toleranceSeconds.

Retrieving the stationId is a bit ugly because it's not provided via any API, instead it's rendered serverside into the website. So I just logged in on the portal and picked it up by looking into the page source.

www.envertecportal.com API

I guess this is some classic in the IoT land, but neither the documentation provided on the portal frontpage as docx, nor the API docs at port 8090 are complete and correct. The few bits I gathered via the Firefox Web Developer Tools are:

  1. Login https://www.envertecportal.com/apiaccount/login - POST, sent userName and pwd containing your login name and password. The response JSON is very explicit if your login was not successful and why.
  2. Store the session cookie called ASP.NET_SessionId for use on all subsequent requests.
  3. Retrieve station info https://www.envertecportal.com/ApiStations/getStationInfo - POST, sent ASP.NET_SessionId and stationId with the ID of the station. Returns a JSON with an object named Data. The field Power contains the currently generated power as a float with two digits (e.g. 0.01).
  4. Logout https://www.envertecportal.com/apiAccount/Logout - POST, sent ASP.NET_SessionId.

Some Surprises

There were a few surprises, maybe they help others dealing with an Envertech setup.

  1. The portal truncates passwords at 16 chars.
  2. The "Forget Password?" function mails you back the password in plain text (that's how I learned about 1.).
  3. The login API endpoint reporting the exact reason why the login failed is somewhat out of fashion. Though this one is probably not a credential stuffing target because there is no money to make, so don't care.
  4. The data logger reports the data to www.envertecportal.com at port 10013.
  5. There is some checksuming done on the reported data, but the system is not replay safe. So you can sent it any valid data string at a later time and get wrong data recorded.
  6. People at forum.fhem.de decoded some values but could not figure out the checksuming so far.
Posted Tue Sep 14 22:11:12 2021

It's a gross hack but works for now. To prevent overly sensitive mic settings autotuned by the browser in web conferences, I currently edit as root /usr/share/pulseaudio/alsa-mixer/paths/analog-input-internal-mic.conf. Change in [Element Capture] the setting volume from merge to 80.

The config block as a whole looks like this:

[Element Capture]
switch = mute
volume = 80
override-map.1 = all
override-map.2 = all-left,all-right

Solution found at https://askubuntu.com/a/761103.

Posted Wed Jun 2 15:38:39 2021