pmeerw's blog

Wed, 06 Nov 2024

Docker DNS configuration woes

Background: DNS resolution on Linux is controlled by /etc/resolv.conf, where up to three nameservers can be configured among other things (search list, timeout, attempts, etc.)

Nameservers are queried in order, the second nameserver is only asked if there is no response from the first. When there is an answer (even a negative one), further nameservers are not consulted. This can be changed with the rotate option. The man page has more info.

How does Docker configure the container's DNS?

At least for bridged container network configuration (default), Docker mounts some host files into the container:

$ mount
...
/dev/nvme0n1p3 on /etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro)
/dev/nvme0n1p3 on /etc/hostname type ext4 (rw,relatime,errors=remount-ro)
/dev/nvme0n1p3 on /etc/hosts type ext4 (rw,relatime,errors=remount-ro)

Hence, resolv.conf that the resolver service of the host uses is used by the container. I.e. on a system with systemd's resolved, the nameservers used by resolved will be used by the container, and NOT the local resolver 172.0.0.53. I don't know why, I think this makes no sense and complicates configuration.

The problem

All nameservers in /etc/resolv.conf shall return the same information. However, this is not the case if local, private and public nameservers are used. Private domains (such as example.local) can only be resolved by the private nameserver. This is in principle possible to configure in resolved, but not easily passed on to the container. In case multiple nameservers are configured for the container and the first local, private nameserver is unreliable or too slow, the fallback nameserver will be queried. This leads to sporadic host name lookup failure for private hosts on a local domain.

nameserver 172.x.y.z # private, can resolve example.local
nameserver 1.1.1.1 # public

Oh no, Snap!
Ubuntu can package docker as a Snap, adding some more complication...

The dockerd config file (--config-file=/var/snap/docker/nnnn/config/daemon.json) for Snap luckily lives in var/snap/docker/current/config/ and is editable, hurray!

Configuration changes

Edit /var/snap/docker/current/config/daemon.json to override DNS configuration for all containers:

{
    "dns": ["172.x.y.z"]
}
Restart the docker container service:
sudo snap restart docker

posted at: 19:05 | path: /configuration | permanent link

Fri, 02 Aug 2024

Windows User Account Control (UAC) and Unknown publisher

A signed Windows executable allows windows to display the publisher name in the UAC dialog, except sometimes it doesn't work. Windows uses Authenticode to verify the integrity of a PE32 executable and provide authentication via code signing.

One way to learn more what UAC does w.r.t. crypto is to enable CAPI2 diagnostics , i.e. event logging.

Things to remember: the entire certificate chain up to but not including the root CA's certificate should be in the executable, i.e. all intermediate certificate. When certificate are missing, they might be retrieved by Certificate Authority Information Access (AIA), specified in RFC5280 via some HTTP URLs given in the certificates.

Different applications implement different verification policies: caching of certifiates, revocation list checks, etc. It's know clear what checks Windows, or the UAC dialog, or other application do to check the authenticity of an executable.

Tooling is difficult: again, it's not clear what the verification policy is. For example, Microsoft's signtool does not complain about missing intermediate certificates.

Looking for some more mystery to research: Try page hashes!

posted at: 00:45 | path: /programming | permanent link

Sun, 21 Apr 2024

A long tale of an (inconvenient) bug: Firefox not working on Ubuntu 24.04 pre-beta

Sometimes one feels adventurous: so I made the hasty decision to upgrade from Ubuntu 22.04 to Ubuntu 24.04 (Noble Numbat) in alpha/pre-beta state -- I swear, this has worked nicely previously.

The update went semi-smooth, only the network/nameserver configuration was lost, but that was easy to fix by manually installing the network-manager package and editing /etc/resolv.conf. It's good if you can remember to manually bring up an network interface, add a default route, set a static DNS:

ifconfig -a # to discover available network interfaces
ifconfig enp5s0 192.168.1.123 # set static IP
route add default gw 192.168.1.1 # add default route
nano /etc/resolv.conf # set static DNS

All looks good, except: Firefox cannot resolve any host name. Also Chromium. Also Slack is dead. Uugh, the machine is pretty useless without the Web. However, network connectivity is good: ping works, nslookup works. Turned out only snap applications are affected, but that was not so clear in the beginning.

Initially, I thought some browser-related package is at fault. Here are my notes of the debugging process and the train of thought; in any case, I learned quite a bit...

System calls

I used strace to look for failing syscalls. I also tried ltrace, but that did not really work. So strace showed that the /etc/nsswitch.conf and the /etc/hosts files are being read, a connect() call to establish a connection to the DNS resolver/nameserver (to 127.0.0.53, port 53 in my case) given in /etc/resolv.conf, the DNS query being sent, and the DNS response being received, but the recvfrom() syscall returning an error code (EINVAL). Inspecting the query traffic with wireshark, everthing looked good on the network layer.

I made a couple of experiments: the browser had no issue connecting to IP addresses directly, also a connection to a hostname in the /etc/hosts file was no problem. So DNS...

Firefox debugging

Next, I tried to corner the issue from above and learned about Firefox's built-in debugging and logging capabilities. Firefox is my main browser; it exposes a wide range of information. Entering about:networking in the browser's address bar, one can observe recent HTTP connections, network socket information, a glimpse on the DNS cache (see screenshot). Using the 'DNS Lookup' functionality, it is easy to confirm that Firefox cannot resolve any hostname.

good
bad

Entering about:logging presents Firefox's logging manger. One restrict logging to certain modules and log levels (supposedly, that is what '5' means). The Firefox Profiler is some online capability to diagnose log files and obviously not very helpful to track down network issues, hence I selected 'Logging to a file'. Since Firefox is usually run in a Snap container, the log file actually ends up inthe /tmp/snap-private-tmp/snap.firefox/tmp/ directory of the file system.

A successful DNS lookup for 'example.com' produces the following lines:

Parent 17655: Main Thread]: D/nsHostResolver Resolving host
Parent 17655: Main Thread]: D/nsHostResolver   No usable record in cache for host
Parent 17655: Main Thread]: D/nsHostResolver NameLookup host:example.com af:0
Parent 17655: Main Thread]: D/nsHostResolver NameLookup: example.com effectiveTRRmode: 1 flags: 0
Parent 17655: Main Thread]: D/nsHostResolver TRR service not enabled - off or disabled
Parent 17655: Main Thread]: D/nsHostResolver NativeLookup host:example.com af:0
Parent 17655: Main Thread]: D/nsHostResolver   DNS thread counters: total=6 any-live=0 idle=6 pending=1
Parent 17655: Main Thread]: D/nsHostResolver   DNS lookup for host
Parent 17655: DNS Resolver #123]: E/nsHostResolver DNS lookup thread - Calling getaddrinfo for host
Parent 17655: Main Thread]: D/nsHostResolver Resolving host
Parent 17655: Main Thread]: D/nsHostResolver   No usable record in cache for host
Parent 17655: Main Thread]: D/nsHostResolver NameLookup host:example.com af:0
Parent 17655: Main Thread]: D/nsHostResolver NameLookup: example.com effectiveTRRmode: 1 flags: 0
Parent 17655: Main Thread]: D/nsHostResolver TRR service not enabled - off or disabled
Parent 17655: DNS Resolver #123]: E/nsHostResolver DNS lookup thread - lookup completed for host
Parent 17655: DNS Resolver #123]: D/nsHostResolver nsHostResolver::CompleteLookup example.com 7650d9e24650 0 resolver=0 stillResolving=0
Parent 17655: DNS Resolver #123]: D/nsHostResolver nsHostResolver record 7650de6a1060 new gencnt
Parent 17655: DNS Resolver #123]: D/nsHostResolver Caching host
Parent 17655: DNS Resolver #123]: D/nsHostResolver CompleteLookup: example.com has 93.184.215.14
Parent 17655: DNS Resolver #123]: D/nsHostResolver CompleteLookup: example.com has 2606:2800:21f:cb07:6820:80da:af6b:8b2c
There's a couple of interesting things: (1) it calling getaddrinfo(), which lives in the C runtime; (2) nsHostResolver is a module in Firefox's code base, which is online browsable, e.g. using searchfox; (3) it ultimately calls PR_GetAddrInfoByName(), which lives in he libnspr4 package/library, it seemed like a good candidate to investigate further. Again, it's browsable online and indeed, it calls getaddrinfo().

Digging deeper, IDA Pro

The strace tool would have been really helpful here to trace at runtime the arguments passed and results returned, however, it didn't work for me. So I tried attaching the IDA Pro 8.3 disassembler/debugger to the Firefox process, a free IDA Pro version is available from Hex-Rays.

In IDA, I looked for the libnspr4.so shared object in 'Modules' list, then searched for the PR_GetAddrInfoByName() symbol in the module's function list to get a disassembly of the function.

Using the basic block graph of the function (and having the almost matching, corresponding source code), it's relatively easy to locate the potential getaddrinfo() call (right after the checks for "localhost" domain). I've set a breakpoint (press F2) on the call and on the next instruction.

Having the debugger interrupt and block the Firefox process is tedious: those functions are called too often for manual inspection of the call arguments and return code. A better way is conditional breakpoints and logging. In IDA, edit the breakpoints and enter a condition:

msg("!!!1 %s\n", get_strlit_contents(rdi, -1, STRTYPE_C)), 0
and
msg("-> %08x\n", eax), 0
respectively. Both output a message, and at the end (", 0") return 0 as the results of the conditional expression (i.e. the breakpoint's action is NOT triggered). The first message outputs the null-terminated string pointed to by the rdi register, that is the host name to resolve. The second message outputs the value of the eax registers, i.e. the result value of the getaddrinfo() function. After instrumenting the Firefox process as described, press F9 to continue execution of the debuggee. Perform a DNS query in Firefox, and the logging -- thanks to the breakpoints -- will be shown in IDA's "Output" window.

I tried to step into (press F7 in IDA) the getaddrinfo() implementation, but quickly gave up on the code. It will somehow call connect() to query the DNS resolver as seen in strace...

Showdown & conclusion

After a long time deducing all the steps, and ignoring the fact that a fresh install might have been the quicker option, here are the finding:

I tried to implement the PR_GetAddrInfoByName() in a small test program, of course no issue there:

// compile: gcc -g -Wall nspr.c -lnspr4

#include "nspr/nspr.h"
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
  PR_Init(0, PR_PRIORITY_NORMAL, 0);

  PRAddrInfo* ai = PR_GetAddrInfoByName("example.com", PR_AF_INET, PR_AI_ADDRCONFIG | PR_AI_NOCANONNAME);
  if (!ai) {
    printf("PR_GetAddrInfoByName() = %d\n", PR_GetError());
    goto error;
  }

  PR_FreeAddrInfo(ai);

error:
  PR_Cleanup();

  return EXIT_SUCCESS;
}

My suspicion was that either apparmor or the Snap containerization is to blame. Apparmor can be easily disabled, and hence ruled out. Snap is not so easy and I know little about it and I had to defer further investigation. Plan was to run the above test program under Snapand hope that the error would reproduce...

Aftermath

A couple of days later, I learned that other people experienced the same problem: "Network problems with snap apps" (on askubuntu.com), "DNS for snaps like Firefox and Chromium fails" (on launchpad), "Snaps unable to connect to network under linux-lowlatency", "Noble kernel regression with new apparmor profiles/features"

And also a solution became known: kernel update from Linux 6.8.0-25-lowlatency to Linux 6.8.0-28-lowlatency -- and the issue is gone! Somehow disappointing, I've tried to track down the actual fix, maybe it's this:

diff -u linux-6.8.0/security/apparmor/af_inet.c linux-6.8.0/security/apparmor/af_inet.c
--- linux-6.8.0/security/apparmor/af_inet.c
+++ linux-6.8.0/security/apparmor/af_inet.c
@@ -103,14 +103,12 @@
 	AA_BUG(!maddr);
 
 	maddr->addrtype = addrtype;
-	if (!addr) {
+	if (!addr || addrlen < offsetofend(struct sockaddr, sa_family)) {
 		maddr->addrp = NULL;
 		maddr->port = 0;
 		maddr->len = 0;
 		return 0;
 	}
-	if (addrlen < offsetofend(struct sockaddr, sa_family))
-		return -EINVAL;
 
 	/*
 	 * its possibly to have sk->sk_family == PF_INET6 and
This somehow would match the initial finding that recvfrom() returns -EINVAL.

posted at: 22:39 | path: /rant | permanent link

Wed, 06 Mar 2024

C++ - WTF user literals?!

user literals: "Since the introduction of user-defined literals, the code that uses format macro constants for fixed-width integer types with no space after the preceding string literal became invalid: std::printf("%"PRId64"\n",INT64_MIN); has to be replaced by std::printf("%" PRId64"\n",INT64_MIN);"

So you want me to insert a space now?

posted at: 13:12 | path: /rant | permanent link

Mon, 26 Feb 2024

constexpr string initialization fails to compile with _DEBUG

C++ code compiles with release build, fails with debug build (/D_DEBUG); MSVC obviously

Expectation: define _DEBUG (or switching between release and debug build) doesn’t change whether code is accepted; apparently Mircosoft has a different view...

// source code, x.cpp
#include <cstdio>
#include <string>

static constexpr std::string s = “asdf”;

int main() {
printf(“%s\n”, s.c_str());
}
Compile with debug:
cl /std:c++20 /D_DEBUG x.cpp
Microsoft ® C/C++ Optimizing Compiler Version 19.39.33520 for x64
Copyright © Microsoft Corporation. All rights reserved.

x.cpp
x.cpp(4): error C2131: expression did not evaluate to a constant
x.cpp(4): note: (sub-)object points to memory which was heap allocated during constant evaluation
Compile as release:
cl /std:c++20 x.cpp
Microsoft ® C/C++ Optimizing Compiler Version 19.39.33520 for x64
Copyright © Microsoft Corporation. All rights reserved.

x.cpp
Microsoft ® Incremental Linker Version 14.39.33520.0
Copyright © Microsoft Corporation. All rights reserved.

/out:x.exe
x.obj

Bonus: when the initializer string “asdf” is longer, e.g. “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasdf” also the release build fails (which is OK)

There's actually a very good and detailed technical explanation.

posted at: 10:00 | path: /programming | permanent link

Made with PyBlosxom