Fun with ovs-vswitchd II: Solving the ovs-vswitchd --detach logging loss
NOTE: If you want, you can read the prequel for this blog post first (Fun with ovs-vswitchd I), but it is not needed to understand this issue.
In big and complex cloud environments, having proper logging is an essential part of the health of the system.
Debugging these environments can become quite tangled. When splitting the ovs-ctl reload command
got solved, the service execution started using the --detach
flag, which side-effect was that service logs were lost
and not outputted to the standard output. To be able to debug and understand what happens on the ovs-vswitchd container
it was important to fix it as soon as possible.
The ovs-vswitchd
container communicates with the ovn-controller
container and is responsible
for all of the packet forwarding on the controller node. It’s a small part of the full
ovn architecture, which you can
check on their official manual. Remember that the ovs-vswitchd
container is part of the Openstack
ML2/OVN mechanism which in RHOSO belongs to the ovn-controller-ovs
pod together with the ovsdb-server
container, and you need to be on the openstack namespace in order to check it.
The final goal of this issue is to make the logs available in the pod logs. This will allow system administrators to check the logs using this:
oc get logs -c ovs-vswitchd $(oc get pod -l service=ovn-controller-ovs -o name)
The first try
The most direct approach was to sort around the detach by detaching the command using &
instead
of --detach
. That way we are “detaching” it to a different execution thread, allowing the process
to continue without leaving start-vswitchd.sh
stuck on the service execution, but still being able
to redirect it thanks to the > /dev/stdout 2>&1
part. This is because bourne-style shells allow standard error to
be redirected to the same destination that standard output is directed to using 2>&1
.
Apart from this, the ovs_pid=!$
saves the process pid, because we want to avoid the
start-vswitchd.sh
execution from finishing once the last command on the script is done. Otherwise the
ovs-vswitchd container would shut down and therefore the service too. Considering we have the
ovn-operator active and controlling that the ovn-controller-ovs
pod is running, we will find the pod crashing
and being restarted on loop. By adding wait $ovs_pid
at the end of the script, the ovs-vswitchd
container will be running for as long as the service is. On the code you will also find a trap command
added to get the exit code of the ovs-vswitchd service if the script is finished.
The code
added to ./templates/ovncontroller/bin/start-vswitchd.sh
is the following:
#...
# It's safe to start vswitchd now. The stderr and stdout are redirected since
# the command needs to be in the background for the script to keep on running.
/usr/sbin/ovs-vswitchd --pidfile --mlockall > /dev/stdout 2>&1 &
ovs_pid=$!
# This should never end up being stale since we have ovsVswitchdReadinessProbe
# and ovsVswitchdLivenessProbe
if [ ! -f /var/run/openvswitch/ovs-vswitchd.pid ]; then
sleep 1
fi
# ...
# Unchanged code here handling the flow restoration
# ...
# Block script from exiting unless ovs process ends, otherwise k8s will
# restart the container again in loop.
trap "exit_rc=$?; echo ovs-vswitchd exited with rc $exit_rc; exit $exit_rc" EXIT
wait $ovs_pid
There is one chunk of the code not mentioned yet, the if condition that is after the ovs_pid variable assignation. It was added to make sure the rest of the script did not run before the ovs-vswitchd service was active and had all the init commands finished, but it is an arbitrary value. The pod would fail with this change, showing the following trace on the logs:
2025-04-04T10:27:24Z|00045|connmgr|INFO|br-int<->unix#11: sending NXTTMFC_ALREADY_MAPPED error reply to NXT_TLV_TABLE_MOD message
OFPT_ERROR (OF1.4) (xid=0x2): NXTTMFC_ALREADY_MAPPED
NXT_TLV_TABLE_MOD (OF1.4) (xid=0x2):
ADD mapping table:
class type length match field
------ ---- ------ --------------
0x102 0x80 4 tun_metadata0
-----------------------------
+ cleanup_flows_backup
+ rc_code=1
+ rm -f /var/lib/openvswitch/flows-script
+ rm -rf /var/lib/openvswitch/saved-flows
+ echo 'Exited with rc 1'
Exited with rc 1
One idea was to play with the sleep value by setting it lower (0.1, 0.2…), but that would not work — if you
didn’t wait long enough, there would be a chance that br-int was still not created, and the
script would also fail when trying to apply the restore flows. For more context on restoring flows, check
the previous entry of the blog. After unsuccessfully
looking for a logical statement that could be used to know when to continue running the script,
the best decision was to directly ask OVN folks on their opinion on this problem. And that was the key,
because it seemed this is actually a problem in ovn-controller, which is not able to handle the
--ovn-restore-wait
flag on it’s own when the --detach
flag is not present.
So we arrived to an dead end here, a workaround for this is needed.
The second try
Not using --detach
was discarded at this point, but luckily Yatin (one of the ovn-operator maintainers)
quickly suggested a second option (thanks!): It seems like the ovs-vswitchd service supports the
--log-file
flag, which ensures logs are being stored on a file of the system. Therefore the problem
now transformed into making sure the logs from the file are being echoed into the standard output:
# We need to use --log-file since --detach disables all logging explicitly.
# Once https://issues.redhat.com/browse/FDP-1292 is fixed we will be able to
# use system logs only.
/usr/sbin/ovs-vswitchd --pidfile --mlockall --detach --log-file
# ...
# Unchanged code here handling the flow restoration
# ...
tail -f /var/log/openvswitch/ovs-vswitchd.log
This will generate an extra file is generated in the container, which is something to be avoided when possible. But it works!
Testing
To make sure this solutions were working, it was not only important to check that the logs did appear, but also that the process of flow restoration was not broken. Therefore the solutions were tested on a CRC environment where the podified Openstack environment was running (deployed using install_yamls). We use this to create a more developer friendly RHOSO (Red Hat Services on Openshift) environment. For this specific scenario, the environment needed to have:
- At least one EDPM (External Data Plane Management) node serving as a compute node.
- An Openstack deployed VM running on the EDPM node being pinged to test possible packet loss.
- The Openstack environment needs to have centralized FIPs (Floating IPs) in order to ensure that the packets from the ping are going through the controller node, and therefore through the ovs-vswitchd container.
The VM can be easily deployed using install_yamls
, using make edpm_deploy_instance
from the devsetup MakeFile.
In order to make the OSP environment have centralized FIPs, my way to do it on a podified env was to modify the
Openstack Control Plane CR oc edit oscp
and adding this configuration on the Neutron section.
The change modifies the neutron.conf
that is then used by the Neutron service (which will configure OVN):
neutron:
template:
customServiceConfig: |
[DEFAULT]
debug=true
[ovn]
enable_distributed_floating_ip=false
There was no significant ping loss when testing the different approaches, and the amount of packet loss was the same whether the change was applied or not, so it did not cause a regression over previous versions.
Conclusion
There are several takeaways from this issue. One of them is that our first solution was really focused
on not using --detach
, but the final solution was not to get rid of it, but to use further flags
to be able to access the logs. It’s important how we frame the problems in order to find an optimal solution.
In this case, there was a considerable amount of time spent on a solution here that never saw the light.
The current solution to the might not be perfect, but it’s quite good and easily understandable, so it could be that the best idea would have been to start directly with that approach. At the same time, I learnt many new things through the experience of debugging this issue, and I got to find a bug in core OVN! So it left a bit of a bittersweet feel on me :-)
Until next time! o/
References
- OVN Architecture: https://man7.org/linux/man-pages/man7/ovn-architecture.7.html
- OVN
--ovn-restore-wait
issue in FDP: https://issues.redhat.com/browse/FDP-1292 - Full PR Discussion for the curious.