How-to: Host System Performance Tuning Guide
1. Overview
This how-to guide discusses host system performance tuning tips that support high sample rate and high data throughput operation when working with Per Vices SDRs.
For a discussion on host machine data flow using Per Vices SDR, see the reference here.
2. Benchmarking your host system’s performance
-
Update to the latest version of UHD. Find this information at how-to install UHD.
-
Once you have the latest package build, navigate to /lib/uhd/examples/benchmark_rate. To see the options to run this program, use the following command:
$ sudo ./benchmark_rate --help
-
An example of running this program to test data flowing (10 seconds, at 162.5MSPS to channal A) from the host system to the FPGA is the following:
$ sudo ./benchmark_rate --duration 10 --tx_rate 162500000 --tx_otw=sc16 --tx_cpu=sc16 --channels=0 --tx_channels "0" --drop-threshold 1
-
An example of running this program to test data flowing from the FPGA to the host system (10 seconds, at 162.5MSPS from channal A) is the following:
$ sudo ./benchmark_rate --duration 10 --rx_rate 162500000 --rx_otw=sc16 --rx_cpu=sc16 --channels=0 --rx_channels "0" --drop-threshold 1
-
After running this test, you will see the following variable results:
Benchmark rate summary: Num received samples: ### Num dropped samples: ### Num overruns detected: ### Num transmitted samples: ### Num sequence errors (Tx): ### Num sequence errors (Rx): ### Num underruns detected: ### Num late commands: ### Num timeouts (Tx): ### Num timeouts (Rx): ###
-Num received samples is the number of samples received multiplied by number of channels.
-Num dropped samples is how many samples dropped during the Tx or Rx rate test.
-Num overruns detected means that the transport has either dropped a packet or received data out of order.
-Num transmitted samples is number of samples transmitted multiplied by number of channels.
-Num sequence errors (Tx) means there was packet loss between host and device or packet loss within a burst.
-Num sequence errors (Rx) means an internal receive buffer has filled or a sequence error has been detected.
-Num underruns detected means an internal send buffer has emptied or an underflow occurred inside a packet and equals the number of status messages reported.
-Num late commands means a stream command was issued in the past.
-Num timeouts (Tx) means there was no samples in the buffer to send.
-Num timeouts (Rx) means there was no packet received and the implementation timed-out.
3. CPU Performance Governer
CPU performance scaling enables the operating system to scale the CPU frequency (i.e. clock speed) up or down in order to save power or improve performance. The Linux kernel offers CPU performance scaling via the cpufreq subsystem, which defines two layers of abstraction: scaling governors implement the algorithms to compute the desired CPU frequency, potentially based off of the system’s needs, and scaling drivers interact with the CPU directly, enacting the desired frequencies that the current governor is requesting.
For SDR and host system high performance, it is important to set CPU governor to performance. This can be done with a Linux utility subsystem called cpufrequtils. You can find more information for Ubuntu here, and for Archlinux here.
The steps for setting the CPU governor to high performance are:
-
Install cpufrequtils in for your specific Linux distrubution.
-
You can then set the CPU governor to performance on each core by issuing the command:
$ sudo cpufreq-set -c $core_num -g performance
-
Or, to set the CPU governor to performance for all cores, run the following command:
$ for ((i=0;i<$(nproc);i++)); do sudo cpufreq-set -c $i -r -g performance; done
-
You can verify that the changes have been made to the CPU governor using running the following command:
$ cpufreq-information
4. Set Correct Ethernet Maximum Transmission Unit (MTU) on your NIC
For the 10Gbps Ethernet used with Crimson TNG, and to get the maximum capabilities from this, it is important to set the MTU to 9000. To set the correct MTU size, do the following:
-
To view you current MTU size, run the following command:
This will output the interface_name on the left hand side, and the current mtu size on the right hand side.$ ifconfig | grep mtu
-
To temporarily change the MTU size, you can run the following command:
For example, to set the interface named ens4 to 9000, you can do the following:$ ifconfig <interface_name> mtu <mtu_size> up
$ ifconfig ens4 mtu 9000 up
-
Or, to make your MTU size change permanent (i.e. between reboots), you can follow the guides here depending on your distribution:
Changing MTU size permanently in Linux using /etc/dhcp/dhclient.conf file.
Changing MTU size permanently in CentOS/Redhat/Fedora using /etc/sysconfig/network-scripts/ifcfg-eth0.
5. Adjust Network Buffers
We use the UDP protocol in our network stack to send data to/from the SDR and host system NIC. Linux places vlimits on the performance of the UDP protocol by restricting the size of the UDP traffic that is allowed to buffer on the receive socket.
-
To find the current values and max values of the UDP/IP receive and send buffer size, type the following commands in a terminal:
$ sysctl net.core.rmem_default
$ sysctl net.core.wmem_default
$ sysctl net.core.rmem_max
$ sysctl net.core.wmem_max
-
If the values are not atleast 6.25MB, you should change this in the /etc/sysctl.conf file (note these changes to not take effect until reboot):
net.core.rmem_max=50000000 net.core.rmem_default=50000000 net.core.wmem_max=50000000 net.core.wmem_default=50000000
-
To make these changes immediately, run as root:
$ sudo sysctl -w net.core.wmem_max=50000000 $ sudo sysctl -w net.core.rmem_max=50000000 $ sudo sysctl -w net.core.wmem_default=50000000 $ sudo sysctl -w net.core.rmem_default=50000000
6. Ethernet Ring Buffers
A ring buffer or circular buffer is a data structure, and is essentially how each NIC queue is implemented for data streaming. These structures are shared buffers beteween the SDR hardware driver and NIC, and store incoming packets until the device driver can process them. These exist for both Rx and Tx functionality, and can help with flow control errors (dropped packets, etc) at higher rates.
To increase ethernet ring buffer sizes on Linux, you can use ethtool.
-
To see your currently configured network cards, run the following commands:
$ ifconfig -a
-
To see the current ring buffer size, run the following:
$ ethtool -g
- To change your ring buffer size, run the following with your interface from step 1:
$ sudo ethtool -G <interface> tx 4096 rx 4096
7. Disable Hyper-threading
Hyperthreading is essentially simultaneous multithreading, which is when a CPU splits each of it’s physical cores into virtual ones known as threads. In applications which require the highest possible CPU performance per core, disabling hyper-threading can provide roughly a 10% increase in core performance. However, this means you will have less core threads. Hyper-threading can be disabled within the BIOs and varies by motherboard manufacturer.
8. Disable Kernal Page-table Isolation (KPTI) Protections for Spectre/Meltdown
In some cases, disabling the KPTI protections for the Linux Kernel can increase performance by 10-15%. See the wikipedia page on KPTI for important information if you’re considering this.
It is important to note the consequences that making this modification can have. This modification is only recommended for SDRs and host systems that absolutely require the best performance and are not connected to the internet. Some more useful links regarding security vulnerabilities are:
Meltdown (Security vulnerabilitiy
spectre (Security vulnerability)
-
Disabling KPTI protections can be done by adding the lines below to your /etc/default/grub file at GRUB_CMDLINE_LINUX_DEFAULT=”“:
pti=off spectre_v2=off l1tf=off nospec_store_bypass_disable no_stf_barrier
-
After modifying this grub file, you’ll have to run the following command to update the configuration file and reboot:
$ sudo update-grub