Skip to content
Snippets Groups Projects
  1. Aug 13, 2020
    • Andrii Nakryiko's avatar
      tools/bpftool: Fix compilation warnings in 32-bit mode · 09f44b75
      Andrii Nakryiko authored
      
      Fix few compilation warnings in bpftool when compiling in 32-bit mode.
      Abstract away u64 to pointer conversion into a helper function.
      
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200813204945.1020225-2-andriin@fb.com
      09f44b75
    • Joe Stringer's avatar
      doc: Add link to bpf helpers man page · a62f68c1
      Joe Stringer authored
      
      The bpf-helpers(7) man pages provide an invaluable description of the
      functions that an eBPF program can call at runtime. Link them here.
      
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200813180807.2821735-1-joe@wand.net.nz
      a62f68c1
    • John Fastabend's avatar
      bpf, selftests: Add tests to sock_ops for loading sk · 9efa9e49
      John Fastabend authored
      
      Add tests to directly accesse sock_ops sk field. Then use it to
      ensure a bad pointer access will fault if something goes wrong.
      We do three tests:
      
      The first test ensures when we read sock_ops sk pointer into the
      same register that we don't fault as described earlier. Here r9
      is chosen as the temp register.  The xlated code is,
      
        36: (7b) *(u64 *)(r1 +32) = r9
        37: (61) r9 = *(u32 *)(r1 +28)
        38: (15) if r9 == 0x0 goto pc+3
        39: (79) r9 = *(u64 *)(r1 +32)
        40: (79) r1 = *(u64 *)(r1 +0)
        41: (05) goto pc+1
        42: (79) r9 = *(u64 *)(r1 +32)
      
      The second test ensures the temp register selection does not collide
      with in-use register r9. Shown here r8 is chosen because r9 is the
      sock_ops pointer. The xlated code is as follows,
      
        46: (7b) *(u64 *)(r9 +32) = r8
        47: (61) r8 = *(u32 *)(r9 +28)
        48: (15) if r8 == 0x0 goto pc+3
        49: (79) r8 = *(u64 *)(r9 +32)
        50: (79) r9 = *(u64 *)(r9 +0)
        51: (05) goto pc+1
        52: (79) r8 = *(u64 *)(r9 +32)
      
      And finally, ensure we didn't break the base case where dst_reg does
      not equal the source register,
      
        56: (61) r2 = *(u32 *)(r1 +28)
        57: (15) if r2 == 0x0 goto pc+1
        58: (79) r2 = *(u64 *)(r1 +0)
      
      Notice it takes us an extra four instructions when src reg is the
      same as dst reg. One to save the reg, two to restore depending on
      the branch taken and a goto to jump over the second restore.
      
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718355325.4728.4163036953345999636.stgit@john-Precision-5820-Tower
      9efa9e49
    • John Fastabend's avatar
      bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers · 8e0c1517
      John Fastabend authored
      
      Loads in sock_ops case when using high registers requires extra logic to
      ensure the correct temporary value is used. We need to ensure the temp
      register does not use either the src_reg or dst_reg. Lets add an asm
      test to force the logic is triggered.
      
      The xlated code is here,
      
        30: (7b) *(u64 *)(r9 +32) = r7
        31: (61) r7 = *(u32 *)(r9 +28)
        32: (15) if r7 == 0x0 goto pc+2
        33: (79) r7 = *(u64 *)(r9 +0)
        34: (63) *(u32 *)(r7 +916) = r8
        35: (79) r7 = *(u64 *)(r9 +32)
      
      Notice r9 and r8 are not used for temp registers and r7 is chosen.
      
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718353345.4728.8805043614257933227.stgit@john-Precision-5820-Tower
      8e0c1517
    • John Fastabend's avatar
      bpf, selftests: Add tests for ctx access in sock_ops with single register · 86ed4be6
      John Fastabend authored
      
      To verify fix ("bpf: sock_ops ctx access may stomp registers in corner case")
      we want to force compiler to generate the following code when accessing a
      field with BPF_TCP_SOCK_GET_COMMON,
      
           r1 = *(u32 *)(r1 + 96) // r1 is skops ptr
      
      Rather than depend on clang to do this we add the test with inline asm to
      the tcpbpf test. This saves us from having to create another runner and
      ensures that if we break this again test_tcpbpf will crash.
      
      With above code we get the xlated code,
      
        11: (7b) *(u64 *)(r1 +32) = r9
        12: (61) r9 = *(u32 *)(r1 +28)
        13: (15) if r9 == 0x0 goto pc+4
        14: (79) r9 = *(u64 *)(r1 +32)
        15: (79) r1 = *(u64 *)(r1 +0)
        16: (61) r1 = *(u32 *)(r1 +2348)
        17: (05) goto pc+1
        18: (79) r9 = *(u64 *)(r1 +32)
      
      We also add the normal case where src_reg != dst_reg so we can compare
      code generation easily from llvm-objdump and ensure that case continues
      to work correctly. The normal code is xlated to,
      
        20: (b7) r1 = 0
        21: (61) r1 = *(u32 *)(r3 +28)
        22: (15) if r1 == 0x0 goto pc+2
        23: (79) r1 = *(u64 *)(r3 +0)
        24: (61) r1 = *(u32 *)(r1 +2348)
      
      Where the temp variable is not used.
      
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718351457.4728.3295119261717842496.stgit@john-Precision-5820-Tower
      86ed4be6
    • John Fastabend's avatar
      bpf: sock_ops sk access may stomp registers when dst_reg = src_reg · 84f44df6
      John Fastabend authored
      
      Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
      src_reg = dst_reg when reading the sk field of a sock_ops struct we
      generate xlated code,
      
        53: (61) r9 = *(u32 *)(r9 +28)
        54: (15) if r9 == 0x0 goto pc+3
        56: (79) r9 = *(u64 *)(r9 +0)
      
      This stomps on the r9 reg to do the sk_fullsock check and then when
      reading the skops->sk field instead of the sk pointer we get the
      sk_fullsock. To fix use similar pattern noted in the previous fix
      and use the temp field to save/restore a register used to do
      sk_fullsock check.
      
      After the fix the generated xlated code reads,
      
        52: (7b) *(u64 *)(r9 +32) = r8
        53: (61) r8 = *(u32 *)(r9 +28)
        54: (15) if r9 == 0x0 goto pc+3
        55: (79) r8 = *(u64 *)(r9 +32)
        56: (79) r9 = *(u64 *)(r9 +0)
        57: (05) goto pc+1
        58: (79) r8 = *(u64 *)(r9 +32)
      
      Here r9 register was in-use so r8 is chosen as the temporary register.
      In line 52 r8 is saved in temp variable and at line 54 restored in case
      fullsock != 0. Finally we handle fullsock == 0 case by restoring at
      line 58.
      
      This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
      this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
      bit more confusing than just adding a new macro despite a bit of
      duplicating code.
      
      Fixes: 1314ef56 ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower
      84f44df6
    • John Fastabend's avatar
      bpf: sock_ops ctx access may stomp registers in corner case · fd09af01
      John Fastabend authored
      
      I had a sockmap program that after doing some refactoring started spewing
      this splat at me:
      
      [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      [...]
      [18610.807359] Call Trace:
      [18610.807370]  ? 0xffffffffc114d0d5
      [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
      [18610.807391]  tcp_connect+0x895/0xd50
      [18610.807400]  tcp_v4_connect+0x465/0x4e0
      [18610.807407]  __inet_stream_connect+0xd6/0x3a0
      [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
      [18610.807417]  inet_stream_connect+0x3b/0x60
      [18610.807425]  __sys_connect+0xed/0x120
      
      After some debugging I was able to build this simple reproducer,
      
       __section("sockops/reproducer_bad")
       int bpf_reproducer_bad(struct bpf_sock_ops *skops)
       {
              volatile __maybe_unused __u32 i = skops->snd_ssthresh;
              return 0;
       }
      
      And along the way noticed that below program ran without splat,
      
      __section("sockops/reproducer_good")
      int bpf_reproducer_good(struct bpf_sock_ops *skops)
      {
              volatile __maybe_unused __u32 i = skops->snd_ssthresh;
              volatile __maybe_unused __u32 family;
      
              compiler_barrier();
      
              family = skops->family;
              return 0;
      }
      
      So I decided to check out the code we generate for the above two
      programs and noticed each generates the BPF code you would expect,
      
      0000000000000000 <bpf_reproducer_bad>:
      ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
             0:       r1 = *(u32 *)(r1 + 96)
             1:       *(u32 *)(r10 - 4) = r1
      ;       return 0;
             2:       r0 = 0
             3:       exit
      
      0000000000000000 <bpf_reproducer_good>:
      ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
             0:       r2 = *(u32 *)(r1 + 96)
             1:       *(u32 *)(r10 - 4) = r2
      ;       family = skops->family;
             2:       r1 = *(u32 *)(r1 + 20)
             3:       *(u32 *)(r10 - 8) = r1
      ;       return 0;
             4:       r0 = 0
             5:       exit
      
      So we get reasonable assembly, but still something was causing the null
      pointer dereference. So, we load the programs and dump the xlated version
      observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
      translated by the skops access helpers.
      
      int bpf_reproducer_bad(struct bpf_sock_ops * skops):
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         0: (61) r1 = *(u32 *)(r1 +28)
         1: (15) if r1 == 0x0 goto pc+2
         2: (79) r1 = *(u64 *)(r1 +0)
         3: (61) r1 = *(u32 *)(r1 +2340)
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         4: (63) *(u32 *)(r10 -4) = r1
      ; return 0;
         5: (b7) r0 = 0
         6: (95) exit
      
      int bpf_reproducer_good(struct bpf_sock_ops * skops):
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         0: (61) r2 = *(u32 *)(r1 +28)
         1: (15) if r2 == 0x0 goto pc+2
         2: (79) r2 = *(u64 *)(r1 +0)
         3: (61) r2 = *(u32 *)(r2 +2340)
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         4: (63) *(u32 *)(r10 -4) = r2
      ; family = skops->family;
         5: (79) r1 = *(u64 *)(r1 +0)
         6: (69) r1 = *(u16 *)(r1 +16)
      ; family = skops->family;
         7: (63) *(u32 *)(r10 -8) = r1
      ; return 0;
         8: (b7) r0 = 0
         9: (95) exit
      
      Then we look at lines 0 and 2 above. In the good case we do the zero
      check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
      into the bpf_sock_ops check and we can confirm that is the 'struct
      sock *sk' pointer field. But, in the bad case,
      
         0: (61) r1 = *(u32 *)(r1 +28)
         1: (15) if r1 == 0x0 goto pc+2
         2: (79) r1 = *(u64 *)(r1 +0)
      
      Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
      line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
      
      [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      
      The 0x01 makes sense because that is exactly the fullsock value. And
      its not a valid dereference so we splat.
      
      To fix we need to guard the case when a program is doing a sock_ops field
      access with src_reg == dst_reg. This is already handled in the load case
      where the ctx_access handler uses a tmp register being careful to
      store the old value and restore it. To fix the get case test if
      src_reg == dst_reg and in this case do the is_fullsock test in the
      temporary register. Remembering to restore the temporary register before
      writing to either dst_reg or src_reg to avoid smashing the pointer into
      the struct holding the tmp variable.
      
      Adding this inline code to test_tcpbpf_kern will now be generated
      correctly from,
      
        9: r2 = *(u32 *)(r2 + 96)
      
      to xlated code,
      
        12: (7b) *(u64 *)(r2 +32) = r9
        13: (61) r9 = *(u32 *)(r2 +28)
        14: (15) if r9 == 0x0 goto pc+4
        15: (79) r9 = *(u64 *)(r2 +32)
        16: (79) r2 = *(u64 *)(r2 +0)
        17: (61) r2 = *(u32 *)(r2 +2348)
        18: (05) goto pc+1
        19: (79) r9 = *(u64 *)(r2 +32)
      
      And in the normal case we keep the original code, because really this
      is an edge case. From this,
      
        9: r2 = *(u32 *)(r6 + 96)
      
      to xlated code,
      
        22: (61) r2 = *(u32 *)(r6 +28)
        23: (15) if r2 == 0x0 goto pc+2
        24: (79) r2 = *(u64 *)(r6 +0)
        25: (61) r2 = *(u32 *)(r2 +2348)
      
      So three additional instructions if dst == src register, but I scanned
      my current code base and did not see this pattern anywhere so should
      not be a big deal. Further, it seems no one else has hit this or at
      least reported it so it must a fairly rare pattern.
      
      Fixes: 9b1f3d6e ("bpf: Refactor sock_ops_convert_ctx_access")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower
      fd09af01
    • Toke Høiland-Jørgensen's avatar
      libbpf: Prevent overriding errno when logging errors · 23ab656b
      Toke Høiland-Jørgensen authored
      
      Turns out there were a few more instances where libbpf didn't save the
      errno before writing an error message, causing errno to be overridden by
      the printf() return and the error disappearing if logging is enabled.
      
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200813142905.160381-1-toke@redhat.com
      23ab656b
    • Jiri Olsa's avatar
      bpf: Iterate through all PT_NOTE sections when looking for build id · b33164f2
      Jiri Olsa authored
      
      Currently when we look for build id within bpf_get_stackid helper
      call, we check the first NOTE section and we fail if build id is
      not there.
      
      However on some system (Fedora) there can be multiple NOTE sections
      in binaries and build id data is not always the first one, like:
      
        $ readelf -a /usr/bin/ls
        ...
        [ 2] .note.gnu.propert NOTE             0000000000000338  00000338
             0000000000000020  0000000000000000   A       0     0     8358
        [ 3] .note.gnu.build-i NOTE             0000000000000358  00000358
             0000000000000024  0000000000000000   A       0     0     437c
        [ 4] .note.ABI-tag     NOTE             000000000000037c  0000037c
        ...
      
      So the stack_map_get_build_id function will fail on build id retrieval
      and fallback to BPF_STACK_BUILD_ID_IP.
      
      This patch is changing the stack_map_get_build_id code to iterate
      through all the NOTE sections and try to get build id data from
      each of them.
      
      When tracing on sched_switch tracepoint that does bpf_get_stackid
      helper call kernel build, I can see about 60% increase of successful
      build id retrieval. The rest seems fails on -EFAULT.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200812123102.20032-1-jolsa@kernel.org
      b33164f2
    • Jean-Philippe Brucker's avatar
      libbpf: Handle GCC built-in types for Arm NEON · 702eddc7
      Jean-Philippe Brucker authored
      
      When building Arm NEON (SIMD) code from lib/raid6/neon.uc, GCC emits
      DWARF information using a base type "__Poly8_t", which is internal to
      GCC and not recognized by Clang. This causes build failures when
      building with Clang a vmlinux.h generated from an arm64 kernel that was
      built with GCC.
      
      	vmlinux.h:47284:9: error: unknown type name '__Poly8_t'
      	typedef __Poly8_t poly8x16_t[16];
      	        ^~~~~~~~~
      
      The polyX_t types are defined as unsigned integers in the "Arm C
      Language Extension" document (101028_Q220_00_en). Emit typedefs based on
      standard integer types for the GCC internal types, similar to those
      emitted by Clang.
      
      Including linux/kernel.h to use ARRAY_SIZE() incidentally redefined
      max(), causing a build bug due to different types, hence the seemingly
      unrelated change.
      
      Reported-by: default avatarJakov Petrina <jakov.petrina@sartura.hr>
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200812143909.3293280-1-jean-philippe@linaro.org
      702eddc7
    • Andrii Nakryiko's avatar
      tools/bpftool: Make skeleton code C++17-friendly by dropping typeof() · 8faf7fc5
      Andrii Nakryiko authored
      
      Seems like C++17 standard mode doesn't recognize typeof() anymore. This can
      be tested by compiling test_cpp test with -std=c++17 or -std=c++1z options.
      The use of typeof in skeleton generated code is unnecessary, all types are
      well-known at the time of code generation, so remove all typeof()'s to make
      skeleton code more future-proof when interacting with C++ compilers.
      
      Fixes: 985ead41 ("bpftool: Add skeleton codegen command")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200812025907.1371956-1-andriin@fb.com
      8faf7fc5
    • Andrii Nakryiko's avatar
      bpf: Fix XDP FD-based attach/detach logic around XDP_FLAGS_UPDATE_IF_NOEXIST · 068d9d1e
      Andrii Nakryiko authored
      
      Enforce XDP_FLAGS_UPDATE_IF_NOEXIST only if new BPF program to be attached is
      non-NULL (i.e., we are not detaching a BPF program).
      
      Fixes: d4baa936 ("bpf, xdp: Extract common XDP program attachment logic")
      Reported-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20200812022923.1217922-1-andriin@fb.com
      068d9d1e
  2. Aug 11, 2020
  3. Aug 10, 2020
  4. Aug 08, 2020
  5. Aug 07, 2020
    • Randy Dunlap's avatar
      bpf: Delete repeated words in comments · b8c1a309
      Randy Dunlap authored
      
      Drop repeated words in kernel/bpf/: {has, the}
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200807033141.10437-1-rdunlap@infradead.org
      b8c1a309
    • Andrii Nakryiko's avatar
      selftests/bpf: Fix silent Makefile output · d5ca5905
      Andrii Nakryiko authored
      
      99aacebe ("selftests: do not use .ONESHELL") removed .ONESHELL, which
      changes how Makefile "silences" multi-command target recipes. selftests/bpf's
      Makefile relied (a somewhat unknowingly) on .ONESHELL behavior of silencing
      all commands within the recipe if the first command contains @ symbol.
      Removing .ONESHELL exposed this hack.
      
      This patch fixes the issue by explicitly silencing each command with $(Q).
      
      Also explicitly define fallback rule for building *.o from *.c, instead of
      relying on non-silent inherited rule. This was causing a non-silent output for
      bench.o object file.
      
      Fixes: 92f7440e ("selftests/bpf: More succinct Makefile output")
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200807033058.848677-1-andriin@fb.com
      d5ca5905
    • Alan Maguire's avatar
      bpf, doc: Remove references to warning message when using bpf_trace_printk() · 7fb20f9e
      Alan Maguire authored
      
      The BPF helper bpf_trace_printk() no longer uses trace_printk();
      it is now triggers a dedicated trace event.  Hence the described
      warning is no longer present, so remove the discussion of it as
      it may confuse people.
      
      Fixes: ac5a72ea ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()")
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/1596801029-32395-1-git-send-email-alan.maguire@oracle.com
      7fb20f9e
    • Xie He's avatar
      drivers/net/wan/lapbether: Added needed_headroom and a skb->len check · c7ca03c2
      Xie He authored
      
      1. Added a skb->len check
      
      This driver expects upper layers to include a pseudo header of 1 byte
      when passing down a skb for transmission. This driver will read this
      1-byte header. This patch added a skb->len check before reading the
      header to make sure the header exists.
      
      2. Changed to use needed_headroom instead of hard_header_len to request
      necessary headroom to be allocated
      
      In net/packet/af_packet.c, the function packet_snd first reserves a
      headroom of length (dev->hard_header_len + dev->needed_headroom).
      Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
      which calls dev->header_ops->create, to create the link layer header.
      If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
      length (dev->hard_header_len), and assumes the user to provide the
      appropriate link layer header.
      
      So according to the logic of af_packet.c, dev->hard_header_len should
      be the length of the header that would be created by
      dev->header_ops->create.
      
      However, this driver doesn't provide dev->header_ops, so logically
      dev->hard_header_len should be 0.
      
      So we should use dev->needed_headroom instead of dev->hard_header_len
      to request necessary headroom to be allocated.
      
      This change fixes kernel panic when this driver is used with AF_PACKET
      SOCK_RAW sockets.
      
      Call stack when panic:
      
      [  168.399197] skbuff: skb_under_panic: text:ffffffff819d95fb len:20
      put:14 head:ffff8882704c0a00 data:ffff8882704c09fd tail:0x11 end:0xc0
      dev:veth0
      ...
      [  168.399255] Call Trace:
      [  168.399259]  skb_push.cold+0x14/0x24
      [  168.399262]  eth_header+0x2b/0xc0
      [  168.399267]  lapbeth_data_transmit+0x9a/0xb0 [lapbether]
      [  168.399275]  lapb_data_transmit+0x22/0x2c [lapb]
      [  168.399277]  lapb_transmit_buffer+0x71/0xb0 [lapb]
      [  168.399279]  lapb_kick+0xe3/0x1c0 [lapb]
      [  168.399281]  lapb_data_request+0x76/0xc0 [lapb]
      [  168.399283]  lapbeth_xmit+0x56/0x90 [lapbether]
      [  168.399286]  dev_hard_start_xmit+0x91/0x1f0
      [  168.399289]  ? irq_init_percpu_irqstack+0xc0/0x100
      [  168.399291]  __dev_queue_xmit+0x721/0x8e0
      [  168.399295]  ? packet_parse_headers.isra.0+0xd2/0x110
      [  168.399297]  dev_queue_xmit+0x10/0x20
      [  168.399298]  packet_sendmsg+0xbf0/0x19b0
      ......
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Martin Schiller <ms@dev.tdt.de>
      Cc: Brian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7ca03c2
  6. Aug 06, 2020
Loading