Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renameat2 on redhat8 & ubuntu18 #145

Open
petersilva opened this issue May 30, 2024 · 11 comments
Open

renameat2 on redhat8 & ubuntu18 #145

petersilva opened this issue May 30, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@petersilva
Copy link
Contributor

running mirroring tests, we noticed that some renames are just not caught... ones that use renameat2 ... they are... strange. Running an strace, we see the renameat2 calls happen... but they do not get intercepted as renameat or rename calls do. It just short-circuits the shim library and passes directly to the lower level one.

This is observed on both ubuntu 18, and redhat 8.

It's fine on ubuntu22 and redhat 9.

@petersilva petersilva added the bug Something isn't working label May 30, 2024
@petersilva
Copy link
Contributor Author

@reidsunderland I reproduced it on ubuntu 18.

@petersilva
Copy link
Contributor Author

perhaps relevant: https://www.phoronix.com/news/Glibc-2.28-Unicode-11-Renameat2

glibc added the renameat2 routine after the release of these two OS's.

@reidsunderland
Copy link
Member

From the shim_post test, mv correctly triggers the shim when the destination file/directory already exists. If the destination file/directory does not exist, the shim is not triggered.

Moving a file when the destination file already exists (works):

[pid 438104] renameat2(AT_FDCWD, "haha", AT_FDCWD, "hihi", RENAME_NOREPLACE) = -1 EEXIST (File exists)
[pid 438104] stat("hihi", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
[pid 438104] lstat("haha", {st_mode=S_IFLNK|0777, st_size=4, ...}) = 0
[pid 438104] newfstatat(AT_FDCWD, "hihi", {st_mode=S_IFREG|0664, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 438104] stat("haha", {st_mode=S_IFREG|0664, st_size=5, ...}) = 0
[pid 438104] geteuid()                  = 1006
[pid 438104] stat("hihi", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
[pid 438104] geteuid()                  = 1006
[pid 438104] getegid()                  = 1006
[pid 438104] getuid()                   = 1006
[pid 438104] getgid()                   = 1006
[pid 438104] access("hihi", W_OK)       = 0
[pid 438104] write(2, "SR_SHIMDEBUG 1 438104 0.0342226 ", 32SR_SHIMDEBUG 1 438104 0.0342226 ) = 32
[pid 438104] write(2, "rename haha hihi\n", 17rename haha hihi
) = 17
[pid 438104] write(2, "SR_SHIMDEBUG 1 438104 0.0342979 ", 32SR_SHIMDEBUG 1 438104 0.0342979 ) = 32
[pid 438104] write(2, " renameorlink haha hihi\n", 24 renameorlink haha hihi
) = 24
[pid 438104] renameat(AT_FDCWD, "haha", AT_FDCWD, "hihi") = 0

Moving a file when the destination does not exist:

[pid 438106] renameat2(AT_FDCWD, "/home/sarra/test/hoho_my_darling.txt", AT_FDCWD, "/home/sarra/test/hoho2.log", RENAME_NOREPLACE) = 0
[pid 438106] lseek(0, 0, SEEK_CUR)      = -1 ESPIPE (Illegal seek)
[pid 438106] write(2, "SR_SHIMDEBUG 16 438106 0.138418 ", 32SR_SHIMDEBUG 16 438106 0.138418 ) = 32
[pid 438106] write(2, " is_duped!\n", 11 is_duped!
) = 11
[pid 438106] close(0)                   = 0
[pid 438106] fcntl(1, F_GETFL)          = 0x8001 (flags O_WRONLY|O_LARGEFILE)
[pid 438106] write(2, "SR_SHIMDEBUG 5 438106 0.138565 ", 31SR_SHIMDEBUG 5 438106 0.138565 ) = 31
[pid 438106] write(2, " fclose 0x7f2d8a8836e0 fd=1 fdst"..., 52 fclose 0x7f2d8a8836e0 fd=1 fdstat=100001, starting
) = 52
[pid 438106] lstat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid 438106] lstat("/proc/self", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0
[pid 438106] readlink("/proc/self", "438106", 4095) = 6
[pid 438106] lstat("/proc/438106", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid 438106] lstat("/proc/438106/fd", {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
[pid 438106] lstat("/proc/438106/fd/1", {st_mode=S_IFLNK|0300, st_size=64, ...}) = 0
[pid 438106] readlink("/proc/438106/fd/1", "/home/sarra/metpx-sr3c/straceout", 4095) = 32
[pid 438106] lstat("/home", {st_mode=S_IFDIR|0755, st_size=34, ...}) = 0
[pid 438106] lstat("/home/sarra", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
[pid 438106] lstat("/home/sarra/metpx-sr3c", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0

From the RENAME(2) man page:

Note: There is no glibc wrapper for renameat2(); see **NOTES**
NOTES
       Glibc does not provide a wrapper for the renameat2() system call; call it using syscall(2).

   Glibc notes
       On older kernels where renameat() is unavailable, the glibc wrapper function falls back to the use of rename().  When oldpath and newpath are relative pathnames, glibc constructs pathnames based on the symbolic links in /proc/self/fd that
       correspond to the olddirfd and newdirfd arguments.

and SYSCALL(2):

SYSCALL(2)                                                                                                    Linux Programmer's Manual                                                                                                    SYSCALL(2)

NAME
       syscall - indirect system call

SYNOPSIS
       #define _GNU_SOURCE         /* See feature_test_macros(7) */
       #include <unistd.h>
       #include <sys/syscall.h>   /* For SYS_xxx definitions */

       long syscall(long number, ...);

DESCRIPTION
       syscall()  is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments.  Employing syscall() is useful, for example, when invoking a system call that has
       no wrapper function in the C library.

       syscall() saves CPU registers before making the system call, restores the registers upon return from the system call, and stores any error code returned by the system call in errno(3) if an error occurs.

       Symbolic constants for system call numbers can be found in the header file <sys/syscall.h>.

RETURN VALUE
       The return value is defined by the system call being invoked.  In general, a 0 return value indicates success.  A -1 return value indicates an error, and an error code is stored in errno.

@petersilva and I are guessing that mv is calling the renameat2 syscall via glibc's syscall function. When the destination file already exists, this returns -1 and mv probably falls back to calling the regular renameat, which the shim does intercept.

$ mv --version
mv (GNU coreutils) 8.30

I added a basic implementation of long int syscall (long int __sysno, ...) in libsr3shim and it does run when the mv happens. __sysno for renameat2 is 316.

But there's still work to do to get it to call renameorlink when __sysno==316 and the real syscall otherwise.

@petersilva
Copy link
Contributor Author

petersilva commented Jun 1, 2024

clarification... syscall is a means of working through the entry point that is missing in glibc, by going to the kernel.
The problem here is not that the kernel does not support renameat2, but rather that glibc doesn´t. It´s kind of the opposite problem of what is being described in the man page.
UPDATE: I think I was confused... ignore the above. below is still good:

fwiw... googling a bit, I found this:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=c50cf67bd7ff70525f3cb4074f0d9cc1f5c6cf9c

+#ifdef HAVE_RENAMEAT2
+  ret_val = renameat2 (fd1, src, fd2, dst, flags);
+  err = errno;
+#elif defined SYS_renameat2
   ret_val = syscall (SYS_renameat2, fd1, src, fd2, dst, flags);
   err = errno;
 #elif defined RENAME_EXCL


and I think bits/syscalls.h provides SYS_renameat2

@petersilva
Copy link
Contributor Author

./x86_64-linux-gnu/asm/unistd_64.h:#define __NR_renameat2 316

@reidsunderland
Copy link
Member

reidsunderland commented Jun 11, 2024

I have an implementation of renameat2 via syscall() working on branch issue145.

The syscall_init_done, syscall_fn_ptr stuff isn't needed if we can't/don't want to try to deal with any syscalls other than 316/renameat2.

@petersilva
Copy link
Contributor Author

petersilva commented Jun 12, 2024

I think the ordinary message needs the syscall number (users usually don't run shimdebug.) when it's not renameat2.
This is the risky part... we will build the new version, and hope that they don't hit a different syscall, but if they do, we want to know which one, We will then circle back and add that implementation if need be.

I expect we won't run into other such syscalls, but without the ability to run a wide variety of jobs (like the user has), we don't know. Printing the syscall number would be necessary for that eventuality.

The other thing is... we need some kind of test or macro for the problem... like:

#ifdef MISSING_RENAMEAT2

#endif

then in the compilation flags with -DMISSING_RENAMEAT2

so that we don't include it on newer OS versions. We don't want to override syscall for all future versions,
just the known broken ones. I don't know if we need some kind of auto config test, I mean we know from the shim tests failing, right? we just add that -D to the CFLAGS when we see that symptom.

@petersilva
Copy link
Contributor Author

petersilva commented Jun 12, 2024

last comment deleted... was wrong... ok... so renameorlink does the rename... but how it does it when glibc is missing the renameat2 wrapper might be a problem. we might need an other #ifdef stanza that does syscall( 316... there in place of renameat2. I worry that if flags are non-zero, we won't behave the way the caller expects.

@reidsunderland
Copy link
Member

if renameat2_fn_ptr

put the syscall_fn_ptr call in the else.

Testing will hopefully reveal if any other syscalls are used. If it's a small number, we can handle them as well.

@reidsunderland
Copy link
Member

After more investigation, I discovered:

  1. RHEL 8 includes glibc 2.28 which does provide renameat2.
  2. Ubuntu 18.04 includes glibc 2.27 which does not provide renameat2.

Both have support for renameat2 in the kernel (added to the kernel in 3.15). This explains why 3741afc worked on RedHat 8; our implementation of the syscall function called renameorlink, which successfully called renameat2.

The problem on RedHat 8 is that its mv command was likely never updated to call renameat2 from glibc and it always calls renameat2 via syscall(316...). So on RedHat 8, and other systems with glibc <= 2.27 we can use renameat2 in our code but still need to intercept syscall.

We will control whether we intercept syscall by manually defining INTERCEPT_SYSCALL at compile time..

@reidsunderland
Copy link
Member

Testing on our own RedHat 8 VM works fine, but now that we've started testing on the HPC RedHat 8 installs, we've found more syscalls that are being intercepted by the shim but not passed along:

On ppc:

  • 20 getpid or epoll_create1 ?
  • 357 --> renameat2
  • 359 --> getrandom

On x86_64:

  • 316 --> renameat2 ???
  • 318 --> getrandom

Instead of hardcoding the number, we need to include <sys/syscall.h>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants