Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fabtests: New fabtest fi_flood to test over subscription of resources #10427

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nikhilnanal
Copy link
Contributor

  1.  MR cache based registrations tests regsiter and send  in batch and sequential modes while flooding the cache beyond the maximum size.
  2. Test receipt of unexpected messages by overwhelming the receiver

@@ -41,6 +41,7 @@ bin_PROGRAMS = \
functional/fi_rdm_stress \
functional/fi_multi_recv \
functional/fi_bw \
functional/fi_flood \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is just adding a new mode to the bw test - I would just replace/rename the bw test and add the new testing mode inside. No need to create a whole new test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, AWS CI has such flood_peer test that reuse fi_bw: https://github.com/ofiwg/libfabric/blob/main/fabtests/pytest/efa/test_flood_peer.py#L6

@@ -3270,6 +3270,7 @@ void show_perf(char *name, size_t tsize, int iters, struct timespec *start,
printf("%8.2fs%10.2f%11.2f%11.2f\n",
elapsed / 1000000.0, bytes / (1.0 * elapsed),
usec_per_xfer, 1.0/usec_per_xfer);
printf("-----------------------------------------\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove random prints through this PR (there are a handful)

@@ -0,0 +1,319 @@
/*
* Copyright (c) 2019 Intel Corporation. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove year

return ret;

if (opts.machr)
show_perf_mr(opts.transfer_size, opts.window_size, &start, &end, 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the performance reporting since this is a functional test and has a hardcoded sleep to force unexpected messages. Replace with a PASS/FAIL print

if (ret)
return ret;

ret = ft_tx(ep, remote_fi_addr, 1, &tx_ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the new option recently added that does this FT_OPT_NO_PRE_POSTED_RX

static void mr_close(struct ft_context *ctx_arr, int window_size)
{
for (int i = 0; i < window_size; i++)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop brackets


return ret;
}
static void mr_close(struct ft_context *ctx_arr, int window_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to something that describes what's happening a bit more - this makes it sound like it's closing a single MR

}
static void mr_close(struct ft_context *ctx_arr, int window_size)
{
for (int i = 0; i < window_size; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare variables outside of for loop

mr_close(tx_ctx_arr, opts.window_size);
mr_close(rx_ctx_arr, opts.window_size);

printf("sequential memory registration:\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make your test prints consistent - capitalize first word, remove new line, and then print pass or fail in your out

	printf("%s\n", ret ? "FAIL" : "PASS");

fabtests/functional/flood.c Show resolved Hide resolved
printf("sequential memory registration:\n");
ft_start();
if (opts.dst_addr) {
for (int i = 0; i < opts.window_size; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare outside

if (ret)
return ret;

ft_post_tx_buf(ep, remote_fi_addr, opts.transfer_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always returns 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft_post_tx_buf calls the macro FT_POST which can return an error
https://github.com/ofiwg/libfabric/blob/main/fabtests/common/shared.c#L2172

fabtests/functional/flood.c Show resolved Hide resolved
fabtests/functional/flood.c Show resolved Hide resolved
if (!opts.dst_addr)
sleep(sleep_time);

ft_start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop performance print and also timers

@zachdworkin
Copy link
Contributor

Timeout failure
server: fi_flood -e rdm -v -T 1 -p "tcp" -s n1
client: fi_flood -e rdm -v -T 1 -p "tcp" -s n1 n2

fi_eq_sread(): common/shared.c:1169, ret=-4 (Interrupted system call)
server: fi_flood -e msg -v -T 1 -p "tcp" -s n1
client: fi_flood -e msg -v -T 1 -p "tcp" -s n1 n2

@nikhilnanal nikhilnanal force-pushed the mr_cache branch 2 times, most recently from cb5cb65 to 23587c8 Compare October 2, 2024 16:02
@shijin-aws
Copy link
Contributor

bot:aws:retest

@@ -652,6 +657,7 @@ dummy_man_pages = \
man/man1/fi_getinfo_test.1 \
man/man1/fi_mr_test.1 \
man/man1/fi_bw.1 \
man/man1/fi_flood.1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the runfabtests/exclude changes, it looks like you intend to replace bw with flood (which I agree with) but here you're adding a new test instead of renaming/adding to bw

if (ret)
return ret;

ft_post_tx_buf(ep, remote_fi_addr, opts.transfer_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft_post_tx_buf calls the macro FT_POST which can return an error
https://github.com/ofiwg/libfabric/blob/main/fabtests/common/shared.c#L2172


#include <shared.h>

int sleep_time = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare as static

@@ -362,6 +363,10 @@ functional_fi_bw_SOURCES = \
functional/bw.c
functional_fi_bw_LDADD = libfabtests.la

functional_fi_flood_SOURCES = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the windows build files as well - fabtests.vcxproj and fabtests.vcxproj.filters

fabtests/functional/flood.c Show resolved Hide resolved
FT_CLOSE_FID(tx_ctx_arr[i].mr);
}
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {

{
int ret, i;

/* Receive side delay is used in order to let the sender
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format block comments like so:

/* This is a block comment
 * that spans multiple lines
 */

Comment on lines 288 to 289
opts.options |= FT_OPT_ALLOC_MULT_MR;
opts.options |= FT_OPT_NO_PRE_POSTED_RX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these to the top of the function to align with other fabtests

fi_bw -e msg

# fi_bw fails by hanging
# fi_flood fails by runfabtest timeout only on the CI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra spaces here. I'm also tempted to keep the two separate excludes since they are for different reasons so it's better documented separately

if (ret)
goto err;

ft_post_rx_buf(ep, opts.transfer_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for returned error

	  1.  MR cache based registrations
	  tests regsiter and send  in batch and sequential modes while
	  flooding the cache beyond the maximum size.
	  2. Test receipt of unexpected messages by overwhelming the receiver

Signed-off-by: nikhil nanal <nikhil.nanal@intel.com>
@nikhilnanal nikhilnanal force-pushed the mr_cache branch 2 times, most recently from b39a5c7 to ac31c17 Compare October 16, 2024 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants