Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-target-api: concurrent requests are not supported #261

Open
lnsyyj opened this issue Jun 2, 2022 · 6 comments
Open

rbd-target-api: concurrent requests are not supported #261

lnsyyj opened this issue Jun 2, 2022 · 6 comments

Comments

@lnsyyj
Copy link

lnsyyj commented Jun 2, 2022

Hi everyone,

When two or more clients request rbd-target-api at the same time, the configuration file gateway.conf will be modified randomly.
When used at scale, performance is slow, will the community support concurrency?

@lxbsz
Copy link
Member

lxbsz commented Jun 20, 2022

Hi everyone,

When two or more clients request rbd-target-api at the same time, the configuration file gateway.conf will be modified randomly. When used at scale, performance is slow, will the community support concurrency?

Sorry for late.

What do you mean modified randomly ? BTW, have you seen any issue ? Currently when changing the gateway.conf object it will acquire the exclusive lock first from Rados. And only the auth gateway node could change the corresponding disk config.

@lnsyyj
Copy link
Author

lnsyyj commented Jun 22, 2022

image

Yes, this exclusive lock cannot lock concurrent requests.
We ran into a problem:

  1. When concurrency with different rbd-target-api
    Both rbd-target-apis are executed to step 3. At this time, rbd-target-api-1 gets the lock, writes gateway.conf to rados successfully, and releases the lock. rbd-target-api-2 also modifies the configuration file in memory at this time, gets the lock, and writes gateway.conf will overwrite the gateway.conf of rbd-target-api-1.

  2. When the same rbd-target-api is used concurrently
    There is also this problem

I think the cause of the problem:

  1. gateway.conf is to read and write the entire rados object, and the granularity is too large.
  2. rbd-target-api is not a distributed service.

@lxbsz
Copy link
Member

lxbsz commented Jun 22, 2022

Currently the sequence is:

1, exclusive lock
2, read gateway.conf object to tmp_config
3, update the tmp_config in memory
4, store tmp_config to gateway.conf
5, exclusive unlock

So for the step 3 in your picture, the exclusive lock should have already been acquired.

And also for each sections, such as for:

 o- iscsi-targets ............................................................. [Targets: 1]
    o- iqn.2003-01.com.redhat.iscsi-gw:ceph-gw1 ................... [Auth: CHAP, Gateways: 2]
    | o- disks ................................................................... [Disks: 1]
    | | o- rbd/disk_1 .............................................. [Owner: rh7-gw2, Lun: 0]
    | o- gateways ..................................................... [Up: 2/2, Portals: 2]
    | | o- rh7-gw1 .................................................... [192.168.122.69 (UP)]
    | | o- rh7-gw2 .................................................... [192.168.122.14 (UP)]
      o- host-groups ........................................................... [Groups : 0]
      o- hosts ................................................ [Auth: ACL_ENABLED, Hosts: 1]

We can see that it's auth is Gateways: 2, and the ceph-iscsi will only allow the auth gateway 2 to update this secions in gateway.conf, so there shouldn't have any conflict of it, or it's buggy in the corresponding code.

@lnsyyj
Copy link
Author

lnsyyj commented Jun 22, 2022

Yes, we can test it. Similar configuration file errors often occur when concurrently accessing different nodes rbd-target-api.

Jun 1 08:00:44 node51 rbd-target-api[2744]: KeyError: u'rbd/disk226'

The following are two scripts that simulate concurrent operation of rbd-target-api services on different nodes and add luns to the same target. (It is very easy to reproduce the problem)
access rbd-target-api-1

for i in `seq 1 100`;
do
curl --insecure --user admin:admin -d mode=create -d create_image=true -d pool=rbd -d size=1T -X PUT http://192.168.122.52:5000/api/disk/rbd/disk$i ;
curl --insecure --user admin:admin -d disk=rbd/disk$i -X PUT http://192.168.122.52:5000/api/targetlun/iqn.2003-01.com.redhat.iscsi-gw:ceph-gw1;
curl --insecure --user admin:admin -d disk=rbd/disk$i -X PUT http://192.168.122.52:5000/api/clientlun/iqn.2003-01.com.redhat.iscsi-gw:ceph-gw1/iqn.2022-05.com.xstor.client0005;
done

access rbd-target-api-2

for i in `seq 101 200`;
do
curl --insecure --user admin:admin -d mode=create -d create_image=true -d pool=rbd -d size=1T -X PUT http://192.168.122.53:5000/api/disk/rbd/disk$i ;
curl --insecure --user admin:admin -d disk=rbd/disk$i -X PUT http://192.168.122.53:5000/api/targetlun/iqn.2003-01.com.redhat.iscsi-gw:ceph-gw1;
curl --insecure --user admin:admin -d disk=rbd/disk$i -X PUT http://192.168.122.53:5000/api/clientlun/iqn.2003-01.com.redhat.iscsi-gw:ceph-gw1/iqn.2022-05.com.xstor.client0005;
done

@lxbsz
Copy link
Member

lxbsz commented Jun 22, 2022

Cool, so it's a bug IMO.
Recently I am busy with cephfs project, since you can reproduce it and if you'd like please raise on PR to fix it.

@lnsyyj
Copy link
Author

lnsyyj commented Jun 22, 2022

I think this problem is very difficult to fix, it involves design issues and changes will be huge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants