转发: file operation is interrupted when using ctdb+nfs

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

转发: file operation is interrupted when using ctdb+nfs

Samba - samba-technical mailing list
repost









原始邮件



发件人:朱尚忠10137461
收件人: <[hidden email]>;
日 期 :2018年01月02日 16:08
主 题 :file operation is interrupted when using ctdb+nfs


 I build a clustered file system using the ctdb+nfs-ganesha, and the cephfs is used as nfs-ganesha's backend.
It works well except the file write operation is interrupted. The following is the steps:
1. mount the nfs export directory to 192.168.1.20 node /home/nfs, the nfs-server ip is 192.168.1.10
2. cp a big file to /home/nfs
3. when the cp operation is in process, kill the nfs-ganesha process on the nfs-server (node ip: 192.168.1.10)
4. The cp operation is interrupted and the error message is "stale file handle".

Any idea?
Reply | Threaded
Open this post in threaded view
|

Re: 转发: file operation is interrupted when using ctdb+nfs

Samba - samba-technical mailing list
On Thu, 4 Jan 2018 14:54:28 +0800 (CST), "zhu.shangzhong--- via
samba-technical" <[hidden email]> wrote:

>  I build a clustered file system using the ctdb+nfs-ganesha, and the cephfs is used as nfs-ganesha's backend.
> It works well except the file write operation is interrupted. The following is the steps:
> 1. mount the nfs export directory to 192.168.1.20 node /home/nfs, the nfs-server ip is 192.168.1.10
> 2. cp a big file to /home/nfs
> 3. when the cp operation is in process, kill the nfs-ganesha process on the nfs-server (node ip: 192.168.1.10)
> 4. The cp operation is interrupted and the error message is "stale file handle".
>
> Any idea?

Questions about the CTDB setup...  :-)

* How many nodes are there?

* What CTDB public IP addresses are defined?

* Do the logs show CTDB failing over public IP addresses?

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|

答复: Re: 转发: file operation is interrupted whenusing ctdb+nfs

Samba - samba-technical mailing list
Thanks martin.







There are 3 CTDB nodes and 3 nfs-ganesha servers.


Their IP address is:                    192.168.1.10,  192.168.1.11,  192.1.12.


The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.







The client IP is 192.168.1.20. The NFS export directory is mounted to the client with public IP 192.168.1.30.






I checked the CTDB logs, the public IP 192.168.1.30 was moved to another node(IP: 192.168.1.32)


when the nfs-server(IP: 192.168.1.10) process was killed.







原始邮件



发件人: <[hidden email]>;
收件人: <[hidden email]>;
抄送人:朱尚忠10137461;
日 期 :2018年01月04日 18:04
主 题 :Re: 转发: file operation is interrupted whenusing ctdb+nfs


On Thu, 4 Jan 2018 14:54:28 +0800 (CST), "zhu.shangzhong--- via
samba-technical" <[hidden email]> wrote:

>  I build a clustered file system using the ctdb+nfs-ganesha, and the cephfs is used as nfs-ganesha's backend.
> It works well except the file write operation is interrupted. The following is the steps:
> 1. mount the nfs export directory to 192.168.1.20 node /home/nfs, the nfs-server ip is 192.168.1.10
> 2. cp a big file to /home/nfs
> 3. when the cp operation is in process, kill the nfs-ganesha process on the nfs-server (node ip: 192.168.1.10)
> 4. The cp operation is interrupted and the error message is "stale file handle".
>
> Any idea?

Questions about the CTDB setup...  :-)

* How many nodes are there?

* What CTDB public IP addresses are defined?

* Do the logs show CTDB failing over public IP addresses?

peace & happiness,
martin
Reply | Threaded
Open this post in threaded view
|

Re: 转发: file operation is interrupted whenusing ctdb+nfs

Samba - samba-technical mailing list
On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <[hidden email]>
wrote:

>> On Thu, 4 Jan 2018 14:54:28 +0800 (CST), "zhu.shangzhong--- via
>> samba-technical" <[hidden email]> wrote:

>> > I build a clustered file system using the ctdb+nfs-ganesha, and
>> > the cephfs is used as nfs-ganesha's backend.

>> > It works well except the file write operation is interrupted. The
>> > following is the steps:

>> > 1. mount the nfs export directory to 192.168.1.20 node /home/nfs,
>> >    the nfs-server ip is 192.168.1.10
>> > 2. cp a big file to /home/nfs
>> > 3. when the cp operation is in process, kill the nfs-ganesha
>> >    process on the nfs-server (node ip: 192.168.1.10)
>> > 4. The cp operation is interrupted and the error message is
>> >    "stale file handle".

>> > Any idea?

>> Questions about the CTDB setup...  :-)

>> * How many nodes are there?

>> * What CTDB public IP addresses are defined?

>> * Do the logs show CTDB failing over public IP addresses?

> There are 3 CTDB nodes and 3 nfs-ganesha servers.

> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12.

> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.

> The client IP is 192.168.1.20. The NFS export directory is mounted
> to the client with public IP 192.168.1.30.

> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
> another node(IP: 192.168.1.32)

> when the nfs-server(IP: 192.168.1.10) process was killed.

OK, that seems good.  :-)

* When do you see the "stale file handle" message?  Immediately when
  the NFS Ganesha server is killed or after the failover?

  If it happens immediately when the server is killed then CTDB is not
  involved and you need to understand what is happening at the NFS
  level.

* Are you able to repeat the test against a single NFS Ganesha server
  on a single node?

  This would involve killing the server, seeing what happens to the cp
  command on the client, checking if the file still exists in the
  server filesystem, and then restarting the server.

  If killing the NFS Ganesha server causes the incomplete copy of the
  file to be deleted without communicating a failure to the client
  then this could explain the "stale file handle" message.

  If this can't be made to work then it probably also isn't possible
  by adding more complexity with CTDB.

By the way, if you are able to reply inline instead of "top-posting"
then it is easier to respond to each part of your reply.  :-)

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|

Re: 转发: file operation is interrupted whenusing ctdb+nfs

Samba - samba-technical mailing list
AFAIrecall,
hitless NFS failover requires that the NFS filehandles remain
invariant across the nodes in the cluster.
I.e. regardless which node you point to, the same file will always map
to the exact same filehandle.
(Stale filehandle just means : "I don't know which file this refers
to" and it would either be caused by the NFS server (Ganesha) losing
the inode<->filehandle mapping state when Ganesha is restarted
or it could mean that the underlying filesystem does not have the
capability to make this possible from the server.)

GPFS/SpectrumScale does guarantee this for knfs.ko (and Ganesha) as
long as you are careful and ensure that the fsid for the backend
filesystem is the same across all the nodes.


You would have to check if this is even possible to do with cephfs
since in order to get this guarantee you will need support from the
backing filesystem.
There is likely not anything that CTDB can do here since it is an
interaction between Ganesha and cephfs.


One way to test for this would be to just do a NFSv3/LOOKUP to the
same file from several Ganesha nodes in the cluster  and verify with
wireshark that
the filehandles are identical regardless which node you use to access the file.

With a little bit of effort, you can even automate this fully if you
want to add this as a check for automatic testing.
The way to do this would be to use libnfs, since it can expose the
underlying nfs filehandle.
You could write a small test program using libnfs that would connect
to multiple different ip's/nodes in the cluster, then
use nfs_open() to fetch a filehandle for the same file on different
nodes and then just compare the underlying filehandle in the
libnfs filehandle.
I don't remember if dereferencing this structure is part of the public
API or not, and too lazy to check right now, so you might
need to include libnfs-private.h if not.


regards

ronnie sahlberg



On Fri, Jan 5, 2018 at 8:00 AM, Martin Schwenke via samba-technical
<[hidden email]> wrote:

> On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <[hidden email]>
> wrote:
>
>>> On Thu, 4 Jan 2018 14:54:28 +0800 (CST), "zhu.shangzhong--- via
>>> samba-technical" <[hidden email]> wrote:
>
>>> > I build a clustered file system using the ctdb+nfs-ganesha, and
>>> > the cephfs is used as nfs-ganesha's backend.
>
>>> > It works well except the file write operation is interrupted. The
>>> > following is the steps:
>
>>> > 1. mount the nfs export directory to 192.168.1.20 node /home/nfs,
>>> >    the nfs-server ip is 192.168.1.10
>>> > 2. cp a big file to /home/nfs
>>> > 3. when the cp operation is in process, kill the nfs-ganesha
>>> >    process on the nfs-server (node ip: 192.168.1.10)
>>> > 4. The cp operation is interrupted and the error message is
>>> >    "stale file handle".
>
>>> > Any idea?
>
>>> Questions about the CTDB setup...  :-)
>
>>> * How many nodes are there?
>
>>> * What CTDB public IP addresses are defined?
>
>>> * Do the logs show CTDB failing over public IP addresses?
>
>> There are 3 CTDB nodes and 3 nfs-ganesha servers.
>
>> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12.
>
>> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.
>
>> The client IP is 192.168.1.20. The NFS export directory is mounted
>> to the client with public IP 192.168.1.30.
>
>> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
>> another node(IP: 192.168.1.32)
>
>> when the nfs-server(IP: 192.168.1.10) process was killed.
>
> OK, that seems good.  :-)
>
> * When do you see the "stale file handle" message?  Immediately when
>   the NFS Ganesha server is killed or after the failover?
>
>   If it happens immediately when the server is killed then CTDB is not
>   involved and you need to understand what is happening at the NFS
>   level.
>
> * Are you able to repeat the test against a single NFS Ganesha server
>   on a single node?
>
>   This would involve killing the server, seeing what happens to the cp
>   command on the client, checking if the file still exists in the
>   server filesystem, and then restarting the server.
>
>   If killing the NFS Ganesha server causes the incomplete copy of the
>   file to be deleted without communicating a failure to the client
>   then this could explain the "stale file handle" message.
>
>   If this can't be made to work then it probably also isn't possible
>   by adding more complexity with CTDB.
>
> By the way, if you are able to reply inline instead of "top-posting"
> then it is easier to respond to each part of your reply.  :-)
>
> peace & happiness,
> martin
>

Reply | Threaded
Open this post in threaded view
|

Re: 转发: file operation is interrupted whenusing ctdb+nfs

Samba - samba-technical mailing list
On Fri, 5 Jan 2018 08:28:52 +1000, ronnie sahlberg
<[hidden email]> wrote:

> On Fri, Jan 5, 2018 at 8:00 AM, Martin Schwenke via samba-technical
> <[hidden email]> wrote:
> > On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <[hidden email]>
> > wrote:

> >> There are 3 CTDB nodes and 3 nfs-ganesha servers.  
> >  
> >> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12.  
> >  
> >> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.  
> >  
> >> The client IP is 192.168.1.20. The NFS export directory is mounted
> >> to the client with public IP 192.168.1.30.  
> >  
> >> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
> >> another node(IP: 192.168.1.32)  
> >  
> >> when the nfs-server(IP: 192.168.1.10) process was killed.  
> >
> > OK, that seems good.  :-)
> >
> > * When do you see the "stale file handle" message?  Immediately when
> >   the NFS Ganesha server is killed or after the failover?
> >
> >   If it happens immediately when the server is killed then CTDB is not
> >   involved and you need to understand what is happening at the NFS
> >   level.
> >
> > * Are you able to repeat the test against a single NFS Ganesha server
> >   on a single node?
> >
> >   This would involve killing the server, seeing what happens to the cp
> >   command on the client, checking if the file still exists in the
> >   server filesystem, and then restarting the server.
> >
> >   If killing the NFS Ganesha server causes the incomplete copy of the
> >   file to be deleted without communicating a failure to the client
> >   then this could explain the "stale file handle" message.
> >
> >   If this can't be made to work then it probably also isn't possible
> >   by adding more complexity with CTDB.
> >
> > By the way, if you are able to reply inline instead of "top-posting"
> > then it is easier to respond to each part of your reply.  :-)

> AFAIrecall,
> hitless NFS failover requires that the NFS filehandles remain
> invariant across the nodes in the cluster.
> I.e. regardless which node you point to, the same file will always map
> to the exact same filehandle.
> (Stale filehandle just means : "I don't know which file this refers
> to" and it would either be caused by the NFS server (Ganesha) losing
> the inode<->filehandle mapping state when Ganesha is restarted
> or it could mean that the underlying filesystem does not have the
> capability to make this possible from the server.)
>
> GPFS/SpectrumScale does guarantee this for knfs.ko (and Ganesha) as
> long as you are careful and ensure that the fsid for the backend
> filesystem is the same across all the nodes.
>
>
> You would have to check if this is even possible to do with cephfs
> since in order to get this guarantee you will need support from the
> backing filesystem.
> There is likely not anything that CTDB can do here since it is an
> interaction between Ganesha and cephfs.
>
>
> One way to test for this would be to just do a NFSv3/LOOKUP to the
> same file from several Ganesha nodes in the cluster  and verify with
> wireshark that
> the filehandles are identical regardless which node you use to access the file.
>
> With a little bit of effort, you can even automate this fully if you
> want to add this as a check for automatic testing.
> The way to do this would be to use libnfs, since it can expose the
> underlying nfs filehandle.
> You could write a small test program using libnfs that would connect
> to multiple different ip's/nodes in the cluster, then
> use nfs_open() to fetch a filehandle for the same file on different
> nodes and then just compare the underlying filehandle in the
> libnfs filehandle.
> I don't remember if dereferencing this structure is part of the public
> API or not, and too lazy to check right now, so you might
> need to include libnfs-private.h if not.

Nice summary.  Thanks, Ronnie!

... and you can check device#/inode# consistency in the cluster
filesystem like this:

# onnode all stat -c '%d:%i' /clusterfs/data/foo

>> NODE: 10.0.0.31 <<
21:52494

>> NODE: 10.0.0.32 <<
21:52494

>> NODE: 10.0.0.33 <<
21:52494

While Samba provides a way of dealing with inconsistent device#s
(https://www.samba.org/samba/docs/man/manpages/vfs_fileid.8.html) I'm
not sure if NFS Ganesha also has something like that.

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|

答复: Re: 转发: file operation is interruptedwhenusing ctdb+nfs

Samba - samba-technical mailing list
Thanks ronnie and martin.

> >When do you see the "stale file handle" message? Immediately when
> >the NFS Ganesha server is killed or after the failover?
The "stale file handle" message will be output after the failover.

> > * Are you able to repeat the test against a single NFS Ganesha server
> > on a single node?
No, the client operation will be hung, and the operation will continue after the NFS Ganesha process was started.

[root@ceph ~]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
>> NODE: 192.168.1.10 <<
39:1099511633885

>> NODE: 192.168.1.11 <<
40:1099511633885

>> NODE: 192.168.1.12 <<
40:1099511633885

------------------原始邮件------------------
发件人: <[hidden email]>;
收件人: <[hidden email]>;
抄送人:朱尚忠10137461; <[hidden email]>;
日 期 :2018年01月05日 10:17
主 题 :Re: 转发: file operation is interruptedwhenusing ctdb+nfs
On Fri, 5 Jan 2018 08:28:52 +1000, ronnie sahlberg
<[hidden email]> wrote:

> On Fri, Jan 5, 2018 at 8:00 AM, Martin Schwenke via samba-technical
> <[hidden email]> wrote:
> > On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <[hidden email]>
> > wrote:

> >> There are 3 CTDB nodes and 3 nfs-ganesha servers.
> >
> >> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12.
> >
> >> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.
> >
> >> The client IP is 192.168.1.20. The NFS export directory is mounted
> >> to the client with public IP 192.168.1.30.
> >
> >> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
> >> another node(IP: 192.168.1.32)
> >
> >> when the nfs-server(IP: 192.168.1.10) process was killed.
> >
> > OK, that seems good.  :-)
> >
> > * When do you see the "stale file handle" message?  Immediately when
> >   the NFS Ganesha server is killed or after the failover?
> >
> >   If it happens immediately when the server is killed then CTDB is not
> >   involved and you need to understand what is happening at the NFS
> >   level.
> >
> > * Are you able to repeat the test against a single NFS Ganesha server
> >   on a single node?
> >
> >   This would involve killing the server, seeing what happens to the cp
> >   command on the client, checking if the file still exists in the
> >   server filesystem, and then restarting the server.
> >
> >   If killing the NFS Ganesha server causes the incomplete copy of the
> >   file to be deleted without communicating a failure to the client
> >   then this could explain the "stale file handle" message.
> >
> >   If this can't be made to work then it probably also isn't possible
> >   by adding more complexity with CTDB.
> >
> > By the way, if you are able to reply inline instead of "top-posting"
> > then it is easier to respond to each part of your reply.  :-)

> AFAIrecall,
> hitless NFS failover requires that the NFS filehandles remain
> invariant across the nodes in the cluster.
> I.e. regardless which node you point to, the same file will always map
> to the exact same filehandle.
> (Stale filehandle just means : "I don't know which file this refers
> to" and it would either be caused by the NFS server (Ganesha) losing
> the inode<->filehandle mapping state when Ganesha is restarted
> or it could mean that the underlying filesystem does not have the
> capability to make this possible from the server.)
>
> GPFS/SpectrumScale does guarantee this for knfs.ko (and Ganesha) as
> long as you are careful and ensure that the fsid for the backend
> filesystem is the same across all the nodes.
>
>
> You would have to check if this is even possible to do with cephfs
> since in order to get this guarantee you will need support from the
> backing filesystem.
> There is likely not anything that CTDB can do here since it is an
> interaction between Ganesha and cephfs.
>
>
> One way to test for this would be to just do a NFSv3/LOOKUP to the
> same file from several Ganesha nodes in the cluster  and verify with
> wireshark that
> the filehandles are identical regardless which node you use to access the file.
>
> With a little bit of effort, you can even automate this fully if you
> want to add this as a check for automatic testing.
> The way to do this would be to use libnfs, since it can expose the
> underlying nfs filehandle.
> You could write a small test program using libnfs that would connect
> to multiple different ip's/nodes in the cluster, then
> use nfs_open() to fetch a filehandle for the same file on different
> nodes and then just compare the underlying filehandle in the
> libnfs filehandle.
> I don't remember if dereferencing this structure is part of the public
> API or not, and too lazy to check right now, so you might
> need to include libnfs-private.h if not.

Nice summary.  Thanks, Ronnie!

.... and you can check device#/inode# consistency in the cluster
filesystem like this:

# onnode all stat -c '%d:%i' /clusterfs/data/foo

>> NODE: 10.0.0.31 <<
21:52494

>> NODE: 10.0.0.32 <<
21:52494

>> NODE: 10.0.0.33 <<
21:52494

While Samba provides a way of dealing with inconsistent device#s
(https://www.samba.org/samba/docs/man/manpages/vfs_fileid.8.html) I'm
not sure if NFS Ganesha also has something like that.

peace & happiness,
martin
Reply | Threaded
Open this post in threaded view
|

Re: 答复: Re: 转发: file operation is interruptedwhenusing ctdb+nfs

Samba - samba-technical mailing list
On Fri, Jan 5, 2018 at 7:26 PM,  <[hidden email]> wrote:

> Thanks ronnie and martin.
>
>> >When do you see the "stale file handle" message? Immediately when
>> >the NFS Ganesha server is killed or after the failover?
> The "stale file handle" message will be output after the failover.
>
>> > * Are you able to repeat the test against a single NFS Ganesha server
>> > on a single node?
> No, the client operation will be hung, and the operation will continue after the NFS Ganesha process was started.
>
> [root@ceph ~]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
>>> NODE: 192.168.1.10 <<
> 39:1099511633885
>
>>> NODE: 192.168.1.11 <<
> 40:1099511633885
>
>>> NODE: 192.168.1.12 <<
> 40:1099511633885


This is likely a problem. The filehandle is usually composed of
#device/#inode (and a few other things) so if the #device
differs between the nodes, then the filehandle will differ as well.
Node 0 has a different #device than the other nodes so I bet that
failover to/from node 0 will always end up with stale filehandle.
(while failover between node 1 and node 2 might work.)



I have created a small tool in libnfs that can be used to print the
filehandle for a file :
https://github.com/sahlberg/libnfs

It is an example utility so use "./configure --enable-examples" to build it.

Then run it like this:
./examples/nfs-fh nfs://127.0.0.1/data/sahlberg/rbtree

It will print the NFS filehandle for the specified nfs object.
This filehandle must be identical on all nodes in order for failover
to work between the nodes.


Martin, the nfs-fh tool might be useful to add to ctdb.
onnode all nfs-fh nfs://127.0.0.1/<path-to-object-in-cluster-fs>
or something like it could be used to just verify that all nodes are
configured properly for NFS.


Maybe even a event script check that all is ok ?

>
> ------------------原始邮件------------------
> 发件人: <[hidden email]>;
> 收件人: <[hidden email]>;
> 抄送人:朱尚忠10137461; <[hidden email]>;
> 日 期 :2018年01月05日 10:17
> 主 题 :Re: 转发: file operation is interruptedwhenusing ctdb+nfs
> On Fri, 5 Jan 2018 08:28:52 +1000, ronnie sahlberg
> <[hidden email]> wrote:
>
>> On Fri, Jan 5, 2018 at 8:00 AM, Martin Schwenke via samba-technical
>> <[hidden email]> wrote:
>> > On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <[hidden email]>
>> > wrote:
>
>> >> There are 3 CTDB nodes and 3 nfs-ganesha servers.
>> >
>> >> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12.
>> >
>> >> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.
>> >
>> >> The client IP is 192.168.1.20. The NFS export directory is mounted
>> >> to the client with public IP 192.168.1.30.
>> >
>> >> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
>> >> another node(IP: 192.168.1.32)
>> >
>> >> when the nfs-server(IP: 192.168.1.10) process was killed.
>> >
>> > OK, that seems good.  :-)
>> >
>> > * When do you see the "stale file handle" message?  Immediately when
>> >   the NFS Ganesha server is killed or after the failover?
>> >
>> >   If it happens immediately when the server is killed then CTDB is not
>> >   involved and you need to understand what is happening at the NFS
>> >   level.
>> >
>> > * Are you able to repeat the test against a single NFS Ganesha server
>> >   on a single node?
>> >
>> >   This would involve killing the server, seeing what happens to the cp
>> >   command on the client, checking if the file still exists in the
>> >   server filesystem, and then restarting the server.
>> >
>> >   If killing the NFS Ganesha server causes the incomplete copy of the
>> >   file to be deleted without communicating a failure to the client
>> >   then this could explain the "stale file handle" message.
>> >
>> >   If this can't be made to work then it probably also isn't possible
>> >   by adding more complexity with CTDB.
>> >
>> > By the way, if you are able to reply inline instead of "top-posting"
>> > then it is easier to respond to each part of your reply.  :-)
>
>> AFAIrecall,
>> hitless NFS failover requires that the NFS filehandles remain
>> invariant across the nodes in the cluster.
>> I.e. regardless which node you point to, the same file will always map
>> to the exact same filehandle.
>> (Stale filehandle just means : "I don't know which file this refers
>> to" and it would either be caused by the NFS server (Ganesha) losing
>> the inode<->filehandle mapping state when Ganesha is restarted
>> or it could mean that the underlying filesystem does not have the
>> capability to make this possible from the server.)
>>
>> GPFS/SpectrumScale does guarantee this for knfs.ko (and Ganesha) as
>> long as you are careful and ensure that the fsid for the backend
>> filesystem is the same across all the nodes.
>>
>>
>> You would have to check if this is even possible to do with cephfs
>> since in order to get this guarantee you will need support from the
>> backing filesystem.
>> There is likely not anything that CTDB can do here since it is an
>> interaction between Ganesha and cephfs.
>>
>>
>> One way to test for this would be to just do a NFSv3/LOOKUP to the
>> same file from several Ganesha nodes in the cluster  and verify with
>> wireshark that
>> the filehandles are identical regardless which node you use to access the file.
>>
>> With a little bit of effort, you can even automate this fully if you
>> want to add this as a check for automatic testing.
>> The way to do this would be to use libnfs, since it can expose the
>> underlying nfs filehandle.
>> You could write a small test program using libnfs that would connect
>> to multiple different ip's/nodes in the cluster, then
>> use nfs_open() to fetch a filehandle for the same file on different
>> nodes and then just compare the underlying filehandle in the
>> libnfs filehandle.
>> I don't remember if dereferencing this structure is part of the public
>> API or not, and too lazy to check right now, so you might
>> need to include libnfs-private.h if not.
>
> Nice summary.  Thanks, Ronnie!
>
> .... and you can check device#/inode# consistency in the cluster
> filesystem like this:
>
> # onnode all stat -c '%d:%i' /clusterfs/data/foo
>
>>> NODE: 10.0.0.31 <<
> 21:52494
>
>>> NODE: 10.0.0.32 <<
> 21:52494
>
>>> NODE: 10.0.0.33 <<
> 21:52494
>
> While Samba provides a way of dealing with inconsistent device#s
> (https://www.samba.org/samba/docs/man/manpages/vfs_fileid.8.html) I'm
> not sure if NFS Ganesha also has something like that.
>
> peace & happiness,
> martin

Reply | Threaded
Open this post in threaded view
|

答复: Re: 答复: Re: 转发: file operation is interruptedwhenusing ctdb+nfs

Samba - samba-technical mailing list
I try to remove the node 192.168.1.10, and use two nodes to test it.
The "stale file handle" message will be output also.

[root@ceph ctdb]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
>> NODE: 192.168.1.11 <<
40:1099511633885
>> NODE: 192.168.1.12 <<
40:1099511633885

[root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.11//nfs-test-1.iso
43000be210dd17000000010000feffffffffffffff000000
[root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.12//nfs-test-1.iso
43000be210dd17000000010000feffffffffffffff000000

Now the file handle is identical.
Because the nfsv4 isn't supported by the CTDB, the mount option nfsvers=3 was used
when mount the nfs export on the client.

------------------原始邮件------------------
发件人: <[hidden email]>;
收件人:朱尚忠10137461;
抄送人: <[hidden email]>;
日 期 :2018年01月06日 01:30
主 题 :Re: 答复: Re: 转发: file operation is interruptedwhenusing ctdb+nfs
On Fri, Jan 5, 2018 at 7:26 PM,  <[hidden email]> wrote:

> Thanks ronnie and martin.
>
>> >When do you see the "stale file handle" message? Immediately when
>> >the NFS Ganesha server is killed or after the failover?
> The "stale file handle" message will be output after the failover.
>
>> > * Are you able to repeat the test against a single NFS Ganesha server
>> > on a single node?
> No, the client operation will be hung, and the operation will continue after the NFS Ganesha process was started.
>
> [root@ceph ~]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
>>> NODE: 192.168.1.10 <<
> 39:1099511633885
>
>>> NODE: 192.168.1.11 <<
> 40:1099511633885
>
>>> NODE: 192.168.1.12 <<
> 40:1099511633885


This is likely a problem. The filehandle is usually composed of
#device/#inode (and a few other things) so if the #device
differs between the nodes, then the filehandle will differ as well.
Node 0 has a different #device than the other nodes so I bet that
failover to/from node 0 will always end up with stale filehandle.
(while failover between node 1 and node 2 might work.)



I have created a small tool in libnfs that can be used to print the
filehandle for a file :
https://github.com/sahlberg/libnfs

It is an example utility so use "./configure --enable-examples" to build it..

Then run it like this:
../examples/nfs-fh nfs://127.0.0.1/data/sahlberg/rbtree

It will print the NFS filehandle for the specified nfs object.
This filehandle must be identical on all nodes in order for failover
to work between the nodes.


Martin, the nfs-fh tool might be useful to add to ctdb.
onnode all nfs-fh nfs://127.0.0.1/<path-to-object-in-cluster-fs>
or something like it could be used to just verify that all nodes are
configured properly for NFS.


Maybe even a event script check that all is ok ?

>
> ------------------原始邮件------------------
> 发件人: <[hidden email]>;
> 收件人: <[hidden email]>;
> 抄送人:朱尚忠10137461; <[hidden email]>;
> 日 期 :2018年01月05日 10:17
> 主 题 :Re: 转发: file operation is interruptedwhenusing ctdb+nfs
> On Fri, 5 Jan 2018 08:28:52 +1000, ronnie sahlberg
> <[hidden email]> wrote:
>
>> On Fri, Jan 5, 2018 at 8:00 AM, Martin Schwenke via samba-technical
>> <[hidden email]> wrote:
>> > On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <[hidden email]>
>> > wrote:
>
>> >> There are 3 CTDB nodes and 3 nfs-ganesha servers.
>> >
>> >> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12..
>> >
>> >> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.
>> >
>> >> The client IP is 192.168.1.20. The NFS export directory is mounted
>> >> to the client with public IP 192.168.1.30.
>> >
>> >> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
>> >> another node(IP: 192.168.1.32)
>> >
>> >> when the nfs-server(IP: 192.168.1.10) process was killed.
>> >
>> > OK, that seems good.  :-)
>> >
>> > * When do you see the "stale file handle" message?  Immediately when
>> >   the NFS Ganesha server is killed or after the failover?
>> >
>> >   If it happens immediately when the server is killed then CTDB is not
>> >   involved and you need to understand what is happening at the NFS
>> >   level.
>> >
>> > * Are you able to repeat the test against a single NFS Ganesha server
>> >   on a single node?
>> >
>> >   This would involve killing the server, seeing what happens to the cp
>> >   command on the client, checking if the file still exists in the
>> >   server filesystem, and then restarting the server.
>> >
>> >   If killing the NFS Ganesha server causes the incomplete copy of the
>> >   file to be deleted without communicating a failure to the client
>> >   then this could explain the "stale file handle" message.
>> >
>> >   If this can't be made to work then it probably also isn't possible
>> >   by adding more complexity with CTDB.
>> >
>> > By the way, if you are able to reply inline instead of "top-posting"
>> > then it is easier to respond to each part of your reply.  :-)
>
>> AFAIrecall,
>> hitless NFS failover requires that the NFS filehandles remain
>> invariant across the nodes in the cluster.
>> I.e. regardless which node you point to, the same file will always map
>> to the exact same filehandle.
>> (Stale filehandle just means : "I don't know which file this refers
>> to" and it would either be caused by the NFS server (Ganesha) losing
>> the inode<->filehandle mapping state when Ganesha is restarted
>> or it could mean that the underlying filesystem does not have the
>> capability to make this possible from the server.)
>>
>> GPFS/SpectrumScale does guarantee this for knfs.ko (and Ganesha) as
>> long as you are careful and ensure that the fsid for the backend
>> filesystem is the same across all the nodes.
>>
>>
>> You would have to check if this is even possible to do with cephfs
>> since in order to get this guarantee you will need support from the
>> backing filesystem.
>> There is likely not anything that CTDB can do here since it is an
>> interaction between Ganesha and cephfs.
>>
>>
>> One way to test for this would be to just do a NFSv3/LOOKUP to the
>> same file from several Ganesha nodes in the cluster  and verify with
>> wireshark that
>> the filehandles are identical regardless which node you use to access the file.
>>
>> With a little bit of effort, you can even automate this fully if you
>> want to add this as a check for automatic testing.
>> The way to do this would be to use libnfs, since it can expose the
>> underlying nfs filehandle.
>> You could write a small test program using libnfs that would connect
>> to multiple different ip's/nodes in the cluster, then
>> use nfs_open() to fetch a filehandle for the same file on different
>> nodes and then just compare the underlying filehandle in the
>> libnfs filehandle.
>> I don't remember if dereferencing this structure is part of the public
>> API or not, and too lazy to check right now, so you might
>> need to include libnfs-private.h if not.
>
> Nice summary.  Thanks, Ronnie!
>
> .... and you can check device#/inode# consistency in the cluster
> filesystem like this:
>
> # onnode all stat -c '%d:%i' /clusterfs/data/foo
>
>>> NODE: 10.0.0.31 <<
> 21:52494
>
>>> NODE: 10.0.0.32 <<
> 21:52494
>
>>> NODE: 10.0.0.33 <<
> 21:52494
>
> While Samba provides a way of dealing with inconsistent device#s
> (https://www.samba.org/samba/docs/man/manpages/vfs_fileid.8.html) I'm
> not sure if NFS Ganesha also has something like that.
>
> peace & happiness,
> martin
Reply | Threaded
Open this post in threaded view
|

Re: 转发: file operation is interruptedwhenusing ctdb+nfs

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Sat, 6 Jan 2018 03:27:52 +1000, ronnie sahlberg
<[hidden email]> wrote:

> I have created a small tool in libnfs that can be used to print the
> filehandle for a file :
> https://github.com/sahlberg/libnfs
>
> It is an example utility so use "./configure --enable-examples" to build it.
>
> Then run it like this:
> ./examples/nfs-fh nfs://127.0.0.1/data/sahlberg/rbtree
>
> It will print the NFS filehandle for the specified nfs object.
> This filehandle must be identical on all nodes in order for failover
> to work between the nodes.

Nice!

> Martin, the nfs-fh tool might be useful to add to ctdb.
> onnode all nfs-fh nfs://127.0.0.1/<path-to-object-in-cluster-fs>
> or something like it could be used to just verify that all nodes are
> configured properly for NFS.

We probably need less rather than more code/dependencies to
maintain.  ;-)

I've added a Troubleshooting section in the "Setting up CTDB for
Clustered NFS" Samba wiki page, including some details about file
handle consistency and nfs-fh:

  https://wiki.samba.org/index.php/Setting_up_CTDB_for_Clustered_NFS#File_handle_consistency

That should cover it, since we don't see a lot of people with these
problems...

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|

Re: 转发: file operation is interruptedwhenusing ctdb+nfs

Samba - samba-technical mailing list
In reply to this post by Samba - samba-technical mailing list
On Sat, 6 Jan 2018 18:01:24 +0800 (CST), "zhu.shangzhong--- via
samba-technical" <[hidden email]> wrote:

> I try to remove the node 192.168.1.10, and use two nodes to test it.
> The "stale file handle" message will be output also.
>
> [root@ceph ctdb]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
> >> NODE: 192.168.1.11 <<  
> 40:1099511633885
> >> NODE: 192.168.1.12 <<  
> 40:1099511633885
>
> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.11//nfs-test-1.iso
> 43000be210dd17000000010000feffffffffffffff000000
> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.12//nfs-test-1.iso
> 43000be210dd17000000010000feffffffffffffff000000
>
> Now the file handle is identical.
> Because the nfsv4 isn't supported by the CTDB, the mount option nfsvers=3 was used
> when mount the nfs export on the client.

So the problem occurs even with consistent file handles?  Hmmm...

* Does the partial file exist in the cluster fileystem when the "stale
  file handle" message is printed?

* Is the filesystem mounted after the failover?  Can you use "ls" to
  list a directory on the client?

* What happens if you disable a node with "ctdb disable"?  Do you get a
  clean failover?

Thanks...

peace & happiness,
martin

Reply | Threaded
Open this post in threaded view
|

答复: Re: 转发: file operation is interruptedwhenusingctdb+nfs

Samba - samba-technical mailing list
> So the problem occurs even with consistent file handles? Hmmm...
   Yes.
> * Does the partial file exist in the cluster fileystem when the "stale
> file handle" message is printed?
   Yes.
> * Is the filesystem mounted after the failover? Can you use "ls" to
> list a directory on the client?
Yes. The filesystem is mounted successfully after the failover.
> * What happens if you disable a node with "ctdb disable"? Do you get a
> clean failover?
The same message "stale file handle" is output also.
Yes, The node's virtual IP will be move to another. The failover works well.

------------------原始邮件------------------
发件人: <[hidden email]>;
收件人: <[hidden email]>;
抄送人:朱尚忠10137461;
日 期 :2018年01月08日 13:59
主 题 :Re: 转发: file operation is interruptedwhenusingctdb+nfs
On Sat, 6 Jan 2018 18:01:24 +0800 (CST), "zhu.shangzhong--- via
samba-technical" <[hidden email]> wrote:

> I try to remove the node 192.168.1.10, and use two nodes to test it.
> The "stale file handle" message will be output also.
>
> [root@ceph ctdb]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
> >> NODE: 192.168.1.11 <<
> 40:1099511633885
> >> NODE: 192.168.1.12 <<
> 40:1099511633885
>
> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.11//nfs-test-1.iso
> 43000be210dd17000000010000feffffffffffffff000000
> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.12//nfs-test-1.iso
> 43000be210dd17000000010000feffffffffffffff000000
>
> Now the file handle is identical.
> Because the nfsv4 isn't supported by the CTDB, the mount option nfsvers=3 was used
> when mount the nfs export on the client.

So the problem occurs even with consistent file handles?  Hmmm...

* Does the partial file exist in the cluster fileystem when the "stale
file handle" message is printed?

* Is the filesystem mounted after the failover?  Can you use "ls" to
list a directory on the client?

* What happens if you disable a node with "ctdb disable"?  Do you get a
clean failover?

Thanks...

peace & happiness,
martin
Reply | Threaded
Open this post in threaded view
|

Re: 答复: Re: 转发: file operation is interruptedwhenusingctdb+nfs

Samba - samba-technical mailing list
If the filehandles are the same, then it must be some other of the
internal state in Ganesha that is causing the stale filehandles.
Can you try using the normal linux kernel nfs server instead of ganesha?


Ganesha folks did not want to use ctdb databases to synchronize state
among the nodes and instead use their own
cross-node metadata synchronization primitives.
You may need to ask the Ganesha folks about this as they invented
their own solution, i.e. not ctdb metadata databases :-(




On Mon, Jan 8, 2018 at 4:40 PM, zhu.shangzhong--- via samba-technical
<[hidden email]> wrote:

>> So the problem occurs even with consistent file handles? Hmmm...
>    Yes.
>> * Does the partial file exist in the cluster fileystem when the "stale
>> file handle" message is printed?
>    Yes.
>> * Is the filesystem mounted after the failover? Can you use "ls" to
>> list a directory on the client?
> Yes. The filesystem is mounted successfully after the failover.
>> * What happens if you disable a node with "ctdb disable"? Do you get a
>> clean failover?
> The same message "stale file handle" is output also.
> Yes, The node's virtual IP will be move to another. The failover works well.
>
> ------------------原始邮件------------------
> 发件人: <[hidden email]>;
> 收件人: <[hidden email]>;
> 抄送人:朱尚忠10137461;
> 日 期 :2018年01月08日 13:59
> 主 题 :Re: 转发: file operation is interruptedwhenusingctdb+nfs
> On Sat, 6 Jan 2018 18:01:24 +0800 (CST), "zhu.shangzhong--- via
> samba-technical" <[hidden email]> wrote:
>
>> I try to remove the node 192.168.1.10, and use two nodes to test it.
>> The "stale file handle" message will be output also.
>>
>> [root@ceph ctdb]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
>> >> NODE: 192.168.1.11 <<
>> 40:1099511633885
>> >> NODE: 192.168.1.12 <<
>> 40:1099511633885
>>
>> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.11//nfs-test-1.iso
>> 43000be210dd17000000010000feffffffffffffff000000
>> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.12//nfs-test-1.iso
>> 43000be210dd17000000010000feffffffffffffff000000
>>
>> Now the file handle is identical.
>> Because the nfsv4 isn't supported by the CTDB, the mount option nfsvers=3 was used
>> when mount the nfs export on the client.
>
> So the problem occurs even with consistent file handles?  Hmmm...
>
> * Does the partial file exist in the cluster fileystem when the "stale
> file handle" message is printed?
>
> * Is the filesystem mounted after the failover?  Can you use "ls" to
> list a directory on the client?
>
> * What happens if you disable a node with "ctdb disable"?  Do you get a
> clean failover?
>
> Thanks...
>
> peace & happiness,
> martin

Reply | Threaded
Open this post in threaded view
|

答复: Re: 答复: Re: 转发: file operation is interruptedwhenusingctdb+nfs

Samba - samba-technical mailing list
Thanks Ronnie. I'll try using the kernel nfs server with ctdb.
------------------原始邮件------------------
发件人: <[hidden email]>;
收件人:朱尚忠10137461;
抄送人: <[hidden email]>;
日 期 :2018年01月08日 16:52
主 题 :Re: 答复: Re: 转发: file operation is interruptedwhenusingctdb+nfs
If the filehandles are the same, then it must be some other of the
internal state in Ganesha that is causing the stale filehandles.
Can you try using the normal linux kernel nfs server instead of ganesha?


Ganesha folks did not want to use ctdb databases to synchronize state
among the nodes and instead use their own
cross-node metadata synchronization primitives.
You may need to ask the Ganesha folks about this as they invented
their own solution, i.e. not ctdb metadata databases :-(




On Mon, Jan 8, 2018 at 4:40 PM, zhu.shangzhong--- via samba-technical
<[hidden email]> wrote:

>> So the problem occurs even with consistent file handles? Hmmm...
>    Yes.
>> * Does the partial file exist in the cluster fileystem when the "stale
>> file handle" message is printed?
>    Yes.
>> * Is the filesystem mounted after the failover? Can you use "ls" to
>> list a directory on the client?
> Yes. The filesystem is mounted successfully after the failover.
>> * What happens if you disable a node with "ctdb disable"? Do you get a
>> clean failover?
> The same message "stale file handle" is output also.
> Yes, The node's virtual IP will be move to another. The failover works well.
>
> ------------------原始邮件------------------
> 发件人: <[hidden email]>;
> 收件人: <[hidden email]>;
> 抄送人:朱尚忠10137461;
> 日 期 :2018年01月08日 13:59
> 主 题 :Re: 转发: file operation is interruptedwhenusingctdb+nfs
> On Sat, 6 Jan 2018 18:01:24 +0800 (CST), "zhu.shangzhong--- via
> samba-technical" <[hidden email]> wrote:
>
>> I try to remove the node 192.168.1.10, and use two nodes to test it.
>> The "stale file handle" message will be output also.
>>
>> [root@ceph ctdb]# onnode all stat -c '%d:%i' /tmp/test/nfs-test-1.iso
>> >> NODE: 192.168.1.11 <<
>> 40:1099511633885
>> >> NODE: 192.168.1.12 <<
>> 40:1099511633885
>>
>> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.11//nfs-test-1.iso
>> 43000be210dd17000000010000feffffffffffffff000000
>> [root@ceph ctdb]# /home/libnfs/examples/nfs-fh nfs://192.168.1.12//nfs-test-1.iso
>> 43000be210dd17000000010000feffffffffffffff000000
>>
>> Now the file handle is identical.
>> Because the nfsv4 isn't supported by the CTDB, the mount option nfsvers=3 was used
>> when mount the nfs export on the client.
>
> So the problem occurs even with consistent file handles?  Hmmm...
>
> * Does the partial file exist in the cluster fileystem when the "stale
> file handle" message is printed?
>
> * Is the filesystem mounted after the failover?  Can you use "ls" to
> list a directory on the client?
>
> * What happens if you disable a node with "ctdb disable"?  Do you get a
> clean failover?
>
> Thanks...
>
> peace & happiness,
> martin