Custom VFS

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom VFS

ivenhov
Hi

I would like to  create custom VFS that would redirect all calls to my
backend.

Few questions:

1) is it possible to use Java with JNI wrapper to communicate with my
backend or does it have to be pure C/C++ ?

2) are the notifications (file deletion/creation etc.) available in VFS, in
other words notifying samba server and applications that directory has
changed?

3) are there any examples/tutorials/walkthroughs or up to date
documentation for VFS?

Regards
Daniel
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Mueller
Ajaxplorer could be worth to look at.



-----------------------------------------------
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: [hidden email]
Internet: www.tropenklinik.de
-----------------------------------------------

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im
Auftrag von Daniel Iwan
Gesendet: Donnerstag, 19. Juli 2012 11:40
An: [hidden email]
Betreff: [Samba] Custom VFS

Hi

I would like to  create custom VFS that would redirect all calls to my
backend.

Few questions:

1) is it possible to use Java with JNI wrapper to communicate with my
backend or does it have to be pure C/C++ ?

2) are the notifications (file deletion/creation etc.) available in VFS, in
other words notifying samba server and applications that directory has
changed?

3) are there any examples/tutorials/walkthroughs or up to date documentation
for VFS?

Regards
Daniel
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

ivenhov
I'm not sure if that's relevant. Ajaxplorer is using smbclient I think with PHP wrapper around it unless I'm missing something? Interesting project though.
Any other suggestions?
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Michael Wood-8
In reply to this post by ivenhov
Hi

On 19 July 2012 11:39, Daniel Iwan <[hidden email]> wrote:

>
> Hi
>
> I would like to  create custom VFS that would redirect all calls to my
> backend.
>
> Few questions:
>
> 1) is it possible to use Java with JNI wrapper to communicate with my
> backend or does it have to be pure C/C++ ?

As far as I know, a Samba VFS must be a shared library.  i.e. a .so
file, so I don't think Java would work (although I don't know much
about JNI.)

> 2) are the notifications (file deletion/creation etc.) available in VFS,
> in
> other words notifying samba server and applications that directory has
> changed?
>
> 3) are there any examples/tutorials/walkthroughs or up to date
> documentation for VFS?

Try this:

http://www.samba.org/~sharpe/The-Samba-VFS.pdf

--
Michael Wood <[hidden email]>
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

ivenhov
Thanks Michael, great link.
Exactly what I was looking for. Does not answer my JNI question (creating file system in Java is not that common),
but it's a great starting point.

Daniel

On 24 July 2012 19:51, Michael Wood-8 [via Samba] <[hidden email]> wrote:
Hi

On 19 July 2012 11:39, Daniel Iwan <[hidden email]> wrote:

>
> Hi
>
> I would like to  create custom VFS that would redirect all calls to my
> backend.
>
> Few questions:
>
> 1) is it possible to use Java with JNI wrapper to communicate with my
> backend or does it have to be pure C/C++ ?
As far as I know, a Samba VFS must be a shared library.  i.e. a .so
file, so I don't think Java would work (although I don't know much
about JNI.)

> 2) are the notifications (file deletion/creation etc.) available in VFS,
> in
> other words notifying samba server and applications that directory has
> changed?
>
> 3) are there any examples/tutorials/walkthroughs or up to date
> documentation for VFS?

Michael Wood <[hidden email]>
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba



If you reply to this email, your message will be added to the discussion below:
http://samba.2283325.n4.nabble.com/Custom-VFS-tp4634738p4634960.html
To unsubscribe from Custom VFS, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Andrew Scherpbier
Hi Daniel,

Just a note of encouragement...
I have so far written 2 filesystems in Java that use Samba for 2
different companies, so you're not alone!  :-)

The strategy I've used is to write a simple TCP protocol client (the VFS
module) and server (a straight forward threaded Java server).
Works like a charm.  As long as the client side is abstracted enough so
that its samba connection state is independent from the server
connection state, there are no issues with restarting either.  (I
started out using a statefull protocol, but ended up changing to a
completely stateless one, where the individual messages contain enough
information to establish context.  This way, if either end of the system
goes down, recovery is the simple act of building a new TCP connection.)

I also attempted to use the Apache ActiveMQ C++ library for
communication, but found it buggy and leaky.

I originally looked into hosting the JVM in the VFS module, but that was
going to be a problem because each smbd process would have to start its
own JVM.  The JVM startup time (especially the server JVM) is very high
and the memory overhead would not make it scalable.

TCP through the loopback interface is very fast (at least on the linux
system's I've developed for), so there was no need to implement some
sort of shared memory interface.

The system I'm working on now manages PB class storage (currently up to
10PB) with hundreds of concurrent clients and the VFS module does this
without issues or much overhead.  We're regularly seeing write speeds in
the 400-500MB/s range using 10GbE and multiple windows clients.

Good luck!

P.S.:  Blatant plug for my current project:
http://www.cuttedge.com/psca/index.html

On 07/24/2012 01:15 PM, ivenhov wrote:

> Thanks Michael, great link.
> Exactly what I was looking for. Does not answer my JNI question (creating
> file system in Java is not that common),
> but it's a great starting point.
>
> Daniel
>
> On 24 July 2012 19:51, Michael Wood-8 [via Samba] <
> [hidden email]> wrote:
>
>> Hi
>>
>> On 19 July 2012 11:39, Daniel Iwan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4634960&i=0>>
>> wrote:
>>
>>> Hi
>>>
>>> I would like to  create custom VFS that would redirect all calls to my
>>> backend.
>>>
>>> Few questions:
>>>
>>> 1) is it possible to use Java with JNI wrapper to communicate with my
>>> backend or does it have to be pure C/C++ ?
>> As far as I know, a Samba VFS must be a shared library.  i.e. a .so
>> file, so I don't think Java would work (although I don't know much
>> about JNI.)
>>
>>> 2) are the notifications (file deletion/creation etc.) available in VFS,
>>> in
>>> other words notifying samba server and applications that directory has
>>> changed?
>>>
>>> 3) are there any examples/tutorials/walkthroughs or up to date
>>> documentation for VFS?
>> Try this:
>>
>> http://www.samba.org/~sharpe/The-Samba-VFS.pdf
>>
>> --
>> Michael Wood <[hidden email]<http://user/SendEmail.jtp?type=node&node=4634960&i=1>>
>>
>> --
>> To unsubscribe from this list go to the following URL and read the
>> instructions:  https://lists.samba.org/mailman/options/samba
>>
>>
>> ------------------------------
>>   If you reply to this email, your message will be added to the discussion
>> below:
>> http://samba.2283325.n4.nabble.com/Custom-VFS-tp4634738p4634960.html
>>   To unsubscribe from Custom VFS, click here<
>> .
>> NAML<
http://samba.2283325.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
> --
> View this message in context: http://samba.2283325.n4.nabble.com/Custom-VFS-tp4634738p4634963.html
> Sent from the Samba - General mailing list archive at Nabble.com.

--
Andrew Scherpbier
[hidden email]

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Jeremy Allison
On Tue, Jul 24, 2012 at 02:35:28PM -0700, Andrew Scherpbier wrote:

> Hi Daniel,
>
> Just a note of encouragement...
> I have so far written 2 filesystems in Java that use Samba for 2
> different companies, so you're not alone!  :-)
>
> The strategy I've used is to write a simple TCP protocol client (the
> VFS module) and server (a straight forward threaded Java server).
> Works like a charm.  As long as the client side is abstracted enough
> so that its samba connection state is independent from the server
> connection state, there are no issues with restarting either.  (I
> started out using a statefull protocol, but ended up changing to a
> completely stateless one, where the individual messages contain
> enough information to establish context.  This way, if either end of
> the system goes down, recovery is the simple act of building a new
> TCP connection.)
>
> I also attempted to use the Apache ActiveMQ C++ library for
> communication, but found it buggy and leaky.
>
> I originally looked into hosting the JVM in the VFS module, but that
> was going to be a problem because each smbd process would have to
> start its own JVM.  The JVM startup time (especially the server JVM)
> is very high and the memory overhead would not make it scalable.
>
> TCP through the loopback interface is very fast (at least on the
> linux system's I've developed for), so there was no need to
> implement some sort of shared memory interface.
>
> The system I'm working on now manages PB class storage (currently up
> to 10PB) with hundreds of concurrent clients and the VFS module does
> this without issues or much overhead.  We're regularly seeing write
> speeds in the 400-500MB/s range using 10GbE and multiple windows
> clients.
>
> Good luck!
>
> P.S.:  Blatant plug for my current project:
> http://www.cuttedge.com/psca/index.html

Wow - that's really cool stuff !

I'm glad the VFS works so well for you. I wanted to give you
a heads-up on the changes we're making to the VFS moving
forward with 4.0.x and above - take a look at the changes
Volker made for the pread() -> pread_send_fn()/pread_recv_fn()
and pwrite() -> pwrite_send_fn()/pwrite_recv_fn() in order to
make the VFS async (and allow pthreaded implementations to
be hidden under the covers).

Sample implementations are in source3/modules/vfs_default.c
in:

vfswrap_pread_send()/vfswrap_asys_ssize_t_recv()
vfswrap_pwrite_send()/vfswrap_asys_ssize_t_recv()

It makes the VFS a little more complicated, but should
enable you to get more performance out of it.

We're also thinking longer term about changing the
model of keeping the current working directory as
the root of the exported service and changing the
internals of Samba to chdir() to the parent directory
of any path currently being processed - this allows
easier security checks inside smbd and reduces the
opportunity for pathname check race conditions.

Feedback very welcome - especially from someone
who has implemented a couple of production Samba
VFS modules already :-).

Thanks !

Jeremy.
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

ivenhov
In reply to this post by Andrew Scherpbier
Andrew.

On 24 July 2012 22:35, Andrew Scherpbier <[hidden email]> wrote:
The strategy I've used is to write a simple TCP protocol client (the VFS module) and server (a straight forward threaded Java server).
Works like a charm.  
 
I was considering pretty much the same technique using localhost and sockets to separate native library from Java service.
 
As long as the client side is abstracted enough so that its samba connection state is independent from the server connection state, there are no issues with restarting either.  (I started out using a statefull protocol, but ended up changing to a completely stateless one, where the individual messages contain enough information to establish context.  This way, if either end of the system goes down, recovery is the simple act of building a new TCP connection.)

That means you need to carry enough information to resume and also have some sort of queue of messages on the client (VFS) that has not been delivered yet.
Is that correct?

I originally looked into hosting the JVM in the VFS module, but that was going to be a problem because each smbd process would have to start its own JVM.  The JVM startup time (especially the server JVM) is very high and the memory overhead would not make it scalable.

Why do you need several smbd on single host? Is it because of high availability or some latency issues you wanted to remove? 
Or did I misinterpreted that?

The system I'm working on now manages PB class storage (currently up to 10PB) with hundreds of concurrent clients and the VFS module does this without issues or much overhead.  We're regularly seeing write speeds in the 400-500MB/s range using 10GbE and multiple windows clients.

Do you use hot-standby Samba server for failover, clustered Samba etc? If yes, how do you achieve that if you don't mind telling?

And quick question about notification. If in your system file appears or was modified outside Samba, is there a way of notifying Samba clients about that change?
Notification that goes from VFS layer so Samba and then to Windows clients to refresh directory, Explorer view etc.
I think that mechanism exists in Samba via inotify but I may be wrong, I'm Samba newbie.

Daniel
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Andrew Scherpbier

On 07/24/2012 11:37 PM, ivenhov wrote:

> Andrew.
>
> On 24 July 2012 22:35, Andrew Scherpbier <[hidden email]> wrote:
>
>> The strategy I've used is to write a simple TCP protocol client (the VFS
>> module) and server (a straight forward threaded Java server).
>> Works like a charm.
>
> I was considering pretty much the same technique using localhost and
> sockets to separate native library from Java service.
>
>
>> As long as the client side is abstracted enough so that its samba
>> connection state is independent from the server connection state, there are
>> no issues with restarting either.  (I started out using a statefull
>> protocol, but ended up changing to a completely stateless one, where the
>> individual messages contain enough information to establish context.  This
>> way, if either end of the system goes down, recovery is the simple act of
>> building a new TCP connection.)
>
> That means you need to carry enough information to resume and also have
> some sort of queue of messages on the client (VFS) that has not been
> delivered yet.
> Is that correct?

Well, this was actually my main reasoning behind using ActiveMQ as the
comm layer initially.  However, it turned out that since what I needed
to pass onto the Java service actually required acknowledgements, the
whole queuing up of requests wasn't actually used!  (20-20 Hindsight!)
So it was pretty simple to put enough information in each request to
make the whole thing stateless.
Mind you, there is state kept in the VFS module to track open files, but
for the current project the java service has no need to know about file
descriptors; it only cares about whole files.

>
> I originally looked into hosting the JVM in the VFS module, but that was
>> going to be a problem because each smbd process would have to start its own
>> JVM.  The JVM startup time (especially the server JVM) is very high and the
>> memory overhead would not make it scalable.
>
> Why do you need several smbd on single host? Is it because of high
> availability or some latency issues you wanted to remove?
> Or did I misinterpreted that?

Because each connection to samba creates a new smbd process.  So 100
clients == 100 smbd processes.   (here I'm saying a client is a windows
computer...  multiple programs running on a single windows computer
under the same profile will share the connection.)

Nothing wrong with that.  That's just the way Samba works.

>
>> The system I'm working on now manages PB class storage (currently up to
>> 10PB) with hundreds of concurrent clients and the VFS module does this
>> without issues or much overhead.  We're regularly seeing write speeds in
>> the 400-500MB/s range using 10GbE and multiple windows clients.
>
> Do you use hot-standby Samba server for failover, clustered Samba etc? If
> yes, how do you achieve that if you don't mind telling?

Our system uses active-passive failover using heartbeat.  We're not
clustering, although that's on the roadmap.  We're using heartbeat
simply because it is already well supported by the OS and its management
that we're using.
So we use some straight forward FC storage for database and local
storage and heartbeat takes care of the switching.
For the current target market (Video Surveillance) this setup works well
enough since there are going to be only a limited number of clients that
write continuously and that can deal with the small hiccup that occurs
when a failover happens.

>
> And quick question about notification. If in your system file appears or
> was modified outside Samba, is there a way of notifying Samba clients about
> that change?
That's an excellent question and one that we've discussed a lot
internally.  The design decision ended up stating that we will only
support client access through Samba.
As a simple safeguard, however, when the Java service starts, it kicks
off a filesystem scanner running under ionice idle class to see if
somehow there were any files created outside of its knowledge.

If our clients end up wanting NFS access to our files, we'll have to
switch from using VFS to using something like FUSE or create our own
kernel module.  Our current solution for those clients is to tell them
to use smbmount.  :-)

> Notification that goes from VFS layer so Samba and then to Windows clients
> to refresh directory, Explorer view etc.
> I think that mechanism exists in Samba via inotify but I may be wrong, I'm
> Samba newbie.

The default_vfs takes care of all that using oplocks, etc.  (Please
correct me if I'm wrong, samba gurus!)  So if you are hooking all the IO
calls, you'll need to do that yourself.  Fortunately, for my
application, I hook the calls, but eventually pass control over to the
default_vfs stuff.

However, again for my specific application, it turns out not to matter.  
The reason for this is that a PB class system is unlikely going to be
used by interactive users; it is used by applications, and they
generally don't care too much about the notifications.

Andrew

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Andrew Scherpbier
In reply to this post by Jeremy Allison

On 07/24/2012 04:22 PM, Jeremy Allison wrote:

> On Tue, Jul 24, 2012 at 02:35:28PM -0700, Andrew Scherpbier wrote:
>> Hi Daniel,
>>
>> Just a note of encouragement...
>> I have so far written 2 filesystems in Java that use Samba for 2
>> different companies, so you're not alone!  :-)
>>
>> The strategy I've used is to write a simple TCP protocol client (the
>> VFS module) and server (a straight forward threaded Java server).
>> Works like a charm.  As long as the client side is abstracted enough
>> so that its samba connection state is independent from the server
>> connection state, there are no issues with restarting either.  (I
>> started out using a statefull protocol, but ended up changing to a
>> completely stateless one, where the individual messages contain
>> enough information to establish context.  This way, if either end of
>> the system goes down, recovery is the simple act of building a new
>> TCP connection.)
>>
>> I also attempted to use the Apache ActiveMQ C++ library for
>> communication, but found it buggy and leaky.
>>
>> I originally looked into hosting the JVM in the VFS module, but that
>> was going to be a problem because each smbd process would have to
>> start its own JVM.  The JVM startup time (especially the server JVM)
>> is very high and the memory overhead would not make it scalable.
>>
>> TCP through the loopback interface is very fast (at least on the
>> linux system's I've developed for), so there was no need to
>> implement some sort of shared memory interface.
>>
>> The system I'm working on now manages PB class storage (currently up
>> to 10PB) with hundreds of concurrent clients and the VFS module does
>> this without issues or much overhead.  We're regularly seeing write
>> speeds in the 400-500MB/s range using 10GbE and multiple windows
>> clients.
>>
>> Good luck!
>>
>> P.S.:  Blatant plug for my current project:
>> http://www.cuttedge.com/psca/index.html
> Wow - that's really cool stuff !
>
> I'm glad the VFS works so well for you. I wanted to give you
> a heads-up on the changes we're making to the VFS moving
> forward with 4.0.x and above - take a look at the changes
> Volker made for the pread() -> pread_send_fn()/pread_recv_fn()
> and pwrite() -> pwrite_send_fn()/pwrite_recv_fn() in order to
> make the VFS async (and allow pthreaded implementations to
> be hidden under the covers).
>
> Sample implementations are in source3/modules/vfs_default.c
> in:
>
> vfswrap_pread_send()/vfswrap_asys_ssize_t_recv()
> vfswrap_pwrite_send()/vfswrap_asys_ssize_t_recv()
>
> It makes the VFS a little more complicated, but should
> enable you to get more performance out of it.

Interesting stuff.  Right now I'm letting default_vfs do all the
low-level I/O, so any improvements in speed you guys make should
immediately be useful!
So does this mean that the VFS module will need to be changed to be
thread-safe?  That actually will be a significant issue.  I'm not too
familiar with pthreads and don't know too much about the low level
implications WRT errno, etc.   (I'm mostly a Java weenie nowadays,
sorry!  Last time I used threads in C++ was a couple years ago using
Boost under Windows)


>
> We're also thinking longer term about changing the
> model of keeping the current working directory as
> the root of the exported service and changing the
> internals of Samba to chdir() to the parent directory
> of any path currently being processed - this allows
> easier security checks inside smbd and reduces the
> opportunity for pathname check race conditions.
For what I'm doing now, I don't think that matters much, other than the
realpath calls, I believe.  Since I'm only dealing with files *after*
they have been closed, the only thing I'm worried about is getting the
right path to the files.
> Feedback very welcome - especially from someone
> who has implemented a couple of production Samba
> VFS modules already :-).

My main gripe with the VFS stuff is the lack of documentation.  What I'd
like to see is at least a call flow to make it easier for module writers
to figure out what calls to hook.  For example, does create_file call
open or do both need to be implemented/hooked?  I unfortunately happen
to have lots of experience with windows kernel calls because I also
wrote a filter-driver based FS for windows in a previous life, so I know
how complicated the create_file call is (Thanks, Microsoft!).  The fact
that you don't need to hook it is awesome, but that's not explained
anywhere I could find.

Or at least detailed docs on the individual hooks, what they are
supposed to do, why they are called, what their side effects are
supposed to be, etc.  (Doxygen docs in the code would be awesome!)

I spend way too much time running "grep -rn something" on the samba
source and following ctags right now  :-(


Don't get me wrong!  I love working on this stuff, but the VFS module is
a small (but important) part of the bigger system and I end up spending
a disproportionate amount of time on the module because of the lack of
documentation.

> Thanks !
>
> Jeremy.

--
Andrew Scherpbier
[hidden email]

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba
Reply | Threaded
Open this post in threaded view
|

Re: Custom VFS

Jeremy Allison
On Wed, Jul 25, 2012 at 09:44:25AM -0700, Andrew Scherpbier wrote:
> Interesting stuff.  Right now I'm letting default_vfs do all the
> low-level I/O, so any improvements in speed you guys make should
> immediately be useful!
> So does this mean that the VFS module will need to be changed to be
> thread-safe?  That actually will be a significant issue.  I'm not
> too familiar with pthreads and don't know too much about the low
> level implications WRT errno, etc.   (I'm mostly a Java weenie
> nowadays, sorry!  Last time I used threads in C++ was a couple years
> ago using Boost under Windows)

No, you won't need to make your VFS module thread-safe
unless you're using threads internally. Most of Samba is
not thread-safe (although we're moving very slowly there)
so you have to be very careful in how you use them.

Check out modules/vfs_aio_pthread.c for an example
if you're interested.

> I'd like to see is at least a call flow to make it easier for module
> writers to figure out what calls to hook.  For example, does
> create_file call open or do both need to be implemented/hooked?  I
> unfortunately happen to have lots of experience with windows kernel
> calls because I also wrote a filter-driver based FS for windows in a
> previous life, so I know how complicated the create_file call is
> (Thanks, Microsoft!).  The fact that you don't need to hook it is
> awesome, but that's not explained anywhere I could find.

Yeah - that one is hard. It's really a 2-level VFS at that point.
The default implementation of CreateFile calls open() internally
to get the fd.

> Or at least detailed docs on the individual hooks, what they are
> supposed to do, why they are called, what their side effects are
> supposed to be, etc.  (Doxygen docs in the code would be awesome!)
>
> I spend way too much time running "grep -rn something" on the samba
> source and following ctags right now  :-(
>
>
> Don't get me wrong!  I love working on this stuff, but the VFS
> module is a small (but important) part of the bigger system and I
> end up spending a disproportionate amount of time on the module
> because of the lack of documentation.

Thanks for the feedback. We'll see what we can fix.

Cheers,

        Jeremy.
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba