rsync ingest to new storage environment

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

rsync ingest to new storage environment

Samba - rsync mailing list
All,

I am seeding a new storage environment (Glusterfs on XFS) and would like to gather advise on best practices.  This data is primarily all media data, so not good with compression.

I currently have made one pass on at 20TB directory tree into the environment as:

- nfs mount from old storage to new storage
- rsync -av /old/storage/* /new/storage/directory

Once the directories and files were on the new storage, I did:

- chown -R root:root
- chmod -R 774

I'll need to do a couple more sync's prior to full cut over.

Questions regarding performance:

- Does anyone have any suggestions on how to achieve the best performace (speed)?

     - Is a local NFS mount from old storage to new storage the best option?  If so are there specific mount options that should be used?
     - Any specific rsync flags (I've tested with and without 'z' flag and it does not help with this data) or best practices?

Questions regarding rsync behavior:

- When I test individual directory resync's within the initial ingest tree, a command such as:

     rsync -av --no-perms --no-owner --no-group /old/storage/dir /new/storage/directory/dir

 Lists all of the directories under 'dir' in the shell.  But if I rerun the command immediately thereafter, nothing is listed in the shell.  Where is this 'metadata' of what is 'already on the destination' stored?  Is it only stored while the shell is open?  I want to set up a cron job moving forward and would like to make sure all info is available.

Any guidance is greatly appreciated.

Thanks in advance,

HB



--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Reply | Threaded
Open this post in threaded view
|

Re: rsync ingest to new storage environment

Samba - rsync mailing list
If rsync isn't doing the networking you are better off with cp -au
instead of rsync.  It should be significantly faster and you can do a
final pass with rsync to get any files that got truncated by a ^C (cp
can only skip files that are newer not files that are not different and
a truncated file will be newer since it never got back-dated).

>      rsync -av --no-perms --no-owner --no-group /old/storage/dir
> /new/storage/directory/dir

Note that rsync treats a trailing / on the source parameter differently.
 If you did rsync ... /old/storage/dir /new/storage/dir then you made
/new/storage/dir/dir and duplicated everything into it.  The correct
syntax is rsync ... /old/storage/dir/ /new/storage/dir OR rsync ...
/old/storage/dir /new/storage

On 10/18/2017 05:00 PM, Herb Burnswell via rsync wrote:

> All,
>
> I am seeding a new storage environment (Glusterfs on XFS) and would like
> to gather advise on best practices.  This data is primarily all media
> data, so not good with compression.
>
> I currently have made one pass on at 20TB directory tree into the
> environment as:
>
> - nfs mount from old storage to new storage
> - rsync -av /old/storage/* /new/storage/directory
>
> Once the directories and files were on the new storage, I did:
>
> - chown -R root:root
> - chmod -R 774
>
> I'll need to do a couple more sync's prior to full cut over.
>
> Questions regarding performance:
>
> - Does anyone have any suggestions on how to achieve the best performace
> (speed)?
>
>      - Is a local NFS mount from old storage to new storage the best
> option?  If so are there specific mount options that should be used?
>      - Any specific rsync flags (I've tested with and without 'z' flag
> and it does not help with this data) or best practices?
>
> Questions regarding rsync behavior:
>
> - When I test individual directory resync's within the initial ingest
> tree, a command such as:
>
>      rsync -av --no-perms --no-owner --no-group /old/storage/dir
> /new/storage/directory/dir
>
>  Lists all of the directories under 'dir' in the shell.  But if I rerun
> the command immediately thereafter, nothing is listed in the shell. 
> Where is this 'metadata' of what is 'already on the destination'
> stored?  Is it only stored while the shell is open?  I want to set up a
> cron job moving forward and would like to make sure all info is available.
>
> Any guidance is greatly appreciated.
>
> Thanks in advance,
>
> HB
>
>
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (231 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Fwd: rsync ingest to new storage environment

Samba - rsync mailing list




> If rsync isn't doing the networking you are better off with cp -au
> instead of rsync.  It should be significantly faster and you can do a
> final pass with rsync to get any files that got truncated by a ^C (cp
> can only skip files that are newer not files that are not different and
> a truncated file will be newer since it never got back-dated).

Thanks, I will run some tests.  Is there any performance increase from allowing rsync doing the networking?  

>>      rsync -av --no-perms --no-owner --no-group /old/storage/dir
>> /new/storage/directory/dir

> Note that rsync treats a trailing / on the source parameter differently.
> If you did rsync ... /old/storage/dir /new/storage/dir then you made
> /new/storage/dir/dir and duplicated everything into it.  The correct
> syntax is rsync ... /old/storage/dir/ /new/storage/dir OR rsync ...
> /old/storage/dir /new/storage

Yes, I just typed it wrong in the example..




On 10/18/2017 05:00 PM, Herb Burnswell via rsync wrote:
> All,
>
> I am seeding a new storage environment (Glusterfs on XFS) and would like
> to gather advise on best practices.  This data is primarily all media
> data, so not good with compression.
>
> I currently have made one pass on at 20TB directory tree into the
> environment as:
>
> - nfs mount from old storage to new storage
> - rsync -av /old/storage/* /new/storage/directory
>
> Once the directories and files were on the new storage, I did:
>
> - chown -R root:root
> - chmod -R 774
>
> I'll need to do a couple more sync's prior to full cut over.
>
> Questions regarding performance:
>
> - Does anyone have any suggestions on how to achieve the best performace
> (speed)?
>
>      - Is a local NFS mount from old storage to new storage the best
> option?  If so are there specific mount options that should be used?
>      - Any specific rsync flags (I've tested with and without 'z' flag
> and it does not help with this data) or best practices?
>
> Questions regarding rsync behavior:
>
> - When I test individual directory resync's within the initial ingest
> tree, a command such as:
>
>      rsync -av --no-perms --no-owner --no-group /old/storage/dir
> /new/storage/directory/dir
>
>  Lists all of the directories under 'dir' in the shell.  But if I rerun
> the command immediately thereafter, nothing is listed in the shell. 
> Where is this 'metadata' of what is 'already on the destination'
> stored?  Is it only stored while the shell is open?  I want to set up a
> cron job moving forward and would like to make sure all info is available.
>
> Any guidance is greatly appreciated.
>
> Thanks in advance,
>
> HB
>
>
>
>

--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb                      Phone:    <a href="tel:%28407%29%20252-6853" value="+14072526853">(407) 252-6853
        Systems Administrator           Internet:
        FutureQuest, Inc.               [hidden email]  (work)
        Orlando, Florida                [hidden email] (personal)
        Web page:                       http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (316 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: rsync ingest to new storage environment

Samba - rsync mailing list
Simply put, when you network mount rsync operates as a local copy with
extra features.  --whole-file is forced (and forcing it off makes things
worse) and rsync performs worse than cp.

If you are only transferring new files then NFS+cp is probably about the
same as rsync>rsyncd and probably faster than rsync over ssh but this
all depends on the setup especially if openssh isn't optimized for data
xfer.

Finally, if you have a lot of changing files rsync (rsyncd or ssh) can
be much faster due to the delta xfer algorithm that is disabled by
--whole-file.

On 10/18/2017 11:49 PM, Herb Burnswell via rsync wrote:

>
>
>
>
>> If rsync isn't doing the networking you are better off with cp -au
>> instead of rsync.  It should be significantly faster and you can do a
>> final pass with rsync to get any files that got truncated by a ^C (cp
>> can only skip files that are newer not files that are not different and
>> a truncated file will be newer since it never got back-dated).
>
> Thanks, I will run some tests.  Is there any performance increase from
> allowing rsync doing the networking?  
>
>>>      rsync -av --no-perms --no-owner --no-group /old/storage/dir
>>> /new/storage/directory/dir
>
>> Note that rsync treats a trailing / on the source parameter differently.
>> If you did rsync ... /old/storage/dir /new/storage/dir then you made
>> /new/storage/dir/dir and duplicated everything into it.  The correct
>> syntax is rsync ... /old/storage/dir/ /new/storage/dir OR rsync ...
>> /old/storage/dir /new/storage
>
> Yes, I just typed it wrong in the example..
>
>
>
>
> On 10/18/2017 05:00 PM, Herb Burnswell via rsync wrote:
>> All,
>>
>> I am seeding a new storage environment (Glusterfs on XFS) and would like
>> to gather advise on best practices.  This data is primarily all media
>> data, so not good with compression.
>>
>> I currently have made one pass on at 20TB directory tree into the
>> environment as:
>>
>> - nfs mount from old storage to new storage
>> - rsync -av /old/storage/* /new/storage/directory
>>
>> Once the directories and files were on the new storage, I did:
>>
>> - chown -R root:root
>> - chmod -R 774
>>
>> I'll need to do a couple more sync's prior to full cut over.
>>
>> Questions regarding performance:
>>
>> - Does anyone have any suggestions on how to achieve the best performace
>> (speed)?
>>
>>      - Is a local NFS mount from old storage to new storage the best
>> option?  If so are there specific mount options that should be used?
>>      - Any specific rsync flags (I've tested with and without 'z' flag
>> and it does not help with this data) or best practices?
>>
>> Questions regarding rsync behavior:
>>
>> - When I test individual directory resync's within the initial ingest
>> tree, a command such as:
>>
>>      rsync -av --no-perms --no-owner --no-group /old/storage/dir
>> /new/storage/directory/dir
>>
>>  Lists all of the directories under 'dir' in the shell.  But if I rerun
>> the command immediately thereafter, nothing is listed in the shell. 
>> Where is this 'metadata' of what is 'already on the destination'
>> stored?  Is it only stored while the shell is open?  I want to set up a
>> cron job moving forward and would like to make sure all info is available.
>>
>> Any guidance is greatly appreciated.
>>
>> Thanks in advance,
>>
>> HB
>>
>>
>>
>>
>
> --
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
>         Kevin Korb                      Phone:    (407) 252-6853
> <tel:%28407%29%20252-6853>
>         Systems Administrator           Internet:
>         FutureQuest, Inc.               [hidden email]  (work)
>         Orlando, Florida                [hidden email]
> <mailto:[hidden email]> (personal)
>         Web page:                       http://www.sanitarium.net/
>         PGP public key available on web site.
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
>
>
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> <https://lists.samba.org/mailman/listinfo/rsync>
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
> <http://www.catb.org/~esr/faqs/smart-questions.html>
>
>
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
        Kevin Korb Phone:    (407) 252-6853
        Systems Administrator Internet:
        FutureQuest, Inc. [hidden email]  (work)
        Orlando, Florida [hidden email] (personal)
        Web page: http://www.sanitarium.net/
        PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (231 bytes) Download Attachment