file reorgs: moving files & maintaining directory structures

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

file reorgs: moving files & maintaining directory structures

Samba - linux mailing list
I’m in the middle of a re-org of files that have built up over the years (>10)  in a Download directory.

I could use ‘cp -a’ or ‘rsync’ to clone the whole directory, as is, but that re-organise the hierarchy or just move old files to a backup location.
There’s ~100GB and ~100,000 files on a desktop computer, which means I have to partition the data and phaf about

My first step was to identify and remove duplicate copies of files. De-duplicating before backup / reorg seemed a good idea :)
Then I pushed software and code / packages into more appropriate places, deleted old temporary files.

My first action has been to identify the oldest half of the files. ‘find -mtime’ wasn’t suitable - it presumes you know the day number.
I tried ‘-newer' against a timestamp’ed file - wasn’t what I wanted either…

Discovered ‘stat’ and that it’d print a/c/m-time in Unix epoch seconds - good for sorting.
From that, it was easy to create a file-list of the oldest half, then use tar to copy files.

‘tar’ & ‘cpio’ create the whole directory hierarchy, whereas ‘mv’ doesn’t.
‘cp -a’ doesn’t take a file list from STDIN. you’re supposed to use ‘xargs’ some way :(
‘rsync’ with "--remove-source-files” & "--prune-empty-dirs" (only discovered that in the research) almost did what I wanted.
I wanted to be able to feed it, like tar & cpio, a list of the files I wanted to copy or move, but have never known how to do that.

> cut -f3 file-list | sort | tar -T - -cf - -C $DN | tar -xvpf - -C $v

I really would’ve like to have used something like this, to avoid a) pipeline and unnecessary processes and b) all the bugs in tar.

> cut -f3 file-list | tr ‘\n’ ‘\0’ | xargs -0 sexy-mv-cmd -t $v

gnu-mv has the -t option, which _almost_ does everything I wanted.
>  -t, --target-directory=DIRECTORY
>               move all SOURCE arguments into DIRECTORY


Questions:

1. Are there good DeDuplication tools you can recommend based on SHA1 or MD5? I had to invent my own :(

2. What tools other people used for this sort of work? [Reorganising and partitioning by date or size]

cheers
steve

--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:[hidden email] http://members.tip.net.au/~sjenkin


--
linux mailing list
[hidden email]
https://lists.samba.org/mailman/listinfo/linux
Reply | Threaded
Open this post in threaded view
|

Re: file reorgs: moving files & maintaining directory structures

Samba - linux mailing list
On 26/08/17 14:54, steve jenkin via linux wrote:
> Questions:
>
> 1. Are there good DeDuplication tools you can recommend based on SHA1 or MD5? I had to invent my own :(
>
> 2. What tools other people used for this sort of work? [Reorganising and partitioning by date or size]

I've used fdupes a few times with great success.  Some versions (not all) have an option
to delete all but one of the duplicate files and hard link to the one copy.  This can make
a huge difference.

Brett

--
  /) _ _ _/_/ / / /  _ _//
 /_)/</= / / (_(_/()/< ///


--
linux mailing list
[hidden email]
https://lists.samba.org/mailman/listinfo/linux
Reply | Threaded
Open this post in threaded view
|

Re: file reorgs: moving files & maintaining directory structures

Samba - linux mailing list
In reply to this post by Samba - linux mailing list

>
> Questions:
>
> 1. Are there good DeDuplication tools you can recommend based on SHA1 or MD5? I had to invent my own :(
>
> 2. What tools other people used for this sort of work? [Reorganising and partitioning by date or size]
>

1. I have also found fdupes to be useful
2. I normally just copy it all and procrastinate cleaning it up!



This set of code could probably do what you seem to want in terms of moving a bunch
of files (oldest approximately half - I took your 100,000 as accurate)


find . -type f -printf '%T@ -*- %p\n'  |        # Get files with mtime in Seconds (and decimals)
 sort -n  |     # Sort from oldest to newest
  sed 's/^[0-9.]* -*- //' |     # Prune Seconds and delimiter from front
   head -50000 |        # Select oldest half - or send this to a file and prune where you want
    cpio -pvdm /mnt/some/new/location   |  # Copy them - stdout will be the list of successful copies
     xargs -d '\n' rm           # Clean up those copied


Break the pipeline where ever you want to give you the control you need

If you want to edit the initial list to find a logical brake you may want to add %t
to your find line to get a human readable time

find . -type f -printf '%T@ %t -*- %p\n'  |     # Get files with mtime in Seconds (and decimals)
 sort -n  > /tmp/temp_file      # Sort from oldest to newest


Dave !





--
linux mailing list
[hidden email]
https://lists.samba.org/mailman/listinfo/linux
Reply | Threaded
Open this post in threaded view
|

Re: file reorgs: moving files & maintaining directory structures

Samba - linux mailing list
Dave,

thanks for the confirm on fdupes.

And thanks also for the better options for find. printf: much better than my use of xargs & stat :)

cheers
steve

> On 26 Aug 2017, at 18:51, David Deaves <[hidden email]> wrote:
>
> 1. I have also found fdupes to be useful
> 2. I normally just copy it all and procrastinate cleaning it up!
>
>
>
> This set of code could probably do what you seem to want in terms of moving a bunch
> of files (oldest approximately half - I took your 100,000 as accurate)
>
>
> find . -type f -printf '%T@ -*- %p\n'  |        # Get files with mtime in Seconds (and decimals)
> sort -n  |     # Sort from oldest to newest
>  sed 's/^[0-9.]* -*- //' |     # Prune Seconds and delimiter from front
>   head -50000 |        # Select oldest half - or send this to a file and prune where you want
>    cpio -pvdm /mnt/some/new/location   |  # Copy them - stdout will be the list of successful copies
>     xargs -d '\n' rm           # Clean up those copied
>
>
> Break the pipeline where ever you want to give you the control you need
>
> If you want to edit the initial list to find a logical brake you may want to add %t
> to your find line to get a human readable time
>
> find . -type f -printf '%T@ %t -*- %p\n'  |     # Get files with mtime in Seconds (and decimals)
> sort -n  > /tmp/temp_file      # Sort from oldest to newest
>
>
> Dave !

--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:[hidden email] http://members.tip.net.au/~sjenkin


--
linux mailing list
[hidden email]
https://lists.samba.org/mailman/listinfo/linux
Reply | Threaded
Open this post in threaded view
|

Re: file reorgs: moving files & maintaining directory structures

Samba - linux mailing list
In reply to this post by Samba - linux mailing list
Brett,

Thanks very much for that. works nicely.

Tried it on another fileset.
20GB & 1.4M files, took 34mins, now I’ve got 450,00 lines to trawl :)

cheers!
steve

> On 26 Aug 2017, at 15:20, Brett Worth via linux <[hidden email]> wrote:
>
>
> I've used fdupes a few times with great success.  Some versions (not all) have an option
> to delete all but one of the duplicate files and hard link to the one copy.  This can make
> a huge difference.
>
> Brett
>
> --
>  /) _ _ _/_/ / / /  _ _//
> /_)/</= / / (_(_/()/< ///
>
>
> --
> linux mailing list
> [hidden email]
> https://lists.samba.org/mailman/listinfo/linux

--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:[hidden email] http://members.tip.net.au/~sjenkin


--
linux mailing list
[hidden email]
https://lists.samba.org/mailman/listinfo/linux