Back to post index

Using git-annex as a podcatcher
Tags: [git-annex]
Published: 16 Nov 2015 14:15

My podcast consumption loop is currently:

I follow about 10 podcasts so this manual process isn’t that bad.

I’ve been looking into git-annex recently for my home directory and found a page on using git-annex as a podcatcher. So I decided to try it.

First, a bash script iterates over a list of podcast feed URLs and passes those to git annex importfeed:

for c in ${casts[@]};
do
    echo ${c};
    git annex importfeed \
        "${c}" \
        --template='${feedtitle}/${itempubdate}-${itemtitle}${extension}'
done

I use the default template but with the itempubdate prefix - some podcasts don’t include the date in their title.

On the Sansa, in the root directory:

git clone ~/podcasts/
git annex init sansa

From the root directory it is then possible to

git annex get The_Tim_Ferriss_Show/2014_04_18-Episode_1__Kevin_Rose.mp3

and this will grab that episode from ~/podcasts. Once listened to, I can drop it from the ~/podcasts/ directory:

git annex drop --from sansa The_Tim_Ferriss_Show/2014_04_18-Episode_1__Kevin_Rose.mp3

However this was not ideal: git-annex will detect when an underlying file system doesn’t support all necessary features and use direct mode. Instead of using symlinks to the .git/annex/ directory, direct mode uses a text placeholder file with the information to access the .git/annex/ object.

Unfortunately with the Rockbox interface and direct mode, it’s impossible to tell which file is a text placeholder and which is an actual audio file. So I had to undo the annex on the Sansa. I followed the procedure detailed in a comment about removing special remotes:

“The best way to remove a special remote is to first git annex move –from $remote to get all the content out of it, then git annex dead $remote and finally you can git remote rm $remote”

So, from ~/podcasts/:

git annex drop --from sansa .
git annex dead sansa
git remote remove sansa

Followed by some rm -rf on the Sansa.

The method I use now is to (again) rsync from ~/podcasts/ to the Sansa. The difference is that I don’t keep the entire podcast library on my laptop - most files are on my file server in another annex. I pull from there when I want to listen to something, and then rsync to the Sansa using the following command:

rsync -av --prune-empty-dirs --copy-links --delete \
    --delete-excluded --ignore-errors \
    --exclude=/.git/* --exclude=/*.sh \
    . ${sansadir} 2> /dev/null
  1. --copy-links transforms the symlink into the file it points to in the destination.
  2. --delete makes sure that dropped files in ~/podcasts/ are deleted on the Sansa - calling git annex drop turns it into a broken symlink, which is not turned into a file by --copy-links, which means that --delete will remove it.
  3. --delete-excluded and --exclude=/.git/* make sure that the giant .git directory is not copied over.
  4. --exclude=/*.sh makes sure that management scripts are not copied over.
  5. --prune-empty-dirs removes podcast/* directories that have no files in them on the Sansa.
  6. 2> /dev/null quiets errors about broken symlinks.

So:

jwm@magnus:~/podcasts$ git annex get The_Tim_Ferriss_Show/2014_04_18-*

jwm@magnus:~/podcasts$ find . -type l -exec test -e {} \; -print
./The_Tim_Ferriss_Show/2014_04_18-Episode_1__Kevin_Rose.mp3
./The_Tim_Ferriss_Show/2014_04_18-Episode_2__Joshua_Waitzkin.mp3

jwm@magnus:~/podcasts$ ./to_sansa.sh
building file list ... done
The_Tim_Ferriss_Show/
The_Tim_Ferriss_Show/2014_04_18-Episode_1__Kevin_Rose.mp3
The_Tim_Ferriss_Show/2014_04_18-Episode_2__Joshua_Waitzkin.mp3

sent 176,777,539 bytes  received 53 bytes  3,721,633.52 bytes/sec
total size is 176,733,086  speedup is 1.00

Remove the file by calling drop, then running the script again:

jwm@magnus:~/podcasts$ git annex drop .

jwm@magnus:~/podcasts$ ./to_sansa.sh
building file list ... done
deleting The_Tim_Ferriss_Show/2014_04_18-Episode_2__Joshua_Waitzkin.mp3
deleting The_Tim_Ferriss_Show/2014_04_18-Episode_1__Kevin_Rose.mp3
deleting The_Tim_Ferriss_Show/

sent 1,115 bytes  received 164 bytes  2,558.00 bytes/sec
total size is 0  speedup is 0.00

The main advantage to using git-annex here is that it automatically downloads new items from the podcast feeds. Automating this saves me time. The ability to only have a small portion of ‘checked-out’ files on my laptop is also highly useful as it doesn’t have a large amount of storage.

The location tracking doesn’t apply to the files on the Sansa though. I’m not aware of a way to have git-annex work in a direct mode without using text file placeholders. If there was, I would use that instead of using rsync like this.