Rsync between Windows Folders

Following from the last post, here is an example script that uses cwrsync to sync a network share and another folder. I had to map the network share to a drive before I could use it properly.

@ECHO OFF
rem uuid = 7bf0dca6-5cde-4e41-b1d5-d75f4002abb5

SET CWRSYNCHOME=C:\PROGRAM FILES\CWRSYNC
SET CYGWIN=nontsec
SET CWOLDPATH=%PATH%
SET PATH=%CWRSYNCHOME%\BIN;%PATH%
SET HOME=C:\Program Files\cwRsync\bin
SET USERNAME=troy
SET RSYNC_RSH=ssh.exe

rem Use the --dry-run option to show what would happen
rem sync from server to local

echo Pulling changes from the iBlast pdf papers
rsync -vrt  /cygdrive/w/documents/ "/cygdrive/c/my documents/research"
pause

Sync files from Windows to Linux using SSH

Over the weekend I decided to figure out how to sync files between windows based computers and Linux based computers, specifically Ubuntu. On windows I investigated a number of technologies. Finally I settled on cwrsync. The reason for the choice is that I really like rsync. I have a number of scripts that work really well (and are fast) that I use on my Linux boxes on a regular basis. There is rsync available in cygwin but that is far too heavy for simple file synchronization. cwrsync is the best of both worlds. It packages the cygwin dll and rsync binaries in a form that is easy to use on windows.

You’ll need to download the cwrsync package and install it on windows. Also it is a good idea to install putty so that you can test your connectivity to your Linux box through ssh. That will eliminate some of the frustration. This article doesn’t go into setting up an ssh server. It is very easy and a quick Google will find detailed tutorials on the subject of setting up an ssh server.

The first script is a windows batch file that will pull changes from the server to the windows box. The second script is pushes the changes from the windows box to the Linux box. The are both virtually identical except for the order in which the paths are called in the rsync command (basically a source – destination order).

The pull script:

echo Pulling changes from server to local
@ECHO OFF
rem uuid = 54b06855-2937-4476-800b-1c6f5af37d18

SET CWRSYNCHOME=C:\PROGRAM FILES\CWRSYNC
SET PATH=%CWRSYNCHOME%\BIN;%PATH%
SET HOME=C:\Program Files\cwRsync\bin
SET CWOLDPATH=%PATH%
SET CYGWIN=nontsec

rem Use the --dry-run option to show what would happen
rem sync from server to local

rsync -rtvze "ssh -p 3687 -i '/cygdrive/c/path/to/keys/rsa.key'" user@ssh.server.com:"'/home/user/files to sync/'" "/cygdrive/c/local files/to/sync"

pause

The push script:

echo Push changes from local to server
@ECHO OFF
rem uuid = f989134a-d48e-4ee7-9de2-2bf758764294

SET CWRSYNCHOME=C:\PROGRAM FILES\CWRSYNC
SET PATH=%CWRSYNCHOME%\BIN;%PATH%
SET HOME=C:\Program Files\cwRsync\bin
SET CWOLDPATH=%PATH%
SET CYGWIN=nontsec

rem Use the --dry-run option to show what would happen
rem sync from local to server

rsync -rtvze "ssh -p 3687 -i '/cygdrive/c/path/to/keys/rsa.key'" "/cygdrive/c/local files/to/sync"  user@ssh.server.com:"'/home/user/files to sync/'"
pause

First off, I use two scripts to make sure that I don’t accidentally push changes when I meant to pull them! If you are like me then your home server will have a non-standard ssh port. This is indicated in the rsync command by the ‘-p 3687’ bit. I also disallow password based logins. I only use key authentication. I specify the private key to use by the “-i ‘/cygdrive/c/path/to/keys/rsa.key'” bit of the rsync command. Note, the single quotes around the path, they are used to deal with spaces. The rest of the command is pretty self explanatory.

A nice use for these scripts is to create your own, secure, dropbox clone. I find it works very well.

Python Script to Parse PFSense DHCP Log

I have a captive portal setup on my pfsense which allows my laptops and various other devices to connect through wifi. I was looking at the DHCP logs provided by pfsense the other day and realized that I needed a way to verify the macs that were requesting ip addresses. I put together a python script that parses the log and attempts to match the mac addresses that I know with the ones in the log. Enjoy the code and note that the macs have been changed.

Here is a sample of the DHCP log file generated by pfsense:

Apr 16 09:19:22 	dhcpd: DHCPACK on 192.168.0.203 to bc:ae:c5:4c:1a:73 (desktop) via vr0
Apr 16 09:19:22 	dhcpd: DHCPREQUEST for 192.168.0.203 (192.168.0.1) from bc:ae:c5:4c:1a:73 (desktop) via vr0
Apr 16 09:19:22 	dhcpd: DHCPOFFER on 192.168.0.203 to bc:ae:c5:4c:1a:73 (desktop) via vr0
Apr 16 09:19:21 	dhcpd: DHCPDISCOVER from bc:ae:c5:4c:1a:73 (desktop) via vr0
Apr 16 09:18:11 	dhcpd: DHCPACK on 192.168.177.238 to 00:00:1b:4e:00:b7 (Wii) via vr1
Apr 16 09:18:11 	dhcpd: DHCPREQUEST for 192.168.177.238 from 00:00:1b:4e:00:b7 (Wii) via vr1
Apr 16 08:59:20 	dhcpd: DHCPACK on 192.168.177.238 to 00:00:1b:4e:00:b7 (Wii) via vr1
Apr 16 08:59:20 	dhcpd: DHCPREQUEST for 192.168.177.238 from 00:00:1b:4e:00:b7 (Wii) via vr1

Here is the contents of valid_machines.csv:

MAC address,Computer
00:00:1b:4e:00:b7, wii
01:03:30:b4:23:8c, dsi xl
bc:ae:c5:4c:1a:73, desktop
01:0c:04:d1:A3:a5, voip
02:14:4c:23:d2:BC, desktop 2
02:0d:78:40:cA:dE, desktop 3

Here is the python script:

#!/usr/bin/env python
#-*- coding:utf-8 -*-

"""
The purpose of this script is to parse the dhcp log from pfsense looking for mac
addresses that aren't listed in the valid_machines.csv file.

Simple copy and past the DHCP log text in to a text file and process it with
this script.

License:
The MIT License

Copyright (c) 2011 Troy Williams

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""

import sys

#Constants
__uuid__ = 'c0a17e00-11bb-4c56-9191-ba4d561feb0a'
__version__ = '0.1'
__author__ = 'Troy Williams'
__email__ = 'troy.williams@bluebill.net'
__copyright__ = 'Copyright (c) 2011, Troy Williams'
__date__ = '2011-04-16'
__maintainer__ = 'Troy Williams'

def load_macs(mac_file):
    """
    The list of know mac addresses are stored as a csv list. Load them
    into a list of dictionaries
    """
    import csv
    data_reader = csv.DictReader(open(mac_file, 'rb'))
    #convert the data_reader into a list of dictionaries so we can properly
    #iterate over it
    return [row for row in data_reader]

def read_log(file_name):
    """
    Takes a file name and reads the contents into a list separated by linefeeds
    """
    text_file = open(file_name, "rb")
    lines = text_file.readlines()
    text_file.close()
    return lines

def find_macs(lines):
    """
    Takes a list of strings and searches them for valid mac address.

    Note: mac regex from here http://txt2re.com
    """

    import re

    re1='.*?'
    re2='((?:[0-9A-F][0-9A-F]:){5}(?:[0-9A-F][0-9A-F]))(?![:0-9A-F])'
    rg = re.compile(re1+re2,re.IGNORECASE|re.DOTALL)

    macs = []
    for line in lines:
        m = rg.search(line)
        if m:
            mac1=m.group(1)
            macs.append(mac1)
    return set(macs)

def main():
    """
    Orchestrates the hole shebang.
    """

    #load the valid_machines.csv for a list of machines into a dictionary
    valid_macs = load_macs('valid_machines.csv')

    #parse the log file
    lines = read_log('dhcp.log.txt')

    #load the list of valid macs
    macs = find_macs(lines)

    #process the list of macs from the log file against the list of valid macs
    for mac in macs:
        found_address = False
        for vmac in valid_macs:
            if mac.lower() == vmac['MAC address'].lower():
                found_address = True
                break
        #Check to see if the mac address was found
        if not found_address:
            print mac, 'not found'
        else:
            print mac, '=', vmac['Computer']

if __name__ == '__main__':
    sys.exit(main())

Convert MTS (AVCHD) Files to mkv

Here is a simple shell script that will use ffmpeg to convert mts files to mkv format using the h264 codec to compress them.

#!/bin/sh
#conversion parameters that seem to work best with my camera (panasonic lumix)
#put the name of the file without the extension in the FILES variable.
FILES="00000
00001
00002
00003
00004
00005
00006
00007
00008
00009"
for f in $FILES
do
    echo "Processing $f.MTS"
    ffmpeg -i $f.MTS \
        -vcodec libx264 \
        -threads 0 \
        -vpre normal \
        -b 2000k \
        $f.mkv
done

Replace 00000 – 00009 with the names of the movies that you want to convert to mkv format.

Convert MTS (AVCHD) Files to xvid

I have a Panasonic Lumix camera that generates MTS (AVCHD) movie files. These files are 720p HD files and are really large. I want to store them in a smaller file format without sacrificing quality. Using ffmpeg it is pretty straight forward to convert an MTS (AVCHD) movie file to xvid using ffmpeg. Using the following command will accomplish the goal nicely:

ffmpeg -i 00001.MTS -vcodec libxvid -b 2000k -acodec libmp3lame -ac 2 -ab 128k -s 1280x720 movie1.avi
  • -i – This tells ffmpeg what file is going to be converted
  • -vcodec – This tells ffmpeg to use xvid compression
  • -b tells ffmpeg to use a certain bitrate. In this case the bit rate is 2000
  • -acodec – Tells ffmpeg to use mp3 to compress the audio stream
  • -ac 2 – Use 2 channels
  • -ab 128k – Set the audio bit rate
  • -s 1280×720 – set the size of the movie

Depending on your camera you will probably have to play with the parameters.

Copy Pictures from a Digital Camera and Automatically Rename to Date and Time Taken

Most digital cameras use some sort of naming scheme that leaves a lot to be desired. The names usually consist of something like:

  • picture001.jpg
  • picture002.jpg
  • picture 134.jpg

As you can see that naming scheme tells you nothing about the picture. Personally I like to rename the picture based on the date and time it was taken. For example: 2010-04-04T07h35m39.jpg. With a name like that you can clearly see that the picture was taken on April 4, 2010 at 7:35 am. The neat thing about this is that all modern digital cameras write this information to what is called an EXIF tag contained within the picture itself.

I wrote a python script that copies all of the pictures from a digital camera (well from the directory that is mounted in the file system) to a temporary location and renames them based on the date and time the pictures were taken. In addition it can also add some additional information to the IPTC tags of the photograph.

Features:

  • Reads a configuration file that contains:
    • Photographer name
    • Copyright notice
    • Output path – the directory to copy the pictures to. Typically it is a temporary location. I would then copy the pictures manually to the final spot to ensure that nothing is accidentally over written
  • Can deal with multiple configuration files and allows the user to choose which one to apply
  • Searches the camera for all picture files (jpg, jpeg, png)
  • Pictures are copied to the output path and renamed based on the EXIF date and time and the IPTC tags are updated as well
    • Pictures are also sorted into directories based on year and month the picture was taken
  • If for some reason two pictures have the exact EXIF date and time a number is appended to the file name
  • After the pictures are copied and renamed, the pictures can be deleted from the camera
  • Any non-picture files are displayed at the end. Useful if you have movies stored on the camera

Here is an example of the configuration file – photographer.cfg:

[camera.profile]
photographer=Troy Williams
copyright=Copyright 2010 Troy Williams
outputpath=/home/troy/repositories/code/Python/camera copy/output/Troy Williams

Here is the script – camera_copy.py:

#!/usr/bin/env python
#-*- coding:utf-8 -*-

"""
This script copies pictures from one folder to another. It attempts to rename
the pictures based on the exif date taken tag. The script also reads from a
configuration file that contains, amoung other things, the name of the
photographer (which is assigned to the photographer IPTC tag) as well as the
folder to copy the images to.

Documentation:
    -Contains urls to sites containing relevant documentation for the code in
    in question. Normally this should be inlined closed to the code where it
    is used.

References:
    -Contains links to reference materials used. If specific functions are used
    directly, then credit is placed there

Dependencies:
    pyexiv2 - http://tilloy.net/dev/pyexiv2/index.htm
              http://tilloy.net/dev/pyexiv2/tutorial.htm

License:
The MIT License

Copyright (c) 2010 Troy Williams

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""
import sys
import os
import shutil
from datetime import datetime

import pyexiv2

#Constants
__uuid__ = 'f706d95a-6c94-4a1e-ab4c-a8ee26b0c563'
__version__ = '0.2'
__author__ = 'Troy Williams'
__email__ = 'troy.williams@bluebill.net'
__copyright__ = 'Copyright (c) 2010, Troy Williams'
__date__ = '2010-04-10'
__license__ = 'MIT'
__maintainer__ = 'Troy Williams'
__status__ = 'Development'

def confirm(prompt=None, resp=False):
    """
    Source: http://code.activestate.com/recipes/541096/

    prompts for yes or no response from the user. Returns True for yes and
    False for no.

    'resp' should be set to the default value assumed by the caller when
    user simply types ENTER.

    >>> confirm(prompt='Create Directory?', resp=True)
    Create Directory? [y]|n:
    True
    >>> confirm(prompt='Create Directory?', resp=False)
    Create Directory? [n]|y:
    False
    >>> confirm(prompt='Create Directory?', resp=False)
    Create Directory? [n]|y: y
    True

    TBW: 2009-11-13 - change the prompt if test
    """

    if not prompt:
        prompt = 'Confirm'

    if resp:
        prompt = '%s [%s]|%s: ' % (prompt, 'y', 'n')
    else:
        prompt = '%s [%s]|%s: ' % (prompt, 'n', 'y')

    while True:
        ans = raw_input(prompt)
        if not ans:
            return resp
        if ans not in ['y', 'Y', 'n', 'N']:
            print 'please enter y or n.'
            continue
        if ans == 'y' or ans == 'Y':
            return True
        if ans == 'n' or ans == 'N':
            return False

def process_command_line():
    """
    Sets up the command line options and arguments
    """
    from optparse import OptionParser

    usage = """
            usage: %prog [options] path1 path2 path3

            The program takes a path (or number of paths) to the directory where
            the pictures are stored. The paths can be relative to the current
            script location. It takes the pictures and copies them to a location
            based on the configuration settings and renames them based on the
            exif date stored within the image. In addition the images will be
            sorted into directories based on the exif date. They are sorted by
            year and month.

            In the same folder as the script, configuration files are detected
            and the user is prompted to select one. A configuration file can
            contain the following:

            [Camera.Profile]
            photographer=Troy Williams
            copyright=Copyright 2010 Troy Williams
            outputpath=/home/troy/Pictures/Troy Williams

            The configuration file must contain the [Camera.Profile] header

            photographer - The name of the person that took the pictures
            copyright - a string that will be added to the IPTC copyright tag
            of the photo
            outputpath - The path to copy the pictures too. They will be sorted
            by year/month based on the exif information stored in the picture.
            If no information is available it will be placed into a misc
            directory.
            """
    parser = OptionParser(usage=usage, version='%prog v' + __version__)

    options, args = parser.parse_args(args=None, values=None)

    if not args:
        parser.error('At least one image path must be specified!')
        parser.print_help()

    return options, args

def find(path, pattern=None):
    """
    Takes a path and recursively finds the files.

    Optionally pattern can be specified where pattern = '*.txt' or something
    that fnmatch would find useful

    NOTE: this is a generator and should be used accordingly
    """

    if not os.path.exists(path):
        raise Exception, '%s does not exist!' % path

    if pattern:
        #search for the files that match the specific pattern
        import fnmatch
        for root, dirnames, filenames in os.walk(path):
            for filename in fnmatch.filter(filenames, pattern):
                yield os.path.join(root, filename)
    else:
        #search for all files
        for root, dirnames, filenames in os.walk(path):
            for filename in filenames:
                yield os.path.join(root, filename)

def make_directory(dir_path):
    """
    Takes the passed directory path and attempts to create it including all
    directories or sub-directories that do not exist on the path.
    """

    try:
        os.makedirs(dir_path)
    except OSError:
        #Check to see if the directory already exists
        if os.path.exists(dir_path):
            #It exists so ignore the exception
            pass
        else:
            #There was some other error
            raise

def path_from_date(path, date):
    """
    Takes a path and a date. It extracts the year and month from the date and
    returns a new path

    path = /home/troy/picture

    date = 2010-04-03 12:22:12 PM

    returns a path like /home/troy/picture/2010/03
    """

    return os.path.join(path, str(date.year), date.strftime("%m"))

def loadConfigParameters(path):
    """
    Takes a path to a configuration file and reads in the values stored there.

    Returns: dictionary
    """

    if not os.path.exists(path):
        raise Exception, '%s does not exist!' % path

    import ConfigParser

    #Set the defaults
    configParams = {}
    configParams['photographer'] = None
    configParams['copyright'] = None
    configParams['output_path'] = None
    configParams['extensions'] = ['.jpg', '.jpeg', '.JPEG', '.JPG', '.png']

    config = ConfigParser.RawConfigParser()
    config.read(path)

    #loop through all the items in the section and assign the values to the the
    #configParams dictionary... We don't assign it as the default dictionary
    #because, the options we are interested in are defined above... This
    #appears to be case sensitive therefore we make the keys lower case
    for name, value in config.items('camera.profile'):
        configParams[name.lower()] = value

    return configParams

def suggest_file_name(path):
    """
    Takes a file path and checks to see if the file exists at that location. If
    it doesn't then it simply returns the path unchanged. If the path exists, it
    will attempt generate a new file name and check to see if it exists.

    If a new name is found, it is returned.
    If the original name is not duplicated, it is returned
    If the looping limit is reached, None is returned
    """

    if os.path.lexists(path):
        filename, extension = os.path.splitext(path)
        for i in xrange(1, 1000):
            #Suggest a new file name of the form "file_name (1).jpg"
            newFile = '%s (%d)%s' % (filename, i, extension)
            if not os.path.lexists(newFile):
                return newFile
        return None
    else:
        return path

def update_image_iptc(path, **iptc):
    """
    This takes an image and updates the iptc information based on the passed
    parameters
    """

    if not os.path.exists(path):
        raise Exception, '%s does not exist!' % path

    image = pyexiv2.ImageMetadata(path)
    image.read()

    if 'exifDateTime' in iptc:
        image['Iptc.Application2.DateCreated'] = [iptc['exifDateTime']]

    if 'photographer' in iptc:
        image['Iptc.Application2.Byline'] = [iptc['photographer']]
        image['Iptc.Application2.Writer'] = [iptc['photographer']]

    if 'copyright' in iptc:
        image['Iptc.Application2.Copyright'] = [iptc['copyright']]

    image.write()

def main():
    """
    The heart of the script. Takes all of the bits and organizes them into a
    proper program
    """
    #grab the command line arguments
    options, args = process_command_line()

    #grab the path to the script.
    scriptPath = sys.path[0]

    #Search the scriptPath for configuration files
    configurationFiles = []
    for filename in find(scriptPath, pattern='*.cfg'):
        configurationFiles.append(filename)

    #make sure that there is at least one configuration file
    if not configurationFiles:
        raise Exception, 'No configurations files found!'

    print 'Please choose the number of the configuration file to use:'

    for i, item in enumerate(configurationFiles):
        print '%i : %s' % (i, os.path.basename(item))

    #prompt the user to pick the index of the configuration file to execute
    index = int(raw_input("Choose the configuration: "))
    selectedConfiguration = configurationFiles[index]

    print "Configuration file: ", selectedConfiguration

    #load the configuration file parameters
    configParams = loadConfigParameters(selectedConfiguration)

    #make the root output directory
    make_directory(configParams['outputpath'])

    #Store a list of files that were successfully copied for later deletion
    matches = []

    #Store a list of files that were not in configParams['extensions'] but in
    #the search path
    mismatches = []

    #potential files to delete
    to_delete = []

    #copy all of the pictures from the specified paths
    for picture_path in args:
        normpath = os.path.join(scriptPath, picture_path)
        print "Searching ", normpath
        for filename in find(normpath):
            filebasename, fileextension = os.path.splitext(filename)
            if fileextension in configParams['extensions']:
                #record the matched file for later statistics
                matches.append(filename)
            else:
                #record the mismatch and continue the loop
                mismatches.append(filename)
                continue

            print 'Attempting to copy: ' + os.path.basename(filename)

            image = pyexiv2.ImageMetadata(filename)
            image.read()

            if 'Exif.Image.DateTime' in image.exif_keys:
                #rename the file based on the exif date and time and copy the
                #picture to a folder based on year/month

                exifDateTime = image['Exif.Image.DateTime'].value
                newpath = path_from_date(configParams['outputpath'],
                                         exifDateTime)
                make_directory(newpath)

                newFile = exifDateTime.strftime("%Y-%m-%dT%Hh%Mm%S") + fileextension
                newpath = os.path.join(newpath, newFile)
            else:
                #no exif date time tag, simply copy to the unsorted directory
                #exifDateTime = datetime.strftime("%Y-%m-%dT%Hh%Mm%S")
                exifDateTime = datetime.today()
                newpath = os.path.join(configParams['outputpath'], "unsorted")
                make_directory(newpath)

                newpath = os.path.join(newpath, os.path.basename(filename))

            #check to see if there are any duplicate file names
            newpath = suggest_file_name(newpath)
            if not newpath:
                print 'Too many duplicates for: ' + filename
                continue

            shutil.copy2(filename, newpath)

            update_image_iptc(newpath, exifDateTime=exifDateTime,
                                       photographer=configParams['photographer'],
                                       copyright=configParams['copyright'])

            #The file has been successfully copied, add it to the list of files
            #delete
            to_delete.append(filename)

#check to see if there are any files to delete
    if len(to_delete) > 0:
        #prompt the user if they want to delete the files
        if confirm(prompt='Delete %s files?' % len(to_delete), resp=False):
            deletedCount = 0
            for item in to_delete:
                os.remove(item)

    #print out the list of invalid files - if any
    if len(mismatches) > 0:
        print "Files not in valid extension list:"
        for item in mismatches:
            print item

    return 0 # success

if __name__ == '__main__':
    status = main()
    sys.exit(status)

Here is an example of a shell script configured for a particular camera – camera.sh:

#!/bin/bash
./camera_copy.py /media/FC30-3DA9

Convert MP3s to iPod Audio Book format (M4B)

I had the need to convert a group of mp3 files into a format that was suitable for playing on my iPod. Of course the mp3s could be played directly on the iPod without any trouble. This is great for songs, but an audio book is significantly longer. In my case I have a 40 minute commute each way and most audio books are too long to listen to during a commute. The iPod supports m4b files which are audio book files and they remember where they were stopped so you can resume listening to it after putting the iPod to sleep or listening to your music collection. The audio book format also supports changing the play back speed so it will be read to you much faster.

Mp3 based audio books usually come in mp3 chunks (about 10MiB or so). They can be converted into an audio book manually using the following steps:

  1. vbrfix (https://gna.org/projects/vbrfix) – Vbrfix reads the mp3 structure and rebuilds the file including a new Xing VBR header. This is applied to all the mp3s that comprise the audio book.
  2. mp3wrap (http://mp3wrap.sourceforge.net/) – Takes a list of mp3s and wraps them into one big one. The only thing to note is that the mp3s have to have a naming convention that allows them to be sorted properly at the command line. Otherwise mp3s could be placed in the wrong position.
  3. madplay streaming into faac (http://www.underbit.com/products/mad/ & http://www.audiocoding.com/) madplay is used to convert the output of mp3wrap into a wav file which is streamed into faac which creates the m4b file.
  4. aacgain (http://altosdesign.com/aacgain/) Takes the m4b file and applies a gain to it in an attempt to make it louder.

These steps can be performed manually, but it is tedious and error prone. I have written a python script that puts all of these together in an automated fashion.

  • The script takes a configuration file which:
    • Points to the directory containing the mp3 chunks
    • Points to a jpg or png file that represents the cover
    • Specifies an output name
    • Tag information
      • Artist
      • Year
      • Genre
      • Comment

A sample configuration file (typically named with the .cfg extension):

[mp3]
path=/mnt/media/iPod/unconverted/call_of_the_wild_64kb_mp3
coverart=/mnt/media/iPod/unconverted/call_of_the_wild_64kb_mp3/cover.jpg
outputfile=Jack London-Call of the Wild
artist=Jack London
title=Call of the Wild
year=1903
genre=AudioBook
comment=The Call of the Wild is a novel by American  writer Jack London. The plot concerns a previously domesticated  dog named Buck, whose primordial instincts return after a series of events leads to his serving as a sled dog in the Yukon during the 19th-century Klondike Gold Rush, in which sled dogs were bought at generous prices. Published in 1903, The Call of the Wild is London's most-read book, and it is generally considered his best, the masterpiece of his so-called "early period". Because the protagonist is a dog, it is sometimes classified as a juvenile novel, suitable for children, but it is dark in tone and contains numerous scenes of cruelty and violence. London followed the book in 1906 with White Fang, a companion novel with many similar plot elements and themes as Call of the Wild, although following a mirror image plot in which a wild wolf becomes civilized by a mining expert from San Francisco named Weedon Scott.The Yeehat, a group of Alaska Natives portrayed in the novel, are a fiction of London's.

Note: Wikipedia is an excellent source of biographical material

Typically, a number of configuration files are created so audio books can be created unattended in a batch.

The script features:

  • logging capabilities – successes and failures are logged. If a failure occurs in a conversion during a batch operation it is easy to track it down
  • Checks to see if all required components are available to the script. If not it prompts for the required components. It even provides an apt-get string for Ubuntu that can be used to install the required components
  • Fixes an vbr inconsistencies
  • wraps the mp3s into one large mp3 – beware that the mp3s need to be properly named i.e. they need to be named so that when they are sorted by the operating system they are in the correct order
  • Tags the resulting m4b file with artist, comment, genre, year and cover art. Tagging the cover art is particularly nice as it shows up in the iPod

mp3tom4b.py:

#!/usr/bin/env python
#-*- coding:utf-8 -*-

"""
This script will take a folder and attempt to convert the mp3s within it to m4b
files (iPod audiobook format).

1) The mp3s are processed using vbrfix
2) The mp3s are joined using the mp3wrap
2) It will encode the newly joined mp3 to m4b
3) The wrapped mp3 will be removed

The output file will be placed in a sub folder of the mp3 folder.

Note: all of the mp3s to be joined as part of the conversion must be in the same
folder and they must have a number or identifier that allows them to be sorted
properly i.e. a proper string sort.

Documentation:

References:

Dependencies:
    vbrfix - https://gna.org/projects/vbrfix
    mp3wrap - http://mp3wrap.sourceforge.net/
    madplay - http://www.underbit.com/products/mad/ - This is a decoder used to
    convert the mp3 to wave
    faac - http://www.audiocoding.com/ - convert wav file to m4b format
    aacgain - http://altosdesign.com/aacgain/

TODO:

License:
The MIT License

Copyright (c) 2010 Troy Williams

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""

import sys
import os
import subprocess
import ConfigParser
import logging

#Constants
__uuid__ = '62a5aa15-2f1f-40e8-8a01-2a5cc74f6fb6'

__version__ = '0.6'
__author__ = 'Troy Williams'
__email__ = 'troy.williams@bluebill.net'
__copyright__ = 'Copyright (c) 2010, Troy Williams'
__date__ = '2010-04-05'
__license__ = 'MIT'
__maintainer__ = 'Troy Williams'
__status__ = 'Development'

#script Level Variables
mainLogger = None

def initialize_log_options():
    """
    Creates a dictionary with the proper values to pass to the logging object

    Dictionary keys:
    level - the debug level to display in the log file
    name - the name of the logger
    quiet - whether to display log messages to the screen - Default=False
    clean - deletes the log file if it exists - Default=True
    log file - the log file to use
    """

    options = {'level' : 'info',
               'name' : 'Log Tester',
               'quiet' : False,
               'clean' : False,
               'log file' : None}
    return options

def initialize_logging(options):
    """
    Log information based upon users options

    options is a dictionary that contains the various log options - see
    initialize_log_options for details

    StackOverflow.com Attribution:
    http://stackoverflow.com/questions/616645/how-do-i-duplicate-sys-stdout-to-a-log-file-in-python/648322#648322
        User Profile: http://stackoverflow.com/users/48658/atlas1j

    Note: The initialize_logging function is only used and it has been modified
          to use a dictionary instead of optparse options class.

    Levels:
    Logger.debug()
    Logger.info()
    Logger.warning()
    Logger.error()
    Logger.exception() <- same as error except provides a stack trace
    Logger.critical()
    """

    if not options:
        raise Exception, 'No logging options set...'

    logger = logging.getLogger(options['name'])
    formatter = logging.Formatter('%(asctime)s %(levelname)s\t%(message)s')
    level = logging.__dict__.get(options['level'].upper(), logging.DEBUG)
    logger.setLevel(level)

    # Output logging information to screen
    if not options['quiet']:
        hdlr = logging.StreamHandler(sys.stderr)
        hdlr.setFormatter(formatter)
        logger.addHandler(hdlr)

    # Output logging information to file
    logfile = options['log file']
    if options['clean'] and os.path.isfile(logfile):
        os.remove(logfile)
    hdlr2 = logging.FileHandler(logfile)
    hdlr2.setFormatter(formatter)
    logger.addHandler(hdlr2)

    return logger

def which(program):
    """
    Takes a binary file name as an argument and searches the path(s) for it. If
    found, the full path is returned. Else None is returned

    StackOverflow.com Attribution::
    http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python/377028#377028
        User Profile: http://stackoverflow.com/users/20840/jay
    """

    def is_exe(fpath):
        return os.path.exists(fpath) and os.access(fpath, os.X_OK)

    fpath, fname = os.path.split(program)
    if fpath:
        if is_exe(program):
            return program
    else:
        for path in os.environ['PATH'].split(os.pathsep):
            exe_file = os.path.join(path, program)
            if is_exe(exe_file):
                return exe_file
    return None

def BuildAptGet(programs):
    """
    Takes the list of programs, a tupple of two values - program name and url,
    and builds an apt get string.

    returns a sudo apt-get string that a user could use to install the required
    components on Ubuntu Linux
    """
    install = []
    if programs:
        for p in programs:
            install.append(p[0])

        return 'sudo apt-get install ', ' '.join(install)

def CheckDependencies():
    """
    Checks the current operation system to see if the dependencies are available
    and installed. An error is raised if the program doesn't exist
    """

    programs = []
    #mp3wrap - http://mp3wrap.sourceforge.net/
    programs.append(('mp3wrap', 'http://mp3wrap.sourceforge.net/'))

    #faac - http://www.audiocoding.com/ - convert wav file to m4b format
    programs.append(('faac', 'http://www.audiocoding.com/'))

    #madplay - http://www.underbit.com/products/mad/ - This is a decoder used to
    #convert the mp3 to wave
    programs.append(('madplay','http://www.underbit.com/products/mad/'))

    #vbrfix - https://gna.org/projects/vbrfix
    programs.append(('vbrfix','http://gna.org/projects/vbrfix'))

    #aacgain - http://altosdesign.com/aacgain/
    programs.append(('aacgain','http://altosdesign.com/aacgain/'))

    #loop through the programs and see if they exist. If they do not, then
    #add them to the missing list
    missing = []
    for p in programs:
        if not which(p[0]):
            missing.append(p)

    #If there are any missing programs, create a printable list
    #and raise an exception
    if missing:
        messages = []
        for p in missing:
            messages.append('%s not found! Please install see %s for details'
                            % p)
        print 'Missing files:'
        print messages
        #Build the aptget string suitable for Ubuntu
        aptGet = BuildAptGet(missing)
        print 'If using Ubuntu you can execute this line to install missing programs:'
        print aptGet

        raise Exception, 'Missing critical programs...'

def makeDirectory(dir_path):
    """
    Takes the passed directory path and attempts to create it including all
    directories or sub-directories that do not exist on the path.
    """

    try:
        os.makedirs(dir_path)
    except OSError:
        #Check to see if the directory already exists
        if os.path.exists(dir_path):
            #It exists so ignore the exception
            pass
        else:
            #There was some other error
            raise

def process_command_line():
    """
    From the Docs: http://docs.python.org/library/optparse.html
    """
    from optparse import OptionParser

    usage = """
            usage: %prog [options] file

            This script will take a series of mp3 files and combine them to form
            an iPod audio book (.m4b) file. It will join the mp3's using
            mp3wrap. It will then run vbrfix to correct any issues. After that
            mp3gain will be used to increase the volume of the mp3 file. Finally
            faac will  be used to convert the mp3 to m4b and tag it with the
            appropriate information.

            file - the name of the configuration file that holds the information
            about the mp3's to be converted to an audiobook. It should look
            somthing like this:
            #-------------------------------
            [mp3]
            path=/path/to/mp3s
            coverart=/path/to/mp3s/cover.jpg
            outputfile=output-audiobook
            artist=Author
            title=book title
            year=2010
            genre=AudioBook
            comment=Some comments about the book
            #-------------------------------

            where:
            path - the absolute path to the mp3s that comprise the audio book
            outputfile - the name of the final output file
            artist - the author of the book
            title - the title of the book
            year - the year the book was published
            genre - should be set to AudioBook or some appropriate genre
            coverart - the absolute path to the image used as the book cover
            """
    parser = OptionParser(usage=usage, version='%prog v' + __version__)

    options, args = parser.parse_args(args=None, values=None)

    if len(args) != 1:
        parser.error('Only one configuration file is required')
        parser.print_help()

    return options, args

def RunCommand(command, useshell=False):
    """
    Takes the list and attempts to run it in the command shell.

    Note: all bits of the command and paramter must be a separate entry in the
    list.
    """
    if not command:
        raise Exception, 'Valid command required - fill the list please!'

    p = subprocess.Popen(command, shell=useshell)
    retval = p.wait()
    return retval

def loadConfigParameters(path):
    """
    Takes a path to a configuration file and reads in the values stored there.

    Returns: dictionary
    """

    if not os.path.exists(path):
        raise Exception, '%s does not exist!' % path

    #Set the defaults
    configParams = {}
    configParams['path'] = None
    configParams['outputfile'] = None
    configParams['artist'] = None
    configParams['title'] = None
    configParams['album'] = None
    configParams['year'] = None
    configParams['comment'] = None
    configParams['genre'] = None
    configParams['track'] = None
    configParams['coverart'] = None

    config = ConfigParser.RawConfigParser()
    config.read(path)

    #loop through all the items in the section and assign the values to the the
    #configParams dictionary... We don't assign it as the default dictionary
    #because, the options we are interested in are defined above... This
    #appears to be case sensitive therefore we make the keys lower case
    for name, value in config.items('mp3'):
        configParams[name.lower()] = value

    return configParams

def find_mp3s(path):
    """
    Takes the folder and returns a list of mp3s in that folder.

    Returns a sorted list of files with the full path name.
    """
    files = []
    for i in os.listdir(path):
        filename = os.path.join(path, i)
        if os.path.isfile(filename):
            basename, ext = os.path.splitext(filename)
            if ext.lower() == '.mp3':
                files.append(filename)

    files.sort()
    return files

def fixMP3Bitrate(mp3Path, outputdirName):
    """
    mp3Path - the path to the directory contain the mp3s that will be adjusted
    by vbrFix

    outputdirName - the name of the directory to store the fixed mp3s - will be
    a subdirectory
    """
    if not os.path.exists(mp3Path):
        raise Exception, '%s does not exist!' % mp3Path

    outputPath = os.path.join(mp3Path, outputdirName)

    #make the output directory
    makeDirectory(outputPath)

    #fix the bit rate on each and every mp3 that comprises the audio book -
    #copying the modified files to the output directory
    mp3files = find_mp3s(mp3Path)

    if not mp3files:
        raise Exception, '%s does not contain mp3s!' % mp3Path

    command = []
    for mp3 in mp3files:
        (dirName, fileName) = os.path.split(mp3)
        newpath = os.path.join(outputPath, fileName)
        command = ['vbrfix', '-allways']
        command.append('%s' % mp3)
        command.append('%s' % newpath)
        RunCommand(command)

def pathExists(path):
    """
    takes a tupple that contains a folder path and file name and attempts
    to determine if it exists
    """

    filepath, filename = path
    fullpath = os.path.join(filepath, filename)

    return os.path.exists(fullpath)

def wrapMP3(path):
    """
    Takes the path to a directory containing mp3s to wrap into one mp3

    returns a tupple containing the path and filename of the wrapped mp3
    """

    if not os.path.exists(path):
        raise Exception('Path does not exist!')

    filename = 'wrap'
    output = os.path.join(path, '%s.mp3' % filename)

    command = ['mp3wrap', '-v', '%s' % output]

    files = find_mp3s(path)

    if files:
        #append the files to the command list
        command = command + files
    else:
        raise Exception, 'No mp3 files to wrap!'

    RunCommand(command)

    return (path,'%s_MP3WRAP.mp3' % filename)

def adjust_aac_gain(path):
    """
    Takes a tupple of file path and file name to an aac to adjust the gain
    using aacgain
    """

    filepath, filename = path
    fullpath = os.path.join(filepath, filename)

    if not os.path.exists(fullpath):
        raise Exception, 'Path does not exist!'

    command = ['aacgain']
    command.append('-r')
    command.append('-k')
    command.append('%s' % fullpath)

    RunCommand(command)

    return path

def convert_m4b(path, configParams = None):
    """
    Takes a tupple representing a file path and file name of an mp3
    and attempts to convert it to an m4b file.

    It returns a tupple containing the file path and filename of the results
    """

    filepath, filename = path
    fullpath = os.path.join(filepath, filename)
    mainLogger.debug('Path to mp3 to convert to m4b = %s' % fullpath)

    if not os.path.exists(fullpath):
        raise Exception, 'Path does not exist!'

    output = 'converted.m4b'

    commandMadPlay = ['nice', '-10']
    commandMadPlay.append('madplay')
    commandMadPlay.append('-q')
    commandMadPlay.append('-o')
    commandMadPlay.append('wave:-')
    commandMadPlay.append('%s' % fullpath)

    commandfaac = ['nice', '-10']
    commandfaac.append('faac')
    commandfaac.append('-w')

    if configParams:
        if configParams['artist']:
            commandfaac.append('--artist')
            commandfaac.append('%s' % configParams['artist'])

        if configParams['title']:
            commandfaac.append('--title')
            commandfaac.append('%s' % configParams['title'])

        if configParams['album']:
            commandfaac.append('--album')
            commandfaac.append('%s' % configParams['album'])

        if configParams['year']:
            commandfaac.append('--year')
            commandfaac.append('%s' % configParams['year'])

        if configParams['comment']:
            commandfaac.append('--comment')
            commandfaac.append('%s' % configParams['comment'])

        if configParams['genre']:
            commandfaac.append('--genre')
            commandfaac.append('%s' % configParams['genre'])

        if configParams['track']:
            commandfaac.append('--track')
            commandfaac.append('%s' % configParams['track'])

        if configParams['coverart']:
            commandfaac.append('--cover-art')
            commandfaac.append('%s' % configParams['coverart'])

    commandfaac.append('-q')
    commandfaac.append('80')
    commandfaac.append('-o')
    commandfaac.append('%s' % os.path.join(filepath, output))
    commandfaac.append('-')

    mainLogger.debug('madplay cmd line = %s' % subprocess.list2cmdline(commandMadPlay))
    mainLogger.debug('faac cmd line = %s' % subprocess.list2cmdline(commandfaac))

    madplayProcess = subprocess.Popen(commandMadPlay, shell=False,
                                                      stdout=subprocess.PIPE)
    faacProcess = subprocess.Popen(commandfaac, shell=False,
                            stdin=madplayProcess.stdout, stdout=subprocess.PIPE)
    retval = faacProcess.wait()

    return (filepath, output)

def main():
    """
    Take a number of mp3 bits that comprise an audiobook and convert it to
    an m4b file - an iPod audiobook file format
    """

    global mainLogger #make sure that other methods can use the log

    logoptions = initialize_log_options()
    #NOTE: the options can be pulled from the command line arguments
    logoptions['log file'] = os.path.join(sys.path[0], sys.argv[0] + '.log')
    #options['clean]' = True

    # Setup logger format and output locations
    mainLogger = initialize_logging(logoptions)

    #grab the command line arguments
    options, args = process_command_line()
    mainLogger.debug('len(args) = %s' % len(args))

    mainLogger.info('Loading Configuration Parameters...')
    configParams = loadConfigParameters(args[0])

    #The working folder under the mp3 path
    outputdir = 'output'

    try:
        mainLogger.info('Checking Dependencies...')
        CheckDependencies()

        mainLogger.info('Working on %s' % configParams['path'])
        mainLogger.info('Validating Configuration Parameters...')
        if not os.path.exists(configParams['path']):
            raise Exception, '%s does not exist!' % configParams['path']

        mainLogger.info('Fixing mp3 bitrate...')
        fixMP3Bitrate(configParams['path'], outputdir)

        path = os.path.join(configParams['path'], outputdir)
        mainLogger.debug('Output folder = %s' % path)
        mainLogger.info('Combining mp3s into one big one...')
        output = wrapMP3(path)

        if not pathExists(output):
            raise Exception, 'The wrapped mp3 does not exist!'

        #convert the mp3 to m4b
        mainLogger.info('Converting to audiobook...')
        output = convert_m4b(output, configParams)
        mainLogger.debug('m4b = %s/%s' % output)

        if not pathExists(output):
            raise Exception, 'conversion result does not exist!'

        #rename the output file
        source =  os.path.join(output[0], output[1])
        dest = os.path.join(output[0], '%s.m4b' % configParams['outputfile'])

        mainLogger.info('Renaming the audio book...')
        mainLogger.debug('rename %s to %s' % (source, dest))

        os.rename(source, dest)
        output = (output[0], '%s.m4b' % configParams['outputfile'])

        #adjust the gain of the audiobook
        mainLogger.info('Adjusting the gain...')
        output = adjust_aac_gain(output)

        mainLogger.info('completed %s/%s' % output)
    except Exception as inst:
        mainLogger.error(inst, ' Occured while processing ', configParams['path'])
        mainLogger.exception(inst, configParams)
        return 1

    finally:
        #Clean up the files by deleting everything in the output folder except
        #for the .m4b file
        searchFolder = os.path.join(configParams['path'], outputdir)
        files = []
        if os.path.exists(searchFolder):
            for i in os.listdir(searchFolder):
                f = os.path.join(searchFolder, i)
                if os.path.isfile(f):
                    ext = os.path.splitext(f)[1]
                    if ext.lower() != '.m4b':
                        files.append(f)
            [os.remove(f) for f in files]

    return 0

if __name__ == '__main__':
    status = main()
    sys.exit(status)

Here is an example of a shell script that can be created to call the conversion script:

#!/bin/sh
#A simple shell script to call the mp3 to m4b conversion script on various cfg files
./convertMP3toM4b.py cfgs/callofthewild.cfg