Ruby: A Quick Cleaner for Your Adwords Invoices

If you advertise on Google then you probably have to save and print your monthly adwords invoices. Especially if you want to get the VAT back! In fact, I do this with pretty much all my online services. I need to have copies of invoices for my own records and for end-of-year accounts. It's a pain in the ass, frankly. You have to go to each site, call up the invoice page, launch the printable version (if they have one), save the page and print it. Don't know about you, but what a waste of precious coding time!

Somebody should write a web service to consolidate online service payments. Yeah, sure, the offline payments world has been done. You can go to sites like and sort it all out. Where's the same solution for my online accounts? Isn't it ironic that the most connected, most virtual services, are the ones I'm putting the most physical labour into?

And what really bugs me is when I save a HTML page, FireFox saves all the ancilliary files in a [name]_files folder. Normally this is what you want I suppose, but for a series of monthly invoices it's not the right thing at all. You end up with loads of folders all containing the same set of CSS, JavaScript, and image files (by the way, can we all agree to call them folders, and not directories? Saves on the typing, you know…). Which is annoying, because you do actually need all those extra files to make the invoices look pretty if you ever want to print them out again. So really what you want is for all the extra files to live in one folder, say, files, and for all the downloaded HTML pages to refer to this folder.

And the other reason for having them all in the same folder is to support version control. Call me paranoid, but I put everything into Subversion. So handy. And multiple files that are all really the same is a completely pointless state of affairs when you add Subversion into the mix. Oy vey!

Up to now I've half-solved this problem whenever it bugged me enough by recording a quick Emacs macro and flying through the files. But you can only record the same macro so many times in your life without cracking up. Time for automation: “Why program by hand in five days what you can spend five years of your life automating?”

Well, let's do it in Ruby and then we can all go home after lunch. Here, for your use, should you have this exact same problem, is a little Ruby script to fix the links in the downloaded HTML pages, and copy the ancilliary files into a files folder (folder, yeah?). Notice that I don't delete the old files and folders. Trust me, never delete files in a hacked-together five-minute script, you will very much regret it. Delete stuff by hand.

require 'fileutils'

date_file_paths = Dir.entries('.').select { |f| 
  f =~ /dddddddd.htm/ }
date_names = { |f| 
  (f.match /(dddddddd)/)[1] }

max = 0
date_names.each { |n| 
  content = []
  File.foreach("#{n}.htm","r") { |line|
    content << line.gsub(/#{n}_files/,'files')
  }"#{n}.htm","w") { |f|
    f.print content
  max = n.to_i if n.to_i > max

FileUtils.cp( Dir.entries("#{max}_files").select{ |n| 
  n != '.' && n != '..' }.map { |n| 
    "#{max}_files/#{n}" }, 'files' )

Oh yeah, one more thing, I save the files using the naming convention YYYYMMDD.htm (you might have guessed, I suppose).

tag gen:Technorati Tags: Tags:

This entry was posted in Web. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *