Close

better, faster, fatter

A project log for HackaDump

Just some dumb bash scripts to backup all my project's pages at http://hackaday.io

yann-guidon-ygdesYann Guidon / YGDES 12/15/2015 at 01:240 Comments

Today I have 19 projects on hackaday (even after I asked al1 to take ownership of #PICTIL) and I need to automate more !

So I added more features, parallelised the script a bit, scraping more pages and more conditional execution to adapt to each project (some have building instructions, others have logs, some have nothing...)

So here is the new version in its whole ugliness ! (remember kids, don't do this at home, yada yada)

#!/bin/bash

MYHACKERNUMBER=4012 # Change it !

fetchproject() {
  mkdir $1
  pushd $1

    # Get the main page:
    wget -O main.html "https://hackaday.io/project/$1"

    grep '<div class="section section-instructions">' main.html &&
      wget -O instructions.html "https://hackaday.io/project/$1/instructions/" &

    # Get the images from the gallery
    wget -O gallery.html "https://hackaday.io/project/$PRJNR/gallery"
    grep 'very-small-button">View Full Size</a>' gallery.html |\
    sed -e 's/.*href="//' \
        -e 's/".*//' |\
    tee images.url
    [[ "$( <  images.url )" ]] && ( \
      mkdir images
      pushd images
        wget -i ../images.url
      popd
    ) &

    # Get the general description of the project
    detail=$(grep 'show">See all details</a' main.html|sed 's/.*href="/https:\/\/hackaday.io/; s/".*//')

    if [[ "$detail" ]]; then
      echo "getting $detail"
      wget -O detail.html "$detail"

      # list the logs:
      grep 'https://hackaday.io/project/.*/log/' detail.html|\
      sed -e 's/.*<strong>Logs:<\/strong>//' \
          -e 's/<br>/\n/g' \
          -e 's/<p>/\n/g'|\
      grep '^[0-9]*[.] <a ' |\
      tee index.txt

      sed 's/.*href="//' index.txt |\
      sed 's/".*//' |\
      tee logs.url

      if [[ "$( <  logs.url )" ]]; then
        mkdir logs
        pushd logs
          wget -i ../logs.url &
        popd
      fi
    fi
  popd
}

######### Start here #########

DATECODE=$(date '+%Y%m%d')
mkdir $DATECODE
pushd $DATECODE

  wget -O profile.html https://hackaday.io/hacker/$MYHACKERNUMBER

  # List all the projects:
  wget -O projects.html https://hackaday.io/projects/hacker/$MYHACKERNUMBER
  #stop before the contributions:
  sed '/contributes to<\/h2>/ q' projects.html |\
  grep 'class="item-link">' |\
  sed -e 's/.*href="\/project\///' -e 's/".*//' |\
  tee projects.names

  ProjectList=$( < projects.names )
  if [[ "$ProjectList" ]]; then
    for PRJNR in $ProjectList
    do
      ( fetchproject $PRJNR ) &
    done
  else
    echo "No project found."
  fi
popd
I still have to make a better system to save the logs, I have an idea but...

PS: it's another quick and dirty hack, so far I'm too lazy to look deeply into the API. It's also a problem of language since bash is not ... adapted. Sue me.

OTOH the above script works and does not require you to get an API key.

Discussions