Close
0%
0%

HackaDump

Just some dumb bash scripts to backup all my project's pages at http://hackaday.io

Similar projects worth following
This great website invites us to add more and more contents and our precious time gets invested in a "write only" system. Without a backup and restore function, our projects can only live on their servers and... that's all.

Because it is always too late to back things up, I have written a script to easily dump my work on my own (Linux) computer. It started humbly but it's slowly getting more mature, with each tiny adaptation and fix.

Yet it's just a hack, it lacks features, it doesn't save every detail and it doesn't even use the API (https://dev.hackaday.io). By using this code, you confirm that you understand these limitations and the inherent risks and you take responsibility for all the consequences. Oh, and being fluent with bash scripting is necessary.

Be careful and mindful, kids!

As the #Discrete YASEP grows, logs accumulate and more than 40^W50^W60 logs contain a lot of reflections and experiments, with illustrations of all kinds. With so much work that depends on the goodwill of others, it would be very sad if "something happened" right ?

"Better safe than sorry" so I asked a backup function on the #Feedback - Hackaday.io channel (we'll see the restore function later) but I couldn't wait.

In fact, it's not difficult at all to dump the essential parts of the project, but the script is crude, not optimised for speed, it will break if the website changes its underlying layout, links are saved but not easy to restore, and the styles/presentation is not preserved.

At least I don't have to type/remember everything "if something happens".


The current script has a big shortcoming: it relies on a carefully edited list of log links on the "details" page. Maybe I'll update this one day but so far "it works".

Done: save the personnal "pages" (only the last 3 were saved, before, from the user's main page).

It's a bit slow but as is, there is no risk of flooding the server. And speed is not important if it's run every few weeks...

Feedback welcome, until HaD provides us with a clean, official way to backup (and restore) :-)

PS: the project's logo is a screenshot of the source code of the HaD pages, for those who have never looked at that code ;-)


Logs:
1. better, faster, fatter
2. Files are now supported
3. Some updates and enhancements
4. Formatting guidelines
5. Some more script fun
6. 404
7. Broken script

backup_pages_contrib.sh

(20170122) Like 20160925 and also saves the contributed to projects

x-shellscript - 7.42 kB - 01/23/2017 at 00:09

Download

backup_pages.sh

Latest version (20160925), cleaner, fixing many changes of the page layout

application/x-shellscript - 5.81 kB - 09/25/2016 at 22:56

Download

backup_serial.sh

(*OBSOLETE*) a safer, serialising version, to make errors easier to spot.

x-shellscript - 3.33 kB - 05/10/2016 at 04:15

Download

backup_profile.sh

(*OBSOLETE*) The script (version 20160404)

x-shellscript - 3.27 kB - 04/04/2016 at 06:37

Download

  • 1 × bash
  • 1 × wget
  • 1 × sed Discrete Semiconductors / Diodes and Rectifiers
  • 1 × grep
  • 1 × Brain (usually located between your ears)

View all 6 components

  • Broken script

    Yann Guidon / YGDES09/25/2016 at 15:02 0 comments

    The layout of the site has changed and I didn't see that my regular backups didn't work well.

    Until today.

    I'm updating the pattern matching, please stay tuned...


    pages: check. Nor sure how to deal with more pages though but the limit is not yet reached.

    projects: a few things have been broken, let's investigate step by step.

    • main project page: OK (though could still get some more cleanup)
    • components: OK
    • instructions: OK
    • images/gallery: OK

    TODO:

    • ensmarten wget (inside a function/procedure) so it detects failures and errors that are not reported as 404 by the server !
    • cleanup the HTML files to remove most of the HaD formatting and boilerplates. AT LEAST remove that huge HaD logo in ASCII art that is easily compressed but takes a lot of room anyway... (done with a grep oneliner)
    • Test if a file has already been downloaded and remove previous identical versions of the past backups... (meaningful for the big files !)

    lun. sept. 26 00:50:45 CEST 2016 : New version online ! It took only 8h to polish it but it's well worth it. UPDATE YOUR SCRIPTS !

  • 404

    Yann Guidon / YGDES05/10/2016 at 04:14 2 comments

    During my last run, I briefly saw 404 errors but couldn't make sense of them because the script output was scrambled between different commands.

    These last days/weeks, I've noticed more transient errors on hackaday.io and I have to find a way to wait and retry if the page fails to load the first time...

    Until then, I made a different version with all parallelising removed and the output is also saved to a log file, for easy grepping. The new file backup_serial.sh is slower but apparently safer.


    Actually, 404 errors are becoming endemic. One script run can get a few or more and there is no provision yet to retry... I have to code this because several independent runs are required to get a good sampling of the data.

    Some wget magic should be done ...


    New twist !

    No 404 error this time. The page migh load but the contents will be "something is wrong. please reload the page." I should have made a screenshot and saved the page to extract its textual signature...

    I must find a way to restart the download when this error occurs too.

  • Some more script fun

    Yann Guidon / YGDES04/13/2016 at 12:51 9 comments

    I just hacked this. Shame on me !

    Let's say, it might be useful to those who test bash on W10...

    Read more »

  • Formatting guidelines

    Yann Guidon / YGDES04/04/2016 at 05:08 0 comments

    I'm lazy.

    I'm too lazy to implement a proper scraper for log pages, even though I would spare efforts by making some efforts. I have even started to implement a suitable feature for the projects list pages. But the "quick and dirty solution" so far is to list all the project logs by hand, in the "details" page. After all there are other advantages, including easier navigation.

    The script uses grep and sed to recognise a specific pattern that indicates the start of the list. First, note that the elements are separated by a line break, "<br>" code in HTML, so you have to hit "shift+enter" instead of only "enter" (which generates a paragraph "<p>")

    The list starts with a bold keyword, recognised in HTML by: "<strong>Logs:</strong>" (click on the bold B in the edition menu)

    Then the rest of the page should be the list of links. Each link starts at the beginning of each line (remember: shift+enter) with a number (no ordering is checked) followed by a dot and a space, then a link ("<a ") and a line break. Yeah, these are absolute links, so be careful...

    overall the script detects this:

    Logs:
    42. some link
    43. another link

    There are some other minor gotchas so don't hesitate to look at the scraped and sed'ed files named logs.url if something is weird.

    I told you it was dirty...

  • Some updates and enhancements

    Yann Guidon / YGDES03/28/2016 at 19:53 0 comments

    Time for an update !

    • Fixed a parsing issue (the pages have changed a tag from <h2> to <h1>)
    • Support more than one projects page (I was wondering why all my projects didn't get saved.... Now I look at the "next" link to build the list of projects)
    • Kinder to the server, to avoid triggering DOS/flood protection from the image server. It's slower but it's not critical...

    My backups now use several minutes and around 17MB.

    It could be faster because a lot of log pages return "301 Moved Permanently", this should be fixed with a better parsing and directly reading the logs pages (those that are in chunks of 10 logs).

  • Files are now supported

    Yann Guidon / YGDES01/10/2016 at 12:51 0 comments

    Hello HaD crowd !

    The admins have now provided us with a 1GB storage area with a nice listing page, similar to the other resources. I have updated the script to fetch everything AND I've put the new script in the download area.

    Fun fact: when I'll next backup my projects, the script will download itself, if all goes well ;-)

  • better, faster, fatter

    Yann Guidon / YGDES12/15/2015 at 01:24 0 comments

    Today I have 19 projects on hackaday (even after I asked al1 to take ownership of #PICTIL) and I need to automate more !

    So I added more features, parallelised the script a bit, scraping more pages and more conditional execution to adapt to each project (some have building instructions, others have logs, some have nothing...)

    So here is the new version in its whole ugliness ! (remember kids, don't do this at home, yada yada)

    #!/bin/bash
    
    MYHACKERNUMBER=4012 # Change it !
    
    fetchproject() {
      mkdir $1
      pushd $1
    
        # Get the main page:
        wget -O main.html "https://hackaday.io/project/$1"
    
        grep '<div class="section section-instructions">' main.html &&
          wget -O instructions.html "https://hackaday.io/project/$1/instructions/" &
    
        # Get the images from the gallery
        wget -O gallery.html "https://hackaday.io/project/$PRJNR/gallery"
        grep 'very-small-button">View Full Size</a>' gallery.html |\
        sed -e 's/.*href="//' \
            -e 's/".*//' |\
        tee images.url
        [[ "$( <  images.url )" ]] && ( \
          mkdir images
          pushd images
            wget -i ../images.url
          popd
        ) &
    
        # Get the general description of the project
        detail=$(grep 'show">See all details</a' main.html|sed 's/.*href="/https:\/\/hackaday.io/; s/".*//')
    
        if [[ "$detail" ]]; then
          echo "getting $detail"
          wget -O detail.html "$detail"
    
          # list the logs:
          grep 'https://hackaday.io/project/.*/log/' detail.html|\
          sed -e 's/.*<strong>Logs:<\/strong>//' \
              -e 's/<br>/\n/g' \
              -e 's/<p>/\n/g'|\
          grep '^[0-9]*[.] <a ' |\
          tee index.txt
    
          sed 's/.*href="//' index.txt |\
          sed 's/".*//' |\
          tee logs.url
    
          if [[ "$( <  logs.url )" ]]; then
            mkdir logs
            pushd logs
              wget -i ../logs.url &
            popd
          fi
        fi
      popd
    }
    
    ######### Start here #########
    
    DATECODE=$(date '+%Y%m%d')
    mkdir $DATECODE
    pushd $DATECODE
    
      wget -O profile.html https://hackaday.io/hacker/$MYHACKERNUMBER
    
      # List all the projects:
      wget -O projects.html https://hackaday.io/projects/hacker/$MYHACKERNUMBER
      #stop before the contributions:
      sed '/contributes to<\/h2>/ q' projects.html |\
      grep 'class="item-link">' |\
      sed -e 's/.*href="\/project\///' -e 's/".*//' |\
      tee projects.names
    
      ProjectList=$( < projects.names )
      if [[ "$ProjectList" ]]; then
        for PRJNR in $ProjectList
        do
          ( fetchproject $PRJNR ) &
        done
      else
        echo "No project found."
      fi
    popd
    I still have to make a better system to save the logs, I have an idea but...

    PS: it's another quick and dirty hack, so far I'm too lazy to look deeply into the API. It's also a problem of language since bash is not ... adapted. Sue me.

    OTOH the above script works and does not require you to get an API key.

View all 7 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 01/19/2022 at 05:07 point

https://dev.hackaday.io/doc/api exists.

my motivation though is not sufficient...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/23/2017 at 19:08 point

The new page layout breaks my script :-(

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/23/2017 at 19:23 point

Apparently, WGET_HTML() removes WAY TOO MUCH from the page, the <body> has disappeared...

  Are you sure? yes | no

RoGeorge wrote 07/24/2017 at 03:40 point

Welcome to the club!

My scripts for gathering statistics were broken too.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/11/2016 at 05:09 point

Well well well, does the script save the page banner picture ?....

  Are you sure? yes | no

Dave Gönner wrote 05/21/2016 at 09:49 point

Thanks for the script mate, works like a charm!

  Are you sure? yes | no

Yann Guidon / YGDES wrote 05/21/2016 at 11:19 point

Wonderful :-)

  Are you sure? yes | no

Eric Hertz wrote 04/21/2016 at 08:21 point

My first run was a couple days ago, looks pretty useful and grabbed the vast-majority of my stuff with only a single line change :)

There were some error messages, but didn't interfere with the process, I'll let you know if they're anything important.

Thanks for sharing this, yo!

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/21/2016 at 08:26 point

Wow, I'm surprised someone actually run it at home :-D

Depending on your projects, the script requires some tuning : you have more projects than me so you should be even more careful to not hammer the server ;-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/21/2016 at 08:35 point

BTW did you recover/save all the project logs ? I use the Formatting guidelines https://hackaday.io/project/8536/log/35174-formatting-guidelines in the details page of each project, otherwise it won't save everything... You'll have to try and see, check the intermediary files.

  Are you sure? yes | no

Eric Hertz wrote 04/21/2016 at 09:46 point

Ah hah! Thanks for that heads-up, I just ran it as-is as a quick/emergency backup in case my #"From Nerd to Criminal in Seven Easy Years" turned out more-negative than it did ;)

But, I do want to run a complete backup more regularly, so I'll probably be doing some looking into that in the near future. Why, if you don't mind my asking, don't you just have it download directly from the logs page?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/21/2016 at 10:01 point

Well there is a question of ... programming convenience.

One day I'll figure out the proper algo to harvest the logs correctly. For now I'm lazy but I think I have a hint with the algo that checks the next projects page...

For now I'm routing a PCB :-D

  Are you sure? yes | no

Ivan Lazarevic wrote 11/23/2015 at 19:17 point

good idea. maybe you should try to use api for this https://dev.hackaday.io/

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/23/2015 at 20:29 point

Damn ! why do I find this AFTER I spent time doing this ?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/24/2015 at 11:46 point

I just went to https://dev.hackaday.io/applications and created a key for hackadump.
Maybe I'll figure out how to use the keys in my scripts ? :-)

  Are you sure? yes | no

jaromir.sukuba wrote 11/23/2015 at 16:55 point

Oh, here we go. I have to try it out.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/23/2015 at 20:27 point

use wisely :-)

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates