Close

404

A project log for HackaDump

Just some dumb bash scripts to backup all my project's pages at http://hackaday.io

yann-guidon-ygdesYann Guidon / YGDES 05/10/2016 at 04:142 Comments

During my last run, I briefly saw 404 errors but couldn't make sense of them because the script output was scrambled between different commands.

These last days/weeks, I've noticed more transient errors on hackaday.io and I have to find a way to wait and retry if the page fails to load the first time...

Until then, I made a different version with all parallelising removed and the output is also saved to a log file, for easy grepping. The new file backup_serial.sh is slower but apparently safer.


Actually, 404 errors are becoming endemic. One script run can get a few or more and there is no provision yet to retry... I have to code this because several independent runs are required to get a good sampling of the data.

Some wget magic should be done ...


New twist !

No 404 error this time. The page migh load but the contents will be "something is wrong. please reload the page." I should have made a screenshot and saved the page to extract its textual signature...

I must find a way to restart the download when this error occurs too.

Discussions

Eric Hertz wrote 05/19/2016 at 10:54 point

Personally, I'm not too fond of parallelizing in scripts... it makes output dang-near unparseable... But there may be ways around that... I recently learned about e.g. 'tee' which could probably be used to output each parallel process into a different log-file... Still, things like watching the scroll-bar in download-processes is nice. Sometimes I create new console windows for separate processes and redirect the output to 'em, but that's a bit OS-specific and a bit confusing because the process isn't actually *running* there, it's just outputting its data there. Oh, and parallelizing makes 'resume' dang-near impossible even if you plan to write a new script to handle a specific failure... But I'm no expert.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 05/19/2016 at 11:45 point

If you accept the slower operation, the latest script is pretty chill :-)

Now I might have to redesign "$WGET" to make it a function that calls wget repeatedly a few times in case of 404...

But I'm lazy and the latest run has encountered no error so it will be for another day.

  Are you sure? yes | no