It’s good to keep backups of website’s HTML and other assets. A common way to do backups, if you’re not using some sort of version control system like Git, is to make a zip of the entire document tree. Usually it’ll just get called “website.zip” or maybe “website-20180810.zip” or whatever the current date is.
It’s a fine way to take a snapshot, but don’t leave it on your web server in your website’s document tree. The document tree is that folder where you upload the files, like /sites/mysite. If you make a zip or tarball or similar and leave it as /sites/mysite/mysite.zip, you’re asking for it to be stolen by bad guys. Maybe you’ve got PHP files in there that have secrets in them, like connection passwords to your database. Maybe you’ve got original work files like the .psd files that you created your .jpg files from. If you don’t want it seen, don’t put it in your document tree.
“No way, nobody knows it’s there”, you may think. You don’t link to the backup file anywhere, and there’s no directory listing on the server. This idea is called “security through obscurity”, and it’s not security at all. It turns out that the bad guys don’t have to know a file is there. They just have to make a lucky guess.
Here’s what brought this to mind: Today I was looking through the error log for a website I work on and noticed a series of 404s, where someone at the same IP address in China was asking for files that didn’t exist. Here are some of the files that this bot was looking for:
Pretty logical way to start, looking for likely filenames, in all three of .zip, .tar.gz and .rar formats. Then they wen’t looking for duplicates that had to have a sequence number added.
Then they looked for variations on the site’s name (www.example.com for this example) and today’s date. Again, they’re trying all the archive formats, and now adding the .7z zip format.
/www.example.com20180810.zip /www.example.com20180810.tar.gz /www.example.com20180810.rar /www.example.com20180810.7z
Now they’re trying all sorts of variations on the filename, and they’re making guesses as to what a subdirectory might be called.
/www.example.com2018.zip /www.example.com2018.tar.gz /www.example.com2018.rar /www.example.com2018.7z /www.example.com/examplecom.zip /www.example.com/examplecom.tar.gz /www.example.com/examplecom.rar /www.example.com/examplecom.7z /www.example.com/_example_com.zip /www.example.com/_example_com.tar.gz /www.example.com/_example_com.rar /www.example.com/_example_com.7z /www.example.com/examplecom.zip /www.example.com/examplecom.tar.gz
On Apache the document folder is often called public_html so that’s a good thing to try.
/www.example.com/public_html.zip /www.example.com/public_html.tar.gz /www.example.com/public_html.rar /www.example.com/public_html.7z
Maybe the target site uses underscores? It’s worth a shot.
/_example_com20180810.zip /_example_com20180810.tar.gz /_example_com20180810.rar /_example_com20180810.7z
And on and on it went, with all sorts of guesses at naming schemes. And why not? It’s free to make a guess. They tried all sorts of variations, like these examples:
/_example_com2018.zip /_example_com (2).zip /example20180810.zip /examplecom20180810.zip /examplecom.zip /example_com20180810.zip /example_com.zip /example2018.zip /example/examplecom.zip /example/_example_com.zip /example/examplecom.zip /example/example_com.zip /example/example.com.zip /example/example.zip /example/wwwroot.zip /example/www.zip /example/web.zip /example/public_html.zip /example/2018.zip /example/2017.zip /example.com20180810.zip /example.com.zip /example.zip /example (2).zip /ww.zip /wwww.zip /wwwweb.zip /wwwroot2018.zip /wwwroot.zip /www.zip /public_html.zip /htdocs.zip /ftp.zip /freehost.zip /flashfxp.zip
In under three minutes, the bot tried 307 different variations trying to find a backup file in my root directory. This is why security through obscurity doesn’t work. The bot doesn’t have to know, but just has to have a lucky guess.
Stay safe. Keep your backups out of the document tree, or better yet, off the web server entirely.