The World English Dictionary defines reconnaissance as “the process of obtaining information about the position,activities, resources, etc, of an enemy or potential enemy”. And what we adopt the word for, when it comes to hacking or penetration testing is extremely close in concept. We are, in essence, obtaining information about a potential enemy.
As discussed previously, there are two main types of recon: active and passive. Let's focus on passive reconnaissance and see exactly what kind of information we can dig up.
LOOKING BACK TO SEE THE FUTURE
There are times when, as a penetration tester, you are going to wish you could turn back time and look things up that were on the Internet BEFORE you ever showed up to try to bust into your customers' network. For instance, let's say you know that the customer had an Org Chart posted on its web site six months ago. And let's say this Org Chart holds crucial data you need to pull of a social engineering attack.
What do you do? Shrug your shoulders and give up?
No way!
You crack open your favorite web browser and point it to
www.archive.org. The Internet Archive keeps snapshots of web sites as they change and archives them for you, the user, to browse at any time.
The tool is called The WayBack Machine (Yes, Mr. Peabody would be proud) and it's found in the middle of the archive.org website as shown here:
In this example, we're attempting to use The WayBack Machine to see what Google looked like years ago.
You simply type in the URL of the web site you would like to browse and The Wayback Machine will present you with a calendar that displays which snapshots are available to you.
Just pick a copy and you'll be able to view it. For instance, here's what Google.com looked like back in 1998.
And here's what Google.com looked like on July 4, 2002.
So, that Org Chart we were talking about before? With The WayBack Machine, that Org Chart's not out of reach at all. You can get the information you need. You can launch that attack. You can have the win.
NEED A CHECK? YOU GOT CACHE
Google cache is another tool that we, the smart hackers can use to gather information on a particular target. A quick refresher: Google's ability to have such a mind-numbingly granular and spot-on search tool comes from it's spidering ability and the databases that Google keeps of what those spiders come across on the web. These programs crawl the web and index just about everything they come across. Google keeps a current version of these indices and a cached version. You, as a web browser, can use these cached versions to check out older data.
For instance, if you're looking to find a tweet that a potential target sent out yesterday that you know contained some comprimising information about their location, or that they were going to be out on vacation and they subsequently deleted this morning, you could use the Google Cache function to find it.
In order to use, simply type in the URL you are looking for into Google's and preface it with the phrase “cache:”. So, let's say we were looking to find the latest cached version of Hack On A Dime's write up on “Other Essential Peripherals”, I'd put the following URL into the Google Search box:
And we'd be presented with the following web page. Note the information banner
across the top of the page, telling us when the snapshot was taken and giving us a link to the current page.
CUSTOM DICTIONARIES
Let's talk about logic for a minute.
Pretend that you're a penetration tester and you've discovered that your customer's stood up a WEP-encrypted wireless access point. And you want to use some of the techniques we will discuss in later lessons to crack WEP and then join their wireless network as one of their nodes.
Tools like aircrack-ng crack WEP passwords pretty quickly when it comes to words that occur fairly frequently in the wild. However, if our target company is named something outside the norm, let's say “T3aching Guru$”, and their network administrator has stood up a WEP-encrypted wireless access point with a password of “T3achingStudents” or “Guru$Students” or some other variation
of their name. It would take a fairly long time for aircrack-ng to compute these, if at all.
However, we smart hackers can take advantage of public information to speed this process up. How? By creating a custom dictionary that will contain some of this information, some of these specialty phrases in it.
Now, this sounds like a daunting task, trying to figure out all the different phrases that our target company could use. However, odds are that if they are
using any phrases that are outside the norm of our language, they are probably using things like their company name or their company's products.
How can we compile a list of all these phrases and ensure we've got a good percentage of them for our own “nefarious” uses?
Well, I ask you this: where does a company advertise the most about themselves?
That's right: their website. On most companies' web sites, you find the most
information about them, their products, their founders, the cities they exist in, their customers, whatever.
So, the best play for us smart hackers to run is for us to use scripts and tools to
download their web site into a custom dictionary. This will allow us to get “the best bang for our buck” when it comes to catching their “proprietary phrases” as I like to call them.
My favorite tool for this job (and one of the quickest, I find) is a script called
“WLAuthor” (stands for Word List Author). It's a perl script and you can find it on the securityexperiment.com web site here:
Here is the help screen for Word List Author or WLAuthor
A friend of mine runs a web site with a fairly unique name:
www.argofax.com.
If this were the name of a target company and we wanted to build a custom dictionary that would include all the various interpretation of the word “argofax” (a word NOT found in our common English language), our best bet would be to run a script like WLAuthor against argofax.com.
Now, if we were to look at our resulting argofax.lst file, or use “grep” against it
to match certain things, we would see that our dictionary (that we can now use in aircrack-ng or other password cracking tools) contains a lot of specialized words that maybe don't ordinarily appear in the English language.
So, now, we've successfully built ourselves a custom dictionary containing entries
that are “specialized” for our specific target company.
ENUMERATING THE NETWORK FROM THE OUTSIDE
Let's face a simple fact: to the smart hackers, information is key. To quote one of my favorite television shows, “The Prisoner”, “We want information.” Be it usernames, passwords, printer names, emails, phone numbers or server names, we want information.
There are a few other tools out in the wild that will allow us to gather information on a potential target and they do fine jobs. But there is one tool that, in my mind, rises above the rest when it comes to enumerating potential target data from the outside, looking in.
That tool is FOCA. FOCA stands for Fingerprinting Organization with Collected Archives. What does this mean? Well, FOCA takes 3 easy steps that we smart hackers must take and combines them into a single, easy tool. FOCA scans a web site for any and all documents uploaded to it: MS Word (doc, docx), MS Excel (xls, xlsx), PDF, and a whole slew of others. After it enumerates all the docs on the web site, FOCA downloads them all to your local hard drive. After that, FOCA parses all the METADATA from these docs and provides for you an easy-to-browse interface that will tell you all the emails contained in the docs and their metadata. FOCA will enumerate all the usernames embedded in the metadata. FOCA will enumerate all the shared folders documented in the metadata. FOCA
will enumerate all the printers contained in the metadata.
In essence, FOCA allows you, the smart hacker, to enumerate a LOT of information from the network, BEFORE YOU EVEN TOUCH IT.
FOCA is windows-based but it does run under Wine, so you will be able to run it on your BackTrack-based laptop. Once you've installed it and got it running, FOCA is a fairly simple to use interface.
Step 1. Create a new project
FOCA saves all its data in easy-to-use projects. Every web site you hit with this tool, create a new project to save the resulting data. You will also need
to designate a folder where FOCA will save all of the documents it's going to download from the web site.
FOCA needs to know which web site you're interested in scanning or else it won't know where to go and download docs from. Enter into the URL into the appropriate text box and then hit the button marked "Create". You will then be presented with FOCA's main screen.
Step 2. Enumerate the Documents
FOCA will crawl the web site you're investigating and will detail for you all the files uploaded and hosted by it.
Step 2. Download the Documents
Once FOCA is completed scanning for documents on the web site, download all the documents (sometimes you only want a few, sometimes you'll take
everything you can)
You will know when the files are being downloaded. You'll see a progress bar as it downloads.
Step 3. Extract Metadata
FOCA will then scan the thousands of documents you've downloaded and will enumerate all sorts of useful reconnaissance information that you, the smart hacker, needs.
Things like which operating systems are in use on the network:
Things like which software is being used on the network:
Things like usernames:
Things like emails:
Things like Shared Folders:
Things like Printers:
But more importantly, we are able to enumerate server names from the printer shares and folder shares.
SUMMARY
Prior to taking on any Pen Testing task, we need to be smart and do our homework. We need to make sure we know information before we even THINK about penetrating the network.
These tools above (and others that are out there) are meant to help us perform our job quicker, easier and more efficient. Next up a quick tutorial on how to install FOCA in a WINE environment on an Ubuntu-based (or BackTrack) box.