hindustan times web edition article scraper
Date Month Year
3 digit page number (eg 011) what's this? 3 digit article no (eg 005) eh? how?
Edition  
   
[why]

[how]

   
 
why

The Hindustan Times doesn’t trust its readers — at least not readers of its web edition. That’s the only possible explanation for its utterly silly restrictions on saving important articles. All HT web edition articles are only jpg images — but HT and its consultant, Pressmart, have cobbled together some crazy restrictions that won’t let you save the image. No right clicks. No File Save As.

Why on earth not? Nobody — at least nobody in his right mind — wants to pass off HT’s work as his own. But to protect against the miscreants who do, HT punishes the few who want to save clippings for genuine work: for reference, research, use in court litigations (HT does some exceptional work on contemporary issues). So no picture saving. That’s right: all javascript blockers. Try disabling java and the site doesn’t work. On a usability scale of 1 to 10, this must get minus 7.

So here’s a work around. You don’t have to jump through too many hoops, but you do have to get the article and page number. And here’s where it gets really weird. When you click on an article in the HT Web edition full page thumbnail, you get another window pane. In the ordinary course, the window pane does not show you the article number etc. So you need a workaround for that, too. Luckily, that’s a one-time, non-invasive, non-serious tweaking of some IE and FF settings (see below). Do that, grab the date, page and article number, enter it in the form and voilà! Downloadable, savable image.

 
 
how

By default, HT web edition opens a popup window with the full article you’ve clicked to read.

Typically, this popup does not show you the full URL (all that http:// blather). The usual ‘address bar’ is missing. This is what you need to see:

The address bar has been deliberately hidden. You can view it, though, in IE7, IE6 and Firefox. I recommend Firefox (version 2.0+). Here’s how to enable the address bar in each browser.

First off, though, click the button that says ‘newspaper view’, so:

In the main full page view, you can also click on personalize and select the newspaper view as your default article view (recommended), thus:

By the way, here’s another interesting glitch: the personalize menu has the text view and newspaper view option functions inverted. So if you select text view as your default option, you get the other one. Go figure.

IE6

When the popup comes on, just hit the F11 key. That shows the article in what’s called “full page” view mode, with the address bar and all. Simple. To go back to the default view, hit F11 again. There’s probably another, permanent way of doing this, I just don’t know it.

IE7

If you don’t have IE7, you’re using a pirated copy of Windows. Don’t. Go buy a licensed version and install the IE7 upgrade. IE7 won’t install on an unauthenticated, non-genuine copy. Good for them, and too bad for the cretins who believe that software should be free.

If you’re one of the sensible types and do have IE7, click on Tools, then Internet Options, then go to the Security tab. Hit custom level and go down the list of options till you come to something called ”allow websites to open windows without address or status bars”. Set that to disable.

(If you can’t see the menu bar — File, Edit, View, etc — right click on any toolbar and tick on “Menu Bar”).

This is a one-off change, so I prefer it.

Firefox

Make sure you’re running the latest version.

Click on Tools, then Options, then Content, then click onenable javascript” (you need it for the site to work), then click Advanced, and click offDisable or replace context menus”.

Hold. Not done yet. Need to do this once, so grit your teeth and hang on.

In the address bar of Firefox, enter about:config (exactly that).

This is a long, long list of additional tweaks that FF lets you do. It’s alphabetical. Go down to

There is a filter bar that appears below the address bar. Type dom. in the filter. You now have a shorter list.

Find the one that says dom.disable_window_open_feature.location. It is set to false by default. Right click on it and click on toggle. It sets to true.

That’s it.

What to look for in the pop up

You already know the date of the newspaper and the edition (Mumbai, Delhi, Chandigarh), but there’s some additional information you need to get to the image.

The article popup’s URL, which you can now see below the title bar, is peculiarly coded. Typically, it looks something like this:

http://epaper.hindustantimes.com/ArticleImage.aspx?article=28_11_2007_001_017&mode=1

What’s important in this are the last two three-digit numbers.

http://epaper.hindustantimes.com/ArticleImage.aspx?article=28_11_2007_001_017&mode=1

These are, respectively, the page number and the article number.

THIS IS ALL THAT YOU NEED TO NOTE.

Once you have this, just go to the form, enter the date and edition using the fields and carefully type in the three-digit page and article numbers. Hit the submit button.

Magic, isn’t it?

Moral of the story: just because you’re paranoid doesn’t mean we’re not out to get you …