JAVA image scraper using jsoup – unleash the power of jsoup

Image scraper? Yes it can be done in just few lines of code with JAVA and jsoup!

Today and for educational purposes only I will show you how to create a simple java based console image scraper application that scrapes images from websites using the powerful jsoup library.
First of all click HERE and download jsoup – it is an open source and the most powerful HTML parsing library out there and the heart of our image scraper.
Once downloaded you have to include the jar in your project and you’re good to go. Here is the main class that will do the whole work for you:
image Scraper:

[code language=”java”] 
public class Scraper {

//The url of the website
private static final String webSiteURL = "http://www.website.com/";

//The path of the folder that you want to save the images to
private static String folderPath = "";
static Integer lastPage = 0;
static Integer LP = 0;

public static void main(String[] args) throws UnknownHostException {

Scanner in = new Scanner(System.in);
System.out.print("Please enter the folder path: ");
final String path = in.nextLine();
folderPath = path;

System.out.print("Please enter the ammount of pages to be scraped: ");
final Integer lp = in.nextInt();
LP = lp;

// start downloading loop
for (int i = 1; i <= LP; i++) {
try {

//Connect to the website and get the html – take a look at "page/" THis is how teh pages are seperated
Document doc = Jsoup.connect(webSiteURL + "page/" + i).get();

//Get all elements with img tag ,
Elements img = doc.getElementsByTag("img");

for (Element el : img) {

//for each element get the srs url
String src = el.absUrl("src");

System.out.println("Image Found!");
System.out.println("src attribute is : " + src);

getImages(src);

}

} catch (IOException ex) {
System.err.println("There was an error: " + ex);
}
}
}

private static void getImages(String src) throws IOException {

int indexname = src.lastIndexOf("/");

if (indexname == src.length()) {
src = src.substring(1, indexname);
}

indexname = src.lastIndexOf("/");
String name = src.substring(indexname, src.length());

System.out.println(name);

//Open a URL Stream
URL url = new URL(src);
InputStream in = url.openStream();

OutputStream out = new BufferedOutputStream(new FileOutputStream(folderPath + name));

for (int b; (b = in.read()) != -1;) {
out.write(b);
}
out.close();
in.close();
}
}
[/code]

Pretty good! Your image scraper is ready!
The above code will open a connection to “website.com”(just an example website) and look for images.If there are any it will download them in the specified by the user folder, when done it will check how many pages there have to be scraped(also user specified) and switch a page. In our example we are using “page/” as a page separator. The above code will download any image of the chosen website and save it to the selected folder.

This image scraper tutorial is made only for educational purposes, please use it wisely!
This is just a small part of the power of jsoup.

Leave a Comment

Your email address will not be published. Required fields are marked *