Retrieve Data from a Remote Webpage

PHP’s file() functions are great for opening, reading, writing to, and doing other dirty tricks with files. Any html or php page is of course a text file so we can open them and extract data in many different ways. Here are just a couple.

Retrieving meta tags from a remote webpage

PHP has a useful function called get_meta_tags that allows you to read meta data as an array and then extract certain elements.

As an example, the following code will snatch the meta data from www.waypoints.ws.

$tags = get_meta_tags('http://www.waypoints.ws/');
$keywords = $tags['keywords'];
$description = $tags['description'];
$author = $tags['author'];

echo "<b>Description:</b> $description<br>";
echo "<b>Keywords:</b> $keywords<br>";
echo "<b>Author:</b> $author<br>";

Will return this:

I use a variant of the get_meta_tags function to display page information as part of a preview function for fat.ly short URL service.

Retrieving title tags from a remote webpage

The get_meta_tags function will only parse data above the closing head tag in your page so it’s relatively quick. The function won’t read the title tag of your page – so to do that we open up the entire page and extract the desired text. This process is far more time consuming.

First, we open the page for reading. I’m using fopen() but you could just as easily use file_get_contents().

<?php
$url = "http://www.google.com.au/";
$page = fopen($url, 'r');
$content = "";
while( !feof( $page ) ) {
$buffer = trim( fgets( $page, 4096 ) );
$content .= $buffer;
}
?>

Second, we extract the text between the title tags.

eregi('',$content,$tmp);
$result = ereg_replace('[[:blank:]]',' ',$tmp[1]);
echo "$result";

Retrieving any text data from a remote webpage

Using the same code above, we could extract any text element of a page between two defined and unique strings of text. This means you that you can effectively snatch portions of remote web pages for inclusion into your own site. Before you did anything remotely resembling this you would should ensure you have permission to do so. Not doing so is theft.

<?php
$url = "http://www.your-URL-in-here.com/";
$page = fopen($url, 'r');
$content = "";
while( !feof( $page ) ) {
$buffer = trim( fgets( $page, 4096 ) );
$content .= $buffer;
}

$start = "<text1>";
$end = "<\/text2>";

preg_match( "/$start(.*)$end/s", $content, $match);
$mytext = $match[1];
echo "$mytext";
?>

Retrieving header information from a remote webpage

A developer can extract header information from a remote page using the following code:

<?php
$fp = fopen('http://www.google.com', 'r');
// Creates variable $http_response_header
print_r($http_response_header);
// or
$meta_data = stream_get_meta_data($fp);
print_r($meta_data);
?>
First Name:
Your Email Address:
 




Download: Retrieve Data from a Remote Webpage
Description: Retrieve Data from a Remote Webpage.
Author:Marty
Category: PHP code
Date: January 8, 2010



If you liked this article, you may also like:

  1. Determine the Status of a Remote Webpage and Retrieve the HTTP Status Code
  2. Open Graph Objects & Facebook – Creating Rich Snippets to Share on Facebook and Google Plus
  3. Include Twitter Data on your Website
  4. Replace YouTube Links in Text with Embed Code (and Retrieve Video Details via the JSON API)
  5. Cache Your Website with PHP (in 1 minute)
About Marty

is a passionate web developer from Sydney, Australia. He owns about 600 websites and makes a healthy living from working the web. As a day job, he works as a pilot for an international airline. Follow Marty on Twitter or Google+.

Comments

  1. When i executes your code the following error occurs..
    This page contains the following errors:

    error on line 2 at column 1: Extra content at the end of the document
    Below is a rendering of the page up to the first error.

  2. sorryyyy… my previous comment did not meant for this post… i got that error when i run you this gmail rss code..
    http://www.internoetics.com/2012/08/08/create-a-gmail-rss-feed/#comments

    sorry again

Please leave a comment or question!

*