Hi everybody,

i'm trying to pull data from the web (via rss) directly into a dataset. i've got that so far:

libname in xml 'C:\\students.xml';
data students;
set in.students;
run;

but now i have to download the xml file and than parse it. in another script i've testet this:

%let feed_url = %nrstr(http://www.xing.com/generated/rss/de-net27452-rssfeed2.0.xml);
filename rss_feed url "&feed_url" debug lrecl=32000;
data rss_data;
  infile rss_feed length=len;
  input xml $varying32000. len;
run;

which works okay, but the script puts every data in one column (xml). can someone help me to combine those scripts, so that i can pull the data directly from the web and have serveral columns for each information? additionally i would like to add an id (autoincrement-like) to each row.

thank u for every hint ;)

asked 29 Jan '12, 11:27

ele's gravatar image

ele
113
accept rate: 0%

edited 06 Nov '12, 14:29

admin's gravatar image

admin ♦♦
13


Hi Ele,

I think this best way to approach this is with the XML libname and an XML map.

You can assign your RSS feed to a library like this:

libname rss xml xmlfileref=rss_feed xmlmap=xml_map;

Where "bounce" and "xml_map" are filenames you've set up.

You an create XMl maps using the SAS XML Mapper tool, details of which can be found in the following paper: http://www2.sas.com/proceedings/sugi29/119-29.pdf

One thing you'll notice quite quickly if you try to map this RSS feed is that it doesn't have a unique top level element. This is a problem, as SAS expects one.

You can get around this by re-formatting the XML before reading it in throug the map, the simplest approach is just to delete all lines until you find a useful top level element.

A basic program to do all this would look a bit like this:

/* Simple program to read useful information out of an RSS fee (XML) */
%let feed_url = %nrstr(http://www.xing.com/generated/rss/de-net27452-rssfeed2.0.xml); 
filename rss_feed url "&feed_url" debug lrecl=32000;

/* Bounce the rss onto a temporary file and remove all the lines until we find a useful root element*/
filename bounce temp lrecl=32000;
data _NULL_;
    retain delete 1;
    file bounce;
    infile rss_feed;
    input;
    if _INFILE_ = "<channel>" then delete = 0; * Change this flag to stop deleting records;
    if delete = 0 then put _INFILE_;
run;

/* Define an XML map to read in the data */
filename xml_map temp;
data _xml_map;
    file xml_map;
    put
            '<?xml version="1.0" encoding="windows-1252" ?>'
        /   '<SXLEMAP xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" name="TreeMap" version="1.2" xsi:noNamespaceSchemaLocation="http://www.sas.com/xml/schema/sxle12.xsd">'
        /   '    <TABLE name="items">'
        /   '       <TABLE-PATH syntax="XPath">/channel/item</TABLE-PATH>'
              /   '        <COLUMN name="ID" class="ordinal">'
              /   '           <INCREMENT-PATH syntax="XPath">/channel/item</INCREMENT-PATH>'
              /   '           <TYPE>numeric</TYPE>'
              /   '           <DATATYPE>integer</DATATYPE>'
              /   '        </COLUMN>'
        /   '        <COLUMN name="title">'
        /   '            <PATH syntax="XPath">/channel/item/title</PATH>'
        /   '            <TYPE>character</TYPE>'
        /   '            <DATATYPE>string</DATATYPE>'
        /   '            <LENGTH>200</LENGTH>'
        /   '        </COLUMN>'
        /   '    </TABLE>'
        /   '</SXLEMAP>'
    ;
run;

/* Read the parsed data into SAS */
libname rss xml xmlfileref=bounce xmlmap=xml_map;
proc sql;
    create table items as
    select *
    from rss_get.items
    ;
quit;

Oh yes, you can define a column with "class=ordinal" to count elements too!

Hope that helps, -- Will

link

answered 31 Jan '12, 06:43

WilliamDobson's gravatar image

WilliamDobson
567114
accept rate: 36%

thank u very much!!

link

answered 07 Feb '12, 05:14

ele's gravatar image

ele
113
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×4
×3
×3

Asked: 29 Jan '12, 11:27

Seen: 1,294 times

Last updated: 06 Nov '12, 14:29

powered by OSQA