About this book
RSS and Atom are the most widely used of many content syndication formats that have developed over the last few years to address the need to distribute and receive streams of content from websites and applications. Sites syndicate content for a broad variety of reasons, from replacing email as a medium for outbound contact to updating satellite sites. Each format has
1.4.2 Other Content and Metadata
Content: Quotations and Pointers
Syndication formats are not content formats; they use existing formats for content: simple text, HTML, XHTML, other XML vocabularies, and also other text and binary media formats. These formats are used for titles, summaries, and the partial or complete reproduction of the content.
also read:
One of the characteristics of newsfeed models is that the description itself is defined in as generic a nature as possible. For this reason, it is possible to include any type of content in that description. In a syndication feed, any kind of web content can be sampled and further distributed. That is why RSS and its relatives are also suitable as a universal publication format on the Web.
Metadata in Syndication Formats
Syndication formats serve to exchange information and make it available in different forms. For this reason, they describe the information they contain in a way that allows other users to use it; at the same time, they also inform the users of the legal and other imits connected to using their information, like the identification of publication and update data, the categorization of content, and the identification of writers, authors, and copyright holders.
RSS as a Publication and Syndication Format
Even though all existing feed formats require an element called link, it is possible that the information in a news stream isn’t to be found outside the RSS feed, meaning that the RSS feed not only refers to another resource, but also contains the original information. The description model of an addressable collection of updatable information objects on the Web, on which RSS is based, works no matter whether these objects exist only in the RSS document, or are referred on other resources on the Web. In principle, every resource on the Web that can be modeled as a collection of updated information objects can be the subject of an RSS feed.
1.5 Syntax: RSS as an XML Format
Many websites identify their newsfeeds through an orange-colored button labeled “XML.” For many users and also for many developers “XML” and “RSS” are synonymous. In fact, all versions of the RSS feed format and Atom are XML applications. Since XML itself is a metalanguage to define languages for the exchange of information on the Web, the feed formats are also often called “XML dialects” or “XML vocabularies”. To date, RSS is the most successful XML vocabulary—except for maybe XHTML, the XML version of HTML.
Standardization and Openness of XML
The biggest advantage of XML in the field of syndication is that XML is a simple, open, and standardized format to exchange information on the Web.
RSS has spread so successfully in recent years not only because it is a particularly effective format, but also because it has established itself as a standard. It acts like a lowest common denominator for updatable information of all kinds, and from the beginning it was accepted as such. Due to the fact that millions of Internet users use RSS to spread and receive information, applications are possible that profit from network implementation and become more useful, the more users use them.
This success would not have been possible without the fundamental features of the underlying technology, XML. XML is a text-based format: people can read XML documents without any great difficulty. The content of XML documents can easily be extracted. In addition, XML is not a proprietary technology that is controlled by any software provider. RSS has inherited these advantages from XML; without them, it would have not been able to spread explosively on the Web. The use of a binary format or a proprietary text format would have complicated the development of software that produces or processes RSS, and limited the market for RSS applications. XML makes it easy to define a format for specific needs. All RSS formats consist of a very small group of XML elements and attributes defined for this purpose, and of rules for the hierarchical connections between these elements. Due to this set of rules (executed as a Relax NG or XML schema), limits for the permitted content of RSS elements can be specified, such as for the format that provides calendar dates.
Separation of Content and Presentation in XML
XML allows for the content and the presentation of documents to be separated. Many XML formats are content formats; they contain no information about how the documents are supposed to be reproduced visually or acoustically. The DocBook vocabulary for technical documentation, for example, uses an emphasis element for important passages and terms. DocBook doesn’t specify, however, how such sections are to be emphasized in print. Other XML languages are description or presentation vocabularies. SVG (Scalable Vector Graphics) describes graphics, SMIL (Synchronized Multimedia Interface Language) describes time-structured presentations, and XSL-FO (eXtensible Stylesheet Language-Formatting Objects) describes the layout of printed pages in detail.
Semantic Distinctions
RSS is a pure text format. An RSS document doesn’t contain information about how a document should be presented to the user. RSS uses XML to semantically distinguish information. Additionally, it uses the possibility provided by XML to separate content and presentation.
All RSS formats are pure source-text-based content formats. This means that it is necessary to provide them with additional presentation instructions that can be adapted to the respective presentation medium. The presentation instructions make it easy to present RSS documents in different media or in different contexts.
Transformability
The simplest method to present RSS is to convert it into HTML and then use an HTML browser or a toolkit to display the HTML. On the one hand, XSLT (XSL Transformations; http://www.w3.org/TR/xslt) can be used with this method to transform XML data into HTML; on the other hand, HTML fragments are frequently included as a part of the content of RSS documents, so an HTML Rendering Engine is necessary anyway to display them. Like all XML documents, RSS documents can also be formatted directly with Cascading Style Sheets. Moreover, there are many other presentation methods; Flash can be used, for example. One example of an RSS document using the latter is Gush (http://www.2entwine.com).
Ability to be Validated
As XML documents, RSS feeds can be checked with standard procedures to determine whether they comply with the rules of the respective format. A document type definition or a schema contains the formal description of the rules that should be checked for compliance.
Internationalization
A document format that is defined as an XML format can use the methods typical to XML to solve problems of internationalization. XML consistently specifies Unicode as the default coding format for the character set. The Unicode standard assigns all the characters from all known alphabets, a number; and by doing so, is able to reproduce texts in any language. 2
If it is important for the process to specify the language in which a document is created, the xml:lang attribute can be used XML-wide. The newer feeds make use of this option.
Extensibility and Namespaces
Extensibility is one of the key aims of XML; the acronym XML doesn’t stand for “Extensible Markup Language” without a good reason. First of all, XML is extensible in that every user can define new element types and attributes, whereas a format like HTML determines the scope of the language.
The developers of all the RSS versions used this feature of XML to define element types like rss (the document or root element of an RSS document), channel, and item.
However, elements and attributes won’t be defined freely any more, if vocabularies like RSS 1.0, RSS 2.0, and Atom are determined and standardized for certain tasks. The formulated and consequently stipulated rules for such vocabularies—in the form of a DTD (Document Type Definition) or a Relax NG or XML schema—allow only certain elements and attributes with determined identifiers in a determined hierarchical order.
The regulation of the content that is permitted for the elements (content models) can nevertheless at the same time, allow embedding elements of other vocabularies in certain locations of a document. This is fundamental for feed formats, in order to allow the inclusion of sections that are formulated in XHTML in a document.
In order to extend documents created in such a vocabulary by adding elements from other vocabularies, a method called the namespace mechanism was developed. All the feed formats described in this book use this mechanism. You need to understand it in order to be able to work productively with these vocabularies. The appendix contains a short introduction to the namespace mechanism (see appendix, section A.3).
1.6 Feed Formats and other XML Formats
Syndication Formats are not News Formats
A comparison of news-specific formats used by news agencies and commercial publishing houses shows that RSS simply can’t be called a news format. The combination NITF/NewsML is increasingly establishing itself there. NITF stands for “News Industry Text Format”. NITF is an XML dialect to identify the components of news content, such as headlines, introductory texts, and names of people and organizations (http://www.nitf.org). NewsML which stands for News Markup Language, is a format for the “wrapper” of news, with information about release dates, the legal situation, etc. (http://www.newsml.org/pages/index.php). NewsML and NITF are based on the model of news in a journalistic sense. For feed formats, these semantics don’t play an important role; their semantics are considerably more abstract.
NewsML and NITF are neither formats for information about the state of a—modifiable—web resource, nor formats for feeds, that is, for documents that summarize different information objects. RSS differs from NewsML and NITF in that all RSS messages refer to resources on the Web, which are identifiable through a URI. It is characteristic for an RSS document to be linked to a complete resource and that the individual information objects may or may not contain links as well.
Essentially, an RSS document is nothing more than a simple, two-level hierarchy of links that are provided with a title and a description. This pattern is so general that it refers to every resource on the Web that is identifiable, that is: which has a URI, which has components that can be labeled, and which changes with time.
Distinction of Message Formats
RSS can also be distinguished from those message formats that have been developed for the purpose of machine-readable data recently. Well-known formats of this kind are XML-RPC and SOAP. These formats mainly serve to exchange Web data that is normally seen by no one. XML-RPC addresses functions of program operation on distant computers (See http://www.xmlrpc.com/spec/ ). SOAP is a format for enveloping any complex message, for example, documents that are exchanged in e-business processes. For example, SOAP serves as a format for covering ebXML messages. (See http://xml.coverpages.org/ebXML.html and http://webservices.xml.com/pub/a/ws/2001/04/04/ebXML.html.)
Surely it is no coincidence that the American developer Dave Winer significantly influenced RSS as well as XML-RPC and SOAP. These three XML vocabularies are formats for messaging on the Web. They both don’t need any exchange technology other than the HTTP protocol; SOAP and XML-RPC, as well, can be called end-to-end technologies. For Winer, especially, XML-RPC and SOAP are complementary to RSS in creating complete publication solutions.
RSS is a format for documents that are accessed by people, whereas SOAP is a format for data that is to be processed by machines. Due to their extensibility, all new RSS versions can in fact be used as envelopes for data. At the same time, the semantics of RSS remain: the messages inform about the state of a web resource that can be modeled as a collection of similarly structured information objects.
1.7 The Versions of RSS and Atom: Their Evolution and the Future
If I use the term “RSS” in this book without the version number, it acts as a collective term for “the different RSS versions and Atom” as a group, that is, as a synonym for “feed format”. If I only talk about one of these formats, I use “RSS” with a version number, or the name “Atom.”
In an ideal world, this book would just be an essay that describes a format for the syndication of content, which is easy to use and explain. In fact—apart from the various predecessors—we are dealing with at least three and a half newer formats, which were developed as alternatives for each other, namely, RSS 1.0 and RSS 1.1 (an RSS 1.0 update), RSS 2.0, and Atom.
Many websites still offer feeds in the predecessor formats of RSS 2.0; these feeds have version numbers 0.91, 0.92, and 0.93. In this book, I describe them along with RSS 2.0. The development and discussion of these formats isn’t over; it is frequently discussed in a passionate and fierce manner. After all, because it concerns a key area of the Web’s future development, it also involves influence and money.
Almost all RSS applications can process every, or at least the relevant, form of RSS feeds. The most important reason for this is the fact that the semantic models, which are the basis for the different syndication formats, overlap for the most part. In addition, documents in the syndication formats have a flat structure; they don’t involve any deep and complex hierarchies. (Where do deeper hierarchies happen?—for example with quoted HTML markup—applications can usually leave the processing to an HTML Rendering Engine.)
The following table includes data in respect of the most important feed and news formats. With this, I follow:
-
Dave Winer: RSS History,
http://blogs.law.harvard.edu/tech/rssVersionHistory;
Mark Pilgrim: The myth of RSS compatibility,
http://diveintomark.org/archives/2004/02/04/incompatible-rss; -
Sam Ruby: Really Simple Syndication,
http://www.intertwingly.net/stories/2002/09/02/reallySimpleSyndication.html; - Edd Dumbill: XML in News Syndication, http://webservices.xml.com/pub/a/ws/2000/07/17/syndication/newsindustry.html, and the news section on XMLNews.org, http://www.xmlnews.org/.
In this book I discuss only the following three families of formats:
- RSS 2.0 and its predecessors (RSS 0.91, RSS 0.92, and RSS 0.93)
- RSS 1.0 and RSS 1.1
- Atom
The news industry formats in the strictest sense (NITF, NewsML, ICE, and PRISM) have tasks different to that of the feed formats of the RSS and Atom family. They serve to exchange content and trade data between commercial partners. All remaining formats either didn’t establish themselves or are irrelevant. This doesn’t mean that they are not interesting. The appendix contains an overview of the Outline Processor Markup Language, OPML, which is used by many aggregators and newsreaders as an addition to RSS (see section A.2, Outline Processor Markup Language).
1.7.1 The Beginnings: MCF, Scripting News, and CDF
The disparate influences that subsequently led to the development of different RSS versions are pretty obvious in the history of the formats. A metadata format—the “Meta Content Framework” MCF—and news channel formats like the Scripting News format and Microsoft’s Channel Definition Format (CDF) were the predecessors of RSS. For the description of RSS‘s case history, I follow primarily Ben Hammersley, Content Syndication with RSS, O’Reilly, 2003. In 2005, the second edition of the book was published (see bibliography).
The World Wide Web was developed as a net of texts, linked to each other. The protocols and standards to which the Web owes its astronomical rise, namely HTML and HTTP, describe how web documents are structured and how they are published, modified, and accessed. HTML doesn’t take into account that many of these documents are often, and in many cases regularly, changed and updated. In the Web’s infrastructure, which established itself in the first half of the 1990s, software developers and their clients were concerned with the demands posed by constant changes and updates in resources on the Web. In this manner, the first content management systems and browser add-ons, like the Netscape Sidebar and Java Applets with stock ticker messages, emerged. In the process, it became clear that common formats and protocols that support the constant updating of web resources, would simplify publishers’ and users’ lives and work on the Net. Such formats were developed in the mid 1990s.
Meta Content Format and Channel Definition Format
The origins of RSS reach back to at least 1995. At the time, Ramanathan V. Guha designed the Meta Content Format or MCF. Apple used the Meta Content Format in an experimental project called ProjectX, and later HotSauce. MCF makes it possible to describe sites with metadata that is found in an MCF file of its own. HotSauce presents this metadata in a format that allows three-dimensional navigation. In 1995, Guha switched over to Netscape and met Tim Bray, one of the most important developers behind the XML standard. Together they transformed MCF into an XML-based format. From this collaboration, the Resource Description Format (RDF) was developed—the basic technology of the Semantic Web.
Simultaneously, Microsoft, together with Pointcast and other companies, also developed an XML-based format to describe websites, which was called Channel Definition Format (CDF). CDF allowed the description of content, publication plans (scheduling), logos, and metadata of a site. It was incorporated in Internet Explorer 4 and acted as the technology basis for Microsoft’s so-called Active Desktop.
UserLand’s Scripting News Format
Perhaps the oldest syndication format in today’s sense is the Scripting News format from UserLand.com (http://my.userland.com/stories/storyReader$11). Dave Winer described it in December 1997 and implemented it publicly. A number of sites still offer newsfeeds in this format, in which every entry is a section with links. Winer tried to form the basic characteristics of writing on the Web, instead of offering only headlines, as in earlier RSS versions. In 1999, Winer included important elements of RSS 0.9 in version 2 of the Scripting News format.
In 1999, Netscape introduced RSS 0.9 as a format to describe information channels and aggregate content. RSS made it possible to publish snapshots of content in the portal “My Netscape”. RSS soon proved to be an effective, simple XML format for the syndication of content beyond this application.
Initially, RSS channels contained only news, but soon new types of content were added. For example, RSS feeds started describing articles in discussion forums, wikis, and new software versions (http://web.resource.org/rss/1.0/spec).
RSS was initially an abbreviation for “RDF Site Summary”. (For information about RSS as “RDF Site Summary” see Chapter 3. For a detailed explanation of the term, see section 3.1 RDF Basics.) With RSS, it is possible to integrate headlines from other sites with links to these sites in the portal. The user could personalize the portal and subscribe to a number of sites that offered RSS data. In this manner, My Netscape had at its disposal a great amount of additional content, which kept users on the site longer; the providers of RSS data received additional traffic—the most important goal of many websites in the times of the dot-com boom. Since it is easy to convert RSS to HTML, other sites soon started using the same technology. Slashdot soon used RSS instead of its own headline format, and tools were developed to create and process RSS in the common scripting languages.
The first desktop headline viewers were released in 1999 (Carmen’s Headline Viewer; compare http://www.xml.com/pub/r/91; http://www.headlineviewer.com/; with Ben Hammersley’s article in the Guardian: http://www.guardian.co.uk/online/story/0,3605,781838,00.html). These applications made it possible to download RSS information and then read it without being connected to the Internet. Likewise, RSS directories like syndic8 and other aggregators were developed at about the same time. Dan Libby developed the first version of RSS as a pure RDF application. At Netscape, however, that format was soon considered too complicated, and it was replaced by a simpler vocabulary, which was not usable RDF, but wasn’t a really simple format either. Soon after, Netscape completely abandoned RDF in RSS 0.91. This decision provoked the first split in the development of the syndication formats, a split that lasts until today. One group of developers considers RSS an XML format to exchange news and other content that is updated often. The other group regards it as a metadata format, that is, an instrument to represent knowledge. The debate over whether newsfeed documents should be RDF documents at the same time isn’t over yet.
In the first year of their existence alone, there were 4,000 different RSS feeds to be found on the Web. In 2002, the RSS directory syndic8 broke through the symbolic 10,000 feeds barrier.
1.7.2 RSS 0.91
Soon after, Netscape published RSS 0.91 under the name of Rich Site Summary. RSS 0.91 wasn’t an RDF format anymore; it took on some elements from UserLand’s Scripting News format, most importantly the description element. This allowed RSS to evolve into a format for spreading content, for which it was developed in the first place. Netscape wasn’t involved in further development of the format for very long. UserLand and especially its founder, Dave Winer, successfully propagated RSS as an element of the syndication framework and soon after published version 0.91 under their own copyright. Winer is among the founders of Weblogging and also belongs among the pioneers of the “Semantic Web”.
RSS 0.91 and all its subsequent versions, as well as XML-RPC and the MetaWeblog API, owe their origins to UserLand and Winer. UserLand products like the content management system Manila and the service EditThisPage.com “brought together the world of content syndication and weblogs”: to use the quote given in the introduction of the RSS 1.0 specification.
An important novelty of the Netscape RSS 0.91 version compared to RSS 0.90 is the possibility of validating documents of this format against a DTD. Abandoning the RDF characteristics, which couldn’t be used any more at that point, simplified the language compared to its predecessor. The abbreviation RSS now stood for Rich Site Summary or Really Simple Syndication (for more information on the XML elements see also section 2.5.1).
1.7.3 RSS 1.0
In the following years, the split came to a real head in the RSS developer community. Dave Winer’s company, UserLand, controlled RSS 0.91. UserLand was above all interested in keeping the format simple and using it for personal publishing, particularly for the new publishing form of Weblogging.
Other important developers, however, among them Rael Dornfest, who was working as a chief technology officer at O’Reilly’s, wanted to expand the scope of RSS to use it for other purposes and connect it with additional formats. Therefore, they reintroduced RDF and also
introduced a new mechanism, the XML namespace. A related specification was published in December of 2002; the developers called the format that was described, RSS 1.0.
RSS 1.0, which is in no way just an additional RSS version, but an alternative language on its own, is more formally specified than RSS 0.91 and its successors. RSS 1.0 is defined not only as a syntax, but also as a data format. Due to its compatibility with RDF, the metadata framework of the W3C, RSS 1.0 makes the exact description of the relationship between RSS data and metadata of other RDF formats possible.
However, RSS 1.0 and RSS 2.0 don’t differ much with respect to the embedding of content in other formats and the description or non-description, respectively, of the relationship between document formats and publication environments. (Chapter 3 gives a detailed description of RSS 1.0. You will find a reference of its XML elements in section A.4 in the appendix.)
1.7.4 RSS 0.92
Winer answered the publication of RSS 1.0 with RSS 0.92, within two weeks. RSS 1.0 was a modular and extensible syndication vocabulary that could be easily combined with other XML vocabularies and RDF formats. RSS 0.92, on the other hand, was an easy-touse vocabulary whose limited features were sufficient for the needs of most users of syndication technologies.
From the users’ perspective, RSS 0.92 and RSS 1.0 were compatible. Most RSS parsers could and can process documents in both formats. Parsers for the 0.9x formats, however, can’t understand the RSS 1.0 extension modules, let alone extract RDF data from RSS documents.
All attempts to develop another RSS format, acceptable to representatives of both versions failed. Several RSS 1.0 fans held Dave Winer responsible for this. Not only did Winer refuse to define RSS as an RDF format or design it to be RDF compatible, but he also didn’t accept the common practice of discussing a format on a mailing list in order to reach the widest possible consensus with other developers.
Instead, Winer wanted to turn weblogs into discussion forums for the further development of RSS. This procedure allowed him and UserLand to filter the articles. (For more information on the XML elements used by RSS 0.91, see section 2.5.1.)
1.7.5 RSS 0.93
RSS version 0.93, which was published by Winer a year later, already contained most of the elements that belong to today’s up-to-date RSS 2.0. But RSS 0.93 doesn’t have an extension mechanism. This format remains popular even today. (For more information on the XML elements used by RSS 0.93 see section 2.5.3.)
1.7.6 RSS 2.0
In September of 2002, Winer published the specification for RSS 2.0, again without making an effort to reach a consensus with those who participated in the rss-dev mailing list and helped develop RSS 1.0. (Just prior to this, he had published the same RSS 2.0 format as RSS 0.94.) At the same time, Winer declared RSS 2.0 a frozen standard; successor formats weren’t supposed to be published under the name RSS any more. A little later, Winer assigned the rights of RSS to Harvard University—RSS was to be exempt from the suspicion of serving personal or business interests.
Today, RSS is the most widely used feed format. It is characteristic of this format to not specify, or to leave it to the application developers to specify: the connections between RSS data on the one hand, between other content formats, data/metadata formats, and publication environments on the other hand. Essentially, RSS 2.0 defines syntax, whereas meaning and use were determined through the use of examples. The supporters of RSS 2.0 consider this low level of specification one of the format’s biggest advantages, whereas the supporters of alternate RSS versions see it as its prime weakness.
Other formats owe their existence to the fact that RSS 2.0 ignores a lot of problems. The enormous problems encountered during the formal definition of these formats are an argument for, as well as against, this strategy; an argument for it, because RSS 2.0 works in many different applications and is by far the most popular version, including its predecessor formats. The argument against it is the fact that, in practice, problems arise wherever the RSS 2.0 specification is unclear, for example, in the case of document validation. (Chapter 2 gives a detailed description of RSS 2.0. You find a reference list of the XML elements of RSS 2.0 in the appendix in section A.3.)
1.7.7 From a Syndication to a Publication Format: Atom, the New Alternative
In June of 2003, the Atom roadmap was published. (See http://intertwingly.net/wiki/pie/RoadMap; concerning the date: http://virtuelvis.com/archives/2003/06/index. Initially, the format was called “Echo” and “Pie”.) The goals of this format were to be “100% vendor neutral, implemented by everybody, freely extensible by anybody, and cleanly and thoroughly specified”. Previously, there had been intense debate about RSS 2.0 and the political implications of the fact that Dave Winer had control over the format. (Links for background material: http://diveintomark.org/archives/2003/06/23/a_fresh_start).
At that point, it was clear that “weblogging would become an industry of its own”, as Mark Pilgrim put it: in the future, interoperation would require more than “calling a friend or sending an e-mail”. Mark Pilgrim and Sam Ruby developed the FEED Validator, which checks the newsfeeds of almost all known feed formats with respect to standard compatibility (http://feedvalidator.org/). In the process, they came across deficits of the RSS 2.0 specification and its predecessors. The specification is unclear on several important points, so in some cases it can’t be decided whether a document complies with it or not. Winer’s attempts to stay in control seemed to be “FUD” to the group of future Atom developers. (Fear, Uncertainty, Doubt: open-source supporters like to characterize this acronym as a generic strategy—used deliberately, but often in vain—to make someone insecure.) At that time, Mark Pilgrim considered RSS 1.0 more or less a failure, or even dead, and some of the people who had backed RSS 1.0 up to that point, supported Atom from then on as a new format.
In March of 2004, Dave Winer—unsuccessfully in the end—suggested combining RSS 2.0 and Atom into one format and naming the document element rssAtom (http://blogs.law.harvard.edu/crimson1/2004/03/08). The new format would “differ from RSS as little as possible” and would be developed by an open IETF work group. The specification, which the Atom developers were promising, and the validation service could be used together. Winer’s suggestion differs from the goals of the Atom developers only in the fact that he placed value on maximum backward compatibility towards older RSS versions. At that point, however, the discussion had advanced too far already, and Winer didn’t participate. In fact, the Atom developers chose the IETF as the standard body. As the only feed format so far to be backed by an organization that is in part responsible for the development of the Internet, Atom has a good chance of becoming a standard.
The Atom work group followed the path of an exact syntactical specification that clearly defines the connections of Atom-specific information to other information included in the document. Atom is explicitly defined as both a syndication format and a publication format. The “Atom Publishing Protocol” will belong to the Atom standard as well, once it is completed. On the other hand, the connection with metadata formats is not the center of the Atom developers’ attention. The Atom standard as such is independent of the specifications of the Resource Description Format; however, for some developers it is especially important that Atom and RDF stay compatible. (Chapter 4 gives a detailed description of Atom. You can find a reference list of the XML elements for Atom in section A.7 of the appendix.)
1.7.8 Which Format for Which Purpose?
All three—or four—up-to-date RSS versions offer the same basic functions for the user. The differences with respect to these tasks are easy to balance with modifications and extensions. The formats, however, vary notably in the amount of detail in the specifications, the processing of documents in these formats, and the additional functions they offer:
- RSS 2.0 and its predecessors were defined by referring to the latest technological implementations. The specification doesn’t depend on the way RSS is treated, but—explicitly or implicitly—it refers regularly to the current practice. This is supposed to make the specification simple and easy to implement, and restricts the creativity of software developers as little as possible. (It is for this reason that it is so easy to accuse Dave Winer, one of the format’s founders, of using the format definitions for personal interest or the interests of his company UserLand. It is a design principle of RSS 2.0 to abide primarily by the current practice; as a pioneer of this practice, Winer can’t do anything other than to refer to his own developments.)
- RSS 1.0 and its successor RSS 1.1, on the other hand, are specified in such a way that the content of documents can be automatically processed. An RSS 1.0 or 1.1 document is nothing but a serialization of statements which follow the rules of the Resource Description Format (RDF). The format uses a semantic model that makes the formal description of the document’s meaning possible. Information that is available in an RSS 1.0 or RSS 1.1 document can be easily connected with other RDF information and used together.
- Atom was defined considering the technological requirements of newsreaders and authoring systems for weblogs. (See also the site of the Atom Wiki concerning Use Cases: http://www.intertwingly.net/wiki/pie/UseCases.) However, in the specification the format is described abstractly and independently of how such systems are implemented. It is the goal of the Atom specification to describe the format and the rules completely and clearly for users. Software developers are supposed to be able to decide for certain what is allowed in an Atom document and how documents are exchanged between the client and the server. (This doesn’t mean the importance of the language elements for a human user, that is, their social unction, is clearly determined. It also doesn’t mean that Atom meets its own expectations one hundred per cent. If it can’t be decided in Atom and RSS 1.0 whether a certain construct in a document is possible or not, it means that there is a bug in the specification.) Another important dif ference between Atom and RSS 2.0 and 1.0 is the fact that Atom was also developed as a format for authoring documents. For that, the format is used in the context of the architecture of the web as described in the current specifications of the W3C.
If you read this book, you are probably using RSS yourself, or at least you will want to use it in the future. Considering the different RSS versions used on the Web, you will ask yourself sooner or later which one is right for you.
You will find here a long and a short answer to this question. The long answer is the book itself. As you will see, the advantages and disadvantages of the different syndication formats can’t be summarized in just a few sentences. If it involves more than producing a simple newsfeed, several aspects have to be considered, like the existing software, the necessity to combine RSS with other vocabularies, the way of validating data, future extensibility, and the requirements that result from the use of web services.
The short answer is: users who want to use RSS only as a syndication format have to analyze what data they want to offer. The most important content elements are found in all RSS versions. Those who restrict themselves to these core elements can use any of the formats and automatically convert it into one of the other formats—either with software on their own system, or with a service that is offered on the Web, like, for example, Feedburner (http://www.feedburner.com).
Those who are looking for more ways to express themselves have to evaluate, which one of the versions offers the features they are looking for and is at the same time supported by software that is supposed to process the data. In respect of the possibilities of expression, the modules of RSS 1.0 are still unmatched at the present time. Anyone who wants to offer multimedia data, for example as a podcast, depends mostly on RSS 2.0 and its expansion modules. It is to be reckoned that the corresponding modules of both formats will soon be integrated in Atom as well.